Python / Generators

Generators

By Marcelo Fernandes Aug 27, 2017

The python iter() function

Whenever the interpreter needs to iterate over an object x, it automatically calls iter(x).
The iter built-in function will:

  • Checks whether the object implements, __iter__, and calls that to obtain an iterator;
  • if __iter__ is not implemented, but __getitem__ is implemented, Python creates an iterator that attempts to fetch items in order, starting from index 0 (zero); Therefore __getitem__ should accept integer keys starting from 0.
  • If that fails, Python raises TypeError, usually saying "'C' object is not iterable", where C is the class of the target object.
This explains why any Python sequence is iterable: They all implement __getitem__. In fact the standard sequences also implement __iter__.
In fact, your sequence should also implement __iter__, because special handling of __getitem__ exists for backward compatibility, and may be gone in the future.

Testing if a class is an iterable


class Foo:
    def __iter__(self):
        pass

from collections import abc

issubclass(Foo, abc.Iterable)
# True

f = Foo()
isinstance(f, abc.Iterable)
# True


Iterables versus Iterators

Iterable:

An iterable is any object, not necessarily a data structure, that can return an iterator (with the purpose of returning all of its elements).
More over, Objects implementing an __iter__ method returning an iterator are iterable. Sequences are always iterable; so as are objects implementing a __getitem__ method which takes 0-based indexes.
In the following example, x is the iterable, while z and y are two instances of an iterator.

x = [1,2,3]
y = iter(x)
y
# <list_iterator object at 0x7f6e35561a90>
z = iter(x)
next(y)
# 1
next(y)
# 2
next(z)
# 1


Iterators:

So what is an iterator then? It's a stateful helper object that will produce the next value when you call next() on it. Any object that has a __next__() method is therefore an iterator. How it produces a value is irrelevant.
So an iterator is a value factory. Each time you ask it for "the next" value, it knows how to compute it because it holds internal state.
The standard interface for an iterator has two methods:

__next__: Returns the next available item, raising StopIteration when there are no more items.
__iter__: Returns self; this allows iterators to be used where an iterable is expected, for example, in a for loop.
Example:

from itertools import count
counter = count(start=13)
next(counter)
# 13
next(counter)
# 14


Building a Fibonacci iterator:

from itertools import islice
# islice helps creating an limited iterator from a infinite iterator.
from collections.abc import Iterator

class fib:

    def __init__(self):
        self.prev = 0
        self.curr = 1

    def __iter__(self):
        return self

    def __next__(self):
        value = self.curr
        self.curr += self.prev
        self.prev = value
        return value

f = fib()
list(islice(f, 0, 10))
# [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

issubclass(fib, Iterator)
# True

NOTE:
Often, for pragmatic reasons, iterable classes will implement both __iter__() and __next__() in the same class, and have __iter__() return self, which makes the class both an iterable and its own iterator. It is perfectly fine to return a different object as the iterator, though.


Therefore: Iterators are also iterable, but iterables are not iterators.

Generators

Any Python function that has the yield keyword in its body is a generator function: a function which, when called, returns a generator object. In other words, a generator function is a generator factory.


Rules of thumb:
  • Any generator also is an iterator (not vice versa!);
  • Any generator, therefore, is a factory that lazily produces values.

Example:

def gen_1234():
    yield 1
    yield 2
    yield 3
    yield 4


for i in gen_1234():
    print(i)

# 1
# 2
# 3
# 4

g = gen_1234()
next(g)
# 1
next(g)
# 2
next(g)
# 3
next(g)
# 4
next(g)

# Traceback (most recent call last):
# ...
# StopIteration



Being pythonic using generators

Instead of:

def something():
    result = []
    for ... in ...:
        result.append(x)
    return result


do:

def iter_something():
    for ... in ...:
        yield x

# it = list(iter_something())

Useful iterators can be found at itertools iterators.

Using yield from

Nested for loops are the traditional solution when a generator function needs to yield values produced from another generator.


For example:


def chain(*iterables):
    for it in iterables:
        for i in it:
            yield i

s = 'ABC'
t = tuple(range(3))
list(chain(s, t))
# ['A', 'B', 'C', 0, 1, 2]

PEP 380 introduced a new syntax in order to do that:


def chain(*iterables):
    for i in iterables:
        yield from i

list(chain(s, t))
['A', 'B', 'C', 0, 1, 2]

Besides replacing a loop, yield from creates a channel connecting the inner generator directly to the client of the outer generator. This channel becomes really important when generators are used as coroutines and not only produce but also consume values from the client code.