chapter 4: Iterators and Generators

4.1. Manually Consuming an Iterator, by next(iterator[, default]) function

with open('python-cookbook-3rd.org') as f:
    print(next(f))

open(filename, …) function will return a iterator of lines in that file

a list object is not an iterator

use the iter(iterable) function to create an iterator given an iterable

the for x in X syntax works both for iterator and list object

iterator and iterable

An object is said to be iterable if it has the iter method defined. The _iter_() will reutrn the iterator object.

An object is said to be a iterator if it has following method defined:

  1. iter: which return itself Can be tested the it._iter_() == it is true
  2. next: return the next value every time it is invoked.

So an iterator is an iterable, call iter(iterable) to get an iterator.

The iter(iterable) function: it will return 'iterable._iter_()'

So if obja is an iterable, then iter(obja) equal obja._iter()

obja = [1, 2, 3]
ia = obja.__iter__()
ib = iter(obja)
ic = iter(ib)
print(ia)
print(ib)
print(ib is ic)
print(next(ia), next(ib))

if obja is iterator, then iter(obja) and obja is the same object.

A good ref: http://www.shutupandship.com/2012/01/understanding-python-iterables-and.html

a example of create a iterable class

class MyList(list):
    def __iter__(self):
	return MyListIter(self)

class MyListIter(object):
    """ A sample implementation of a list iterator. NOTE: This is just a 
    demonstration of concept!!! YOU SHOULD NEVER IMPLEMENT SOMETHING LIKE THIS!
    Even if you have to (for any reason), there are many better ways to 
    implement this."""
    def __init__(self, lst):
	self.lst = lst
	self.i = -1
    def __iter__(self):
	return self
    def __next__(self):
	if self.i<len(self.lst)-1:
	    self.i += 1         
	    return self.lst[self.i]
	else:
	    raise StopIteration

if __name__ == '__main__':
    a = MyList([1, 2, 3, 4])
    ia = iter(a)
    print('type(a): %r, type(ia): %r' %(type(a), type(ia)))
    for i in a: 
	print (i)

how does the for in loop works

  1. it first get the iterable's iterator object, by calling its _iter_() method
  2. get the element by invoke the iterator's _next_() method, and bind the value to the variable.
  3. stop when an 'StopIteration' exception happens.

the next(iterator) function

it just return iterator._next_()

the iter(iterable) function

it just return iterable._iter_()

the len(obj) function

it just return obj._len_()

4.2. Delegating Iteration

When create a class the with a underline container, just define an _iter_() method that forward the request to the underlineing container object.

4.3. Creating New Iteration Patterns with Generators

what is a generator?

a generator is a function that contains at lease one 'yeild' statement.

Unlike normal function, it's boyd will not be executed when it is be called, instead, it will return a generator object.

4.4. Implementing the Iterator Protocol

use the generator instead of the next method, which will be much simple.

使用yeild 创建一个Tree Node,比使用_next_函数简单多了。

yeild from syntax.

4.5. Iterating in Reverse, by the reversed(obj) function

reversed only works if the obj

  • the obj defined a _reversed_() method. or
  • the obj's size can be determined.

It returns an iterator.

For example, a file handler returned by the 'open()' function can't be used with the reversed function. to use it, first convert it to a list, then pass it to the reversed() function.

with open("1.txt") as f:
    a = reversed(list(f))
    print(next(a), next(a))

defined a customized reversed iterator, by define the _reversed_() method

class CountDown():
    def __init__(self, start):
	self._start = start

    def __iter__(self):
	return self

    def __next__(self):
	if self._start >=0:
	    n = self._start
	    self._start -= 1
	    return n
	else:
	    raise StopIteration

    def __reversed__(self):
	return ReversedCountDown(self)

class ReversedCountDown():
    def __init__(self, orig):
	self._orig = orig
	self._n = -1
    def  __iter__(self):
	return self
    def __next__(self):
	if self._n <= self._orig._start:
	    self._n += 1
	    return self._n
	else:
	    raise StopIteration

# if __name__ == '__main__':
cd = CountDown(2)
# for a in cd:
#     print(a)

print("reversed")
for a in reversed(cd):
    print(a)

Implemet the iterator protocal by next method is a little complex compared to by use the yield statement. The differenc is that then the object is …

class CountDown():
    def __init__(self, start):
	self._start = start

    def __iter__(self):
	n = self._start
	while n >=0:
	    yield n
	    n -=1

    def __reversed__(self):
	n = 0
	while n <=self._start:
	    yield n
	    n+=1

cd = CountDown(3)
for a in cd:
    print(a)

print ("reversed")
for a in reversed(cd):
    print(a)

4.6. Defining Generator Functions with Extra State

print the surrounding previous lines if pattern matched, by use a generator, implemented by a class

Here previous lines are states.

from collections import deque
class HistoryLines():
    def __init__(self, lines, histlen=3):
	self.lines = lines
	self.history = deque(maxlen=histlen)

    def __iter__(self):
	for line in self.lines:
	    self.history.append(line)
	    yield line

with open('1.txt') as f:
    hist_lines = HistoryLines(f)
    for line in hist_lines:
	if  'wrap' in line:
	    for hl in hist_lines.history:
		print('%s' % hl)

Good practice: if you need save some states, then don't use a function to create a generator, use a class.

4.7. Taking a Slice of an Iterator

by use of the itertools.islice(start, end, step) functon

Because we don't know the size of a iterator or a generator, so we can't slice it directly.

from  itertools import islice as slice_iter
a = range(8)
for b  in slice_iter(iter(a), 2, 5, 1):
    print(b)

with open('1.txt') as f:
    for line in slice_iter(f, 2, 5, 2):
	print(line.strip())

The result is the same as my impllemented one.

a try by me, works

def slice_iter(aiter, start, end, step):
    n = 0
    idx = range(end)[start:end:step]
    for i in range(end):
	v = next(aiter)
	if i in idx:
	    yield v

a = range(8)
for b  in slice_iter(iter(a), 2, 5, 1):
    print(b)

with open('1.txt') as f:
    for line in slice_iter(f, 2, 5, 2):
	print(line.strip())

4.8. Skipping the First Part of an Iterable, by itertools.dropwhile(testfunc, iterable)

import itertools
with open('1.txt') as f:
    for line in itertools.dropwhile(lambda x: x.startswith('#'), f):
	print(line, end='')

This is different from filtering

if the position is known, then we can use itertools.islice(iterable, start, None) to drop the first 'start' items.

4.9. Iterating Over All Possible Combinations or Permutations

An important aspect of itertools module: for complex iteration tasks, it is very likely there is an exist solution.

create permutations from a iterable collection of items, by itertools.permutations(iterable[, len])

The return value is an iterator

from itertools import permutations
a = ['a', 'b', 'c']
for b in permutations(a, 2):
    print(b)

create combinations from a iterable collection of items, by itertools.combinations(iterable, len)

The order of items does not matter

from itertools import combinations
a = ['a', 'b', 'c']
for b in combinations(a, 2):
    print(b)

create combinations from a iterable collection of items, by itertools.combinationswithreplacement(iterable, len), same item can exist more than one times.

The order of items does not matter

from itertools import combinations_with_replacement
a = ['a', 'b', 'c']
for b in combinations_with_replacement(a, 4):
    print(b)

4.10. Iterating Over the Index-Value Pairs of a Sequence, by enumerate(iterable[, startindex])

a = ['a', 'b', 'c']
for i, v in enumerate(a, 1):
    print(i, v)

4.11. Iterating Over Multiple Sequences Simultaneously, by zip(iterable1, iterable2, …), shortest

The zip function will create an iterator that return tuples: first element from iterable1, second element from iterable2, … Should the size of all iterables be the same? => No, it can be different. the returned size is the same as the shortest size of all iterables.

a =  [1,  2, 3]
b = ['a', 'b', 'c', 'd']
for v in zip(a, b):
    print(v)

Iterating Over Multiple Sequences Simultaneously, by itertools.ziplongest(iterable1, iterable2, …), longest

If you want the returned iterator take the longest size, then use ziplongest. The element value will be None if that iterable is exzasted.

From the two functions: zip and ziplongest, there is a lesson: it better to create different function name, than add a more parameter.

from itertools import zip_longest
a =  [1,  2, 3]
b = ['a', 'b', 'c', 'd']
for v in zip_longest(a, b):
    print(v)

4.12. Iterating on Items in Separate Containers, by itertools.chain(iterable1, iterable2, …), concat iterables

from itertools import chain
a =  [1,  2, 3]
b = ['a', 'b', 'c', 'd']
for v in chain(a, b):
    print(v)

4.13. Creating Data Processing Pipelines

This section is about divide a task to many small pipelines(steps), by use of generator Generator is a producer, for loop is a comsumer.

example: iterate all matched lines from all files in a directory, recursively

相当于把多重QIAN TAO循环给扁平化了。但执行的顺序完全相同。generator确实比较好用。

import os
def gen_filenames(top):
    for dirpath, dirs, files in os.walk(top):
	for f in files:
	    yield os.path.join(dirpath, f)

def gen_open(filenames):
    for f in filenames:
	# print('file names: %s' % f)
	fh = open(f, encoding='utf-8')
	yield fh
	fh.close()

def gen_lines(files):
    for f in files:
	yield from f

def gen_match(lines, pattern):
    for v in  lines:
	if pattern in v:
	    yield v

filenames = gen_filenames('..')
files = gen_open(filenames)
lines = gen_lines(files)
matched_lines = gen_match(lines, 'slice')

for v in matched_lines:
    print(v, end='')

[not work]change two embeded for loop to two seperate one by generator

a = [1, 2, 3]
b = ['a', 'b']

for i in a:
    for j in b:
	print(i, j)

def gen_a(aiter):
    for v in aiter:
	yield v

def gen_b(aiter, biter):
    for v in aiter:

4.14. Flattening a Nested Sequence, by generator, recursively

Why this function is not included in itertools module?

from collections import Iterable
def  flatten(items, ignored_types=(str, bytes)):
    for v in items:
	if isinstance(v, Iterable) and not isinstance(v, ignored_types):
	    yield from flatten(v, ignored_types)
	else:
	    yield v

a = [1, 2, [3, 4, [5, 6], 7],  8, 'abc']
for v in a:
    print(v)

print("the flattened version")
for v in flatten(a):
    print(v)

yield from just like a for loop

def gen_a():
    for v in range(3):
	yield v

def gen_b(gena):
    yield from gena

def for_b(gena):
    for v  in gena:
	yield v

# the gen_b and for_b works exactly the same, but the yield from is better
for v in gen_b(gen_a()):
    print(v)

print('the for version')
for v in for_b(gen_a()):
    print(v)

4.15. Iterating in Sorted Order Over Merged Sorted Iterables, by heapq.merge(iterable1, iterable2, …)

the input iterables should in sorted order. then it will create an new iterable of sorted items from all input.

a = [1, 4, 8]
b = [2, 3,  7, 9]

import heapq
for v in heapq.merge(a, b):
    print(v)

The function will only get the needed items into memory. So it better to merge two sorted files.

Similar to sorted(itertools.chain(*iterables)), but will not read all content to memory.

4.16. Replacing Infinite conditional while Loops with an Iterator, by iter(callable, sentinel) function

invoke the callable UNTIL it returns the sentinel

Means: repeated invoke the callable, and return its return value, until the return value equal to the sentinel.

a = [1, 2, 3, 4, 5]
idx = -1
def foo():
    global idx
    idx+=1
    return a[idx]

for v in iter(foo, 3):
    print(v)