Python Cookbook》读书笔记.

chapter 1: Data Structures and Algorithms

1.6 Mapping Keys to Multiple Values in a Dictionary, by list value and defaultdict

example: create a dictionary with default value, by defaultdict

dictionary's property:

  1. you can add new key to a dictionary
  2. but when you access a key that not exists, there will be error
  3. defaultdict is used to fix such problem.
from collections import defaultdict
`dict` = defaultdict(`func list`)
# `dict`['aaa'].append(2)
$0

1.6 example

from collections import defaultdict
frequency = defaultdict(int)
frequency['colorless'] = 4
frequency['ideas'] # will be 0

frequency = defaultdict(list)
# first, frequency['colorless'] will return a empty list, then append one element to this list.
frequency['colorless'].append(4)
frequency['ideas'] # will be []

# Or you can pass a function take no arguments 

# the idiom:
my_dictionary = defaultdict(function to create default value)
for item in sequence:
my_dictionary[item_key] is updated with information about item

1.7 Keeping Dictionaries in Order, OrderedDict

form collections import OrderedDict
d = OrderedDict()
# the insertion order will be reserved.

An typical application is when for serilization.

1.8. Calculating with Dictionaries

get max key/value in a dictionary, based on the value, by inverting the dict

max(zip(`dict`.values(), `dict`.keys()))
# another solution
# max(`dict`, key=lambda k:`dict`[k])

first convert the dict to list of (value, key) pairs, then max function will first compare value, then compare key.

result of the max value for many things

for tuple and list, it just the element. but for a dict, it returns only the key. Why? Because it accept a iterable as first parameter, and for a dictionary, the iterable value is the key.

understanding of multi value bind

---> (b11, b12) = 1 TypeError: 'int' object is not iterable

The right hand side should be an iterable, every element in the iterable will be asigned to the left hand side variable, with each variable comsume one element exzactly. If the number of elements and variables not match, then there will be an error.

To consume more than one values, use the '*varname' expresstion, then the variable 'varname' will be a list of many elements.

Back to the error prints in the example, the 'int' object refers to the right side '1'.

PS: I find python much simpler and funny than java.

1.9. Finding Commonalities in Two Dictionaries

a dictionary's d.keys() and d.items() support set operations So to find the common part keys/items in two dictionaries, just use the set operation '|' or 'union' function.

get all keys as a iterable in a dictionary, by keys()

`dict`.keys()

get all values as a iterable in a dictionary, by values()

`dict`.values()

get all key, value pairs as a iterable in a dictionary, by items()

`dict`.items()

set operations

'|': union '&': intersection '-': difference 's1 < s2': check if s1 is a subset of s2

Example: In 1: e.keys(), d.keys() Out1: (dict_keys([1, 4, 'a', 9]), dict_keys([1, 3, 5]))

In 2: e.keys() & d.keys() Out2: {1}

In 3: e.keys() | d.keys() Out3: {1, 3, 4, 5, 'a', 9}

In 4: e.keys() - d.keys() Out4: {9, 4, 'a'}

1.10. Removing Duplicates from a Sequence while Maintaining Order

problem: what is hashable(and the link to python glossary)

From the python glossary: https://docs.python.org/3/glossary.html

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal, and their hash value is their id().

how yield/generator iterator is implemented

From the glossary, it works by suspends the function and return the value, and save the current status. Then if it was called next time, it will start execute from the place last time it was suspended. Great!! I understanded this.

generator A function which returns a generator iterator. It looks like a normal function except that it contains yield expressions for producing a series of values usable in a for-loop or that can be retrieved one at a time with the next() function. Usually refers to a generator function, but may refer to a generator iterator in some contexts. In cases where the intended meaning isn’t clear, using the full terms avoids ambiguity.

generator iterator An object created by a generator function. Each yield temporarily suspends processing, remembering the location execution state (including local variables and pending try-statements). When the generator iterator resumes, it picks-up where it left-off (in contrast to functions which start fresh on every invocation).

file object is also an iterable, the element is a line

with open(somefile,'r') as f:
    for line in f:
	print(line)

a function that delete all duplicates in a list, with order preserved

def dedupe(items, key=None):
    seen = set()
    for item in items:
	val = item if key is None else key(item)
	if val not in seen:
	    yield item
	    seen.add(val)

If the element is hashable, then key function is not needed. Else, provide a fucntion to convert the element to a hashable element.

examples: >>> a = [ {'x':1, 'y':2}, {'x':1, 'y':3}, {'x':1, 'y':2}, {'x':2, 'y':4}] >>> list(dedupe(a, key=lambda d: (d['x'],d['y']))) [{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 2, 'y': 4}] >>> list(dedupe(a, key=lambda d: d['x'])) [{'x': 1, 'y': 2}, {'x': 2, 'y': 4}] >>>

delete all duplicates in a list, don't preserve order, by set

set(`list`)

Then all duplicate elements in list will be removed.

the a?b:c expression in python, if else in one line

val = b if a else c

looks good

1.11. Naming a Slice

the slice object

create a slice

a=[1,2,3,4]
s = slice(1,2)
print(a[s])
print(a[1:2])

'1:2' is just a shortcut to 'slice(1,2)'

slice attributes

s = slice(1,2,2)
print(s.start, s.stop, s.step)

1.12. Determining the Most Frequently Occurring Items in a Sequence

A method by me

a = [1, 2, 1, 3, 2,3,3]
from collections import defaultdict
d = defaultdict(int)
# b = [d[k]+=1 for k in a]  # syntax error here
for k in a:
    d[k]+=1

r = max(zip(d.values(), d.keys()))
print(r[1])

the collections.Counter class: change a list to a list of tuple of (element, count)

a = [1, 2, 1, 3, 2,3,3]
from collections import Counter
b = Counter(a)
c = b.most_common(1)
print(c[0][0])

# get the count
print(b[3]) # 3 is the element in a


# update with more words
b.update([4, 2, 5])

# and a Counter object support the math operations: '+' and '-'

When you need to count data, use Counter class. This is a so little class, in practice, I will always write it from scratch before.

1.13. Sorting a List of Dictionaries by a Common Key

the operator.itemgetter function

it will return a callable that can be passed to 'sorted':s key parameter, for list or dictionary

# return value of
import operator
operator.itemgetter("name")
# is the same as this one
lambda r:r["name"]
# but the former  is a little faster

仍然是非常小的功能,为什么搞得这么精细呢?

1.14. Sorting Objects Without Native Comparison Support

the operator.attrgetter function

it will return a callable that can be passed to 'sorted':s key parameter, for user defined class

class User():
    def __init__(self, name):
	self.name = name

    def __repr__(self):
	return 'User({})'.format(self.name)

# return value of
operator.attrgetter("name")
# is the same as this one
lambda o:o.name
# but the former  is a little faster

1.15. Grouping Records(a sequence of dictionaries) Together Based on a Field

the itertools.groupby function: group sequencially the list as tuple (key, items)

import itertools
rows =  [{1:2}, {1: 4},  {1: 3}]
# a should be a generator
rows.sort(key=itemgetter(1))
a = itertools.groupby(rows, key=itemgetter(1))

another way is just use a default list dictionary to group, then no sort is needed.

1.16. Filtering Sequence Elements

To fitering, just use list comprehension with an if condition

itertools.compress function, a filtering tool

it takse two parameters:

  1. an iterable which to be compressed
  2. a Boolean sequence, with the same length of first parameter if the element in this sequence is True, then the element at the same position in the first iterable will be put to the output

    An example:

addresses = [
    '5412 N CLARK',
    '5148 N CLARK',
    '5800 E 58TH',
    '2122 N CLARK'
    '5645 N RAVENSWOOD',
    '1060 W ADDISON',
    '4801 N BROADWAY',
    '1039 W GRANVILLE',
]
counts = [ 0, 3, 10, 4, 1, 7, 6, 1]

import itertools
b = [e > 5 for e in counts]
a = itertools.compress(addresses, b)
# Now a will be all items where count larger than 5
print(a)

1.17. Extracting a Subset of a Dictionary

dictionary comprehension, just like list comprehension, but use '{' instead of '['

prices = {
    'ACME': 45.23,
    'AAPL': 612.78,
    'IBM': 205.55,
    'HPQ': 37.20,
    'FB': 10.75
}
# Make a dictionary of all prices over 200
p1 = { key:value for key, value in prices.items() if value > 200 }
# Make a dictionary of tech stocks
tech_names = { 'AAPL', 'IBM', 'HPQ', 'MSFT' }
p2 = { key:value for key,value in prices.items() if key in tech_names }

1.18. Mapping Names to Sequence Elements

the collections.nametuple function, map an index to a name, and access to an element with that name

example:

from collections import namedtuple
People =  namedtuple('People', ['name', 'age'])
p = People(name='Jim', age=12)
print(p, p.name, p.age)

A good application: for database selection.

The ._replace method: Because a tuple is immutable, so to change an element, you can use _replace to replace a field and a new one will be returned. A tipical usage is first create a prototype element with all field value be the default one, then update some fields with the _replace function. Why there is a '_' in the function name?

from collections import namedtuple
Stock = namedtuple('Stock', ['name', 'shares', 'price', 'date', 'time'])
# Create a prototype instance
stock_prototype = Stock('', 0, 0.0, None, None)
# Function to convert a dictionary to a Stock
def dict_to_stock(s):
    return stock_prototype._replace(**s)

a = {'name': 'ACME', 'shares': 100, 'price': 123.45}
dict_to_stock(a)
# Stock(name='ACME', shares=100, price=123.45, date=None, time=None)

1.19. Transforming and Reducing Data at the Same Time

use generator-expression argument

The reducing function means: given a list, return a value.

the any function, check if any of an element is True in a iterable

check if any .py files exist in a directory

# Determine if any .py files exist in a directory
import os
files = os.listdir('dirname')
if any(name.endswith('.py') for name in files):
    print('There be python!')
else:
    print('Sorry, no python.')

get all files in a directory as a list

import os
files = os.listdir('dirname')

change a tuple/list/iterable to a csv line

This is much better than the string format method

# Output a tuple as CSV
s = ('ACME', 50, 123.45)
print(','.join(str(x) for x in s))# Output a tuple as CSV

1.20. Combining Multiple Mappings into a Single Mapping

the collections.ChainMap

combining many maps/dictionaries, then when get an element, it will try to get from the first map, then the second, ...

And for operations that mutate the mapping always affect the first map/dictionary.

typical application: scoped variable in a programming language.

Difference from the dict.update function: ChainMap use a link to the original dictionary, while dict.update create a new one.

  • check if an element exists in many dictionaries/maps, sequencially
    a = {'x': 1, 'z': 3 }
    b = {'y': 2, 'z': 4 }
    from collections import ChainMap
    c = ChainMap(a,b)
    print(c['x']) # Outputs 1 (from a)
    print(c['y']) # Outputs 2 (from b)
    print(c['z']) # Outputs 3 (from a)
    

Footnotes:

1

DEFINITION NOT FOUND.

2

DEFINITION NOT FOUND.

3

DEFINITION NOT FOUND.

4

DEFINITION NOT FOUND.