PythonTips - 王朝网络宽屏版

Packages:

www.python.org

www.python.org/pypi

http://py.vaults.ca/parnassus/

Text:

http://www.python.org/doc/

http://www.diveintopython.org/

http://aspn.activestate.com/ASPN/Python/Cookbook/

http://gnosis.cx/TPiP/

Install Python 2.3 and also install win32all toolkit from Mark Hammond which

comes with PythonWin (IDE with integrated debugger). The debugger feature is

not obvious but the traditional shortcuts of F9, F10 and so on will work.

When you press F5 it will try to run the program using the default run mode.

If you want to change the run mode, you should click on the running man icon

and change the Debugging combo box value (Step-in the debugger will step the

code from the beginning, Run in the debugger requires you to set up a

breakpoint with F9 first).

Even better, get a copy of Komodo (Personal or Professional) and get the

power of auto-completion, more integrated debugger, remote debugging, cross-

platform development (Linux, Solaris, Windows).

How to add modules into a Python installation without setting $PYTHONPATH?

Locate the site-packages

directory (run python, type "import sys", type "sys.path") and create modulename.pth. Inside this file put a relative

directory path (or absolute directory path) to the new module that you are

trying to add. Inside this relative directory path, you should have the

modulename.py that can be used via "import modulename" or "from modulename

import classname"

If the newmodule is within the same directory, you can just import it as is

without using silly pth file.

Change a file.py into an executable? C:\py\cx_freeze-3.0.beta2-win32-py23

\cx_Freeze-3.0.beta2\FreezePython.exe --install-dir hello.py

There are six sequence types: strings, Unicode strings, lists, tuples,

buffers, and xrange objects.

There is currently only one standard mapping type, the dictionary.

tuple = immutable sequence

dictionary == hash

Accessing an element in a dictionary uses a similar contruct as accessing a

sequence. Python gets confused. Initialize the variable like so:

x = {} or x = []

or more verbosely

x = dict() or x = list()

x = { 1:"one", 2:"two" } # dictionary

x = (1, 2) # tuple

x = [1, 2] # list

There is a difference between x[] and x[:] assuming x is a sequence. The

first one creates another reference to x. The second one copies elements

of x and seems to perform a deepcopy (?).

Instead of using filter, use list comprehension.

def evennumber(x):

if x % 2 == 0: return True

else: return False

>>> filter(evennumber, [4, 5, 6])

[4, 6]

>>> [x for x in [4, 5, 6] if evennumber(x)]

[4, 6]

Instead of using map, use list comprehension.

>>> map(evenumber, [4, 5, 6])

[True, False, True]

>>> [evennumber(x) for x in [4, 5, 6]]

[True, False, True]

Remember starting in Python 2.2, built-in function open() is an alias to file

().

I am confused about setattr/getattr vs. property.

Built-in functions str vs. repr, both return string representation of an

object but str doesn't try to return a string that is acceptable to eval.

Check out built-in function range that createst a list of an arithmetic

progressions. Eg. range(0, 10) -> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] or range(1,

5, 2) -> [1, 3] or range(-1, -8, -3) -> [-1, -4, -7].

Built-in function reduce is interesting. Definition: Apply function of two arguments cumulatively to the items of sequence, from left to right, so as to reduce the sequence to a single value. For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). Another example. reduce(func, [1, 2, 3, 4, 5]) is the same as:

x = func(1, 2)

x = func(x, 3)

x = func(x, 4)

x = func(x, 5)

return x

AND

Yet another example of reduce:

reduce(operator.add, range(n), m) # really is the same as the command: sum(range(n), m)

AND

It seems that reduce is great for associative operations but gets confusing if used for other purposes.

What is the difference of built-in functions eval and exec/execfile?

Built-in functions enumerate vs. iter? Similar but iter returns only the

value whereas enumerate returns the index position as well. From its

returned object, use its next() to go through each item.

list = ['1', '2', '3']

e = enumerate(list)

>>> e.next()

(0, '1')

>>> e.next()

(1, '2')

>>> e.next()

(2, '3')

>>> e.next()

Traceback (most recent call last):

File "<interactive input>", line 1, in ?

StopIteration

---------

>>> i = iter(list)

>>> i.next()

'1'

>>> i.next()

'2'

>>> i.next()

'3'

>>> i.next()

Traceback (most recent call last):

File "<interactive input>", line 1, in ?

StopIteration

How can an optional argument be the first and third position for built-in

function slice?

list[start:stop:step] # possible to specify "step"

Seems that module operator contains code used by Python keywords like +, -,

and so on.

If you import a module that imports another module, the scope prevents you

from using the second module unless, of course, you import it yourself.

() is a tuple, an immutable (read-only) sequence

[] is a sequence

{} is a dictionary

Use built-in function type to query an object's type. Use it to do a

compare, and you should use module import to get the list of possible types.

What is built-in function slice, staticmethod?

What good is zip?

zip([[1, 2, 3], ['a', 'b', 'c']])

[([1, 2, 3],), (['a', 'b', 'c'],)]

Boolean operations "or" and "and" always return one of their operands.

"import foo" doesn't require a module object named foo to exist. Why?

A special member of every module is __dict__.

Functions have another special attribute f.__dict__ (a.k.a. f.func_dict)

which contains the namespace used to support function attributes.

Variables with double leading underscore are "mangled" to provide a simple

but effective way to define class private variables. Any identifier of the

form __spam (at least two leading underscores, at most one trailing

underscore) is textually replaced with _classname__spam, where classname is

the current class name with any leading underscores stripped.

What is the difference between function and method?

Special attributes of several types of object: __dict__, __class__,

__bases__, and __name__.

When dealing with exception, remember that Python will traverse through the

list of handled exceptions and pick the first one that matches. Check

http://docs.python.org/lib/module-exceptions.html for a class hierarchy of

exceptions.

Wanting to write an extension module (usually a module written in C which

Python can call)? You can use Pyrex or SWIG.

Missing your switch/case statement? Consider using a dictionary:

functions = { 'a':function_1, 'b':function_2, 'c':function_3 }

func = functions[value]

func()

Or to call a function by name (a la Java's reflection),

method = getattr(self, 'theMethod_' + str(value))

method()

Want to speed up Python? See, for example, Psyco, Pyrex, PyInline, Py2Cmod,

and Weave.

Default values are created exactly once, when the function is defined. Thus,

it is usually a bad practice to use a mutable object as a default value since

subsequent call will remember the previous value. Use immutable. This

feature can be useful though if you want to cache and to avoid using global

variables. Eg.

def foo(D=[]): # Danger: shared reference to one dict for all calls

D.append("duh")

return D

foo() # you get "duh"

foo() # here you get "duh duh"

Never use relative package imports. If you're writing code that's in the

package.sub.m1 module and want to import package.sub.m2, do not just write

import m2, even though it's legal. Write from package.sub import m2 instead.

Relative imports can lead to a module being initialized twice, leading to

confusing bugs.

If only instances of a specific class use a module, then it is reasonable to

import the module in the class's __init__ method and then assign the module

to an instance variable so that the module is always available (via that

instance variable) during the life of the object.

What the heck is * and **? Optional arguments and keyword parameters? Used

to pass from one function to another?

Many ways to call a function by-reference, the best is to actually have the

function return a tuple. A trick is to by convention designate the first item in the

tuple to be the return code. If the first item says it is okay, then we can use

the rest of the tuple; otherwise don't. This is helpful if you want to return

(2,) to indicate 2 as an error code -> lets caller know not to check the rest

of the tuple.

In a class, several special methods (or are they called functions?):

__init__: the constructor

__call__: called the class' instance is called as in instance().

Want to copy an object? For sequences use [:], for dictionaries use copy(),

for others use copy.copy() or copy.deepcopy().

3.2 Why does -22 / 10 return -3?

It seems that all integer operation will round down if it has to round something.

String to number? Use int(). Watch out if you are trying to convert a

string representation of a hexadecimal value (eg. "0x0f").

Number to string? For decimal, use str(). For hexadecimal, use hex(). For

octal, use oct(). You can also use the % operator for formatting, eg. "%

06d" % 144 # '000144'

Interestingly, in order to insert some characters into a string, you need to

first convert the string into a list (a = list("hello there"), use a list

operation to insert the characters (a[6:8] = "you"), and finally converting

it back into a string (''.join(a)).

Three techniques to call functions by names: dictionary, getattr(), and

locals()/eval().

Perl's chomp equivalence? Use s.rstrip() or s.splitlines()[0].

Reverse a sequence? For a list, just call l.reverse() and assign to another

variable. For non-list, use either:

a) for i in range(len(sequence)-1, -1, -1): print sequence[i]

b) for i in sequence[::-1]: print i # works only on Python 2.3 and above

Replicating a list with * doesn't create copies, it only creates references

to the existing objects.

Sort a list based on values in another list? First merge them using zip,

then sort the zipped tuple, next extract using list comprehension:

>>> list1 = ["what", "I'm", "sorting", "by"]

>>> list2 = ["something", "else", "to", "sort"]

>>> pairs = zip(list1, list2)

>>> pairs

[('what', 'something'), ("I'm", 'else'), ('sorting', 'to'), ('by', 'sort')]

>>> pairs.sort()

>>> result = [ x[1] for x in pairs ]

>>> result

['else', 'sort', 'to', 'something']

The del statement does not necessarily call __del__ -- it simply decrements

the object's reference count, and if this reaches zero __del__ is called.

Despite the cycle collector, it's still a good idea to define an explicit

close() method on objects to be called whenever you're done with them. The

close() method can then remove attributes that refer to subobjecs. Don't call

__del__ directly -- __del__ should call close() and close() should make sure

that it can be called more than once for the same object.

What is weakref? If the only reference to an object is a weakref, then

garbage collector is free to destroy it. Primarily used to implement caches

or mappings holding large objects.

Threading? Use threading module, not the low-level thread module. Python's

threading support doesn't seem to be good.

Truncate a file? f = open(filename, "r+"), and use f.truncate(offset);

Or also, os.ftruncate(fd, offset) for fd opened by os.open().

Copy file? Use module shutil.

Read/write binary data? Use struct module. Then you can use its members

pack and unpack. Example:

import struct

f = open(filename, "rb") # Open in binary mode for portability

s = f.read(8)

x, y, z = struct.unpack(">hhl", s)

The '>' in the format string forces big-endian data; the letter 'h' reads one

"short integer" (2 bytes), and 'l' reads one "long integer" (4 bytes) from

the string.

For homogenous list of ints or floats, you can use the array module.

Bi-directional pipe trying to avoid deadlocks? Use temporary file (not

really an elegent solution) or use excectpy or pexcept.

Random number generator? Use module random.

Want to redirect stdout/stderr to a file?

sys.stdout = file(logFile, "w")

sys.stderr = sys.stdout

pathlist = os.environ['PATH'].split(os.pathsep)

except can take a list of exception classes

DOTALL in re makes . matches \n as well

import types, types can be used to tell what type of object reference you

have

There is no ternary operator (aka. conditional operator) aka ( a ? b : c).

There is a workaround but it makes the code even more confusing to read.

Sorry.

Locking file can be achieved using fcntl. On some system, like HP-UX,

the file needs to be opened with w+ or r+ (or something with +,

check documentation). Example:

import fcntl

...

file = open(filename, "w+")

fcntl.lockf(file.fileno(), fcntl.LOCK_EX)

...

file.close()

Silly way to retrieve current time in string:

import time, shutil

now = time.localtime()

timestr = ".%s%s%s.%s%s%s" % (now.tm_year, now.tm_mon, now.tm_mday, now.tm_hour, now.tm_min, now.tm_sec)

Want to backup while preserving permission (like cp -p)?

shutil.copy2(src, dst)

Want to compare type or want to know what type of an object you have?

Keyword type will do it. Use as follow:

if type(foo) == type([]): print "it is a list"

# You can also compare it against types.Types.

Want to modify the sys.path at runtime without modifying source code?

Set environment PYTHONPATH and its content will be prefixed in front

of the default values of sys.path.

An example on regular expression, rex, re, regexp:

rex = re.compile("File (.*) is removed; (.*) not included in release tag (.*)")

sreMatch = rex.search(line)

if sreMatch: # or perhaps can be written as if sreMatch != None

print "%s, %s, %s" % (sreMatch.group(1), sreMatch.group(2), sreMatch.group(3))

Best way to remove duplicates of items in a list:

foo = [1, 1, 2, 2, 3, 4]

set = {}

map(set.__setitem__, foo, [])

foo = set.keys()

Declare and initialize a dictionary?

foo = {

"CCU":("weight","SBP","DBP"),

"OE":("weight","HR","Temp","blood glucose")

}

input = sys.stdin.readline()

try:

id = int(input)

except ValueError:

print "Can't convert input to an integer."

id = -1

input = sys.stdin.readline()

if input[0] in ("y", "Y"):

print "it is a yes"

# check if python version is at least 2.3

if sys.version < '2.3': print "gotta run at least 2.3."

# check if euid is root or 0

if os.getuid() != 0: print "gotta run as root."

#environment variable CCSYSDIR

os.environ['CCSYSDIR']

# get password line entryfor username from /etc/passwd

import pwd

userattr = pwd.getpwdnam("username")

# userattr[2] == user

# userattr[3] == group

# set permission of a file so that it is rwx for ?

os.chmod(file, stat.S_IREAD|stat.S_IWRITE|stat.S_IEXEC)

# take one keystroke and return from function?

raw_input("press enter to continue..."

# quickly open and write

file.open("thefile.txt", "w")

file.write("junk goes into this file\nsecond line of junk file\nand third and final line\n")

file.close()

LIBRARY MODULES

# get reference count of an object, usually one higher than expected

sys.getrefcount(obj)

import os

from os.path import join, getsize

for root, dirs, files in os.walk('python/Lib/email'):

print root, "consumes",

print sum([getsize(join(root, name)) for name in files]),

print "bytes in", len(files), "non-directory files"

if 'CVS' in dirs:

dirs.remove('CVS') # don't visit CVS directories

import os

from os.path import join

# Delete everything reachable from the directory named in 'top'.

# CAUTION: This is dangerous! For example, if top == '/', it

# could delete all your disk files.

for root, dirs, files in os.walk(top, topdown=False):

for name in files:

os.remove(join(root, name))

for name in dirs:

os.rmdir(join(root, name))

Instead of using os.listdir(), use dircache.

lst = dircache.listdir(path) # get a list of all directories/files under path

dircache.annotate(path, lst) # extra, add trailing / for directory name

Single file comparison? filecmp.cmp(fname1, fname2 [,shallow=1])

filecmp.cmp returns 0 for match, 1 for no match.

Multiple files comparison in two directories? filecmp.cmpfiles(dirname1, dirname2, fnamelist [,shallow=1]))

filecmp.cmpfiles returns a tuple of three lists: (matches,mismatches,errors)

Single directory comparison? file.dircmp(dirname1, dirname2 [,ignore=... [,hide=...])

ignore defaults to '["RCS","CVS","tags"]' and hide defaults to '[os.curdir,os.pardir]' (i.e., '[".",".."]').

fileinput, something like cat?

glob: list pathnames matching pattern

pathnames = glob.glob('/Users/quilty/Book/chap[3-4].txt')

linecache: efficient random access to a file

linecache.getline('/etc/hosts', 15)

linecache.checkcache # has file been modified since last cached

os.path: path name manipulation

os.path.commonprefix

os.path.expanduser

os.path.expandvars

os.path.join

os.path.normpath: remove redundant path information

os.path.splitdrive: useful on Windows

os.path.walk ~= os.walk?

file.readline() # slow but memory-friendly

file.readlines() # fast but memory-hungry

xreadlines is better but is deprecated. Use idiom 'for line in file.open("bla"):' instead.

The [commands] module exists primarily as a convenience wrapper for calls to `os.popen*()`.

commands.getoutput can be implemented as such:

def getoutput(cmd):

import os

return os.popen('{ '+cmd+'; } 2>&1').read()

# check out the usage of '+cmd+' for whatever that means...

Use dbm module to create a 'dictionary-on-disk'. This allows you use to store to disk pairs of key/value where both key and value are strings. You work with the dbm as though it is a in-memory dictionary.

If you need to store a key/value pair where the value is not just a string, use shelve module. Of course, it still can't store objects that are not pickle-able like file objects.

If shelve is not powerful enough for your need, try ZODB.

Prefer cPickle over picke module. Example usage:

import cPickle

from somewhere import my_complex_object

s = cPickle.dumps(my_complex_object)

new_obj = cPickle.loads(s)

Module name collision? Use the keyword as.

import repr as _repr

from repr import repr as newrepr

datetime module giving you a headache since you don't know how to tell it to set the dst? Pass -1 as the last argument when creating the object datetime.datetime and Python will figure out the dst for you.