w w w . a q u a m e n t u s . c o m
  What is Python?
    Why Python3?
    Hello, world
    Basics
      Commenting
      Indenting
      Variables
      Calling functions
      "pass"
      "None"
      "is" vs. "=="
  Data types
    boolean
      Logic operations
    integer
      math
      bit manipulation
    floating-point
      fixed-point
    string
      comparison
      concatenation [global "+" operator]
      replication [global "*" operator]
      length [global "len" function]
      subset check [global "in" operator]
      substring [global "[]" operator]
      formatting with format [method]
      formatting with rjust/ljust/center/zfill [member functions]
      formatting with pprint
      count [method]
      startswith [method]
      endswith [method]
      find/index [method]
      isdigit [method]
      join [method]
      strip [method]
      lstrip [method]
      replace [method]
      partition [method]
      split [method]
      regex (regular expressions)
    set
      add [method]
      remove/discard [methods]
      "in" [global operator]
      "len" [global function]
      issubset ("<=", "<") [method]
      issuperset (">=", ">") [method]
      union ("|", "|=") [method]
      intersection ("&", "&=") [method]
      difference ("-") [method]
      symmetric_difference ("^") [method]
      comprehensions
    list/tuple
      slicing
      adding elements (append/extend/+)
      inserting elements (insert)
      removing elements (del, remove, pop)
      size (global 'len' function)
      searching (index)
      counting (count)
      sorting (sort method)
      sorting (sorted global function)
      reverse
      comprehensions
      tuple packing/unpacking
    dict
      size (global "len" function)
      query (global "in" operator)
      retrieval (global "[]" operator)
      removal (global "del" function)
      items [method]
      keys [method]
      values [method]
      pop [method]
      popitem [method]
      update [method]
      comprehensions
      the defaultdict class
    collections.deque
    array.array
  Control statements
    condition checking
    if
      ye olde ternary operator
      assert
    while
    for
      range
  Exceptions
    catching
    raising your own
    common exceptions to know about
  I/O
    Input from stdin
    Output to stdout
    Output to stderr
    Input from files
    Output to files
    os.listdir(..)
    repr vs. str
  Functions
    Return value(s)
    Pass-by-value-of-reference
    Default parameter values
      DANGER: mutable default values
    Named arguments
    Variable arguments
      variadic-style
      hash-style
      using all of the above
      unpacking variable args
    lambda functions
  Modules
    module search path
    "import" vs "from .. import"
    renaming identifiers
  Packages
  Classes
    instantiating
    inheriting
    constructor
    destructor
    "string operator"
    __getitem__
    __setitem__
    __getattr__ (autoload)
    operator overloading
    RTTI
    iterators
    generators
    static methods
    static variables
  Scoping
    function
    enclosing function
    module
    built-in
  OS and system functions
    sys.argv
      argparse
    os.getcwd() / os.chdir()
    os.environ[]
    sys.executable
    os.system()
    shutil.copyfile() / shutil.move()
    os.mkdir / os.makedirs
    glob
    date and time
    os.path
  Processes
    subprocess.call (a.k.a. "system")
    subprocess.check_output (reading stdout)
    subprocess.Popen (driving stdin)
    os.getpid()
    socket.gethostname()
  Introspection
    type
    dir
    callable
    isinstance
    issubclass
    getattr
  docstrings
  Serialization with pickle
  Misc
    zip
    global variables
      __name__
    networking
    threading
    logging
    profiling

What is Python?

Python is an interpreted programming language whose design priority was to bridge the gap between C (which requires consideration of an immense number of very low-level details) and shell scripting (which is missing so many features it's not usable on even medium-sized projects). Some have called it a "a cleaned-up improvement of perl", which is true, but that wasn't its design goal. Python has become wildly successful, especially at Google (due in no small part to its creator (Guido) having worked at Google, I'm sure). It is far from being a perfect language, but it is much better than C, shell, and perl for creating and maintaining production-quality code.

This page is meant to be two things:

  1. a tutorial for people learning python3
  2. a reference
Which is why it's called a "tutref".

The official python documentation (here) is a little weird, in a few ways. First, due to the schism between python2 and python3, googling for python help usually gets you to a python2 page, so you have to use the pulldown menu to find the Python3 version of the documention. (Which, actually, is pretty sophisticated, given that some things have moved around.) Second, and worse, their "search the documentation" feature lists sections out in alphabetic search order very slowly, so it goes from section "1" to section "10" (then "11", "12", etc) instead of going to section "2". If you need something from section 6, good news: you have time for a Starbucks run!

Why Python3?

This page is focused (exclusively) on Python3, because most Python tutorials and books are for Python2.

There are so many changes between Python2 and Python3 that Congress is considering declaring the official release notes cruel and unusual punishment. However, if you need an executive summary: Python3 is an even more cleaned-up version of Python2. And if you need a shorter executive summary: 3 > 2.

Hello, world

What is any programming-language tutorial without a "hello, world" example to kick it off? But it can't just be "hello, world", it has to be "hello, world" from Marvin, the depressed robot in The Hitchhiker's Guide to the Galaxy:

print("Hello, cruel world.")
Things to notice right away:

Basics

Commenting

Python's comment marker is a single "#", which makes the rest of the line a comment. (Just like perl, and kind of just like tcl.)

Python does not really have a block-comment mechanism, but like other languages without a real block-comment mechanism there's always some other element of the language you can misuse to get it. In Python's case, you can abuse "docstrings", which are like normal strings except they can wrap around lines without any special handling. The most common docstring to use for commenting is the three double-quotes (but you can also use three single-quotes if you prefer).

Predictably, docstring-based block comments do not nest. My best-practices tip: use three double-quotes for docstrings, and use three single-quotes for block commenting. Then at least they don't trample on each other.

Indenting

Python is rather unique in that all of the source code's leading whitespace is critical. Instead of identifying blocks of code with surrounding curly braces ("{}"), Python identifies them by their relative indentation levels.

This system has its advantages and disadvantages, but there's no way to subvert it so you better start getting used to it. :)

Variables

Python's variables have no sigils (such as perl's dollar ($), percent (%) and at (@)). They are not declared, but rather autovivify on assignment.

Variables do have an associated type; we'll get to that in a bit.

The scope of variables is limited to their local function. There are at least two unexpected surprises as a result:

  1. if you need to write to a global variable, you have to declare it (in the function) as global using the global keyword.
  2. even if you autovivify a variable inside a loop (or other sub-block), the variable still lives as long as the function.
Curiously, you can use del to unscope a variable!

You can chain assignment statements:

x = y = z = 0

You can also multi-assign:

a, b = 0, 1

You can probably also chain multi-assigns:

x1,x2 = y1,y2 = 0,MAX

Calling functions

Python keeps function-calls the same as other languages:

ret = function_name(arg1, arg2)

Note that the parentheses are required, because the semantics of not including them is asking for the function address instead of calling it.

One of the coolest things about python is that functions are allowed to return multiple values:

ret1, ret2, ret3 = function_name(arg1, arg2)

"pass"

Python has a special keyword called pass that lets you define stub functions, classes, loops, or anything else that needs to be indented. (A special keyword is only necessary because the first line of any kind of block structure must be indented.)

def my_function():
  pass
class my_class:
  pass
while 1:   # busy-wait
  pass

"None"

Python has a keyword specifically for undefined values: None. You can treat it as a regular constant (think of NULL in C++, or undef in perl). It is what is returned if a return has no argument.

a = None
def no_return():
  return
if no_return() == None:
  ..

"is" vs. "=="

Python makes a distinction between "do these two things look the same" (==) versus "do these two things live at the same spot in memory" (is). The difference is important when you start dealing with objects, and in particular when they overload the __eq__ function.

A quick example:

a = "abcd"
b = "cdef"

a[2:4] == b[0:2]   # is True because "cd" looks like "cd"
a[2:4] is b[0:2]   # is False because they're parts of different strings

Data types

Python's variables have data types -- they are either an integer, a floating-point number, a string, a boolean, or an object. (In contrast, perl lumps all of those together as a generic scalar, and tcl considers everything a string. C also has data types, but has way more than Python -- it refines "integer" as signed vs. unsigned, and with a specific bit-width.) Python does not have a character type; characters are just single-element strings.

(Aside: python has direct support for complex numbers. However, those things are useless to 99.99867% of the planet, so I'm ignoring them here. Though, as an electrical engineer I do have to give props to Python for using "j" instead of "i" in their notation.)

Curiously, variables are not declared as a specific type. Their type is determined when they are assigned (and yes they can be reassigned as a different type later).

boolean

The most basic of types, boolean values are either True or False. The following are all considered False:

Falsedirect boolean "false" value
0integer zero
0.0floating-point zero
NonePython's "undefined" value
""empty string
()empty tuple
[]empty list
{}empty dictionary or set

Everything else is considered True in boolean context.

Logic operations

Logical operations are different from other languages because Python spells out the operation name (instead of using "||" or "&&" or "!"):

x and y
x or y
not x

Note: in Python, these operators use short-circuit evaluation, which means that if the answer can be determined from just the left part, then the right part won't be evaluated at all. For example, for and to return True, both parts have to be True -- if the left part is false, we already know the whole expression will be False, so the right part doesn't matter. This can be used for several sneaky things; my favorite such sneaky thing is printing debugging messages only when the variable VERBOSE is set to True:

  VERBOSE and print("got to this point!")
More universally relevant, though, this is also good to leverage as an optimization, because you can leverage the short-circuiting to avoid unnecessary expensive operations:
  # order this so we don't do slow_operation() if fast_operation() is already false:
  if fast_operation() and slow_operation():
  ..
  # order this so we don't do slow_operation() if fast_operation() is already true:
  if fast_operation() or slow_operation():
  ..

integer

Python's built-in integer type is 64 bits wide (usually, on most platforms).

Tip: you can specify octal (base-8) numbers with a "0" prefix, and hexadecimal (base-16) with a "0x" prefix.

Integer comparisons are the same as other languages:

<less than
<=less than or equal to
>greater than
>=greater than or equal to
==is equal to
!=is not equal to

You can typecast strings into ints with the int function:

my_str = "12"
my_int = int(my_str) 

math

Math is done with the usual infix operators, so this is also the same as other languages. Python's math operators are:

x + yaddition
x - ysubtraction
x * ymultiplication
x / ydivision
x // yinteger (floored) division
x % ymodulo
x ** yexponent (same as pow)
-xnegation
+xno-op

Python's (useful) built-in math functions are:

abs         divmode
pow         round
Additionally, there is a math module that provides some more functions:
ceil        cos
floor       fmod
log         pow
sin         sqrt
tan         trunc

math.e and math.pi are there as well, as attributes.

Random numbers (and random list choices!) can be generated from the random module:

import random
random.choice(['a', 'b', 'c'])  #=> returns one of them at random!
random.random()                 #=> returns a random floating-point num between 0 and 1
random.randrange(10)            #=> returns a random int between 0 and 10

bit manipulation

Python's bitwise operations are the same as most languages:

~xnot
x & yand
x | yor
x ^ yxor
x << yshift-left
x >> yshift-right

floating-point

For all intents and purposes, these are treated the same as integer values, so the same operators and comparisons work.

(I reiterate the common warning not to use direct equality checks on floating- point values, since that has issues with binary floating-point representations.)

You can typecast a variable to floating-point with the float function.

fixed-point

Python has a module for fixed-point math called decimal. I'm not sure if it's faster, but if you're dealing with money it may well be necessary due to the evilness of floating-point math.

string

Python allows you to enclose strings in either single- or double-quotes. Unlike perl, the two are completely interchangable in Python -- escape sequences are treated the same in both. (If you want Python to not interpret escape sequences, prefix the string with "r" to get a raw string.)

For multi-line strings, you must escape the end-of-lines, and the end-of-line does not translate as a carriage return.

Python also has the triple-quoting mechanism, which is where you use either single-quotes (''') or double-quotes (""") three times in a row. It behaves differently only for multi-line strings: carriage returns in triple-quoted strings are preserved in the string and do not get escaped.

a = 'two-line\nstring'
b = "two-line\nstring"
c = r'one-line\nstring'
d = r"one-line\nstring"
e = 'another one-line\
string'
f = 'another two-line\n\
string'
g = """yet another
two-line string"""
h = '''and yet
another two-liner'''

You can "typecast" anything into a string with the str function. (It's not really typecasting so much as interpreting in a string context, which you need for printing.)

my_num = 10
my_str = str(my_num)

Python's string operations are either global functions or methods of the string class. I'm not entirely sure why they're not all methods, but fortunately there aren't too many that aren't. Unfortunately, they're the most common ones.

comparison

In Python, you compare strings for equality with == and !=. How nice! <ignore my glaring at perl>

concatenation [global "+" operator]

Python concatenates strings with +.

a = "s1" + "s2"   #=> "s1s2"
a += "s3"         #=> "s1s2s3"

Curiously, Python also allows you to implicitly concatenate adjacent string constants, which is a C thing:

a = "s1" "s2"   #=> "s1s2"

replication [global "*" operator]

You can replicate strings by "multiplying" them:

print("-" * 80)
print("-- SECTION 4")
...
print("-" * 80)

length [global "len" function]

this_length = len(this_string)

subset check [global "in" operator]

in returns whether or not one string exists in another.

if 'Py' in 'Python':
  ...
Note: as syntactic sugar, you have the option of negating this by putting the "not" next to the "in", so that it reads like English:
"b" not in "asdf"   # works
not "b" in "asdf"   # exact same

substring [global "[]" operator]

You can fetch substrings from of a string with the array brackets ("[]"). This is exactly the same syntax as we'll see later for extracing slices from lists.

You can fetch a single character with a single, zero-based index:

one_char = big_string[3]  # fourth character of big_string
last_char = big_string[-1]

Or you can fetch a range of characters with [n:m], which fetches characters n up to but not including m!

sub_string = big_string[2:7]  # third through seventh (not eighth!) characters

You can also leave off one or the other to implicitly say "beginning" or "end":

first_part = big_string[:10]
last_part = big_string[10:]

You can use any arbitrary expression for those slice indices; these examples all used constant numbers, but you can use variables and math all you like.

formatting with format [method]

format is a string function that lets you:

It's Python's 10-ton hammer to handle lots of things in a single place, so bear with me as we get increasingly more sophisticated with it.

The most basic thing to do is embed variables in strings. Here, you can think of "{}" as the "%s" from printf:

# the manual way:
name1 = "cats"
name2 = "dogs"
madlibs = name1 + " hate " + name2   #=> "cats hate dogs"
# using format:
madlibs = "{} hate {}".format(name1, name2)  #=> "cats hate dogs"

However, these are actually positional, so you can specify which of the arguments to use in which position:

madlibs = "{0} hate {1}".format(name1, name2)  #=> "cats hate dogs"
madlibs = "{1} hate {0}".format(name1, name2)  #=> "dogs hate cats"

More usefully, though, they can actually be named:

madlibs = "{animal1} hates {animal2}".format(animal1="cats", animal2="dogs")

And since they can be named, we can also shove them into a hash, giving us the final and most sophisticated way to embed variables in strings:

h = {'animal1':'cats', 'animal2':'dogs'}
madlibs = "{animal1} hates {animal2}".format(**h))
(No, we haven't covered hashes or the "**" thing yet. Hashes have their own section coming up, and the "**" thing is described in the "Functions" section.)

So now that we've embedded variables in strings, here's how to format them: after that (optional) field name/number, you can add ":" and a printf-like modifier. A bare number specifies a left-justified field width:

madlibs = "{:6} hate {:6}".format("cats", "dogs")  #=> "cats   hate dogs  "
but I think any printf formatting works. A particularly useful ability is rounding:
print("12/7 is {:.3f}".format(12/7))  #=> "12/7 is 1.714"

History lesson: there used to be a global "%" operator that worked kind of like format and printf/sprintf. It was pretty common, so you'll see it all over the place in existing code. But, it's deprecated, so try to avoid using it.

formatting with rjust/ljust/center/zfill [member functions]

While the full power of the printf functions will let you do right/left/center justification with either spaces or zeros, Python strings also have a few methods to do that directly without having to remember printf syntax.

rjust takes a field width and returns the original string with prefixed spaces to fill the width.

ljust add spaces to the end of a string to fill the requested width

center adds spaces to both ends.

zfill is meants for numeric values. It does the same as rjust except it prefixes zeros instead of spaces. It will preserve an initial negative sign if necessary.

str = "foo"
print(str.rjust(5))  #=> "  foo"
print(str.ljust(5))  #=> "foo  "
print(str.center(5)) #=> " foo "
print(str.zfill(5))  #=> "00foo"

formatting with pprint

Python's equivalent to perl's Dumper module is called pprint, and the ridiculous amount of overhead needed to use it is as follows:

import pprint
pprintobj = pprint.PrettyPrinter()
pprintobj.pprint(my_thing)
(That wasn't really sarcasm - I think creating an object here is too much. It could've been just "pprint(mything)". Even "print(pprint(my_thing))" is better. Oh well.)

count [method]

Counts the number of times a given substring appears. Note that if you just want to know if a substring exists, use in instead!

str = "this is a test"
if str.count("is"): ...   # works, but inefficient
if "is" in str: ...       # much better

startswith [method]

Returns True/False if the string starts with the given string.

b = "moo".startswith("mo")  # yep

endswith [method]

Returns True/False if the string ends with the given thing.

str = "moo"
str.endswith("oo")  # yep

find/index [method]

Returns the first place where one string exists in another. If the string is not found, then find will return -1, and index will throw an exception.

idx = big_string.find('moo')
Note: if you just want to know if a substring exists (not where it is), it is much more efficient to use the aforementioned in instead of find.

isdigit [method]

Returns True if the string consists entirely of legal numeric characters.

There are many other is* functions.

join [method]

Joins the given thingamajigs together with the current string. The thingamajigs come from a list or tuple (which we haven't gotten to yet):

full_string = " ".join(my_list)

strip [method]

Removes certain characters from the beginning and ending of the string. By default it's whitespace, but you can make it whatever you need.

full_string = "   eek!   ".strip()   # just "eek!"

lstrip [method]

Just like strip (above) except that it only strips the left (beginning) of the string.

full_string = "   eek!   ".lstrip()   # just "eek!   "

replace [method]

Replaces all occurrences of one sub-string with another.

new_string = old_string.replace("old", "new")

partition [method]

Splits a string on the first occurrence of a sub-string. It returns three values: the part before, the thing it found, and the part after.

splits = "this is a test".partition(" is") # returns ["this", " is", " a test"]

split [method]

Splits a string on every occurrence of a given sub-string.

my_list = " 1 2  3 ".split(" ")  # returns ["", "1", "2", "", "3", ""]

If you don't specify a sub-string, then it's subtly different -- it splits on all sequences of whitespace, and auto-trims the ends:

my_list = " 1  2  3 ".split()     # returns ["1", "2", "3"]

If you want to split on a regex instead of a string, you need to use the split function in the re module:

import re
fields = re.split("\s*,\s*", "this,is , a ,CSV, line")

regex (regular expressions)

(Full docs on Python's regexs are here.)

The more I work with regexs in various languages, the more I start to appreciate the brevity of perl. And since perl is the gold standard for regexes, here are the pythonic ways to do various things you may want to do:

basic match

perl:   if ($str =~ /f.o/)
python: if re.search("f.o", str):

capture of one element

perl:   if ($str =~ /f(.o)/) { my $match = $1; ...
python:
  res = re.search("f(.o)", str)
  if res:
    match = res.group(1) ...

capture of two elements

perl:   if ($str =~ /(ab)?cd(e.*)?f/) { my ($m1, $m2) = ($1, $2); ...
python:
  res = re.search("(ab)?cd(e.*)?f", str)
  if res:
    m1 = res.group(1)
    m2 = res.group(2)
    ...

non-capture of parentheses

perl:   if ($str =~ /f(?:.o)*/)
python: if re.search("f(?:.o)", str):

case-insensitive match

perl:   if ($str =~ /f.o/i)
python: if re.search("f.o", str, re.I)

return captured strings, instead of success flag

perl:   my @matches = ($str =~ /f.o/g);
python: matches = re.findall("f.o", str)

substitution

perl:   $str =~ s/f.o/$newtext/;
python: str = re.sub("f.o", newtext, str)

substitution with back references

perl:   $str =~ s/f(.o)/m\1/;
python: str = re.sub("f(.o)", r"m\1", str)

set

Sets are implemented as hash tables, which means they are really good at two things:

  1. quickly telling you if something's in it
  2. uniquifying a list
It also means that there's no concept of ordering between the elements, so you won't be able to extract elements in the order you put them in.

Sets are mostly primitives, so there is syntax specifically for creating them. However, this syntax (curly braces) overlaps with the syntax for creating hashes, so the one ambiguous case (an empty set) has different syntax:

empty_set = set()
nonempty_set = {'a', 'b', 'c'}

Sets support an awesome number of functions, many of which exist in both method and operator form.

Worth mentioning is that there is also a frozenset class, which is the immutable version of set.

add [method]

Adds something to a set.

myset.add("asdf")

If you have the list ahead of time, you can add them all when you create the set:

myset = set(some_list)

remove/discard [methods]

Removes something from a set. If the thing is not found, remove will throw an exception, and discard will just keep calm and carry on.

myset.remove("asdf")
myset.discard("fdsa")

"in" [global operator]

Checks if something's in the set.

if "asdf" in myset: ...

"len" [global function]

Returns the number of elements in the set.

len(myset)

issubset ("<=", "<") [method]

Returns whether one set is a subset of another. Predictably, the "<" form checks whether it's a proper subset.

if small_set <= big_set: ...

issuperset (">=", ">") [method]

Opposite of issubset.

union ("|", "|=") [method]

Returns the union (addition) of two sets.

big_set = small_set1 | small_set2

intersection ("&", "&=") [method]

Returns the intersection (overlap) of two sets.

small_set = big_set1 & big_set2

difference ("-") [method]

Returns one set without any of the elements of a second set.

good_tv = fox_programs - fox_news

symmetric_difference ("^") [method]

Returns the elements that are in one but not both of the sets. ("xor".)

my_xor = set1 ^ set2

comprehensions

Holy crap, comprehensions apply to sets, too! (OK we haven't gotten to comprehensions yet, but they're next, in the "list" section.)

s1 = {'a', 'b', 'c', 'd', 'e', 'f'}
s2 = {c for c in s1 if c not in 'powerade'}   #=> s2 is {'f', 'b', 'c'}..order is not preserved!

list/tuple

Lists and tuples are conceptually the same thing (arrays), but they have different syntax and semantics. (Don't read that sentence too carefully..)

mutable?notation?
listybrackets "[]"
tuplenparentheses "()"

Lists have direct syntax support in Python, so you create one directly like so:

my_list = ["a", "b", "c"]

(Note that the elements of a list do not all have to be the same type.)

Lists work the same as most other languages: you access elements by their zero-based numeric index, using square brackets. Python also lets you use negative indexes to step back from the end of the list:

a = my_list[0]    #=> "a"
b = my_list[-1]   #=> "c"

Lists may also nest. Yo dawg, I heard you like lists, so I put a list in your list:

a = [1, 2, 3]
b = [4, 5, 6]
c = [a, b, "foo"]   #=> "[[1, 2, 3], [4, 5, 6], "foo"]"

slicing

You can use slice notation with lists, and it works much the same as for strings. Note, however, that you can write to lists slices (unlike string slices.)

sub_list = my_list[4:7]  #=> [my_list[4], my_list[5], my_list[6]]
my_list[1:3] = "moo"     #=> replaces two items with the single "moo"

Also note that the way to copy a list is to use a slice on the whole thing:

copied_list = orig_list[:]

adding elements (append/extend/+)

append and extend both add elements to the end of the list. The difference is when the element you're adding is itself a list: append will add the list as a single object, whereas extend (and +) will add all the elements of the list as multiple objects.

my_list1 = ["a", "b"]
my_list2 = ["c", "d"]
my_list1.append(my_list2)   #=> ["a", "b", ["c", "d"]]
my_list1.extend(my_list2)   #=> ["a", "b", ["c", "d"], "c", "d"]

The only difference between + and extend is that + returns a new list instead of modifying an existing one.

inserting elements (insert)

insert inserts the given element at the given location, pushing the existing elements out of the way. It returns None, so you cannot chain it.

l = ['a', 'b', 'c']
l.insert(1, 'd')    #=> ['a', 'd', 'b', 'c']

removing elements (del, remove, pop)

del is a global function that takes a list element (or slice!) as a parameter, removes those elements from the list, and returns nothing:

del my_list[2]
del my_list[4:6]

remove is a member function that takes a list element value. It will not return anything. Note that it will remove the first element of the list that has the given value:

my_list.remove('asdf')

pop will remove and return an element of the list. By default it operates on the last element, but you can give it an index to specify a different one:

l = ['a', 'b', 'c', 'd']
val = my_list.pop()   #=> val is 'd', my_list is ['a', 'b', 'c']
val = my_list.pop(1)  #=> val is 'b', my_list is ['a', 'c']

size (global 'len' function)

list_len = len(my_list)

searching (index)

Once again, using the in operator is more efficient if you just want to know if something's in a list, but index will tell you where exactly it is:

idx = my_list.index('asdf')

index will throw an exception if the element is not found; in merely return False.

counting (count)

count returns the number of times something occurs in a list. You cannot use regexs or anything sophisticated here.

num_overachievers = test_scores.count(100)

sorting (sort method)

sort sorts the elements of the list. This operates on the list itself, rather than returning a new list. However, it returns None, so don't go trying to chain things together:

my_list.sort()

sorting (sorted global function)

sorted sorts the elements of a list, but returns a new list instead of modifying the list itself.

sorted_list = sorted(list)

sorted lets you specify a sort function (as a lambda), so you can specify custom sort criteria. By default, sorted sorts by increasing value; here's how to make it sort by decreasing value:

sorted_list = sorted(list, key=lambda n:-n)

reverse

Reverses the elements of a list, in place. It also returns None.

my_list.reverse()

comprehensions

You will undoubtedly encounter list comprehensions as you explore the world, for they are considered one of the most pythonic things one could possibly do. I think it's equivalent to genuflecting or visiting Mecca.

List comprehensions are very similar to perl's map function (in fact, Python has a map function as well). The idea is to succinctly convert one list into a second list using some translation that you specify. Let's say we have a list of numbers and we also want the list of them scaled by 0.2. The "long" way might be:

scaled_list = []
for x in unscaled_list:
  scaled_list.append(x*0.2)

List comprehensions are syntactic sugar to convert that to the following:

scaled_list = [x*0.2 for x in unscaled_list]

Notice that we basically just moved the "for x in unscaled_list" and "x*0.2" around so that they're inside the brackets, and in the other order.

So that's the Python equivalent of perl's map, but what about grep, which returns only elements matching certain criteria? List comprehensions can do that, too. Suppose we're filtering results from match.com and we want only girls between 16 and 19 and a half. (Stop looking at me like that. That's a Monty Python reference.) The "long" way:

hawties = []
for g in girls:
  if g.age >= 16 and g.age <= 19.5:
    hawties.append(g)

List comprehensions make this much more succinct:

hawties = [g for g in girls if g.age >= 16 and g.age <= 19.5]

You can also nest list comprehensions, but before you do that you should consult a licensed psychiatrist.

tuple packing/unpacking

Tuples and the read-only version of lists. (Python has mutable and immutable versions of everything..)

Tuples are used for packing and unpacking multiple values into a single variable. You can pack like so:

t = ()         #=> empty tuple
t = 'a',       #=> one-element tuple
t = 'a', 'b'   #=> two-element tuple
t = ('a', 'b') #=> exact same as above

You can then unpack a tuple by assigning it the other direction:

e1, e2, e3 = t    #=> must be 3 elements in 't' for this to work
(e1, e2, e3) = t  #=> exact same as above

dict

Python's dictionaries are what everyone else calls hash tables. (Or associative arrays.) They map keys to values, where the keys can be arbitrary strings. (Actually, then can be any immutable type, which is one of the reasons why Python has mutable and immutable versions of everything.)

Dictionaries are a primitive, so there is direct syntax support for creating them. Here's the most common way to create them:

english_to_german = {'one':'eins', 'two':'zwei', 'three':'drei'}
There are at least four other ways to create dict objects, but I'm not going to show them to you because this one is by far the best one to use.

size (global "len" function)

Our friend the "len" function is back, this time to tell us the number of elements in the dictionary.

num_keys = len(my_dict)

query (global "in" operator)

To see if something is in the dictionary, use in, just like for strings and lists.

exists = some_key in my_dict

retrieval (global "[]" operator)

To get an item from a dictionary, we use the brackets like we're looking up an array, except the index is a string (or other immutable data type) instead of a number!

my_val = my_dict[some_key]

If the key does not exist, python will actually throw a fit. To avoid that, you could either use in to see if you should try the retrieve at all, or you could specify a default value to return if the key's missing. Further, you can also have python add that default key if you want. To just return a default, use get, whose default default is None:

my_val = my_dict.get(some_key1)  # my_val will be None if some_key1 doesn't exist,
                                 # and my_dict will still not have some_key1
If you want python to set the missing key's value to the default at the same time, use setdefault:
my_val = my_dict.setdefault(some_key2, "moo")  # my_val would be "moo" instead
                                 # of None, and my_dict now has some_key2 for sure.

removal (global "del" function)

To remove an item from a dictionary, pretend you're fetching it with brackets but then put del in front of it.

del my_dict[some_key]

items [method]

Returns all the (key, value) pairs in the dict. This is the preferred way to walk a dictionary:

for this_key, this_value in my_dict.items():
  ..

keys [method]

Returns the list of all the keys in the dictionary.

for this_key in my_dict.keys():
  ..

Note that this is (more or less) the function called when you try to iterate on a dictionary, so the above example is equivalent to:

for this_key in my_dict:
  ..

A frequent need is to walk a dictionary in sorted order. For this, use the global sorted function. Also, remember you can use its key parameter to determine the sort order. Here's how to sort a dictionary by value (instead of by key):

filename2length = { ... }
print("Files by size:")
for this_file in sorted(filename2length, key=lambda n:-filename2length[n]):
  print("{} {}".format(filename2length[this_file], this_file)

values [method]

Returns the list of all the values in the dictionary.

for this_value in my_dict.values():
  ..

pop [method]

Removes and returns a specific element of the dictionary.

old_value = my_hash.pop(some_key)

popitem [method]

Removes and returns an arbitrary (key, value) pair from the dictionary. This might be useful if you need to destructively iterate. (Remember that it's dangerous to iterate over a list that's changing, so putting a del inside a loop going over keys is perilous.)

while my_hash:
  (key, value) = my_hash.popitem()
  ..

update [method]

Adds the key/value pairs from another dictionary to this one, overwriting any common items.

my_hash = {'a':0, 'b':1}
other_hash = {'a':2, 'c':3}
my_hash.update(other_hash)   #=> my_hash is now {'a':2, 'b':1, 'c':3}

comprehensions

OK, my mind is blown; there are comprehensions for dictionaries too. Use a colon (":") to separate the keys and values:

my_list = [0, 1, 2, 3]
num_to_square = {x:x*x for x in my_list}  #=> "{0:0, 1:1, 2:4, 3:9}", in some order

the defaultdict class

While the dict class has a setdefault function to let you specify what to do when the dict is asked for a key that doesn't exist, it only applies to the first level of hierarchy. If you want to truly emulate perl, you need to use the defaultdict class so that it will fill in everything for you.

d = defaultdict(int)  # unknown keys are auto-created with int()
for this_num in numbers:
  d[this_num] += 1    # first call initializes d[this_num] to int(), which is 0
for this_num in d:
  print(this_num, "occurred", d[this_num], "times")

collections.deque

Since lists aren't very good as queues, Python has a deque class in the collections module:

import collections
queue = collections.deque(['a'])   #=> ['a']
queue.append('b')                  #=> ['a', 'b']
queue.appendleft('c')              #=> ['c', 'a', 'b']
next = queue.pop()                 #=> next is 'b', queue is ['c', 'a']
next = queue.popleft()             #=> next is 'c', queue is ['a']
queue.extend(another_list)         # (adds elements of another_list to end of queue)
queue.extendleft(another_list)     # (adds reversed elements of another_list to front)

array.array

Python's lists can contain heterogenous objects, and that generality comes with a price: each element of a list is 16 bytes of data. If you need better memory compaction and happen to know your list will contain homogenous numeric data, you can use array.array instead.

import array
arr = array.array('H', [0, 14, 3, 57])  # "H" means two-byte unsigned
I believe you can use these anywhere you could use a list. Do check the docs for the various ways to specify what kind of numeric data you have.

Control statements

condition checking

Before we launch into how Python uses conditions to control program flow, let's go over how conditions work.

The usual boolean operators are spelled out as and and or and not (instead of "&&" or "||" or "!"). They have the same precedence they do in other languages: not has highest, followed by and, followed by or.

Python's comparison operators (in, not in, is, and not is) all have the same precedence, which is higher than the booleans.

Math operators are familiar, and have higher precedence than the comparison operators. However, one biggie in Python is that you can chain math operators; the following are exactly the same:

if a < b == c: ..
if a < b and b == c: ..

Final note: you may not do an assignment inside a condition. I guess the Python designers got tired of that particular C bug.

if

The only things odd about Python's if are that it doesn't require parentheses, and that "else if" is condensed down to elif.

Tip: Python imported (from C) the ability to put short if statements on a single line.

if x < 0: print("[ERROR]")

if x < 0:
  print("[ERROR]")
elif x == 0:
  print("[WARNING]")
else:
  print("blah blah blah...")

There is no "switch" or "case" statement in Python; use a bunch of elifs instead.

ye olde ternary operator

Python decided to implement the typical ternary operator with a "suffix" form of if and else:

result = "ok" if num_errors==0 else "failed"

This is, to date, the most awkward construct I've seen in python. Yikes.

assert

Python has a built-in assert function that you can use to make sure certain expected conditions are what you expect them to be. When the condition is false, assert throws an exception. (And when the condition is true, it does nothing whatsoever.)

assert x>0

while

Like if, while doesn't require parentheses.

As usual, you can use continue to jump up to the next iteration of the loop, and break to break out of the loop right away.

while x < 10:
  x += 1

The weirdest thing ever, though, is that python has an else for while, which runs when the loop condition becomes false. (Though skipped if you use break.) The example from the real python tutorial is pretty cool -- it uses this to tell you if a number is prime, by virtue of it not being divisible by anything:

def is_prime(x):
  for i in range(2, x):
    if x % i == 0:
      res = False
      break
  else:
    res = True
  print("Is", x, "prime? ", res)
  return res

for

Python's for is a little strange because it only does list iteration. That is, you only give it one thing: a list, or other iterable object. You do not give it a start command, a loop condition, and an iteration command.

for x in my_list:
  ...

Like while, Python's for also allows you to specify an else, which runs when the list is depleted (and you didn't exit the for with break).

(Note: avoid modifying a list in the middle of iterating over it. That doesn't tend to work very well in any language.)

range

Since for is only for iterables, how can you do a common iteration between two numbers? For that, Python gives you the function range, which returns an iterable over all the numbers you normally would have managed yourself. Voila:

for x in range(0, 2):
  print(x)

It may be important to note that range returns an iterable, not a list with all the actual numbers in it. That's good! (For memory.)

range assumes a start number of 0 and an increment of 1; both of those can be overridden:

range(3)   #=> 0, 1, and 2
range(1, 3)  #=> 1 and 2
range(1, 10, 3)  #=> 1, 4, and 7

Exceptions

All of Python's exceptions inherit from the base Exception class. (No, we have not yet covered either classes or inheritence, but exceptions are really one of the basic parts of Python so we can't wait any longer.)

Python's exceptions look very similar to C++'s: there's a try block, a subsequent except section, and a raise function. (When I say "look very similar", I mean architecturally, since C++'s terms are try, catch, and throw.)

Other oddities: Python introduces an else, which executes only if no exceptions were thrown, and you didn't leave the try with a break, continue, or return. try also has a finally, which always executes whether or not there were exceptions or you left with a break, continue, or return.

catching

A try can be followed by any number of except class [as var] catchers, and the first one that matches the actual exception will be used. (For a catch-all, you could either just leave off any exception class, or use Exception, which is the parent of all exception classes.)

try:
  ...
except ZeroDivideError:  # catches just division-by-zero exceptions
  ...
except (RuntimeError, TypeError, SomeCustomErrorICameUpWith):  # catches any of these
  ...
except:  # catches any other exceptions not listed above
  ...

A confusing point: you can have an else here, but it does not mean the same thing as catch-all except, even though that's exactly what you'd think reading it. else runs if the try block ran without any exceptions. else will not run if you leave the try block with break, continue, or return. (Honestly I think it's totally superfluous then, because you could just put that code at the end of the try block.)

Note that if you want the exception object itself, you can specify a "parameter" to the except. (To get the exception object in a catch-all except, you have to use the parent Exception class form.)

try:
  ...
except ZeroDivideError as e:
  print("caught a div-zero exception:", e)
exept Exception as e:
  print("caught generic exception":", e)

raising your own

Most languages "throw" exceptions; Python "raise"s them.

To raise an exception, you call the raise function. For a built-in, you just use the built-in exception class:

raise ZeroDivisionError
raise ZeroDivisionError('custom message addition!')  # you can add your own text!

To make your own exceptions, define a new class that inherits Exception, and define at least the __init__ and __str__ functions:

class myexception(Exception):
  def __init__(self, msg):
    self.msg = msg
  def __str__(self):
    return str(self.msg)
  ...
try:
  raise myexception("wah!!")
except myexception as e:
  print("Caught something:", e)   #=> "Caught something: wah!!"

Note that you can trap and re-raise exceptions, which allows you to know what's going on without interfering:

try:
  ...
except:
  print("Eep!  Exception!  Re-throwing..")
  raise

Lastly, you can create an exception object without immediately raising it, so you can play with it before sending it along:

e = myexception("eek!")
e.do_something()
raise e

common exceptions to know about

I/O

Input from stdin

You can read from stdin with the input function. It is line-based (as opposed to character-based).

print("WHAT...is your name?")
name = input()
print("WHAT...is your quest?")
quest = input()
print("ok, the 1960s called, they want their terminal-based input back.")

Output to stdout

You've already seen the print function. The only useful thing to add is the way to make it not print out the carriage return at the end:

print("A line without a carriage return!", end='')

Output to stderr

stderr is an awkward abomination in UNIX architecture, but for some reason people keep using it. Fortunately, using stderr in Python is also an awkward abomination:

print(file=sys.stderr, "I'm going to stderr!")
A second way to do it, for some reason:
sys.stderr.write("I'm going to stderr!\n")  # carriage return needed for write()

Input from files

Python has file objects:

try:
  fh = open(filename, "r")
  for line in fh:
    # 'line' includes the terminating \n:
    line = line.strip("\n")
    ...
  fh.close()
except IOError as e:
  print("[ERROR] could not read", filename, ":", e)

A second way to do the exact same thing is using the with keyword. It automatically closes the file when it's out of stuff to read, which is more robust for resource cleanup:

with open(filename, "r") as fh:
  for line in fh:
    # 'line' includes the terminating \n
    line = line.strip("\n")
    ..
(The with function is totally crazy. Its whole deal is to set up "constructors" and "destructors" for generic blocks of code. In this case, the open returns a filehandle object that happens to also be a "Context Manager", and with calls special "Context Manager" functions at the beginning and end of the real block of code. It's..crazy. You can make your own Context-Manager-aware classes that could also be used with with. Assuming you, too, are crazy.)

You can read in the entire file in one shot ("slurping) with the read() function. You can also read in all the lines of the file in one shot with the readlines() function.

If the open fails, it raises an IOError exception.

The valid modes you can give to open are:

Output to files

Since this calls write directly, you have to do your own object-to-string conversion. (One would think this would be built-in, just like print, but hey.)

fh = open(filename, "w")
fh.write("Line 1\n")          #=> note the \n is needed here
fh.writelines(list_of_stuff)  #=> each element must be a string with a \n!
fh.close()

If the open fails, it raises an IOError exception. (Or, if you used mode "x" for strictly creating a new file, open will raise a FileExistsError exception if the file already exists.)

os.listdir(..)

Returns a list of the contents of the given directory. By default, it does not include dot-files.

stuffs = os.listdir("/tmp")

repr vs. str

Python has two functions for converting generic values to human-readable strings: repr and str.

str is meant to convert things for humans to read on a terminal or file.

repr is meant to convert things for the Python interpreter to read.

Numbers are the same for both:

a = 10
print(str(a))   #=> 10
print(repr(a))  #=> 10

Strings get quoted by repr:

a = "moo"
print(str(a))   #=> moo
print(repr(a))  #=> 'moo'

Functions

Python functions are defined with the def keyword (for defined!). Function signatures include only input variable names; outputs are not declared, and there are no variable types at all.

def empty_func():
  pass

def useless_func(in):
  return in

Python treats functions as first-class values, meaning you can assign a variable to a function:

foo = useless_func
foo("bar")

Return value(s)

As mentioned earlier, one of Python's coolest things is the ability to return multiple values (without having to wrap them in an object!). Python does this by packing and unpacking them as a tuple:

def my_func():
  ...
  return "no errors", 17, False
...
err_str, line_count, is_flammable = my_func()

In a completely unrelated observation, I am mildly surprised that Python3 made print a function (and thus now requires parentheses) but did not change return (which is still allowed to not use parentheses).

Pass-by-value-of-reference

Python claims that it passes argument by value, though the values of arguments are always references. This is mostly pass-by-reference, except that you get a local copy of the reference for you to play with. Here's a quick illustration:

def myfunc(param):
  param.do_something()  # actually affects RealObj
  param = ...           # does not affect RealObj
  return
RealObj = ...
myfunc(RealObj)
I snarkily call this "pass by value of reference". The Python docs think maybe it should be called "pass by object reference".

Default parameter values

Parameters can have default values, like so:

def my_func(arg1, arg2 = 42, arg3="asdf"):
  pass

Python requires that the parameters with defaults go after ones without them.

DANGER: mutable default values

The default value is only evaluated once, at compile-time, and thereafter appears to become a static variable. Subsequent calls will inherit any local changes made to it. Actual example from the Python docs:

def f(a, L=[]):
    L.append(a)
    return L

print(f(1))    #=> "[1]"
print(f(2))    #=> "[1, 2]"
print(f(3))    #=> "[1, 2, 3]"

This is ... just weird as hell. The default value for L is not actually what the code says it is, it's actually whatever it's been mangled to by whatever code happens to have run. Yikes.

Named arguments

Very unique to Python is the ability to name your arguments, which allows you to put them in any order. (Verilog does this too, but that's not a software language.)

def my_string_to_int(src = my_string, base = 10):
  ...

my_string_to_int(base = 8, src="0755")

Note one odd constraint: you can't put positional (that is, non-named) arguments after named arguments. :/

Variable arguments

Python has two ways of supporting an arbitrary number of parameters: variadic-style, and hash-style. Both of these can exist in the presence of formal parameters (that is, the usual style), and with each other.

variadic-style

Variadic-style is also known as varargs-style. What it does is package up all the unknown arguments into a tuple and put it into a single parameter (which you identify with an asterisk):

def my_concat(*args):
  return "::".join(args)

hash-style

In hash-style, unknown key=value arguments are put into a single hash parameter (which you identify with two asterisks):

def my_func(**stuff):
  for k in sorted(stuff.keys):
    print(k, stuff[k])

using all of the above

So now we have four different ways to specify parameters and arguments to functions:

formal, unnamed
def f(arg):
  pass
..
f(foo)
formal, named
def f(arg):
  pass
..
f(arg=foo)
variable, unnamed
def f(*arg):
  pass
..
f(foo)
variable, named
def f(**arg):
  pass
..
f(arg=foo)

Note: if you use both variadic-style and hash-style, you have to list the variadic-style arg before the hash-style arg:

def myfunc(real_param, * extra_args, **named_args):
  for arg in extra_args:
    ..
  for arg in named_args.keys():
    ..

unpacking variable args

The above let you pack a bunch of arguments into a single parameter; what if you have a single argument that needs to fill a bunch of parameters instead? Python lets you unpack these things with the same sort of syntax, though it's kind of out of place:

def myfunc(a, b):
  ...
# the unnamed version:
mylist = [10, 27]
myfunc(*mylist)
# the named version:
myhash = {'a':10,'b':27}
myfunc(**myhash)

lambda functions

Lambda functions are small, anonymous functions, and are primarily useful when function objects are the best way to write a particular piece of code.

add = lambda x:x+1
sub = lambda x:x-1
if start < end:
  incr = add
else:
  incr = sub
while start != end:
  ...
  incr(start)

You cannot do anything complex in lambda functions -- they have to fit on one line, and cannot call any other functions.

The Python docs call this a nod to functional programming, but honestly I don't see why it's necessary. Functions are already first-class objects, so you can do the above already. /shrug

Modules

Packaging up code is pretty straightforward in Python. There are two ways to do it: one is by classes (which you can instantiate as objects) and the other is by modules (which you cannot instantiate, so they're more like a namespace).

To create a module foo, all you do is name your Python file foo.py. There is no package keyword like in perl; the expectation is that modules and files have a one-to-one mapping, so foo.py exactly describes the foo module.

To import your new foo.py module from another script, you run import foo. Python appends the ".py" extension and loads/runs whatever it finds in foo.py.

Any code at the top level of a module is executed the first time (and only the first time) the module is imported.

Modules are also a namespace (so, they have their own symbol table). Thus you would need to use global to write to outside variables.

module search path

So where does Python finds your foo.py, since it could be anywhere on the file system? Python searches the paths in the sys.path list until it finds one that has a foo.py. Initially, sys.path contains the following:

  1. the cwd (not necessarily the dir where there script is!)
  2. ENV{PYTHONPATH}
  3. system- and installation-dependent paths
You can write to sys.path at any time, so you have full control over where Python looks for foo.py. Suppose you have a dir that contains a top-level main.py that uses an adjacent Stuff.py; here's a good way to make sure main.py picks up its associated Stuff.py:
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.abspath(sys.argv[0])))
import Stuff

"import" vs "from .. import"

"import foo" imports whatever's in foo.py and preserves the namespace, so you have to qualify any identifiers you use. For example:

import mystuff
mystuff.conquerTheWorld()
mystuff.evilLaugh()

"from foo import *" also imports whatever's in foo.py, but rolls everything into the local namespace:

from mystuff import *
conquerTheWorld()
evilLaugh()
(Note that the "*" form does not import any identifiers that begin with an underscore; semantically those are considered internal functions.)

If you don't want to steamroll your local namespace like that, you can also selectively import just the things you want:

from mystuff import conquerTheWorld
conquerTheWorld()
mystuff.evilLaugh()

renaming identifiers

Should the need arise, you have the option of renaming identifiers when you import them. This can be abused to much merriment:

from mystuff import conquerTheWorld as petKittens
petKittens()   # d'awww...er, wait...

Packages

Packages are just collections of modules. Python has a little overhead and a little syntactic sugar for dealing with them. First, the overhead.

In order for Python to recognize packages, you must create an __init__.py file in the package directory. (It can actually be empty!) This file is a marker so that Python can avoid false positives looking for your package.

Example:

% find .
./myscript.py
./foo/bar/__init__.py
./foo/bar/mod1.py
./foo/bar/mod2.py

The syntactic sugar is that you can now use periods (".") in import names to denote subdirectories. (It's similar to perl's double-colon in things like use List::Util.)

% cat myscript.py
import foo.bar.mod1
from foo.bar import mod2
...

Since "import *" is a little strange for these things (do you mean all the identifiers in a module? or all the files in a dir? or all the subdirs in a dir? or all of the above?), Python has a mechanism for the package to declare what "import *" should do. See the python docs for how to define the __all__ attribute in your __init__.py.

You can do relative imports within a package, when you want to pick up a related module in the same package. The syntax is "from . import foo".

Classes

Classes work almost the same as any other language. To define a class, you use the class keyword, and then define any attributes (data) or methods (functions) you want in that class. Usually you only need to define methods, since Python autovivifies instead of declaring variables. (However, see the 'static' section below.)

class myclass:
  def myfunc(self):
    self.myvariable = "asdf"
Almost all of the methods and members in a class are public, so anyone can view and change anything.

instantiating

Creating an instance of a class is done a little strangely: you call the class name like a function, and Python implicitly knows that means to create an object:

myobj = myclass()

inheriting

To inherit from a base class, you include it in the class defintion like so:

class Derived(Base):
  ...
Python supports multiple inheritence:
class Derived(Base1, Base2):
  ...

Everything in classes is virtual, which means that you will always pick up any overrides of methods (or attributes!) by derived classes. However, there may be cases where you don't want to pick up the override, so Python's syntax for picking a specific definition of a method/attribute is to scope it with the class name:

class Derived(Base):
  def myfunc(self):
    # call Base's first:
    Base.myfunc(self)
    # then do my stuff:
    ...

constructor

The constructor is a special function named __init__. Note that python does not call parent constructors for you, because why would such a perfect language do such a silly, silly thing, so you have to do it yourself with super:

class myclass:
  def __init__(self):
    super().__init__()
You can add parameters to __init__ so that you can pass arguments during instantiation.

destructor

The destructor is a special function named __del__. It is so named because it turns out you can run the global del function on any object to destroy it at any time. (You can also destroy an object by assigning it to None. You don't need to do either of those things because Python has garbage collection, but it's nice to know you can force cleanup when necessary.)

class myclass:
  def __del__(self):
    pass

"string operator"

If you trying to print out an object, you will usually get gibberish:

myobj = myclass()
print(myobj)   #=> "<class '__main__.myclass'>"
But if we define a __str__ function we can format it however we like:
class myclass:
  def __str__(self):
    return "Moo!"
myobj = myclass
print(myobj)   #=> "Moo!"

__getitem__

Defining this function allows consumers to reference your object with an index. Here's how to roll your own list or dictionary!

class MyClass:
  def __getitem__(self, id):
    ..
myobj = MyClass()
..
print(myobj[3])

__setitem__

This is the inverse of __getitem__. It lets indexed objects of your class be lvalues.

class MyClass:
  def __setitem__(self, id, value):
    ..
myobj = MyClass()
myobj['asdf'] = "foo!"

__getattr__ (autoload)

Perl has a nifty function called AUTOLOAD, which gets called whenever you try to invoke a function that isn't defined. The AUTOLOAD function has a parameter with the name of the function that you tried to call, and you can try to figure out what to do with it on your own.

Python's eqivalent (the __getattr__ function) is only for data (not methods), but it may still be useful:

class myclass:
  def __getattr__(self, attrname):
    print("You tried to get", attrname, "??  I give you '3' instead!")
    return 3

operator overloading

You can overload some operators for your class. Note that these are only used when the objects on both sides are the same class.

__eq__Used in ==
__ne__Used in !=
__lt__Used in <
__gt__Used in >
__index__Used in int() conversion

RTTI

Python has two (identical?) ways of getting the name of an object's class. The first is with the __class__ attribute, and the other is by passing it to the type function.

class myclass():
  ...
x = myclass()
print(x.__class__)  #=> "<class '__main__.myclass'>"
print(type(x))      #=> "<class '__main__.myclass'>"

If you don't need the exact name of the class, you can also check just to see if it's an instance of a particular class:

if isinstance(x, myclass): ...

You can also check to see if an object somehow inherits from a particular class:

if issubclass(DerivedClass, BaseClase): ...

iterators

You can implement your own iterators by defining a class that has __iter__ and __next__ functions. Instances that class can then be used directly in the for loop!

(Section 9.9 of the tutorial has a great example that I don't need to repeat.)

generators

In Python, generators are just functions that call yield, and they're just another way to implement iterators. (In other languages, things that call yield are called coroutines.)

I'm also not going to reprint the great example the tutorial has, but I wanted to at least mention these things because they could be useful.

static methods

A static method is one that does not have an implicit self argument -- these exist when you have a method that applies to a class but not so much to any instances of it. To designate a method as static, you use the special @staticmethod decorator:

class myclass:
  ...
  @staticmethod
  def myfunc():  #=> look, no self!
    ...

For flexibility, you can call static methods on either the class or on an instance of it. (The instance form ignores the instance, except for figuring out which class it belongs to.)

myclass.myfunc()
o = myclass()
o.myfunc()

static variables

Python's implementation of static variables is..indirect. To understand how this works, you first need to understand that Python is pervasively object-oriented. When you define a class, you're used to instantiating it to create objects. However (and this is the key), the class itself is also an object. And since it's an object, it has its own namespace! Therefore, we can create static class variables by navigating Python's namespacing rules. So here we go:

class MyClass:
  asdf = 1    #=> this is both a class and an instance attribute
  def my_func(self):
    # these assignments are all orthogonal:
    asdf = 2          #=> this is local to my_func
    self.asdf = 3     #=> this is the instance's attribute
    MyClass.asdf = 4  #=> this is the class's attribute

Scoping

Python has 4 specific scopes for identifiers, and now that we've covered them all in other sections, you'll know what I'm talking about!

Of particular note is that none of these is global. That's right - python does not have global variables.

function

The smallest scope is actually function. You're probably used to block-level scoping from other languages, but the following works in python just fine:

..no mention of 'foo'..
if some_condition:
  foo = "bar"
else:
  foo = "bas"
print(foo)  # works just fine

enclosing function

This is a little odd, but since you can define nested functions, python lets you peek into the parent function's namespace via the nonlocal keyword. Yay, it's like we're programming in tcl!

def outer():
  myvar = 1
  def inner()
    nonlocal myvar
    myvar = 3  # outer's myvar is 3
Curiously enough, python will actually warn you if you try to mess with myvar before declaring it nonlocal.

Also curiously enough, nonlocal will continue tracing up nested namespaces looking for your identifier.

module

Modules, which are really objects like everything else, can have any attributes you want. You can access those attributes directly from within the module by declaring them as global.

Such attributes look a lot like global variables (especially because of that deceptive word global), but when you try to access them from another module you do need scope resolution.

(from foo.py)
asdf = 1
..
def myfunc():
  global asdf
  asdf = 2
..
print("asdf:", asdf)   # prints 1
myfunc()
print("asdf:", asdf)   # prints 2

(from bar.py)
import foo
print("foo.asdf:", foo.asdf)  # prints 1
foo.myfunc()
print("foo.asdf:", foo.asdf)  # prints 2

built-in

The last scope is for built-in python identifiers such as int and len. You, as a mere mortal, do not have the ability to add things to this scope.

OS and system functions

sys.argv

Python puts all of the command-line arguments into the sys.argv variable. Like C, sys.argv[0] is the path to the script; the first argument is actually in sys.argv[1].

Python has not one but two packages for handling command-line arguments: getopt (which is apparently like the UNIX one) and argparse (which is the endorsed one).

argparse

Here's an executive-summary example of how to use argparse:

import argparse
p = argparse.ArgumentParser(
    description="this script does blah blah blah"  #optional; shows up with -h
  )

# an option with an argument:
p.add_argument(
    "--infile",         #long option
    help="input file")  #optional; shows up with -h

# an option without an argument:
p.add_argument(
    "-v",                 #short option
    action="store_true")  #"exists" instead of "has a value"

# an argument without an option (e.g. the "update" part of "cvs update"):
p.add_argument(
    "mode")            #instead of "--mode"

# a required option:
p.add_argument(
    "--moo",
    required=True)

# an option with a default value:
p.add_argument(
    "--out",
    default="/dev/null")

# an option with an argument that could be specified more than once:
p.add_argument(
    "--infile",
    action='append')  #resulting field will be a list!

# an option without an argument that could be specified more than once, and
# in either short or long form:
p.add_argument(
    "-v",             #detects "-vv"
    "--verbose",      #detects "--verbose --verbose"
    action='count')

# an option with an optional argument:
p.add_argument(
    "--foo",
    nargs="?")  #detects either "--foo A" or just "--foo"

# an option with multiple arguments:
p.add_argument(
    "--foo",
    nargs=4)   #detects "--foo A B C D" 

# an option that can only be one of a few different things:
p.add_argument(
    "--darth",
    choices=['vader', 'maul', 'sidious'])


args = p.parse_args()

# and here's how to access things.  The field name is the first double-dash option name
# you give for an arg (or, the first single-dash option name if there are no
# double-dash options):
if args.verbose:
  print("[info] verbose on")
print("Using input file", args.infile)
print("Running as mode", args.mode)

From reading through the docs, I believe (but can't definitively confirm) that argparse follows the POSIX convention for arguments, which means:

The description and help arguments are optional, but one of the ten-ton-hammer things about argparse is that it will automatically print the inline help for a script when the arguments fail validation.

argparse is immense. If you're wondering if you can do XYZ: you can, and go look through the documentation. (Skip the tutorial though; the authors spend more time showing us how they do debugging than showing how to use the module.)

os.getcwd() / os.chdir()

Also known as 'pwd' or the environment variable $PWD, this returns the process's current directory. It's safer to use this than $PWD because anyone can change environment variables, and chdir isn't guaranteed to.

import os
pwd = os.getcwd()
os.chdir("/tmp")
...
os.chdir(pwd)

os.environ[]

A hash of the ENV variables. Modifying this will call the underlying putenv function, but calling os.putenv directly will not update os.environ. So, it's recommended to use os.environ directly as much as possible, unless you happen to be using SWIG'd C code that calls putenv itself, in which case you should make sure all your ENV reading comes from os.getenv.

I hate ENV management.

import os
print("Orig LD_LIB path:", os.environ["LD_LIBRARY_PATH"])
os.environ["LD_LIBRARY_PATH"] = "/some/custom/path:"+os.environ["LD_LIBRARY_PATH"]

sys.executable

A string containing the path to the version of Python executing this code.

import sys
print("being run by:", sys.executable)

os.system()

Runs a program in a sub-shell, as usual. The return value is the full error code, not just the return code, so you need to shift-right by 8 to get the return code.

import os
os.system("mkdir -p foo/bar")

shutil.copyfile() / shutil.move()

The shutil module contains a few functions that could save you from calling system a kajillion times.

import shutil
shutil.copyfile(src, dest)
shutil.move(src, dest)
path_to_mkdir = shutil.which("mkdir")

os.mkdir / os.makedirs

These functions create directories. The difference between them is that os.mkdir creates only leaf directories, whereas os.makedirs will create all necessary parent directories.

import os
os.mkdir("/foo/bar/bas")  # creates just "bas"
os.makedirs("/foo/bar/bas")  # creates "foo", then "bar", then "bas", if needed

glob

The glob module implements shell-style file globbing:

import glob
code_files = glob.glob("*.py")

date and time

Dates and times are both handled by the datetime module. It has classes for just date, just time, and for both (datetime).

import datetime
str(datetime.date.today())    #=> '2012-12-31'
str(datetime.datetime.now())  #=> '2012-12-31 14:36:05.788937'

Another module handling time is the time module. One of its most interesting functions is localtime, which returns a 9-element list of things for the local time zone:

import time
tm_year, tm_month, tm_day, tm_hour, tm_minute, tm_second, tm_weekday, tm_yearday, tm_isdst = localtime()
# tm_year = 1993
# tm_month = 1-12
# tm_day = 1-31
# tm_hour = 0-23
# tm_minute = 0-59
# tm_second = 0-61.  Seriously
# tm_weekday = 0-6, starting with Monday
# tm_yearday = 1-366
# tm_isdst = -1-1.  Boolean for "is Daylight Savings", with -1 being "you go figure it out"

Another common function needed is getting the number of seconds since the epoch, which python does with time.time. The only difference in python is that this may return a floating-point number with better-than-second resolution.

start_time = time.time()
...
end_time = time.time()
print("{} seconds elapsed".format(end_time - start_time)

os.path

This module contains a ton of things for playing with paths:

Processes

Python's interfacing to subprocesses is a bit on the clunky side, but at least it's object oriented. (Sigh.)

subprocess.call (a.k.a. "system")

This function behaves like system in other languages -- it runs the specified program and waits for it to exit. Its stdin/stdout/stderr channels are connected to the current ones, and the function returns the process's returns code.

Passing shell=True is optional, but runs the command through the shell first, which means your commandline can contain any of the following:

If you use the shell, though, they suggest passing the commandline as a string instead of as an array.

import subprocess
retcode = subprocess.call(["myscript", "-in", infile, "-out", outfile])
..
retcode = subprocess.call("myscript -in infile -out outfile", shell=True)

subprocess.check_output (reading stdout)

Calls a given sub-program and returns its stdout.

If you want stderr as well, you can redirect it into the stdout channel by passing in stderr=subprocess.STDOUT.

Note that you will probably want to call it with universal_newlines=True, because otherwise the returned output is "encoded bytes", which is Pythonese for "useless crap that you can't get any data from because it's a string that isn't a string so thbptbtpbtptptpbt!!"

If your commandline has any pipes, be sure to turn on shell=True and specify it as a string instead of a list.

When the called process returns an error code, this function will throw a subprocess.CalledProcessError exception.

import subprocess
try:
  output = subprocess.check_output(
    ["someprogram", "-in", "infile.txt"],
    universal_newlines=True)
except subprocess.CalledProcessError as e:
  print("ERROR: someprogram terminated with error code", e.returncode)

try:
  output = subprocess.check_output("grep foo bar.txt | grep -v bas",
    universal_newlines=True,
    shell=True)
except subprocess.CalledProcessError as e:
  print("blah blah blah")

subprocess.Popen (driving stdin)

If you want to drive a subprocess's input, there's no convenience function, so you have to drop down to the ten-ton-hammer function Popen. For driving a sub-process's stdin, the key thing to do is pass stdin=subprocess.PIPE. Note that you only get one chance to drive its input, because the subsequent communicate function you call will close the sub-process's stdin. That means all your input has to be put in one string.

Additionally, probably due to unicode support, you have to call bytes() on your input to convert it for communicate.

Brilliant, python.

p = subprocess.Popen(
    ['/usr/bin/mail',
      'foo@bar.com',
      '-s "subject line"',
    ],
    stdin=subprocess.PIPE,
  )
input_str = "Automated notification of blah blah blah"
p.communicate(input=bytes(input_str, "UTF-8"))

I found another way to do this, which is far more straightfoward, though there's a scary warning in the python docs that doing this has a possible deadlock:

p = subprocess.Popen(
    ['/usr/bin/mail',
      'foo@bar.com',
      '-s "subject line"',
    ],
    stdin=subprocess.PIPE,
  )
p.stdin.write(bytes("Moo moo moo!\n", "UTF-8"))
p.stdin.write(bytes("haha, you've been moo'd\n", "UTF-8"))
p.stdin.close()
p.wait()

Your mileage may vary. Have a nice day!

os.getpid()

This is the function to get the current process's PID.

import os
mypid = os.getpid()

socket.gethostname()

Returns the name of the current machine.

import socket
hostname = socket.gethostname()

Introspection

Interpreted languages usually provide you with a means of asking about runtime objects in a way that thoroughly breaks encapsulation. For example, given a variable foo, give me a string that's the name of its class! Or, given a class name, tell me all the functions and variables! This is called introspection and is occasionally very handy. Here are some of the things you can do in python.

type

type returns the data type of the given variable, by name. You can then compare the output to see if there's a match.

a = 0
type(a)   # returns "<class 'int'>"
b = "asdf"
type(b)   # returns "<class 'str'>"
if type(a) == int:
  ...

Note that you can also use type to create a class dynamically. See the 3-argument form, and then wash your eyes out with bleach. Actually, do that in the other order, so that you never use type for that.

dir

Give dir an object, and it return the list of attributes on it. It's really meant for interactive play-time on the command line, so don't put too much stock in it.

Remember that attributes usually means "member variables and methods", but also remember that classes can define the __dir__ function to override the default behavior of dir.

dir(str)   # returns: ['__add__', '__class__', '__contains__',
    '__delattr__', '__dir__', '__doc__'....'endswith', 'expandtabs', 'find',
    'format', 'format_map', 'index',....]

callable

callable tells you whether the thing you passed it can be called like a function. This is infinitely useful for when you can't remember what you called your function.

a = print
if callable(a):
  a("moo!")

isinstance

isinstance returns whether a given variable is an instance of a given class (or any class derived from it).

class B: pass
class D(B): pass
b = B()
d = D()
isinstance(b, B)  # returns True
isinstance(b, D)  # returns False
isinstance(d, B)  # returns True

issubclass

issubclass is like isinstance except it queries a class instead of an object.

class B: pass
class D(B): pass
issubclass(B,B)  # returns True
issubclass(B,D)  # returns False
issubclass(D,B)  # returns True

getattr

getattr allows you to ask an object for an attribute by name.

a = MyClass()
func = getattr(a, "myfunc")
func("hi")   # same as a.myfunc("hi")

docstrings

For functions and classes, Python has some syntactic sugar to make documentation a bit more consistent. You can establish a function's/class's docstring by creating a string as the first thing in the body. Behind the scenes, Python stores that special string in the __doc__ attribute, but consumers can get it in a much more friendly manner with the help function:

def my_func():
  """Doesn't do much, but I wrote it so it must be awesome."""
  return 4

print(help(my_func))

Docstrings have several conventions that are completely unenforced by the compiler:

One thing you may come across in docstrings is copy-pasted output from an interactive Python session. Combined with the doctest module, these are actally embedded tests -- a nifty way to not only document usage of a function but also to test it at the same time:

import doctest
def myfunc(asdf):
  """Blah blah blah

  >>> print(myfunc(10))
  0
  >>> print(myfunc("asdf"))
  'moo'
  """
...
doctest.testmod()   # checks all the embedded tests!

Serialization with pickle

Data serialization is awesome because it allows you to dump a Python data structure to a file in such a way that Python can rebuild the data structure from the file directly. Python's built-in data serializer is the pickle module.

import pickle
settings = {"common": {"host":"mordor", "user":"sauron"}, "date","3849739103"}
fh = open("datafile", "w")
data = pickle.dump(settings, fh)
fh.close()
Then, later:
import pickle
fh = open("datafile", "r")
data = pickle.load(fh)
fh.close()

No more manually parsing settings files! Yay! Without this, you'd have to traverse your data structure, and convert all your non-string values to strings when writing your file. Worse, you'd have to do the inverse when reading it back in.

Misc

zip

The global function zip takes two lists and zippers them up into a hash so that one of the lists is all the keys and the other is all the values:

k = ['name',  'quest',           'favorite color']
v = ['Borat', 'Pamela Anderson', 'puce']
h = zip(k, v)

global variables

__name__

Usually contains the name of the current module, as a string. That is, inside foo.py this will be set to 'foo'. The one exception is that when you run foo.py from the commandline it will instead set __name__ to '__main__' instead. This allows you to use a module as either a packaged module or as the top-level code. In practice, this is best reserved for putting testing code in the same module:

mylib.py:
  def func1..
  def func2..
  def func3..
  if __name__=='__main__':
    # run tests!
    func1()
    ..

networking

Of course Python has direct support for networking. Two modules that you may want to check into are urllib.request (for grabbing data from a web page) and smtplib (for sending email).

threading

Python has no fork and exec. (Well, okay, it has an exec, but it's not the one that goes with fork.) Instead, Python has a threading module that gives you a nifty API for dealing with multi-threading.

Note also that for sufficiently complex threading applications you will probably also want to look at the queue class, which is a thread-safe synchronous queue.

logging

Python has a logging module. It appears to be rather heavyweight, where by heavyweight I mean it's cumbersome to the point of sucking. But I haven't really given it a fair chance, so I dunno.

profiling

Python has a few ways to do profiling:

The timeit module has a class named Timer that will run code you give it and tell you a very specific floating-point number runtime.

The profile and pstats modules have less granularity but help with profiling entire programs.


Chris verBurg
2015-03-08