|
w w w . a q u a m e n t u s . c o m |
What is Python?
Why Python3?
Hello, world
Basics
Commenting
Indenting
Variables
Calling functions
"pass"
"None"
"is" vs. "=="
Data types
boolean
Logic operations
integer
math
bit manipulation
floating-point
fixed-point
string
comparison
concatenation [global "+" operator]
replication [global "*" operator]
length [global "len" function]
subset check [global "in" operator]
substring [global "[]" operator]
formatting with format [method]
formatting with rjust/ljust/center/zfill [member functions]
formatting with pprint
count [method]
startswith [method]
endswith [method]
find/index [method]
isdigit [method]
join [method]
strip [method]
lstrip [method]
replace [method]
partition [method]
split [method]
regex (regular expressions)
set
add [method]
remove/discard [methods]
"in" [global operator]
"len" [global function]
issubset ("<=", "<") [method]
issuperset (">=", ">") [method]
union ("|", "|=") [method]
intersection ("&", "&=") [method]
difference ("-") [method]
symmetric_difference ("^") [method]
comprehensions
list/tuple
slicing
adding elements (append/extend/+)
inserting elements (insert)
removing elements (del, remove, pop)
size (global 'len' function)
searching (index)
counting (count)
sorting (sort method)
sorting (sorted global function)
reverse
comprehensions
tuple packing/unpacking
dict
size (global "len" function)
query (global "in" operator)
retrieval (global "[]" operator)
removal (global "del" function)
items [method]
keys [method]
values [method]
pop [method]
popitem [method]
update [method]
comprehensions
the defaultdict class
collections.deque
array.array
Control statements
condition checking
if
ye olde ternary operator
assert
while
for
range
Exceptions
catching
raising your own
common exceptions to know about
I/O
Input from stdin
Output to stdout
Output to stderr
Input from files
Output to files
os.listdir(..)
repr vs. str
Functions
Return value(s)
Pass-by-value-of-reference
Default parameter values
DANGER: mutable default values
Named arguments
Variable arguments
variadic-style
hash-style
using all of the above
unpacking variable args
lambda functions
Modules
module search path
"import" vs "from .. import"
renaming identifiers
Packages
Classes
instantiating
inheriting
constructor
destructor
"string operator"
__getitem__
__setitem__
__getattr__ (autoload)
operator overloading
RTTI
iterators
generators
static methods
static variables
Scoping
function
enclosing function
module
built-in
OS and system functions
sys.argv
argparse
os.getcwd() / os.chdir()
os.environ[]
sys.executable
os.system()
shutil.copyfile() / shutil.move()
os.mkdir / os.makedirs
glob
date and time
os.path
Processes
subprocess.call (a.k.a. "system")
subprocess.check_output (reading stdout)
subprocess.Popen (driving stdin)
os.getpid()
socket.gethostname()
Introspection
type
dir
callable
isinstance
issubclass
getattr
docstrings
Serialization with pickle
Misc
zip
global variables
__name__
networking
threading
logging
profiling
What is Python?
Python is an interpreted programming language whose design priority was to
bridge the gap between C (which requires consideration of an immense number of
very low-level details) and shell scripting
(which is missing so many features it's not usable on even medium-sized
projects). Some have called it a "a cleaned-up improvement of perl", which
is true, but that wasn't its design goal. Python has become wildly successful,
especially at Google (due in no small part to its creator (Guido) having
worked at Google, I'm sure). It is far from being a perfect language, but it is
much better than C, shell, and perl for creating and maintaining
production-quality code.
This page is meant to be two things:
- a tutorial for people learning python3
- a reference
Which is why it's called a "tutref".
The official python documentation (here)
is a little weird, in a few ways. First,
due to the schism between python2 and python3, googling for python help usually
gets you to a python2 page, so you have to use the pulldown menu to find the
Python3 version of the documention. (Which, actually, is pretty sophisticated,
given that some things have moved around.)
Second, and worse, their "search the documentation"
feature lists sections out in alphabetic search order very slowly, so it goes from
section "1" to section "10" (then "11", "12", etc) instead of going to section
"2". If you need something from section 6, good news: you have time for a Starbucks run!
Why Python3?
This page is focused (exclusively) on Python3, because most Python tutorials
and books are for Python2.
There are so many changes between Python2 and Python3 that Congress is
considering declaring the
official release notes
cruel and unusual punishment. However, if you need an
executive summary: Python3 is an even more cleaned-up version of Python2. And
if you need a shorter executive summary: 3 > 2.
Hello, world
What is any programming-language tutorial without a "hello, world"
example to kick it off? But it can't just be "hello, world", it has to be
"hello, world" from Marvin, the depressed robot in
The Hitchhiker's Guide to the Galaxy:
print("Hello, cruel world.")
Things to notice right away:
- you can print to stdout right away, without having to #include anything
and/or drag a namespace into local scope. I hate C++ sometimes.
- there's no semicolon ending the statement. (Though there could be)
- there's no carriage return in the string, yet it shows up in the output.
- you need to have parentheses around the arguments to
print
.
Basics
Commenting
Python's comment marker is a single "#", which makes the rest of the line a
comment. (Just like perl, and kind of just like tcl.)
Python does not really have a block-comment mechanism, but like other languages
without a real block-comment mechanism there's always some other element of
the language you can misuse to get it. In Python's case, you can abuse "docstrings",
which are like normal strings except they can wrap around lines without any
special handling. The most common docstring to use
for commenting is the three double-quotes (but you can also use three
single-quotes if you prefer).
Predictably, docstring-based block comments do not nest. My best-practices tip:
use three double-quotes for docstrings, and use three single-quotes for block
commenting. Then at least they don't trample on each other.
Indenting
Python is rather unique in that all of the source code's leading whitespace
is critical. Instead of identifying blocks of code with surrounding curly
braces ("{}"), Python identifies them by their relative indentation levels.
This system has its advantages and disadvantages, but there's no way to
subvert it so you better start getting used to it. :)
Variables
Python's variables have no sigils (such as perl's dollar ($), percent (%)
and at (@)). They are not declared, but rather autovivify on assignment.
Variables do have an associated type; we'll get to that in a bit.
The scope of variables is limited to their local function. There are at least
two unexpected surprises as a result:
- if you need to write to a global variable, you have to declare it (in the
function) as global using the
global
keyword.
- there's also a
nonlocal
keyword, which appears to be applicable to
things like nested functions.
- even if you autovivify a variable inside a loop (or other sub-block), the
variable still lives as long as the function.
Curiously, you can use del
to unscope a variable!
You can chain assignment statements:
x = y = z = 0
You can also multi-assign:
a, b = 0, 1
You can probably also chain multi-assigns:
x1,x2 = y1,y2 = 0,MAX
Calling functions
Python keeps function-calls the same as other languages:
ret = function_name(arg1, arg2)
Note that the parentheses are required, because the semantics of not
including them is asking for the function address instead of calling it.
One of the coolest things about python is that functions are allowed to
return multiple values:
ret1, ret2, ret3 = function_name(arg1, arg2)
"pass"
Python has a special keyword called pass
that lets you define stub functions,
classes, loops, or anything else that needs to be indented. (A special keyword
is only necessary because the first line of any kind of block structure must
be indented.)
def my_function():
pass
class my_class:
pass
while 1: # busy-wait
pass
"None"
Python has a keyword specifically for undefined values: None
. You can
treat it as a regular constant (think of NULL
in C++, or undef
in perl).
It is what is returned if a return
has no argument.
a = None
def no_return():
return
if no_return() == None:
..
"is" vs. "=="
Python makes a distinction between "do these two things look the same" (==)
versus "do these two things live at the same spot in memory" (is
). The
difference is important when you start dealing with objects, and in particular
when they overload the __eq__
function.
A quick example:
a = "abcd"
b = "cdef"
a[2:4] == b[0:2] # is True because "cd" looks like "cd"
a[2:4] is b[0:2] # is False because they're parts of different strings
Data types
Python's variables have data types -- they are either an integer, a
floating-point number, a string, a boolean, or an object. (In
contrast, perl lumps all of those together as a generic scalar
, and
tcl considers everything a string. C also has data types, but has way
more than Python -- it refines "integer" as signed vs. unsigned, and
with a specific bit-width.) Python does not have a character
type; characters
are just single-element strings.
(Aside: python has direct support for complex numbers. However,
those things are useless to 99.99867% of the planet, so I'm
ignoring them here. Though, as an electrical engineer I do have to
give props to Python for using "j" instead of "i" in their notation.)
Curiously, variables are not declared as a specific type. Their type
is determined when they are assigned (and yes they can be reassigned
as a different type later).
boolean
The most basic of types, boolean values are either True
or False
.
The following are all considered False
:
False | direct boolean "false" value |
0 | integer zero |
0.0 | floating-point zero |
None | Python's "undefined" value |
"" | empty string |
() | empty tuple |
[] | empty list |
{} | empty dictionary or set |
Everything else is considered True
in boolean context.
Logic operations
Logical operations are different from other languages because Python
spells out the operation name (instead of using "||" or "&&" or "!"):
Note: in Python, these operators use short-circuit evaluation, which
means that if the answer can be determined from just the left part,
then the right part won't be evaluated at all. For example, for
and
to return True
, both parts have to be True
-- if the left part
is false, we already know the whole expression will be False
, so
the right part doesn't matter. This can be used for several sneaky
things; my favorite such sneaky thing is printing debugging messages
only when the variable VERBOSE
is set to True
:
VERBOSE and print("got to this point!")
More universally relevant, though, this is also good to leverage as
an optimization, because you can leverage the short-circuiting to
avoid unnecessary expensive operations:
# order this so we don't do slow_operation() if fast_operation() is already false:
if fast_operation() and slow_operation():
..
# order this so we don't do slow_operation() if fast_operation() is already true:
if fast_operation() or slow_operation():
..
integer
Python's built-in integer type is 64 bits wide (usually, on most platforms).
Tip: you can specify octal (base-8) numbers with a "0" prefix, and
hexadecimal (base-16) with a "0x" prefix.
Integer comparisons are the same as other languages:
< | less than |
<= | less than or equal to |
> | greater than |
>= | greater than or equal to |
== | is equal to |
!= | is not equal to |
You can typecast strings into ints with the int function:
my_str = "12"
my_int = int(my_str)
math
Math is done with the usual infix operators, so this is also the same as
other languages. Python's math operators are:
x + y | addition |
x - y | subtraction |
x * y | multiplication |
x / y | division |
x // y | integer (floored) division |
x % y | modulo |
x ** y | exponent (same as pow ) |
-x | negation |
+x | no-op |
Python's (useful) built-in math functions are:
abs divmode
pow round
Additionally, there is a math
module that provides some more functions:
ceil cos
floor fmod
log pow
sin sqrt
tan trunc
math.e
and math.pi
are there as well, as attributes.
Random numbers (and random list choices!) can be generated from the random
module:
import random
random.choice(['a', 'b', 'c']) #=> returns one of them at random!
random.random() #=> returns a random floating-point num between 0 and 1
random.randrange(10) #=> returns a random int between 0 and 10
bit manipulation
Python's bitwise operations are the same as most languages:
~x | not |
x & y | and |
x | y | or |
x ^ y | xor |
x << y | shift-left |
x >> y | shift-right |
floating-point
For all intents and purposes, these are treated the same as integer values, so
the same operators and comparisons work.
(I reiterate the common warning not to use direct equality checks on floating-
point values, since that has issues with binary floating-point representations.)
You can typecast a variable to floating-point with the float
function.
fixed-point
Python has a module for fixed-point math called decimal
. I'm not sure if
it's faster, but if you're dealing with money it may well be necessary due
to the evilness of floating-point math.
string
Python allows you to enclose strings in either single- or double-quotes.
Unlike perl, the two are completely interchangable in Python -- escape
sequences are treated the same in both. (If you want Python to not
interpret escape sequences, prefix the string with "r" to get a raw string.)
For multi-line strings, you must escape the end-of-lines, and the end-of-line
does not translate as a carriage return.
Python also has the triple-quoting mechanism, which is where you use either
single-quotes (''') or double-quotes (""") three times in a row. It behaves
differently only for multi-line strings: carriage returns in triple-quoted
strings are preserved in the string and do not get escaped.
a = 'two-line\nstring'
b = "two-line\nstring"
c = r'one-line\nstring'
d = r"one-line\nstring"
e = 'another one-line\
string'
f = 'another two-line\n\
string'
g = """yet another
two-line string"""
h = '''and yet
another two-liner'''
You can "typecast" anything into a string with the str
function. (It's not
really typecasting so much as interpreting in a string context, which you
need for printing.)
my_num = 10
my_str = str(my_num)
Python's string operations are either global functions or methods of the string
class. I'm not entirely sure why they're not all methods, but fortunately
there aren't too many that aren't. Unfortunately, they're the most
common ones.
comparison
In Python, you compare strings for equality with == and !=. How nice!
<ignore my glaring at perl>
concatenation [global "+" operator]
Python concatenates strings with +
.
a = "s1" + "s2" #=> "s1s2"
a += "s3" #=> "s1s2s3"
Curiously, Python also allows you to implicitly concatenate adjacent string
constants, which is a C thing:
a = "s1" "s2" #=> "s1s2"
replication [global "*" operator]
You can replicate strings by "multiplying" them:
print("-" * 80)
print("-- SECTION 4")
...
print("-" * 80)
length [global "len" function]
this_length = len(this_string)
subset check [global "in" operator]
in
returns whether or not one string exists in another.
if 'Py' in 'Python':
...
Note: as syntactic sugar, you have the option of negating this by putting the
"not" next to the "in", so that it reads like English:
"b" not in "asdf" # works
not "b" in "asdf" # exact same
substring [global "[]" operator]
You can fetch substrings from of a string with the array brackets ("[]"). This
is exactly the same syntax as we'll see later for extracing slices from lists.
You can fetch a single character with a single, zero-based index:
one_char = big_string[3] # fourth character of big_string
last_char = big_string[-1]
Or you can fetch a range of characters with [n:m]
, which fetches characters
n
up to but not including m
!
sub_string = big_string[2:7] # third through seventh (not eighth!) characters
You can also leave off one or the other to implicitly say "beginning" or "end":
first_part = big_string[:10]
last_part = big_string[10:]
You can use any arbitrary expression for those slice indices; these examples all
used constant numbers, but you can use variables and math all you like.
formatting with format [method]
format
is a string function that lets you:
- embed variables in strings, in a flexible order
- specify whether to use
str
vs repr
- control formatting such as rounding and field width.
It's Python's 10-ton hammer to handle lots of things in a single place, so
bear with me as we get increasingly more sophisticated with it.
The most basic thing to do is embed variables in strings. Here, you can think
of "{}" as the "%s" from printf:
# the manual way:
name1 = "cats"
name2 = "dogs"
madlibs = name1 + " hate " + name2 #=> "cats hate dogs"
# using format:
madlibs = "{} hate {}".format(name1, name2) #=> "cats hate dogs"
However, these are actually positional, so you can specify which of the
arguments to use in which position:
madlibs = "{0} hate {1}".format(name1, name2) #=> "cats hate dogs"
madlibs = "{1} hate {0}".format(name1, name2) #=> "dogs hate cats"
More usefully, though, they can actually be named:
madlibs = "{animal1} hates {animal2}".format(animal1="cats", animal2="dogs")
And since they can be named, we can also shove them into a hash, giving us the
final and most sophisticated way to embed variables in strings:
h = {'animal1':'cats', 'animal2':'dogs'}
madlibs = "{animal1} hates {animal2}".format(**h))
(No, we haven't covered hashes or the "**" thing yet. Hashes have their
own section coming up, and the "**" thing is described in the "Functions"
section.)
So now that we've embedded variables in strings, here's how to format them: after
that (optional) field name/number, you can add ":" and a printf-like modifier.
A bare number specifies a left-justified field width:
madlibs = "{:6} hate {:6}".format("cats", "dogs") #=> "cats hate dogs "
but I think any printf formatting works. A particularly useful ability is
rounding:
print("12/7 is {:.3f}".format(12/7)) #=> "12/7 is 1.714"
History lesson: there used to be a global "%" operator that worked kind of
like format
and printf/sprintf. It was pretty common, so you'll see it all
over the place in existing code. But, it's deprecated, so try to avoid using it.
formatting with rjust/ljust/center/zfill [member functions]
While the full power of the printf functions will let you do right/left/center
justification with either spaces or zeros, Python strings also have a few
methods to do that directly without having to remember printf syntax.
rjust
takes a field width and returns the original string with prefixed
spaces to fill the width.
ljust
add spaces to the end of a string to fill the requested width
center
adds spaces to both ends.
zfill
is meants for numeric values. It does the same as rjust
except it
prefixes zeros instead of spaces. It will preserve an initial negative sign
if necessary.
str = "foo"
print(str.rjust(5)) #=> " foo"
print(str.ljust(5)) #=> "foo "
print(str.center(5)) #=> " foo "
print(str.zfill(5)) #=> "00foo"
formatting with pprint
Python's equivalent to perl's Dumper
module is called pprint
, and the ridiculous
amount of overhead needed to use it is as follows:
import pprint
pprintobj = pprint.PrettyPrinter()
pprintobj.pprint(my_thing)
(That wasn't really sarcasm - I think creating an object here is too much.
It could've been just "pprint(mything)". Even "print(pprint(my_thing))" is
better. Oh well.)
count [method]
Counts the number of times a given substring appears. Note that if you just
want to know if a substring exists, use in
instead!
str = "this is a test"
if str.count("is"): ... # works, but inefficient
if "is" in str: ... # much better
startswith [method]
Returns True/False
if the string starts with the given string.
b = "moo".startswith("mo") # yep
endswith [method]
Returns True/False
if the string ends with the given thing.
str = "moo"
str.endswith("oo") # yep
find/index [method]
Returns the first place where one string exists in another. If the
string is not found, then find
will return -1, and index
will throw
an exception.
idx = big_string.find('moo')
Note: if you just want to know if a substring exists (not where it is),
it is much more efficient to use the aforementioned in
instead of find
.
isdigit [method]
Returns True
if the string consists entirely of legal numeric characters.
There are many other is* functions.
join [method]
Joins the given thingamajigs together with the current string. The
thingamajigs come from a list or tuple (which we haven't gotten to yet):
full_string = " ".join(my_list)
strip [method]
Removes certain characters from the beginning and ending of the string.
By default it's whitespace, but you can make it whatever you need.
full_string = " eek! ".strip() # just "eek!"
lstrip [method]
Just like strip
(above) except that it only strips the left
(beginning) of the string.
full_string = " eek! ".lstrip() # just "eek! "
replace [method]
Replaces all occurrences of one sub-string with another.
new_string = old_string.replace("old", "new")
partition [method]
Splits a string on the first occurrence of a sub-string. It returns
three values: the part before, the thing it found, and the part after.
splits = "this is a test".partition(" is") # returns ["this", " is", " a test"]
split [method]
Splits a string on every occurrence of a given sub-string.
my_list = " 1 2 3 ".split(" ") # returns ["", "1", "2", "", "3", ""]
If you don't specify a sub-string, then it's subtly different -- it splits
on all sequences of whitespace, and auto-trims the ends:
my_list = " 1 2 3 ".split() # returns ["1", "2", "3"]
If you want to split on a regex instead of a string, you need to use the
split
function in the re
module:
import re
fields = re.split("\s*,\s*", "this,is , a ,CSV, line")
regex (regular expressions)
(Full docs on Python's regexs are
here.)
The more I work with regexs in various languages, the more I start to appreciate
the brevity of perl. And since perl is the gold standard for regexes, here
are the pythonic ways to do various things you may want to do:
basic match
perl: if ($str =~ /f.o/)
python: if re.search("f.o", str):
capture of one element
perl: if ($str =~ /f(.o)/) { my $match = $1; ...
python:
res = re.search("f(.o)", str)
if res:
match = res.group(1) ...
capture of two elements
perl: if ($str =~ /(ab)?cd(e.*)?f/) { my ($m1, $m2) = ($1, $2); ...
python:
res = re.search("(ab)?cd(e.*)?f", str)
if res:
m1 = res.group(1)
m2 = res.group(2)
...
non-capture of parentheses
perl: if ($str =~ /f(?:.o)*/)
python: if re.search("f(?:.o)", str):
case-insensitive match
perl: if ($str =~ /f.o/i)
python: if re.search("f.o", str, re.I)
return captured strings, instead of success flag
perl: my @matches = ($str =~ /f.o/g);
python: matches = re.findall("f.o", str)
substitution
perl: $str =~ s/f.o/$newtext/;
python: str = re.sub("f.o", newtext, str)
substitution with back references
perl: $str =~ s/f(.o)/m\1/;
python: str = re.sub("f(.o)", r"m\1", str)
set
Sets are implemented as hash tables, which means they are really good at
two things:
- quickly telling you if something's in it
- uniquifying a list
It also means that there's no concept of ordering between the elements, so
you won't be able to extract elements in the order you put them in.
Sets are mostly primitives, so there is syntax specifically for creating
them. However, this syntax (curly braces) overlaps with the syntax for
creating hashes, so the one ambiguous case (an empty set) has different
syntax:
empty_set = set()
nonempty_set = {'a', 'b', 'c'}
Sets support an awesome number of functions, many of which exist in both method
and operator form.
Worth mentioning is that there is also a frozenset
class, which is the
immutable version of set
.
add [method]
Adds something to a set.
myset.add("asdf")
If you have the list ahead of time, you can add them all when you create
the set:
myset = set(some_list)
remove/discard [methods]
Removes something from a set. If the thing is not found, remove
will
throw an exception, and discard
will just keep calm and carry on.
myset.remove("asdf")
myset.discard("fdsa")
"in" [global operator]
Checks if something's in the set.
if "asdf" in myset: ...
"len" [global function]
Returns the number of elements in the set.
len(myset)
issubset ("<=", "<") [method]
Returns whether one set is a subset of another. Predictably, the
"<" form checks whether it's a proper subset.
if small_set <= big_set: ...
issuperset (">=", ">") [method]
Opposite of issubset
.
union ("|", "|=") [method]
Returns the union (addition) of two sets.
big_set = small_set1 | small_set2
intersection ("&", "&=") [method]
Returns the intersection (overlap) of two sets.
small_set = big_set1 & big_set2
difference ("-") [method]
Returns one set without any of the elements of a second set.
good_tv = fox_programs - fox_news
symmetric_difference ("^") [method]
Returns the elements that are in one but not both of the sets. ("xor".)
my_xor = set1 ^ set2
comprehensions
Holy crap, comprehensions apply to sets, too! (OK we haven't gotten to
comprehensions yet, but they're next, in the "list" section.)
s1 = {'a', 'b', 'c', 'd', 'e', 'f'}
s2 = {c for c in s1 if c not in 'powerade'} #=> s2 is {'f', 'b', 'c'}..order is not preserved!
list/tuple
Lists and tuples are conceptually the same thing (arrays), but they have different
syntax and semantics. (Don't read that sentence too carefully..)
| mutable? | notation? |
list | y | brackets "[]" |
tuple | n | parentheses "()" |
Lists have direct syntax support in Python, so you create one directly like so:
my_list = ["a", "b", "c"]
(Note that the elements of a list do not all have to be the same type.)
Lists work the same as most other languages: you access elements by
their zero-based numeric index, using square brackets. Python also lets you
use negative indexes to step back from the end of the list:
a = my_list[0] #=> "a"
b = my_list[-1] #=> "c"
Lists may also nest. Yo dawg, I heard you like lists, so I put a list in
your list:
a = [1, 2, 3]
b = [4, 5, 6]
c = [a, b, "foo"] #=> "[[1, 2, 3], [4, 5, 6], "foo"]"
slicing
You can use slice notation with lists, and it works much the same as for
strings. Note, however, that you can write to lists slices (unlike string
slices.)
sub_list = my_list[4:7] #=> [my_list[4], my_list[5], my_list[6]]
my_list[1:3] = "moo" #=> replaces two items with the single "moo"
Also note that the way to copy a list is to use a slice on the whole thing:
copied_list = orig_list[:]
adding elements (append/extend/+)
append
and extend
both add elements to the end of the list. The difference
is when the element you're adding is itself a list: append
will add the list
as a single object, whereas extend
(and +
) will add all the
elements of the list as multiple objects.
my_list1 = ["a", "b"]
my_list2 = ["c", "d"]
my_list1.append(my_list2) #=> ["a", "b", ["c", "d"]]
my_list1.extend(my_list2) #=> ["a", "b", ["c", "d"], "c", "d"]
The only difference between +
and extend
is that +
returns a new list instead of modifying an existing one.
inserting elements (insert)
insert
inserts the given element at the given location, pushing the existing
elements out of the way. It returns None
, so you cannot chain it.
l = ['a', 'b', 'c']
l.insert(1, 'd') #=> ['a', 'd', 'b', 'c']
removing elements (del, remove, pop)
del
is a global function that takes a list element (or slice!) as a
parameter, removes those elements from the list, and returns nothing:
del my_list[2]
del my_list[4:6]
remove
is a member function that takes a list element value. It will not
return anything. Note that it will remove the first element of the list that
has the given value:
my_list.remove('asdf')
pop
will remove and return an element of the list. By default it operates
on the last element, but you can give it an index to specify a different one:
l = ['a', 'b', 'c', 'd']
val = my_list.pop() #=> val is 'd', my_list is ['a', 'b', 'c']
val = my_list.pop(1) #=> val is 'b', my_list is ['a', 'c']
size (global 'len' function)
list_len = len(my_list)
searching (index)
Once again, using the in
operator is more efficient if you just want to know if
something's in a list, but index
will tell you where exactly it is:
idx = my_list.index('asdf')
index
will throw an exception if the element is not found; in
merely
return False
.
counting (count)
count
returns the number of times something occurs in a list. You cannot use
regexs or anything sophisticated here.
num_overachievers = test_scores.count(100)
sorting (sort method)
sort
sorts the elements of the list. This operates on the list itself, rather
than returning a new list. However, it returns None
, so don't go trying to
chain things together:
my_list.sort()
sorting (sorted global function)
sorted
sorts the elements of a list, but returns a new list instead of
modifying the list itself.
sorted_list = sorted(list)
sorted
lets you specify a sort function (as a lambda), so you can specify
custom sort criteria. By default, sorted
sorts by increasing value; here's
how to make it sort by decreasing value:
sorted_list = sorted(list, key=lambda n:-n)
reverse
Reverses the elements of a list, in place. It also returns None
.
my_list.reverse()
comprehensions
You will undoubtedly encounter list comprehensions as you explore the world, for
they are considered one of the most pythonic things one could possibly do. I
think it's equivalent to genuflecting or visiting Mecca.
List comprehensions are very similar to perl's map
function (in fact, Python
has a map
function as well). The idea is to succinctly convert one list into
a second list using some translation that you specify. Let's say we have a list
of numbers and we also want the list of them scaled by 0.2. The "long" way
might be:
scaled_list = []
for x in unscaled_list:
scaled_list.append(x*0.2)
List comprehensions are syntactic sugar to convert that to the following:
scaled_list = [x*0.2 for x in unscaled_list]
Notice that we basically just moved the "for x in unscaled_list" and "x*0.2"
around so that they're inside the brackets, and in the other order.
So that's the Python equivalent of perl's map
, but what about grep
, which
returns only elements matching certain criteria? List comprehensions can do
that, too. Suppose we're filtering results from match.com and we want only
girls between 16 and 19 and a half. (Stop looking at me like that. That's
a Monty Python reference.) The "long" way:
hawties = []
for g in girls:
if g.age >= 16 and g.age <= 19.5:
hawties.append(g)
List comprehensions make this much more succinct:
hawties = [g for g in girls if g.age >= 16 and g.age <= 19.5]
You can also nest list comprehensions, but before you do that you should
consult a licensed psychiatrist.
tuple packing/unpacking
Tuples and the read-only version of lists. (Python has mutable and
immutable versions of everything..)
Tuples are used for packing and unpacking multiple values into a single
variable. You can pack like so:
t = () #=> empty tuple
t = 'a', #=> one-element tuple
t = 'a', 'b' #=> two-element tuple
t = ('a', 'b') #=> exact same as above
You can then unpack a tuple by assigning it the other direction:
e1, e2, e3 = t #=> must be 3 elements in 't' for this to work
(e1, e2, e3) = t #=> exact same as above
dict
Python's dictionaries are what everyone else calls hash tables. (Or associative
arrays.) They map keys to values, where the keys can be arbitrary strings.
(Actually, then can be any immutable type, which is one of the reasons why
Python has mutable and immutable versions of everything.)
Dictionaries are a primitive, so there is direct syntax support for
creating them. Here's the most common way to create them:
english_to_german = {'one':'eins', 'two':'zwei', 'three':'drei'}
There are at least four other ways to create dict
objects, but I'm not
going to show them to you because this one is by far the best one to use.
size (global "len" function)
Our friend the "len" function is back, this time to tell us the number of
elements in the dictionary.
num_keys = len(my_dict)
query (global "in" operator)
To see if something is in the dictionary, use in
, just like for strings and lists.
exists = some_key in my_dict
retrieval (global "[]" operator)
To get an item from a dictionary, we use the brackets like we're looking up
an array, except the index is a string (or other immutable data type)
instead of a number!
my_val = my_dict[some_key]
If the key does not exist, python will actually throw a fit. To avoid that,
you could either use in
to see if you should try the retrieve at all, or
you could specify a default value to return if the key's missing. Further,
you can also have python add that default key if you want. To just return a
default, use get
, whose default default is None
:
my_val = my_dict.get(some_key1) # my_val will be None if some_key1 doesn't exist,
# and my_dict will still not have some_key1
If you want python to set the missing key's value to the default at the same
time, use setdefault
:
my_val = my_dict.setdefault(some_key2, "moo") # my_val would be "moo" instead
# of None, and my_dict now has some_key2 for sure.
removal (global "del" function)
To remove an item from a dictionary, pretend you're fetching it with
brackets but then put del
in front of it.
del my_dict[some_key]
items [method]
Returns all the (key, value) pairs in the dict. This is the
preferred way to walk a dictionary:
for this_key, this_value in my_dict.items():
..
keys [method]
Returns the list of all the keys in the dictionary.
for this_key in my_dict.keys():
..
Note that this is (more or less) the function called when you try to iterate on
a dictionary, so the above example is equivalent to:
for this_key in my_dict:
..
A frequent need is to walk a dictionary in sorted order. For this, use the
global sorted
function. Also, remember you can use its key
parameter to
determine the sort order. Here's how to sort a dictionary by value (instead
of by key):
filename2length = { ... }
print("Files by size:")
for this_file in sorted(filename2length, key=lambda n:-filename2length[n]):
print("{} {}".format(filename2length[this_file], this_file)
values [method]
Returns the list of all the values in the dictionary.
for this_value in my_dict.values():
..
pop [method]
Removes and returns a specific element of the dictionary.
old_value = my_hash.pop(some_key)
popitem [method]
Removes and returns an arbitrary (key, value) pair from the dictionary. This
might be useful if you need to destructively iterate. (Remember that it's
dangerous to iterate over a list that's changing, so putting a del
inside
a loop going over keys
is perilous.)
while my_hash:
(key, value) = my_hash.popitem()
..
update [method]
Adds the key/value pairs from another dictionary to this one, overwriting any
common items.
my_hash = {'a':0, 'b':1}
other_hash = {'a':2, 'c':3}
my_hash.update(other_hash) #=> my_hash is now {'a':2, 'b':1, 'c':3}
comprehensions
OK, my mind is blown; there are comprehensions for dictionaries too. Use a
colon (":") to separate the keys and values:
my_list = [0, 1, 2, 3]
num_to_square = {x:x*x for x in my_list} #=> "{0:0, 1:1, 2:4, 3:9}", in some order
the defaultdict class
While the dict
class has a setdefault
function to let you specify what to
do when the dict is asked for a key that doesn't exist, it only applies to the
first level of hierarchy. If you want to truly emulate perl, you need to use
the defaultdict
class so that it will fill in everything for you.
d = defaultdict(int) # unknown keys are auto-created with int()
for this_num in numbers:
d[this_num] += 1 # first call initializes d[this_num] to int(), which is 0
for this_num in d:
print(this_num, "occurred", d[this_num], "times")
collections.deque
Since lists aren't very good as queues, Python has a deque
class in
the collections
module:
import collections
queue = collections.deque(['a']) #=> ['a']
queue.append('b') #=> ['a', 'b']
queue.appendleft('c') #=> ['c', 'a', 'b']
next = queue.pop() #=> next is 'b', queue is ['c', 'a']
next = queue.popleft() #=> next is 'c', queue is ['a']
queue.extend(another_list) # (adds elements of another_list to end of queue)
queue.extendleft(another_list) # (adds reversed elements of another_list to front)
array.array
Python's lists can contain heterogenous objects, and that generality comes with
a price: each element of a list is 16 bytes of data. If you need better
memory compaction and happen to know your list will contain homogenous
numeric data, you can use array.array
instead.
import array
arr = array.array('H', [0, 14, 3, 57]) # "H" means two-byte unsigned
I believe you can use these anywhere you could use a list. Do check the
docs
for the various ways to specify what kind of numeric data you have.
Control statements
condition checking
Before we launch into how Python uses conditions to control program flow, let's
go over how conditions work.
The usual boolean operators are spelled out as and
and or
and not
(instead
of "&&" or "||" or "!"). They have the same precedence they do in other
languages: not
has highest, followed by and
, followed by or
.
Python's comparison operators (in
, not in
, is
, and not is
) all have the
same precedence, which is higher than the booleans.
Math operators are familiar, and have higher precedence than the comparison
operators. However, one biggie in Python is that you can chain math operators;
the following are exactly the same:
if a < b == c: ..
if a < b and b == c: ..
Final note: you may not do an assignment inside a condition. I guess the
Python designers got tired of that particular C bug.
if
The only things odd about Python's if
are that it doesn't require parentheses, and
that "else if" is condensed down to elif
.
Tip: Python imported (from C) the ability to put short if
statements on a single line.
if x < 0: print("[ERROR]")
if x < 0:
print("[ERROR]")
elif x == 0:
print("[WARNING]")
else:
print("blah blah blah...")
There is no "switch" or "case" statement in Python; use a bunch of elif
s
instead.
ye olde ternary operator
Python decided to implement the typical ternary operator with a "suffix" form
of if
and else
:
result = "ok" if num_errors==0 else "failed"
This is, to date, the most awkward construct I've seen in python. Yikes.
assert
Python has a built-in assert
function that you can use to make sure certain
expected conditions are what you expect them to be. When the condition is
false, assert
throws an exception. (And when the condition is true, it
does nothing whatsoever.)
assert x>0
while
Like if
, while
doesn't require parentheses.
As usual, you can use continue
to jump up to the next iteration of the loop, and
break
to break out of the loop right away.
while x < 10:
x += 1
The weirdest thing ever, though, is that python has an else
for while
, which runs
when the loop condition becomes false. (Though skipped if you use break
.) The
example from the real python tutorial is pretty cool -- it uses this to tell you
if a number is prime, by virtue of it not being divisible by anything:
def is_prime(x):
for i in range(2, x):
if x % i == 0:
res = False
break
else:
res = True
print("Is", x, "prime? ", res)
return res
for
Python's for
is a little strange because it only does list iteration. That
is, you only give it one thing: a list, or other iterable object. You do not
give it a start command, a loop condition, and an iteration command.
for x in my_list:
...
Like while
, Python's for
also allows you to specify an else
, which runs
when the list is depleted (and you didn't exit the for
with break
).
(Note: avoid modifying a list in the middle of iterating over it. That doesn't
tend to work very well in any language.)
range
Since for
is only for iterables, how can you do a common iteration between
two numbers? For that, Python gives you the function range
, which returns
an iterable over all the numbers you normally would have managed yourself. Voila:
for x in range(0, 2):
print(x)
It may be important to note that range
returns an iterable, not a list with
all the actual numbers in it. That's good! (For memory.)
range
assumes a start number of 0 and an increment of 1; both of those can
be overridden:
range(3) #=> 0, 1, and 2
range(1, 3) #=> 1 and 2
range(1, 10, 3) #=> 1, 4, and 7
Exceptions
All of Python's exceptions inherit from the base Exception
class. (No, we
have not yet covered either classes or inheritence, but exceptions are really one
of the basic parts of Python so we can't wait any longer.)
Python's exceptions look very similar to C++'s: there's a try
block, a
subsequent except
section, and a raise
function. (When I say "look very
similar", I mean architecturally, since C++'s terms are try
, catch
, and throw
.)
Other oddities: Python introduces an else
, which executes only if no exceptions
were thrown, and you didn't leave the try
with a break
, continue
, or
return
. try
also has a finally
, which always executes whether or not there
were exceptions or you left with a break
, continue
, or return
.
catching
A try
can be followed by any number of except class [as var]
catchers, and the first one
that matches the actual exception will be used. (For a catch-all, you could
either just leave off any exception class, or use Exception
, which is the
parent of all exception classes.)
try:
...
except ZeroDivideError: # catches just division-by-zero exceptions
...
except (RuntimeError, TypeError, SomeCustomErrorICameUpWith): # catches any of these
...
except: # catches any other exceptions not listed above
...
A confusing point: you can have an else
here, but it does not mean the same
thing as catch-all except
, even though that's exactly what you'd think reading
it. else
runs if the try
block ran without any exceptions. else
will
not run if you leave the try
block with break
, continue
,
or return
. (Honestly I think it's totally superfluous then, because you
could just put that code at the end of the try
block.)
Note that if you want the exception object itself, you can specify a "parameter"
to the except
. (To get the exception object in a catch-all except
, you have
to use the parent Exception
class form.)
try:
...
except ZeroDivideError as e:
print("caught a div-zero exception:", e)
exept Exception as e:
print("caught generic exception":", e)
raising your own
Most languages "throw" exceptions; Python "raise"s them.
To raise an exception, you call the raise
function. For
a built-in, you just use the built-in exception class:
raise ZeroDivisionError
raise ZeroDivisionError('custom message addition!') # you can add your own text!
To make your own exceptions, define a new class that inherits Exception, and
define at least the __init__
and __str__
functions:
class myexception(Exception):
def __init__(self, msg):
self.msg = msg
def __str__(self):
return str(self.msg)
...
try:
raise myexception("wah!!")
except myexception as e:
print("Caught something:", e) #=> "Caught something: wah!!"
Note that you can trap and re-raise exceptions, which allows you to know what's
going on without interfering:
try:
...
except:
print("Eep! Exception! Re-throwing..")
raise
Lastly, you can create an exception object without immediately raising it,
so you can play with it before sending it along:
e = myexception("eek!")
e.do_something()
raise e
common exceptions to know about
- Pressing ctrl-c generates a
KeyboardInterrupt
exception. Woo, you can
trap them!
int
typecasting generates a ValueError
when the value cannot be
converted to a number.
I/O
Input from stdin
You can read from stdin with the input function. It is line-based (as opposed
to character-based).
print("WHAT...is your name?")
name = input()
print("WHAT...is your quest?")
quest = input()
print("ok, the 1960s called, they want their terminal-based input back.")
Output to stdout
You've already seen the print function. The only useful thing to add
is the way to make it not print out the carriage return at the end:
print("A line without a carriage return!", end='')
Output to stderr
stderr is an awkward abomination in UNIX architecture, but for some reason
people keep using it. Fortunately, using stderr in Python is also an awkward
abomination:
print(file=sys.stderr, "I'm going to stderr!")
A second way to do it, for some reason:
sys.stderr.write("I'm going to stderr!\n") # carriage return needed for write()
Input from files
Python has file objects:
try:
fh = open(filename, "r")
for line in fh:
# 'line' includes the terminating \n:
line = line.strip("\n")
...
fh.close()
except IOError as e:
print("[ERROR] could not read", filename, ":", e)
A second way to do the exact same thing is using the with
keyword. It
automatically closes the file when it's out of stuff to read, which
is more robust for resource cleanup:
with open(filename, "r") as fh:
for line in fh:
# 'line' includes the terminating \n
line = line.strip("\n")
..
(The with
function is totally crazy. Its whole deal is to set up
"constructors" and "destructors" for generic blocks of code. In this case,
the open
returns a filehandle object that happens to also be a "Context
Manager", and with
calls special "Context Manager" functions at the
beginning and end of the real block of code. It's..crazy. You can make your
own Context-Manager-aware classes that could also be used with with
. Assuming
you, too, are crazy.)
You can read in the entire file in one shot ("slurping) with the read()
function. You
can also read in all the lines of the file in one shot with the readlines()
function.
If the open
fails, it raises an IOError
exception.
The valid modes you can give to open
are:
- r: read
- w: write
- x: write, but die if the file already exists
- a: write, but append if the file already exists
- b: binary-mode, either for reading or writing. Basically just turns
off encoding.
Output to files
Since this calls write
directly, you have to do your own object-to-string
conversion. (One would think this would be built-in, just like print
, but
hey.)
fh = open(filename, "w")
fh.write("Line 1\n") #=> note the \n is needed here
fh.writelines(list_of_stuff) #=> each element must be a string with a \n!
fh.close()
If the open
fails, it raises an IOError
exception. (Or, if you used
mode "x" for strictly creating a new file, open
will raise a FileExistsError
exception if the file already exists.)
os.listdir(..)
Returns a list of the contents of the given directory. By default, it does
not include dot-files.
stuffs = os.listdir("/tmp")
repr vs. str
Python has two functions for converting generic values to human-readable
strings: repr
and str
.
str
is meant to convert things for humans to read on a terminal or file.
repr
is meant to convert things for the Python interpreter to read.
Numbers are the same for both:
a = 10
print(str(a)) #=> 10
print(repr(a)) #=> 10
Strings get quoted by repr
:
a = "moo"
print(str(a)) #=> moo
print(repr(a)) #=> 'moo'
Functions
Python functions are defined with the def
keyword (for defined!). Function
signatures include only input variable names; outputs are not declared, and
there are no variable types at all.
def empty_func():
pass
def useless_func(in):
return in
Python treats functions as first-class values, meaning you can assign a
variable to a function:
foo = useless_func
foo("bar")
Return value(s)
As mentioned earlier, one of Python's coolest things is the ability to return
multiple values (without having to wrap them in an object!). Python does this
by packing and unpacking them as a tuple:
def my_func():
...
return "no errors", 17, False
...
err_str, line_count, is_flammable = my_func()
In a completely unrelated observation, I am mildly surprised that Python3 made
print
a function (and thus now requires parentheses) but did not change
return
(which is still allowed to not use parentheses).
Pass-by-value-of-reference
Python claims that it passes argument by value, though the values of arguments
are always references. This is mostly pass-by-reference, except that you get
a local copy of the reference for you to play with. Here's a quick illustration:
def myfunc(param):
param.do_something() # actually affects RealObj
param = ... # does not affect RealObj
return
RealObj = ...
myfunc(RealObj)
I snarkily call this "pass by value of reference". The Python docs think
maybe it should be called "pass by object reference".
Default parameter values
Parameters can have default values, like so:
def my_func(arg1, arg2 = 42, arg3="asdf"):
pass
Python requires that the parameters with defaults go after ones without them.
DANGER: mutable default values
The default value is only evaluated once, at compile-time, and thereafter
appears to become a static variable. Subsequent calls will inherit any
local changes made to it. Actual example from the Python docs:
def f(a, L=[]):
L.append(a)
return L
print(f(1)) #=> "[1]"
print(f(2)) #=> "[1, 2]"
print(f(3)) #=> "[1, 2, 3]"
This is ... just weird as hell. The default value for L is not actually
what the code says it is, it's actually whatever it's been mangled to by
whatever code happens to have run. Yikes.
Named arguments
Very unique to Python is the ability to name your arguments, which allows you to
put them in any order. (Verilog does this too, but that's not a
software language.)
def my_string_to_int(src = my_string, base = 10):
...
my_string_to_int(base = 8, src="0755")
Note one odd constraint: you can't put positional (that is, non-named) arguments
after named arguments. :/
Variable arguments
Python has two ways of supporting an arbitrary number of parameters: variadic-style,
and hash-style. Both of these can exist in the presence of formal parameters (that
is, the usual style), and with each other.
variadic-style
Variadic-style is also known as varargs-style. What it does is package up all the
unknown arguments into a tuple and put it into a single parameter (which you identify
with an asterisk):
def my_concat(*args):
return "::".join(args)
hash-style
In hash-style, unknown key=value arguments are put into a
single hash parameter (which you identify with two asterisks):
def my_func(**stuff):
for k in sorted(stuff.keys):
print(k, stuff[k])
using all of the above
So now we have four different ways to specify parameters and arguments to functions:
formal, unnamed |
def f(arg):
pass
..
f(foo)
|
formal, named |
def f(arg):
pass
..
f(arg=foo)
|
variable, unnamed |
def f(*arg):
pass
..
f(foo)
|
variable, named |
def f(**arg):
pass
..
f(arg=foo)
|
Note: if you use both variadic-style and hash-style, you have to list the
variadic-style arg before the hash-style arg:
def myfunc(real_param, * extra_args, **named_args):
for arg in extra_args:
..
for arg in named_args.keys():
..
unpacking variable args
The above let you pack a bunch of arguments into a single parameter; what if
you have a single argument that needs to fill a bunch of parameters instead?
Python lets you unpack these things with the same sort of syntax, though it's
kind of out of place:
def myfunc(a, b):
...
# the unnamed version:
mylist = [10, 27]
myfunc(*mylist)
# the named version:
myhash = {'a':10,'b':27}
myfunc(**myhash)
lambda functions
Lambda functions are small, anonymous functions, and are primarily useful when
function objects are the best way to write a particular piece of code.
add = lambda x:x+1
sub = lambda x:x-1
if start < end:
incr = add
else:
incr = sub
while start != end:
...
incr(start)
You cannot do anything complex in lambda functions -- they have to fit on one
line, and cannot call any other functions.
The Python docs call this a nod to functional programming, but honestly I
don't see why it's necessary. Functions are already first-class objects, so
you can do the above already. /shrug
Modules
Packaging up code is pretty straightforward in Python. There are two ways to do
it: one is by classes (which you can instantiate as objects) and the other is
by modules (which you cannot instantiate, so they're more like a namespace).
To create a module foo
, all you do is name your Python file foo.py
. There is
no package keyword like in perl; the expectation is that modules and files have
a one-to-one mapping, so foo.py
exactly describes the foo
module.
To import your new foo.py
module from another script, you run import foo
. Python
appends the ".py" extension and loads/runs whatever it finds in foo.py
.
Any code at the top level of a module is executed the first time (and only the
first time) the module is imported.
Modules are also a namespace (so, they have their own symbol table). Thus you
would need to use global
to write to outside variables.
module search path
So where does Python finds your foo.py
, since it could be
anywhere on the file system? Python searches the paths in the sys.path
list
until it finds one that has a foo.py
. Initially, sys.path
contains the
following:
- the cwd (not necessarily the dir where there script is!)
- ENV{PYTHONPATH}
- system- and installation-dependent paths
You can write to sys.path at any time, so you have full
control over where Python looks for foo.py
. Suppose you have a dir that
contains a top-level main.py
that uses an adjacent Stuff.py
; here's a good
way to make sure main.py
picks up its associated Stuff.py
:
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.abspath(sys.argv[0])))
import Stuff
"import" vs "from .. import"
"import foo" imports whatever's in foo.py
and preserves the namespace, so you have
to qualify any identifiers you use. For example:
import mystuff
mystuff.conquerTheWorld()
mystuff.evilLaugh()
"from foo import *" also imports whatever's in foo.py
, but rolls everything
into the local namespace:
from mystuff import *
conquerTheWorld()
evilLaugh()
(Note that the "*" form does not import any identifiers that begin with
an underscore; semantically those are considered internal functions.)
If you don't want to steamroll your local namespace like that, you can also
selectively import just the things you want:
from mystuff import conquerTheWorld
conquerTheWorld()
mystuff.evilLaugh()
renaming identifiers
Should the need arise, you have the option of renaming identifiers when you import
them. This can be abused to much merriment:
from mystuff import conquerTheWorld as petKittens
petKittens() # d'awww...er, wait...
Packages
Packages are just collections of modules. Python has a little overhead and
a little syntactic sugar for dealing with them. First, the overhead.
In order for Python to recognize packages, you must create an __init__.py
file in the package directory. (It can actually be empty!) This file is a
marker so that Python can avoid false positives looking for your package.
Example:
% find .
./myscript.py
./foo/bar/__init__.py
./foo/bar/mod1.py
./foo/bar/mod2.py
The syntactic sugar is that you can now use periods (".")
in import
names to denote subdirectories. (It's similar to perl's
double-colon in things like use List::Util
.)
% cat myscript.py
import foo.bar.mod1
from foo.bar import mod2
...
Since "import *" is a little strange for these things (do you mean all the
identifiers in a module? or all the files in a dir? or all the subdirs in
a dir? or all of the above?), Python has a mechanism for the package to
declare what "import *" should do. See the python docs for how to define the
__all__
attribute in your __init__.py
.
You can do relative imports within a package, when you want to pick up a related
module in the same package. The syntax is "from . import foo".
Classes
Classes work almost the same as any other language. To define a class, you use
the class
keyword, and then define any attributes (data) or methods (functions)
you want in that class. Usually you only need to define methods, since Python
autovivifies instead of declaring variables.
(However, see the 'static' section below.)
class myclass:
def myfunc(self):
self.myvariable = "asdf"
Almost all of the methods and members in a class are public, so
anyone can view and change anything.
instantiating
Creating an instance of a class is done a little strangely: you call the
class name like a function, and Python implicitly knows that means to create an
object:
myobj = myclass()
inheriting
To inherit from a base class, you include it in the class defintion like so:
class Derived(Base):
...
Python supports multiple inheritence:
class Derived(Base1, Base2):
...
Everything in classes is virtual, which means that you will always pick up
any overrides of methods (or attributes!) by derived classes. However, there
may be cases where you don't want to pick up the override, so Python's
syntax for picking a specific definition of a method/attribute is to scope it
with the class name:
class Derived(Base):
def myfunc(self):
# call Base's first:
Base.myfunc(self)
# then do my stuff:
...
constructor
The constructor is a special function named __init__
. Note that python does
not call parent constructors for you, because why would such a perfect language
do such a silly, silly thing, so you have to do it yourself with super
:
class myclass:
def __init__(self):
super().__init__()
You can add parameters to __init__
so that you can pass arguments during
instantiation.
destructor
The destructor is a special function named __del__
. It is so named because it
turns out you can run the global del
function on any object to destroy it at
any time. (You can also destroy an object by assigning it to None
. You don't
need to do either of those things because Python has garbage collection, but
it's nice to know you can force cleanup when necessary.)
class myclass:
def __del__(self):
pass
"string operator"
If you trying to print out an object, you will usually get gibberish:
myobj = myclass()
print(myobj) #=> "<class '__main__.myclass'>"
But if we define a __str__
function we can format it however we like:
class myclass:
def __str__(self):
return "Moo!"
myobj = myclass
print(myobj) #=> "Moo!"
__getitem__
Defining this function allows consumers to reference your object with an
index. Here's how to roll your own list or dictionary!
class MyClass:
def __getitem__(self, id):
..
myobj = MyClass()
..
print(myobj[3])
__setitem__
This is the inverse of __getitem__
. It lets indexed objects of your class
be lvalues.
class MyClass:
def __setitem__(self, id, value):
..
myobj = MyClass()
myobj['asdf'] = "foo!"
__getattr__ (autoload)
Perl has a nifty function called AUTOLOAD
, which gets called whenever you try
to invoke a function that isn't defined. The AUTOLOAD
function has a
parameter with the name of the function that you tried to call, and you can try
to figure out what to do with it on your own.
Python's eqivalent (the __getattr__
function) is only for data (not methods),
but it may still be useful:
class myclass:
def __getattr__(self, attrname):
print("You tried to get", attrname, "?? I give you '3' instead!")
return 3
operator overloading
You can overload some operators for your class. Note that these are only
used when the objects on both sides are the same class.
__eq__ | Used in == |
__ne__ | Used in != |
__lt__ | Used in < |
__gt__ | Used in > |
__index__ | Used in int() conversion |
RTTI
Python has two (identical?) ways of getting the name of an object's class.
The first is with the __class__
attribute, and the other is by passing it
to the type
function.
class myclass():
...
x = myclass()
print(x.__class__) #=> "<class '__main__.myclass'>"
print(type(x)) #=> "<class '__main__.myclass'>"
If you don't need the exact name of the class, you can also check just to see
if it's an instance of a particular class:
if isinstance(x, myclass): ...
You can also check to see if an object somehow inherits from a particular class:
if issubclass(DerivedClass, BaseClase): ...
iterators
You can implement your own iterators by defining a class that has __iter__
and __next__
functions. Instances that class can then be used directly in
the for
loop!
(Section 9.9
of the tutorial has a great example that I don't need to repeat.)
generators
In Python, generators are just functions that call yield
, and they're just
another way to implement iterators. (In other languages, things that call
yield
are called coroutines.)
I'm also not going to reprint the great example
the tutorial
has, but I wanted to at least mention these things because they could be useful.
static methods
A static method is one that does not have an implicit self
argument -- these
exist when you have a method that applies to a class but not so much to any
instances of it. To designate a method as static, you use the special
@staticmethod
decorator:
class myclass:
...
@staticmethod
def myfunc(): #=> look, no self!
...
For flexibility, you can call static methods on either the class or on an
instance of it. (The instance form ignores the instance, except for figuring
out which class it belongs to.)
myclass.myfunc()
o = myclass()
o.myfunc()
static variables
Python's implementation of static variables is..indirect. To understand how
this works, you first need to understand that Python is pervasively
object-oriented. When you define a class
, you're used to instantiating it
to create objects. However (and this is the key),
the class itself is also an object. And since it's an object, it has its
own namespace! Therefore, we can create static class variables by navigating
Python's namespacing rules. So here we go:
class MyClass:
asdf = 1 #=> this is both a class and an instance attribute
def my_func(self):
# these assignments are all orthogonal:
asdf = 2 #=> this is local to my_func
self.asdf = 3 #=> this is the instance's attribute
MyClass.asdf = 4 #=> this is the class's attribute
Scoping
Python has 4 specific scopes for identifiers, and now that we've covered
them all in other sections, you'll know what I'm talking about!
Of particular note is that none of these is global. That's right - python
does not have global variables.
function
The smallest scope is actually function. You're probably used to block-level
scoping from other languages, but the following works in python just fine:
..no mention of 'foo'..
if some_condition:
foo = "bar"
else:
foo = "bas"
print(foo) # works just fine
enclosing function
This is a little odd, but since you can define nested functions, python lets
you peek into the parent function's namespace via the nonlocal
keyword. Yay,
it's like we're programming in tcl!
def outer():
myvar = 1
def inner()
nonlocal myvar
myvar = 3 # outer's myvar is 3
Curiously enough, python will actually warn you if you try to mess with myvar
before declaring it nonlocal
.
Also curiously enough, nonlocal
will continue tracing up nested namespaces
looking for your identifier.
module
Modules, which are really objects like everything else, can have any
attributes you want. You can access those attributes directly from within
the module by declaring them as global
.
Such attributes look a lot like global variables (especially because of that
deceptive word global
), but when you try to access
them from another module you do need scope resolution.
(from foo.py)
asdf = 1
..
def myfunc():
global asdf
asdf = 2
..
print("asdf:", asdf) # prints 1
myfunc()
print("asdf:", asdf) # prints 2
(from bar.py)
import foo
print("foo.asdf:", foo.asdf) # prints 1
foo.myfunc()
print("foo.asdf:", foo.asdf) # prints 2
built-in
The last scope is for built-in python identifiers such as int
and len
. You,
as a mere mortal, do not have the ability to add things to this scope.
OS and system functions
sys.argv
Python puts all of the command-line arguments into the sys.argv
variable. Like
C, sys.argv[0]
is the path to the script; the first argument is actually in
sys.argv[1]
.
Python has not one but two packages for handling command-line arguments:
getopt
(which is apparently like the UNIX one) and argparse
(which is the
endorsed one).
argparse
Here's an executive-summary example of how to use argparse
:
import argparse
p = argparse.ArgumentParser(
description="this script does blah blah blah" #optional; shows up with -h
)
# an option with an argument:
p.add_argument(
"--infile", #long option
help="input file") #optional; shows up with -h
# an option without an argument:
p.add_argument(
"-v", #short option
action="store_true") #"exists" instead of "has a value"
# an argument without an option (e.g. the "update" part of "cvs update"):
p.add_argument(
"mode") #instead of "--mode"
# a required option:
p.add_argument(
"--moo",
required=True)
# an option with a default value:
p.add_argument(
"--out",
default="/dev/null")
# an option with an argument that could be specified more than once:
p.add_argument(
"--infile",
action='append') #resulting field will be a list!
# an option without an argument that could be specified more than once, and
# in either short or long form:
p.add_argument(
"-v", #detects "-vv"
"--verbose", #detects "--verbose --verbose"
action='count')
# an option with an optional argument:
p.add_argument(
"--foo",
nargs="?") #detects either "--foo A" or just "--foo"
# an option with multiple arguments:
p.add_argument(
"--foo",
nargs=4) #detects "--foo A B C D"
# an option that can only be one of a few different things:
p.add_argument(
"--darth",
choices=['vader', 'maul', 'sidious'])
args = p.parse_args()
# and here's how to access things. The field name is the first double-dash option name
# you give for an arg (or, the first single-dash option name if there are no
# double-dash options):
if args.verbose:
print("[info] verbose on")
print("Using input file", args.infile)
print("Running as mode", args.mode)
From reading through the docs, I believe (but can't definitively confirm)
that argparse
follows the POSIX convention for arguments, which means:
- "short" options (those with a single dash) can be arbitrarily compacted
together, so "-foo bar" is actually "-f -o -o bar", whatever that may mean.
- "long" options (those with a double dash) are supposed to be the normal
ones that everyone uses. (Which has not been my experience in practice,
but hey.)
The description
and help
arguments are optional, but one of the ten-ton-hammer
things about argparse
is that it will automatically print the inline help
for a script when the arguments fail validation.
argparse
is immense. If you're wondering if you can do XYZ: you can, and
go look through
the documentation.
(Skip the tutorial though; the authors
spend more time showing us how they do debugging than showing how to use the
module.)
os.getcwd() / os.chdir()
Also known as 'pwd' or the environment variable $PWD, this returns the process's
current directory. It's safer to use this than $PWD because anyone can change
environment variables, and chdir isn't guaranteed to.
import os
pwd = os.getcwd()
os.chdir("/tmp")
...
os.chdir(pwd)
os.environ[]
A hash of the ENV variables. Modifying this will call the underlying putenv
function, but calling os.putenv
directly will not update os.environ
. So,
it's recommended to use os.environ
directly as much as possible, unless you
happen to be using SWIG'd C code that calls putenv
itself, in which case you
should make sure all your ENV reading comes from os.getenv
.
I hate ENV management.
import os
print("Orig LD_LIB path:", os.environ["LD_LIBRARY_PATH"])
os.environ["LD_LIBRARY_PATH"] = "/some/custom/path:"+os.environ["LD_LIBRARY_PATH"]
sys.executable
A string containing the path to the version of Python executing this code.
import sys
print("being run by:", sys.executable)
os.system()
Runs a program in a sub-shell, as usual. The return value is the full error
code, not just the return code, so you need to shift-right by 8 to get the
return code.
import os
os.system("mkdir -p foo/bar")
shutil.copyfile() / shutil.move()
The shutil
module contains a few functions that could save you from calling
system
a kajillion times.
import shutil
shutil.copyfile(src, dest)
shutil.move(src, dest)
path_to_mkdir = shutil.which("mkdir")
os.mkdir / os.makedirs
These functions create directories. The difference between them is that
os.mkdir
creates only leaf directories, whereas os.makedirs
will create
all necessary parent directories.
import os
os.mkdir("/foo/bar/bas") # creates just "bas"
os.makedirs("/foo/bar/bas") # creates "foo", then "bar", then "bas", if needed
glob
The glob
module implements shell-style file globbing:
import glob
code_files = glob.glob("*.py")
date and time
Dates and times are both handled by the datetime
module. It has classes
for just date
, just time
, and for both (datetime
).
import datetime
str(datetime.date.today()) #=> '2012-12-31'
str(datetime.datetime.now()) #=> '2012-12-31 14:36:05.788937'
Another module handling time is the time
module. One of its most interesting
functions is localtime
, which returns a 9-element list of things for the local
time zone:
import time
tm_year, tm_month, tm_day, tm_hour, tm_minute, tm_second, tm_weekday, tm_yearday, tm_isdst = localtime()
# tm_year = 1993
# tm_month = 1-12
# tm_day = 1-31
# tm_hour = 0-23
# tm_minute = 0-59
# tm_second = 0-61. Seriously
# tm_weekday = 0-6, starting with Monday
# tm_yearday = 1-366
# tm_isdst = -1-1. Boolean for "is Daylight Savings", with -1 being "you go figure it out"
Another common function needed is getting the number of seconds since the epoch,
which python does with time.time
. The only difference in python is that this
may return a floating-point number with better-than-second resolution.
start_time = time.time()
...
end_time = time.time()
print("{} seconds elapsed".format(end_time - start_time)
os.path
This module contains a ton of things for playing with paths:
- os.path.abspath: returns an absolute path to the given file, but leaves
links intact
- os.path.realpath: returns an absolute path to the given file, but
resolves links
- os.path.basename: returns just the file name part of a path
- os.path.dirname: returns just the directory part of a path
- os.path.exists: returns whether a file/directory/link exists
- os.path.getmtime: returns the last modification time (in epoch-seconds)
of the given file
Processes
Python's interfacing to subprocesses is a bit on the clunky side, but at least
it's object oriented. (Sigh.)
subprocess.call (a.k.a. "system")
This function behaves like system
in other languages -- it runs the specified
program and waits for it to exit. Its stdin/stdout/stderr
channels are connected
to the current ones, and the function returns the process's returns code.
Passing shell=True
is optional, but runs the command through the
shell first, which means your commandline can contain any of the following:
- pipes
- wildcards
- environment variables
- "~" for home directories
If you use the shell, though, they suggest passing the commandline as a string
instead of as an array.
import subprocess
retcode = subprocess.call(["myscript", "-in", infile, "-out", outfile])
..
retcode = subprocess.call("myscript -in infile -out outfile", shell=True)
subprocess.check_output (reading stdout)
Calls a given sub-program and returns its stdout
.
If you want stderr
as well, you can redirect it into the stdout
channel
by passing in stderr=subprocess.STDOUT
.
Note that you will probably want to call it with universal_newlines=True
,
because otherwise the returned output is "encoded bytes", which is Pythonese for
"useless crap that you can't get any data from because it's a string that isn't a
string so thbptbtpbtptptpbt!!"
If your commandline has any pipes, be sure to turn on shell=True
and specify it as a string instead of a list.
When the called process returns an error code, this function will throw a
subprocess.CalledProcessError exception.
import subprocess
try:
output = subprocess.check_output(
["someprogram", "-in", "infile.txt"],
universal_newlines=True)
except subprocess.CalledProcessError as e:
print("ERROR: someprogram terminated with error code", e.returncode)
try:
output = subprocess.check_output("grep foo bar.txt | grep -v bas",
universal_newlines=True,
shell=True)
except subprocess.CalledProcessError as e:
print("blah blah blah")
subprocess.Popen (driving stdin)
If you want to drive a subprocess's input, there's no convenience function, so
you have to drop down to the ten-ton-hammer function Popen
. For driving a
sub-process's stdin, the key thing to do is pass
stdin=subprocess.PIPE
. Note that you only get one chance to drive
its input, because the subsequent communicate
function you call will close
the sub-process's stdin
. That means all your input has to be put in one string.
Additionally, probably due to unicode support, you have to call bytes()
on your
input to convert it for communicate
.
Brilliant, python.
p = subprocess.Popen(
['/usr/bin/mail',
'foo@bar.com',
'-s "subject line"',
],
stdin=subprocess.PIPE,
)
input_str = "Automated notification of blah blah blah"
p.communicate(input=bytes(input_str, "UTF-8"))
I found another way to do this, which is far more straightfoward, though
there's a scary warning in the python docs that doing this has a possible
deadlock:
p = subprocess.Popen(
['/usr/bin/mail',
'foo@bar.com',
'-s "subject line"',
],
stdin=subprocess.PIPE,
)
p.stdin.write(bytes("Moo moo moo!\n", "UTF-8"))
p.stdin.write(bytes("haha, you've been moo'd\n", "UTF-8"))
p.stdin.close()
p.wait()
Your mileage may vary. Have a nice day!
os.getpid()
This is the function to get the current process's PID.
import os
mypid = os.getpid()
socket.gethostname()
Returns the name of the current machine.
import socket
hostname = socket.gethostname()
Introspection
Interpreted languages usually provide you with a means of asking about
runtime objects in a way that thoroughly breaks encapsulation. For example,
given a variable foo
, give me a string that's the name of its class! Or,
given a class name, tell me all the functions and variables! This is called
introspection and is occasionally very handy. Here are some of the things
you can do in python.
type
type
returns the data type of the given variable, by name. You can then
compare the output to see if there's a match.
a = 0
type(a) # returns "<class 'int'>"
b = "asdf"
type(b) # returns "<class 'str'>"
if type(a) == int:
...
Note that you can also use type
to create a class dynamically. See the
3-argument form, and then wash your eyes out with bleach. Actually, do that
in the other order, so that you never use type
for that.
dir
Give dir
an object, and it return the list of attributes on it. It's really
meant for interactive play-time on the command line, so don't put too much
stock in it.
Remember that attributes usually means "member variables and methods", but
also remember that classes can define the __dir__
function to override the
default behavior of dir
.
dir(str) # returns: ['__add__', '__class__', '__contains__',
'__delattr__', '__dir__', '__doc__'....'endswith', 'expandtabs', 'find',
'format', 'format_map', 'index',....]
callable
callable
tells you whether the thing you passed it can be called like a
function. This is infinitely useful for when you can't remember what you
called your function.
a = print
if callable(a):
a("moo!")
isinstance
isinstance
returns whether a given variable is an instance of a given
class (or any class derived from it).
class B: pass
class D(B): pass
b = B()
d = D()
isinstance(b, B) # returns True
isinstance(b, D) # returns False
isinstance(d, B) # returns True
issubclass
issubclass
is like isinstance
except it queries a class instead of an
object.
class B: pass
class D(B): pass
issubclass(B,B) # returns True
issubclass(B,D) # returns False
issubclass(D,B) # returns True
getattr
getattr
allows you to ask an object for an attribute by name.
a = MyClass()
func = getattr(a, "myfunc")
func("hi") # same as a.myfunc("hi")
docstrings
For functions and classes, Python has some syntactic sugar to make documentation
a bit more consistent. You can establish a function's/class's docstring by
creating a string as the first thing in the body. Behind the scenes, Python
stores that special string in the __doc__
attribute, but consumers
can get it in a much more friendly manner with the help
function:
def my_func():
"""Doesn't do much, but I wrote it so it must be awesome."""
return 4
print(help(my_func))
Docstrings have several conventions that are completely unenforced by the
compiler:
- First line should be a summary.
- Second line should be blank.
- Remaining lines are the full documentation.
- Since they span multiple lines, use triple-quoted strings.
One thing you may come across in docstrings is copy-pasted output from
an interactive Python session. Combined with the doctest
module, these are
actally embedded tests -- a nifty
way to not only document usage of a function but also to test it at the same
time:
import doctest
def myfunc(asdf):
"""Blah blah blah
>>> print(myfunc(10))
0
>>> print(myfunc("asdf"))
'moo'
"""
...
doctest.testmod() # checks all the embedded tests!
Serialization with pickle
Data serialization is awesome because it allows you to dump a Python data
structure to a file in such a way that Python can rebuild the data structure
from the file directly. Python's built-in data serializer is the pickle
module.
import pickle
settings = {"common": {"host":"mordor", "user":"sauron"}, "date","3849739103"}
fh = open("datafile", "w")
data = pickle.dump(settings, fh)
fh.close()
Then, later:
import pickle
fh = open("datafile", "r")
data = pickle.load(fh)
fh.close()
No more manually parsing settings files! Yay! Without this, you'd have to
traverse your data structure, and convert all your non-string values to strings
when writing your file. Worse, you'd have to do the inverse when reading it
back in.
Misc
zip
The global function zip
takes two lists and zippers them up into a hash so
that one of the lists is all the keys and the other is all the values:
k = ['name', 'quest', 'favorite color']
v = ['Borat', 'Pamela Anderson', 'puce']
h = zip(k, v)
global variables
__name__
Usually contains the name of the current module, as a string. That is, inside
foo.py
this will be set to 'foo'. The one exception is that when you run
foo.py
from the commandline it will instead set __name__
to '__main__'
instead. This allows you to use a module as either a packaged module or
as the top-level code. In practice, this is best reserved for putting testing
code in the same module:
mylib.py:
def func1..
def func2..
def func3..
if __name__=='__main__':
# run tests!
func1()
..
networking
Of course Python has direct support for networking. Two modules that you may
want to check into are urllib.request
(for grabbing data from a web page) and
smtplib
(for sending email).
threading
Python has no fork
and exec
. (Well, okay, it has an exec
, but it's not
the one that goes with fork
.) Instead, Python has a threading
module that
gives you a nifty API for dealing with multi-threading.
Note also that for sufficiently complex threading applications you will
probably also want to look at the queue
class, which is a thread-safe
synchronous queue.
logging
Python has a logging
module. It appears to be rather heavyweight, where by
heavyweight I mean it's cumbersome to the point of sucking. But I haven't
really given it a fair chance, so I dunno.
profiling
Python has a few ways to do profiling:
The timeit
module has a class named Timer
that will run code you give
it and tell you a very specific floating-point number runtime.
The profile
and pstats
modules have less granularity but help with
profiling entire programs.
Chris verBurg
2015-03-08