Python Basics

This is a reading notes for Appendix in the book of Python for Data Analysis, published by O’Reilly, 2012. The appendix of this book describes python basic syntax and information, which I think is very useful for python beginners like me.

Semicolons can be used to seperate multiple statements on a single line: a=5;b=6;c=7;

# is used to lead comments.

Functions can take both potential and keyword arguments: result = f(a,b,c,d=5,e='foo')


Python is passing by reference. If we have a, then b=a => b and a are refering same objects:

1
2
3
4
a = [1,2,3]
b = a
a.append[4]
# => b=[1,2,3,4]

Python is a strongly-typed language, which means '5' + 5 will return error.

In Python, isinstance can be use to check variable type:

1
2
a = 5
isinstance(a,float) #true

iter function is used to check if a variable is iterable or not. In python, String is iterable.

In python, a module is a .py file containing functions and variables.

In python, import can has as keyword: import some_module as sm

In python, a list() always create a new list object:

1
2
a = [1,2,3]
c = list(a) # => a==c but a is not c

is/isnot always be used to check if a variable is None: a = None; a is None => true

a//b: floor-divide. Drop any fractional reminder.

In python, calculation is done immediately (No lazy evaluation)

In python, string and tuple are immutable(can not change after defined). lists, dicts, NumPy arrays or classes are mutable.

In python, literal string can have single or double quotes. triple quotes for strings with line break:

1
2
3
c = '''
This is a string with line breaks
'''

String is immutable but can replace substring with another piece of string:

1
2
3
a = "This is string"
b = a.replace("string","long string")
# => b = "This is long string"

Most objects can be coverted to string using str(): a=5.5; b=str(a); #=>b='5.5'.

+ can be used to concatenate strings: a='part A'; b = 'part B'; a+b=>'partApartB'

In Python3, templates for String can be used:

1
2
3
4
5
6
template = "%.2f %S are $%d"
template%(4.558, 'Tomcat', 1)
#Output = 4.56 TomCat are $1
# .2f: 2 decimal points
# %S: String
# %d: integer

In python, boolean can be used to judge empty:

1
2
3
4
5
6
bool([]) # F
bool([1,2,3]) # T
bool('ok') # T
bool('') # F
bool(0) # F
bool(1) # T

str, bool, int, float can be used for type casting.

pass can be used to end blocks when there is no return value:

1
2
if x<0:
pass

Tuple

Tuple is a one-dimentional, fiexed lengh, immutable sequence of python objects:

1
2
3
tup = 4,5,6 # => (4,5,6)
nested_tup = (4,5,6),(7,8) # => ((4,5,6),(7,8))
tuple('string') # => ('s','t','r'...)

Once a tuple is created, it is impossiable to modify which object is stored in each slot: tup[2]='m' # => Error

tuple can be concatenated using the + operator: tup + nest_tup # => ((4,5,6),(4,5,6),(7,8))

tuple can have multiple concatenated effect: tup * 2 #=>((4,5,6),(4,5,6))

tuple can be unpacked: a,b,c = tup # => a=4;b=5;c=6;

count() counts occurrance in tuple: a=(1,2,2,3) # => a.count(2) = 2

List

list are the tuples with content modifying. Using [] or list() function:

1
2
3
a_list = [2,3,7,none]
tup = ('a','b')
b_list = list(tup) # => b_list = ['a','b']

Add to list:

  • append: append element to end of list
  • insert(num, char): append to specific location b_list.insert(1,'c') # => ['a', 'c', 'b']

Remove from list: remove(char) is used to remove the first such value

in() is used to check existence in list: 'b' in b_list # => true

list has the same was as tuple to use + for concatenate

list can use extend() to concatenate more list, which is quicker than +

sort() is used to do in-place sort for list. sort(key="") is used for sorting parameter: a_list.sort(key=len) # => sort by lenth of element

  • import bisect is used to do binary-search then insert into a sorted list
  • bisect.bisect(list_name, element) is used to find location where the element should be inserted
  • bisect.insort(list_name, element) is used to insert element to sorted location
1
2
3
4
5
6
import bisect
d_list = [1,2,3,4,4,4,6]
bisect.bisect(d_list,4) # => 6
bisect.insort(d_list,4) #=> [1,2,3,4,4,4,4,6]
# list shoudl be sorted first!
  • slice can be used to arrays or tuples, using start,stop-1 index. It can also be used for insertion:
1
2
3
# use example above
d_list[1:3] # => [2,3]
d_list[1:3] = ['a','b'] # => d_list = [1,'a','b',2,3...]
  • start/stop can be skipped, such as d_list[:2] # => [1, 'a']
  • negative value means slice from the end: d_list[-1:-2] # => [4]
  • :: can be used to take every other element: d_list[::2] # => [1,'b',3,4...]
  • So now we can use [::-1] to reverse a list: d_list[::-1] # => [6,4,4...]

Build-in Sequence Functions

Enumerate

Python have build-in enumerate function: for i, value in enumerate(collection_name)

1
2
3
4
# function to change enumerate to dict:
a = ['a','b','c']
mapping = dict((v,i)for i, v in enumerate(a))
# => mapping: ['a':0,'b':1,'c':2]

Sorted

sorted() function returns a new sorted list from the element of any sequence:

1
sorted('Hello!') # => ['e','H','l','l','o','!']

sorted can also be used to get a sorted list of unique char from sequence:

1
sorted(set('Hello')) # => ['e','H','l','o']

Zip

zip() pairs two sequence to create a list of tuples:

1
2
3
seq1 = ['a','b','c']
seq2 = ['d','e','f']
zip(seq1,seq2) # => [('a','d'),('b','e'),('c','f')]

Unzip using *:

1
2
3
4
seq1,seq2 = zip(*[('a','d'),('b','e'),('c','f')])
# back to tuple
# => seq1 = ('a','b','c')
# => seq2 = ('d','e','f')

Dict

Using {} to create dict. The insertion/access operations are the same as tuple/list:

1
2
Di = {'a':'hello','b':[1,2,3]}
# => Di['b'] = [1,2,3]

in can be used for existence.

del or pop can be used for removing:

1
2
del Di['a'] #=> Di={'b':[1,2,3]}
Di.pop('b') #=> return the element that has been removed

keys() and values() will give keys/values in dict object

update() can merge one dict with another:

1
2
Di.update({'c':'world'})
# => Di={...,'c':'world'}

Default Values

1
2
3
4
5
6
7
if key in dict:
value = dict[key]
else:
value = def_value
#code above equals to value = dict.get[key,def_value]
# will return none if no they is there
1
2
3
4
5
6
7
8
9
for word in words:
letter = word[0]
if letter not in a:
a[letter] = [word]
else:
a[letter].append(word)
#code above equals to a.setdefault(letter,[]).append(word)
# will return {'a':['apple','atom'],'b':['bat']}

dict‘s keys has to be immutable objects. This is called hashability. hash() is used to check this: hash(1,2,[2,3]) # => error
tuple can be used as key

Set

set is an unsorted collection of unique elements. It can be created by set() or {}.

set can do: & | -(difference) ^(xor)

issubset() | issuperset() can be used to check subset or superset

set can use ==

isdisjpint() is used to check two sets have no elements in common.

Comprehensions of List, Set and Dict

List

[expr for val in collection if condition]:

1
2
[x.upper() for x in strings if len(x)>2]
# => ['BAT','CAT']

Dict, Set

dict_comp = {key-expr: value-expr for value in collection if condition}:

1
2
loc_mapping = {val: index for index, val in enumerate(strings)}
set_comp = {expr for value in collection if conditions}

Nested List

1
2
3
4
5
6
7
8
9
data = [['tom','mat'],['ann','beth']]
result = [name for names in data for name in names if name.count('n')<2]
'''
=> result = ['tom','mat','beth']
second line is two 'for loop', it equals to
for names in data
for name in names
if name.count('n')<2
'''

Functions

Potential Argument: def my_function(x,y)
Keyword Argument: def my_function(x = 1.5) #keyword argument allow over-write

Return Multiple Values:

1
2
3
4
5
def f():
a = 5
b = 3
return a,b
c,d = f()

In python, functions are objects. They can be re-used like normal objects.

Lambda Functions:

1
2
3
def short_f(x):
return x*2
# equals to: f = lambda x:x*2

Closure

A closure is any dynamically-generated function, returned by another function.

closure continue to have access to the function where it is created, even though that function has done executing.

1
2
3
4
5
6
7
def format(template):
def formatter(x):
return(template%x)
return formatter
fmt = format('%.4f',15);
# fmt(1.756) = '1.7560'

Extend Call Parameter

*args is for extend parameter as tuple
*kwargs is for extend parameter as dict

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def format(f, *args, *kwargs):
print 'arg is',args
print 'kwarg is',kwargs
return f(*args, *kwargs)
def g(x, y, z=1):
return (x+y)/z
format(g,1,2,z=5)
# => args is (1,2)
# => kwargs is {'z':5.0}
# => return 0.6 => (1+2)/5
*Currying*
Deriving new functions from existing ones by *partial* argument application:

python
def add_numbers(x,y):
return x+y

currying

from functools import partial

add_five = partial(add_numbers, 5)

return 5+y

1
2
3
4
**Generators**
To create, using `yield` instead of `return`. It will only be executed when requesting from generator

python
def sequares()

# ...
yield i*2

gen = sequares() # => not a value we want
for x in gen
print x # => 2,4,6…

1
2
*Generator Expressions*:

python

sum(x **2 for x in xrange(100))

def sum():
for x in xrange(100)
yield x ** 2
```

itertools module

itertools is a collection of generators.

Useful functions:

  • imap: generate map
  • ifilter: generate filter
  • combinations: k-tuples of elements in a sequence
  • permutations: k-tuples of elements in a sequence, respecting order

Filter and the Operating System

open(path) for open file. Default is read-only.

Opened file can be treated as list: for line in open(path): #...

  • open(path,'w'): write-only mode
  • open(path,'r'): read-only mode
  • open(path,'a'): append to existing/create if not exist
  • open(path,'b'): add to mode for binary files

File Methods:

  • read([size]): return data as string with [size] bytes
  • readlines([size]): return list of lines in file
  • write([size])/writelines([size]): same as above
  • close(): close file
  • flush(): flush internal i/o buffer to disk
  • seek(pos): move to position pos
  • tell(): return current file position as integer
  • closed(): return true if file is closed