Python is a high-level, general-purpose programming language. The language is interpreted, has code blocks based on indentation, has an extensive standard library, and many third-party libraries. Python has slower runtime performance and higher memory overhead compared to compiled languages. Python is used across many domains, for exapmle in scripting, web development, data analysis, and machine learning.

Key characteristics:

  • Paradigm: multi-paradigm (imperative, object-oriented, functional)
  • Memory model: automatic memory management with garbage collection
  • Type system: dynamically typed with optional static typing (type hints)
  • Concurrency: threads, multiprocessing, async/await
  • Execution: interpreted (bytecode executed by a virtual machine)
  • Ecosystem tools: pip, venv, pytest, mypy, black

Project Setup

Create a simple project

mkdir hello_python
cd hello_python
python3 -m venv .venv
source .venv/bin/activate

File layout

hello_python/
├── .venv/
└── main.py

Minimal example

# main.py
print("Hello World!")

Run the program

python main.py

Language Basics

Compared to other languages, there is no main function. Entry point is the file we invoke when running python <file>. Note that Python source code is interpreted line by line, so we may not call functions that are defined later in code. This issue is often handled as follows.

"""
Multiline comment / docstring, ''' also works
"""

def main():
    hello()
    hello("Joe")

def hello(to_whom: str = "World"):
    print("Hello", to_whom + "!")

main()      # Single-line comment
Hello World!
Hello Joe!

Memory Model

In Python, all variables are references to values stored on the heap. Primitive types are immutable, changing the variable changes their reference. Variables can be None. Python uses garbage collection.

All primitive values are immutable, even though variables pointing to them can be easily reassigned, e.g. through +=. Additionally, tuples are immutable lists – their structure cannot change, but we can mutate their mutable elements.

Primitive Types

x = 42          # int, unbounded values - uses bigint automatically
y = 3.14        # float, normal -- bounded precision
l = True        # bool
n = None        # NoneType is the null value
s = "text"      # str, array of unicode characters
b = b"data"     # bytes, array of raw 8-bit values
c = 3 + 4j      # complex, access with .real, .imag
  • Arithmetic: +, -, *, /, % (modulo ↗, least positive residue), // (floored division), ** (power)
  • Logical: and, or, not
  • Comparison: <, <=, >, >=, ==, !=
  • Bitwise: &, |, ^, ~, <<, >>
  • Ternary: True if x else False
  • internally, all of these are classes; by convention in lower-case

Standard Types

Due to dynamic typing Python’s collections can be non-uniform.

l = [0, 1, "hi"]    # list
t = (0, 1, "hi")    # tuple
s = {0, 1, "hi"}    # set, needs to be hashable
d = {0: "hi"}       # dict, needs hashable key

Each of the above types has a constructor that takes an iterable as its parameter. Hence, list(t) given tuple transforms it to a list, same works with tuple and set. Similarly with dict, but each element must consist of two values, e.g. dict([(0, "hi")]) converts list of tuples of size 2 to {0: "hi"}.

Hashable requirement means we cannot use lists, sets, or dicts as keys. For this, we can transform list to tuple using tuple(l), note that its elements must be hashable; nested lists can be transformed recursively.

Flow Control

if 0 < x <= 10:     # comparisons can be merged
    pass
elif x == 0 or x == -2:
    pass    # nil operation, needed here as blocks cannot be empty
else:
    pass
match value:
    case 1:
        pass
    case 2 | 3:
        pass
    case _:     # default branch
        pass

Loops

for i in range(5):
    continue

while condition:
    break

Error Handling

try:
    risky()
except ValueError as e:
    handle(e)
    raise e     # optionally propagate
else:
    success()
finally:
    cleanup()

Functions, Lambdas, Closures

def func(a, b=0, *args, **kwargs):
    return a + b

lambda_func = lambda x: x * 2

Closures capture variables from outer scopes.

Unpacking like func(*list) can be used to pass values of arr as list of arguments. Similarly, func(**dict) can pass named arguments to a function, where argument names are the same as the dictionary keys.

Classes and Data Structures

By convention our classes should be upper-case named.

class Base:
    def __init__(self, x):
        self.x = x      # calls setter

    def __str__(self):
        return f"{self.x}"

    @property
    def x(self):        # getter for x
        self._x = value

    @x.setter
    def x(self, value): # setter for x
        self._x = value

    letters = ["a", "b"]    # class variable
    @classmethod
    def get_letters(cls):
        return cls.letters

Base.get_letters()

class Derived(Base):
    def value(self):
        return super().x * 2
  • by convention variables starting with _ are meant to be private; and __ are “very” private
  • __repr__(self) representation for developers
  • __add__(self, other) overloading the + operator

Enums:

from enum import Enum

class Color(Enum):
    RED = 1
    BLUE = 2

Tuples and unions:

from typing import Union

Point = tuple[int, int]
Number = Union[int, float]

Iterators, Map, Filter, Reduce

map(function, data)
filter(predicate, data)

from functools import reduce
reduce(function, data)

Generics

from typing import TypeVar, Generic

T = TypeVar("T")

class Box(Generic[T]):
    def __init__(self, value: T):
        self.value = value

Expanding the Project

Standard library functions

function call complexity meaning
list.sort() | sorted(iter) $O(N \log N)$ sort a list | sorted copy
sorted(a, reverse=True) in the decreasing order
sorted(a, key=func) custom sorting
a, b = b, a $O(1)$ swaps elements
range(start, stop) $O(1)$ lazy sequence of integers
list.reverse() | reversed(a) $O(N)$ reverse a list | reversed copy
list[shift:] + list[:shift] $O(N)$ left shifts contents of the list
collections.deque.rotate(-shift) $O(\mathrm{shift})$ in-place rotation on deque
a, b, c = tup $O(|\mathrm{tup}|)$ tuple unpacking

Standard library data structures

structure operation returns complexity note
a: list[V] dynamically-sized array, fast element access
list(iter) list $O(|\mathrm{iter}|)$ initialize with references to the iterator elements
[default]*len list $O(\mathrm{len})$ initialize with references to the default
a[idx] V $O(1)$
a.append(V) None $O'(1)$ amortized
a.pop() V $O(1)$ removes and returns the last element
a[-1] V $O(1)$ access the last element
a.insert(i, V) None $O(N)$ insert at position
a.extend(iter) None $O(K)$ append K elements
s: set[V] collection of unique hashable keys, unordered
a.add(V) None $O(1)$
V in a bool $O(1)$
a.remove(V) None $O(1)$ KeyError if missing
a.discard(V) None $O(1)$ no error if missing
a | b set $O(N+M)$ union
a & b set $O(\min(N,M))$ intersection
a - b set $O(N)$ difference
frozenset immutable version of set, can be used as dict key or set element
dict key-value pairs, unique keys, insertion-ordered (Python 3.7+)
a[K] V $O(1)$ KeyError if missing
a.get(K, def) V $O(1)$ returns def if missing
a[K] = V None $O(1)$ insert or update
K in a bool $O(1)$
del a[K] -- $O(1)$ KeyError if missing
a.pop(K) V $O(1)$ remove and return
a.setdefault(K,V) V $O(1)$ get or insert default
collections.defaultdict(factory) dict subclass, auto-creates missing keys using factory (e.g. int, list)
collections.Counter dict subclass for counting hashable objects (like C++ multiset for counting)
Counter(iter) Counter $O(N)$ count elements
c.most_common(k) list $O(N \log k)$ top k elements
c.update(k) -- $O(k)$ add elements
list (as stack) last in first out
a.append(V) None $O(1)$ push
a[-1] V $O(1)$ peek top
a.pop() V $O(1)$ pop
collections.deque double-ended queue, fast access to both ends
a.appendleft(V) None $O(1)$
a[0] V $O(1)$
a.popleft() V $O(1)$
a.append(V) None $O(1)$
a[-1] V $O(1)$
a.pop() None $O(1)$
a.rotate(k) None $O(k)$ rotate right by k
also usable as queue (FIFO): append() to push, popleft() to pop
heapq (module) min-heap operations on a list, smallest element first
for max-heap, negate values or use tuples with negated priority
heapq.heappush(a, V) None $O(\log N)$
a[0] V $O(1)$ peek smallest
heapq.heappop(a) V $O(\log N)$
heapq.heapify(a) None $O(N)$ turn list into heap in-place
str immutable sequence of characters
s[start:] str $O({\rm len}-{\rm start})$ suffix of a string
s[start:start+count] str $O({\rm count})$ substring
s.find(sub) int $O(N \cdot M)$ index or -1
s.split(sep) list $O(N)$ split into parts
sep.join(parts) str $O(N)$ join parts
tuple immutable sequence, usable as dict key or set element
t[idx] V $O(1)$
a, b = t -- $O(N)$ unpacking
  • number of elements: len(a),
  • membership test: V in a,
  • copying: a.copy() or a[:] (shallow),
  • deep copy: copy.deepcopy(a),
  • iterating: for x in a, for i, x in enumerate(a).

Strings

name = "Alice"
f"Hello {name}"
"{} {}".format(a, b)
f"Hello {d['key']}"     # for dictionaries

.strip(), .title(), a, b = s.split(" "), s.replace("what", "with")

Regex:

import re
re.match(r"\\d+", text)
if matches := re.search(pattern, text):
    matches.group(1)        # to retrieve capture groups
result = re.sub("pattern", "replace with", text)
  • using r"pattern" says this is a raw string so that, e.g., \n is not interpreted as newline
  • . any
  • * 0 or more
  • + 1 or more
  • ? optional
  • {m} m repetitions
  • {m,n} m to n repetitions
  • (...) captured group, (?:...) non-captured group
  • a|b or
  • re.IGNORECASE, re.MULTILINE, re.DOTALL
  • re.split and re.findall

Modules and Packages

project/
├── package/
│   ├── __init__.py
│   └── module.py
└── main.py
from package.module import func

Any importing interprets the file, so any code gets executed. Libraries should only define the library functions. To fix this define the called code in if __name__ == '__main__': which is true only if the code is called directly.

External dependencies:

pip install requests

Tooling

  • Interpreter: python / python3
  • Package manager: pip
  • Virtual environments: venv
  • Linter: pylint, flake8
  • Formatter: black
  • Type checker: mypy
  • REPL: python

Language Advanced (Optional)

Concurrency

  • threading: shared memory, GIL-limited
  • multiprocessing: true parallelism
  • asyncio: cooperative async I/O
async def main():
    await task()

Interfacing with Other Languages

  • C extensions via CPython API
  • CFFI / ctypes
  • Embedding Python in C/C++

Other Resources


  • slices
  • res=sorted(arr)
  • first, second = myline.split(",") # fails if there are not 2
  • type(object) to find the type of an object
  • global var to have access to a global variable
  • constant variables are by a convention capitalized
  • for idx, val in enumerate(iterable):

Generators

We may return from a function using yield which creates a generator with this one value given.

Comprehension

List comprehension is [returned_value for value in iterable if predicate] where if predicate is optional. This has the same result as map(returned_value, filter(predicate, iterable)).

upper_cased_words = [word.upper() for word in words]

Dictionary comprehension is {key: value for value in iterable if predicate}.

File I/O

We open files with open, defining type of access as "r" to read, "w" to write, "a" to append.

with open("file.txt", "a") as myfile
    myfile.write("whatever")
myfile = open("file.txt", "r")
for line in myfile:
    print(line.rstrip())
myfile.close()      # manually closing the file

Testing

Using pytest we may invoke a file that includes our library and runs each function that starts with test_ or ends with _test to test everything works as intended using asserts. Pytest then gives more information on values of the failed asserts. May expect an error through with pytest.raises(ErrorType). We may run pytest test on director test that contains __init__.py (defines a package) then it runs all

Documentation

def myfun(n: int) -> str:
    """
    Docstring for my function in the restructured text format.

    :param n: Description of what the n parameter is.
    :type n: int
    :raise TypeError: When does this happen?
    :rtype: str
    """
    return str(n) + " was a number\n"

Libraries

  • sys.argv list for passed arguments, sys.exit("message")
  • random.choice([options ...]), randint(from, to), shuffle(arr) in place,
  • statistics.mean([values])
  • json.dumps("json_string")
  • requests.get, response.json()
  • csv.reader(file), csv.DictReader(file) when first line has field names, w=csv.vwriter(file) and w.writeline(line), csv.DictWriter(file, fieldnames=["col1", "col2"])
  • pillow for processing image files
  • argparse for parsing command line arguments

Resources