Preface
1. Understanding Performant Python
The Fundamental Computer System
Computing Units
Memory Units
Communications Layers
Putting the Fundamental Elements Together
Idealized Computing Versus the Python Virtual Machine
So Why Use Python?
2. Profiling to Find Bottlenecks
Profiling Efficiently
Introducing the Julia Set
Calculating the Full Julia Set
Simple Approaches to Timing--print and a Decorator
Simple Timing Using the Unix time Command
Using the cProfile Module
Using runsnakerun to Visualize cProfile Output
Using line_profiler for Line-by-Line Measurements
Using memory_profiler to Diagnose Memory Usage
Inspecting Objects on the Heap with heapy
Using dowser for Live Graphing of Instantiated Variables
Using the dis Module to Examine CPython Bytecode
Different Approaches, Different Complexity
Unit Testing During Optimization to Maintain Correctness
No-op @profile Decorator
Strategies to Profile Your Code Successfully
Wrap-Up
3. Lists and Tuples
A More Efficient Search
Lists Versus Tuples
Lists as Dynamic Arrays
Tuples As Static Arrays
Wrap-Up
4. Dictionaries and Sets
How Do Dictionaries and Sets Work?
Inserting and Retrieving
Deletion
Resizing
Hash Functions and Entropy
Dictionaries and Namespaces
Wrap-Up
5. Iterators and Generators
Iterators for Infinite Series
Lazy Generator Evaluation
Wrap-Up
6. Matrix and Vector Computation
Introduction to the Problem
Aren't Python Lists Good Enough?
Problems with Allocating Too Much
Memory Fragmentation
Understanding perf
Making Decisions with perf's Output
Enter numpy
Applying numpy to the Diffusion Problem
Memory Allocations and In-Place Operations
Selective Optimizations: Finding What Needs to Be Fixed
numexpr: Making In-Place Operations Faster and Easier
A Cautionary Tale: Verify "Optimizations" (scipy)
Wrap-Up
7. Compiling to C
What Sort of Speed Gains Are Possible?
JIT Versus AOT Compilers
Why Does Type Information Help the Code Run Faster?
Using a C Compiler
Reviewing the Julia Set Example
Cvthon
Compiling a Pure-Python Version Using Cython
Cython Annotations to Analyze a Block of Code
Adding Some Type Annotations
Shed Skin
Building an Extension Module
The Cost of the Memory Copies
Cython and numpy
ParaUelizing the Solution with OpenMP on One Machine
Numba
Pythran
PyPy
Garbage Collection Differences
Running PyPy and Installing Modules
When to Use Each Technology
Other Upcoming Projects
A Note on Graphics Processing Units (GPUs)
A Wish for a Future Compiler Project
Foreign Function Interfaces
ctypes
cffi
f2py
CPython Module
Wrap-Up
8. Concurrency
Introduction to Asynchronous Programming
Serial Crawler
gevent
tornado
AsyncIO
Database Example
Wrap-Up
9. lhe multiprocessing Module
An Overview of the Multiprocessing Module
Estimating Pi Using the Monte Carlo Method
Estimating Pi Using Processes and Threads
Using Python Objects
Random Numbers in Parallel Systems
Using numpy
Finding Prime Numbers
Queues of Work
Verifying Primes Using Interprocess Communication
Serial Solution
Naive Pool Solution
A Less Naive Pool Solution
Using Manager.Value as a Flag
Using Redis as a Flag
Using RawValue as a Flag
Using mmap as a Flag
Using mmap as a Flag Redux
Sharing numpy Data with multiprocessing
Synchronizing File and Variable Access
File Locking
Locking a Value
Wrap-Up
10. Clusters and Job Queues
Benefits of Clustering
Drawbacks of Clustering
$462 Million Wall Street Loss Through Poor Cluster Upgrade Strategy
Skype's 24-Hour Global Outage
Common Cluster Designs
How to Start a Clustered Solution
Ways to Avoid Pain When Using Clusters
Three Clustering Solutions
Using the Parallel Python Module for Simple Local Clusters
Using IPython Parallel to Support Research
NSQ for Robust Production Clustering
Queues
Pub/sub
Distributed Prime Calculation
Other Clustering Tools to Look At
Wrap-Up
11. Using Less RAM
Objects for Primitives Are Expensive
The Array Module Stores Many Primitive Objects Cheaply
Understanding the RAM Used in a Collection
Bytes Versus Unicode
Efficiently Storing Lots of Text in RAM
Trying These Approaches on 8 Million Tokens
Tips for Using Less RAM
Probabilistic Data Structures
Very Approximate Counting with a 1-byte Morris Counter
K-Minimum Values
Bloom Filters
LogLog Counter
Real-World Example
12. Lessons from the Field
Adaptive Lab's Social Media Analytics (SOMA)
Python at Adaptive Lab
SoMA's Design
Our Development Methodology
Maintaining SoMA
Advice for Fellow Engineers
Making Deep Learning Fly with RadimRehurek.com
The Sweet Spot
Lessons in Optimizing
Wrap-Up
Large-Scale Productionized Machine Learning at Lyst.com
Pythons Place at Lyst
Cluster Design
Code Evolution in a Fast-Moving Start-Up
Building the Recommendation Engine
Reporting and Monitoring
Some Advice
Large-Scale Social Media Analysis at Smesh
Pythons Role at Smesh
The Platform
High Performance Real-Time String Matching
Reporting, Monitoring, Debugging, and Deployment
PyPy for Successful Web and Data Processing Systems
Prerequisites
The Database
The Web Application
OCR and Translation
Task Distribution and Workers
Conclusion
Task Queues at Lanyrd.com
Python's Role at Lanyrd
Making the Task Queue Performant
Reporting, Monitoring, Debugging, and Deployment
Advice to a Fellow Developer
Index