Python 3.6 &
Performance

A Love Story

I am...

Python 3.6

What's new?

PEP 498: f-strings

name = 'Igor'
country = 'Belarus'
f"Hi, {country}! I'm {name}"

PEP 525: Async generators

async def coro():
    for item in range(10):
        yield item


async def main():
    async for item in coro():
        ...

PEP 530: Async comprehensions

data = [item async for item in fetch_data()]
data_gen = (item async for item in fetch_data())

PEP 526: Variable annotations

from typing import Any, Dict, List

data: List[Dict[str, Any]] = []

key: str  # No initial value

class Validator(object):

    data: Dict[str, Any] = {}

And more…

  • number = 1_000_000 aka underscores in numeric literals
  • New secrets module
  • os.PathLike aka file system path protocol
  • PYTHONMALLOC var & DTrace probing support
  • Simpler metaclasses & enhancements to descriptors
  • Local Time Disambiguation
  • Change Windows filesystem/console encoding to UTF-8
  • New dict implementation
  • And even more…

Python & Performance

The Problem

  • Why to switch to Python 3 if it slower than Python 2?
  • Plus there is a PyPy

Acceptance

  • We understand that Python 3 is slow
  • We need a tool to compare Python 2 / Python 3 performance
  • We don't need to enforce benchmark-driven-developmen

perf

  • perf.readthedocs.io
  • Toolkit to write, run, analyze & modify benchmarks
  • Store results in JSON
  • Has tool to display, compare, analyze and modify benchmark results
  • Includes statistical tools to analyze the distribution of results

perf

$ python3 -m perf compare_to 2016-11-03_15-36-2.7-91f024fc9b3a.json.gz
2016-11-03_15-38-3.6-c4319c0d0131.json.gz -G --min-speed=5
Slower (40):
- python_startup: 7.74 ms +- 0.28 ms -> 26.9 ms +- 0.6 ms: 3.47x slower
- python_startup_no_site: 4.43 ms +- 0.08 ms -> 10.4 ms +- 0.4 ms: 2.36x slower
- ...

Faster (15):
- telco: 707 ms +- 22 ms -> 22.1 ms +- 0.4 ms: 32.04x faster
- unpickle_list: 15.0 us +- 0.3 us -> 7.86 us +- 0.16 us: 1.90x faster
- ...

Benchmark hidden because not significant (8): 2to3, dulwich_log,
nbody, pidigits, regex_dna, tornado_http, unpack_sequence, unpickle
Ignored benchmarks (3) of 2016-11-03_15-36-2.7-91f024fc9b3a.json:
hg_startup, pyflate, spambayes

python/performance

  • speed.python.org
  • Authirative source of benchmarks for all Python implementations
  • Focus on real-world benchmarks, rather than synthetic benchmarks
  • pyperformance will run Student's two-tailed T test on the benchmark results at the 95% confidence level to indicate whether the observed difference is statistically significant

python/performance

Examples

  • django_template: use the Django template system to build a 150x150-cell HTML table
  • dulwich_log: Iterate on commits of the asyncio Git repository using the Dulwich module
  • hexiom: Solver of Hexiom board game (level 25 by default)
  • sqlalchemy_declarative: SQLAlchemy Declarative benchmark using SQLite
  • telco: Benchmark the decimal module
  • tornado_http: Benchmark HTTP server of the tornado module

Performance Trends

  • Python 2.7 is still fastest Python
  • Python 3.5 much faster Python 3.4
  • Python 3.6 faster Python 3.5
  • Python 3.7 has some significant improvements already

Python 2.7 vs Python 3.5

Python 2.7 vs Python 3.5

Python 2.7 vs Python 3.6

Python 2.7 vs Python 3.6

Python 3.5 vs Python 3.6

Python 3.5 vs Python 3.6

Python 3.5 vs Python 3.6

Python 3.5 vs Python 3.6

Python 3.6
Performance Improvements

Asyncio Improvements

  • asyncio.Future now has an optimized C implementation
  • asyncio.Task now has an optimized C implementation
Python Future & Task   |   C Future & Py Task   |   C Future & C Task
      23K req/s        |           26K          |          30K
                       |      ~10-15% boost     |         ~15%

After optimizations uvloop start working faster up to 5% as well

String & Bytes Improvements

  • ASCII decoder for surrogateescape, ignore and replace up to 60 times faster
  • ASCII & Latin-1 encoder for surrogateescape up to 3 times faster
  • UTF-8 decoder for error handlers up to 75 times faster
  • UTF-8 encoder for error handlers up to 15 times faster
  • bytes % args / bytearray % args up to 5 times faster
  • More optimizations for bytes.fromhex(), bytearray.fromhex() & bytes.replace(...), bytearray.replace(...) methods

Glob Improvements

  • Optimized glob() and iglob() functions; up to 6 times faster
  • Optimized globbing in pathlib; up to 4 times faster
  • Optimization gained by using os.scandir() introduced in Python 3.5

New dict implementation

  • dict type now using "compact" representation
  • First implemented in PyPy
  • The memory usage of the new dict() is between 20% and 25% smaller compared to 3.5

New dict implementation

  • Preserving class attribute definition order
  • Preserving keyword argument order
data = {'a': 1, 'b': 2}
data['c'] = 3
assert str(data) == "{'a': 1, 'b': 2, 'c': 3}"

New dict implementation

curr_size = 24 * t
new_size = 24 * n + sizeof(index) * t

New dict implementation

data = {'one': 'один', 'two': 'два', 'three': 'три'}
# Before
entries = [['--', '--', '--'],
           [-8522787127447073495, 'two', 'два'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           [-9092791511155847987, 'one', 'один'],
           ['--', '--', '--'],
           [-6480567542315338377, 'three', 'три']]

New dict implementation

data = {'one': 'один', 'two': 'два', 'three': 'три'}
# After
indices =  [None, 1, None, None, None, 0, None, 2]
entries =  [[-9092791511155847987, 'one', 'один'],
            [-8522787127447073495, 'two', 'два'],
            [-6480567542315338377, 'three', 'три']]

Few other things

  • Optimization of passing keyword arguments, comparing to passing positional arguments
  • picke.load() & picke.loads() up to 10% faster on many small objects
  • Various implementation improvements in typing module
  • xml.etree.ElementTree parsing, iteration and deepcopy improvements
  • Creation of fractions.Fraction from floats and decimals up to 3 times faster

Conclusion?

Python 3.6
is faster than
Python 3.5

Python 3.7
will be faster than
Python 3.6

Questions?

Twitter: @playpausenstop
GitHub: @playpauseandstop