Use connection pool in Python Requests

Requests is a great library for performing HTTP related request and response.
As it’s description, it supports connection pool automatically.

Requests takes all of the work out of Python HTTP/1.1 — making your integration with web services seamless. There’s no need to manually add query strings to your URLs, or to form-encode your POST data. Keep-alive and HTTP connection pooling are 100% automatic, powered by urllib3, which is embedded within Requests.

Actually, the following snippet will not use connection pool provided from urllib3.

requests.get('http://developer.github.com/v3/')
requests.get('http://developer.github.com/v3/media/')

Requests use urllib3.PoolManager in requests.adapters.HTTPAdapter. Which is mounted in requests.Session.__init__. Thus, for each request

def request(method, url, **kwargs):
    session = sessions.Session()
    return session.request(method=method, url=url, **kwargs)

will create new session, which make requests.request won’t gain any benefit from connection pool. Alternatively, you can use requests.Session to make Requests use connection pool powered by urllib3.

session = requests.Session()
session.get('http://developer.github.com/v3/')
session.get('http://developer.github.com/v3/media/')

I rewrite a benchmark_request.py from urllib3 benchmark.py.

The result

Completed requests_session_get in 7.067s
Completed requests_get in 9.162s

concurrent.futures.ProcessPoolExecutor.map may be slow in some cases

A simple test using Python 3.3

Sample Code

from __future__ import print_function

from concurrent import futures
import math
import multiprocessing


def is_prime(num):
    if num % 2 == 0:
        return False

    sqrt_num = int(math.floor(math.sqrt(num)))
    for i in range(3, sqrt_num + 1, 2):
        if num % i == 0:
            return False

    return True


def prime_worker(count):
    return sorted(num for num in range(count) if is_prime(num))


def future_prime_worker(count):
    with futures.ProcessPoolExecutor(4) as executor:
        numbers = range(count)
        return sorted(num for num, prime in
                      zip(numbers, executor.map(is_prime, numbers)) if prime)


def multiprocess_prime_worker(count):
    pool = multiprocessing.Pool(4)
    numbers = range(count)
    return sorted(num for num, prime in
                  zip(numbers, pool.map(is_prime, numbers)) if prime)


if __name__ == '__main__':
    import timeit
    t = timeit.timeit("prime_worker(200000)",
                      number=1,
                      setup="from __main__ import prime_worker")
    print (t)

    t = timeit.timeit("multiprocess_prime_worker(200000)",
                      number=1,
                      setup="from __main__ import multiprocess_prime_worker")
    print (t)

    t = timeit.timeit("future_prime_worker(200000)",
                      number=1,
                      setup="from __main__ import future_prime_worker")
    print (t)

Result

1.1414704178459942
0.7401300449855626
88.23592492006719

References:
concurrent.futures.ProcessPoolExecutor.map() doesn’t batch function arguments by chunks

Using wheel for python deployment

I’ve discussed with my colleagues for better deployment process which replaces existing RPM based deployment.

There are several projects provides binary format, virtualenv manipulation, and caching for python deployment.

For example:

They are both great projects, but somehow our requirements are trivial

  • Install python package system-wise (may not be virtualenv)
  • Avoid binary recompile, should be done in build machine once
  • The deployed machines should not install gcc and development tools
  • The deployed machines may have limited internet access rights

After some investigation, wheel seems a perfect solution.

Usage:

# wheel need pip 1.4+
pip install --upgrade pip==dev
pip install wheel
pip wheel --wheel-dir=/tmp/wheelhouse flask
pip install --use-wheel --no-index --find-links=/tmp/wheelhouse flask

It also handles c-extension so well (we’re using MySQL-Python)
We can also deploy /tmp/wheelhouse to other deployment machines.

The result looks like

$ time pip install --pre --use-wheel --no-index --find-links=/tmp/wheelhouse flask
Ignoring indexes: https://pypi.python.org/simple/
Downloading/unpacking flask
Downloading/unpacking itsdangerous>=0.21 (from flask)
Downloading/unpacking Werkzeug>=0.7 (from flask)
Downloading/unpacking Jinja2>=2.4 (from flask)
Downloading/unpacking markupsafe (from Jinja2>=2.4->flask)
Installing collected packages: flask, itsdangerous, Werkzeug, Jinja2, markupsafe
Successfully installed flask itsdangerous Werkzeug Jinja2 markupsafe
Cleaning up...
pip install --pre --use-wheel --no-index flask 0.31s user 0.07s system 90% cpu 0.423 total

The PEP 427 was accepted, and the further pip release will contain wheel support.
I’m looking forward to seeing wheel will be used in our environment.

Use textwrap.dedent

If you’re writing long string in python, it will break indent and look like the followings

import string
import random

def generate_long_string(n):
    for _ in xrange(n):
        name = "".join([random.choice(string.letters) for i in xrange(15)])
        msg = """\
Hi %s,
Greeting

Hubert
""" % name
        yield msg

for msg in generate_long_string(3):
    print msg

You can use textwrap.dedent to solve it

import string
import random

from textwrap import dedent

def generate_long_string(n):
    for _ in xrange(n):
        name = "".join([random.choice(string.letters) for i in xrange(15)])
        msg = """\
            Hi %s,
            Greeting

            Hubert
            """ % name
        yield dedent(msg)

for msg in generate_long_string(3):
    print msg

Using tox and jenkins to have python continuous integration environment

在看幾個 github 上的專案,像是 celery 都會發現他們的 repo 有一個 tox.ini,剛好手邊有個小小的 python 專案,就順便看一下 tox 是什麼。

節錄 tox 官網的介紹。

Tox as is a generic virtualenv management and test command line tool you can use for:

  • checking your package installs correctly with different Python versions and interpreters
  • running your tests in each of the environments, configuring your test tool of choice
  • acting as a frontend to Continuous Integration servers, greatly reducing boilerplate and merging CI and shell-based testing.

對我最實用的功能大概是拿來測試 python2.6 跟 python2.7 的差異,常常不小心就寫出 dictionary comprehension
(事實上如果你記得裝 syntastic 就會提醒你 …)

另外一個好處就是,他是用 virtualenv,確保每次測試的環境都是獨立的,dependency 沒有處理好就會叫一下,不會因為開發環境有裝,然後到 deployment 的時候又要再檢查。

我這邊裝 python2.6 跟 python2.7 是用 pythonbrew

pip install pythonbrew
pythonbrew_install
pythonbrew install 2.7.3 2.6.8

另外一個蠻不錯的,就是他跟 jenkins 結合也不是很困難,大概跟著官網的說明做就好了。

最後我寫的 tox.ini 就會像是

[tox]
envlist = py26, py27

[testenv]
commands = nosetests {posargs:--with-cov --cov-report=xml --with-xunit --cov package}
flake8 --exit-zero package

deps = nose
nose-cov
coverage
mock
flake8

[testenv:py26]
basepython={homedir}/.pythonbrew/pythons/Python-2.6.8/bin/python

[testenv:py27]
basepython={homedir}/.pythonbrew/pythons/Python-2.7.3/bin/python

他同時會幫你用 flake8 檢查程式碼,再配合 nose-cov 幫你產生 cobertura 的 coverage report,以及 xunit 的測試結果,跟 jenkins 接在一起就會像是

不過,如果你是放在 github 上,也許直接用 travis-ci 會簡單的多…

Filter is good

Java

import org.apache.commons.collections.*;
CollectionUtils.filter(list, new Predicate() {
  public boolean evaluate(Object o) {
    return !((String)o).isEmpty();
  }
});

PHP

$list = array_filter($list, function($s) {return !empty($s);});

Python

l = [x for x in l if x]

JavaScript

l = _.filter(l, function(s){return s.length;});

憑著直覺寫的,如果不能動的話再說 XD
這種寫法通常好讀又簡單,不過有些語言要寫的囉嗦一點…

用 Microsoft Translator API 翻譯 properties 檔案

因為另外一套要錢,所以來用 Microsoft Translator API

#!/usr/bin/env python

# coding: utf-8

import urllib
import sys
import xml.dom.minidom

# Get Bing AppID from https://ssl.bing.com/webmaster/developers/appids.aspx
BING_APPID = 'KERO~'

FILE_FROM = 'lang.properties'
FILE_TO = 'translated.properties'

LANG_FROM = 'en'
LANG_TO = 'zh-chs'

def translate(text, from_lang, to_lang):
    base_url = 'http://api.microsofttranslator.com/v2/Http.svc/Translate?'
    data = urllib.urlencode({'appId':BING_APPID,
                             'from': from_lang.encode('utf-8'),
                             'to': to_lang.encode('utf-8'),
                             'text': text.encode('utf-8')
                            })

    url = base_url + data
    response = urllib.urlopen(url).read()

    dom = xml.dom.minidom.parseString(response)
    result = dom.documentElement.childNodes[0].nodeValue

    return result.encode('utf-8')

def parse_properties(filename):
    langs = {}
    with open(filename, 'r') as f:
        lines = [ line.strip() for line in f.readlines() ]
        for line in lines:
            idx = line.find('=')
            if idx == -1:
                continue

            (key, val) = (line[:idx], line[idx+1:])
            if not key:
                continue

            langs[key] = val

    return langs

def main():
    langs = parse_properties(FILE_FROM)
    translated = { k: translate(v, LANG_FROM, LANG_TO) if v else ""
                   for k,v in langs.items() }

    with open(FILE_TO, 'w+') as f:
        for k,v in sorted(translated.items()):
            f.write("%s=%s\n" % (k,v))

if __name__ == '__main__':
    main()

隨手寫寫,一年沒寫文章,也許有點刻意,差不多該是改變的時候了。

Use VERSIONER_PYTHON_PREFER_32_BIT=yes in Snow Leopard

In python, you can use ctype (dl is deprecated now) to load dynamic link library. In 10.4 and 10.5 it may work fine, but it may occur errors in 10.6, such as

/Library/Frameworks/dummy.framework/dummy: no matching architecture in universal wrapper

Since the python in Snow Leopard is 64-bit in default, for those libraries which does not support 64-bit. You can test your library by

% file /Library/Frameworks/dummy.framework/Versions/Current/dummy

If you library is i386 only, you may need to handle it in this way.

% export VERSIONER_PYTHON_PREFER_32_BIT=yes
% python your_script.py

ericsk has also mentioned this for wxPython in his plurk

For more information, you can read the man page in your Snow Leopard.