語言背後

語言是一種表達,一種傳遞概念的方式。

小時候我們常常會聽爸媽說「都是為你好」,我們心中其實無法百分之百的理解其中的用意。久而久之,當我們真心想要一件事物,在那「為你好」的高牆之後,我們選擇的往往是叛逆、爭論,或者隱藏自己真正的期待假裝順從。

這樣的溝通方式,很無奈的,有時候需要時間才能夠理解彼此真正的意圖,也可能自始自終都無法消弭溝通的成見與誤會。人往往會更專注於表面性的「態度」、「口氣」,而無法看清事物的本質。

站在天平的兩個端點,在面對溝通的情境之時,或許我可以這樣做

  • 耐心地表達自己的意見,聆聽對方的意見
  • 脫離情緒的言語,抽絲剝繭的了解對方真實的意圖

當我想要表達「為你好」的意圖時,是不是能夠讓對方明白我全部的意圖,而不是武斷地做出決定。
當我想要表達「我很好」的意圖時,是不是能夠先了解對方意圖,並好好陳述自己的意見呢?

也許最後沒有結論、或是交集,但理解的過程往往比結果更加重要。

因為過程本身,影響的是彼此未來的關係。
單一事件的結果,並不代表永遠的勝利。

My C++11 async practice

I’d like to rewrite concurrent.futures.ProcessPoolExecutor.map may be slow in some cases by using C++11 async and future

It takes some time to remember C++ stuff, but it’s still fun to write C++ code 🙂

Sample Code

#include <iostream>
#include <cmath>
#include <future>
#include <queue>
#include <vector>
#include <tuple>
#include <chrono>
#include <iterator>
#include <memory>
#include <functional>

typedef uint32_t NUM_TYPE;

const size_t REPEAT_TIMES = 100;
const NUM_TYPE TEST_NUM = 200000;

const size_t MAX_WORKER = 4;

typedef std::tuple<NUM_TYPE, bool> RESULT_TYPE;
typedef std::priority_queue<NUM_TYPE, std::vector<NUM_TYPE>, std::greater<NUM_TYPE>> STORE_DATA_TYPE;

RESULT_TYPE is_prime(NUM_TYPE num) {
  if (num % 2 == 0) {
    return RESULT_TYPE(num, false);
  }

  NUM_TYPE sqrt_num = static_cast<NUM_TYPE>(std::floor(std::sqrt(num)));
  for (NUM_TYPE i = 3; i < sqrt_num + 1; i+=2) {
    if (num % i == 0) {
      return RESULT_TYPE(num, false);
    }
  }
  return RESULT_TYPE(num, true);
}


std::queue<NUM_TYPE> is_prime_wrapper(std::vector<NUM_TYPE> nums) {
  std::queue<NUM_TYPE> result;
  for (auto & num : nums) {
    RESULT_TYPE prime = is_prime(num);
    if (std::get<1>(prime)) {
      result.push(std::get<0>(prime));
    }
  }
  return result;
}

STORE_DATA_TYPE async_worker(NUM_TYPE num) {
  STORE_DATA_TYPE result;
  std::vector<NUM_TYPE> nums;

  for (NUM_TYPE i = 0; i < num; ++i) {
    nums.push_back(i);
  }

  const size_t NUM_SIZE = num / MAX_WORKER;
  std::vector< std::future<std::queue<NUM_TYPE>> > futures;
  for (size_t i = 0 ; i < MAX_WORKER; ++i) {
    std::vector<NUM_TYPE> split_nums;
    if (i == MAX_WORKER - 1)  {
      split_nums = std::vector<NUM_TYPE>(std::begin(nums) + NUM_SIZE * i, std::end(nums));
    } else {
      split_nums = std::vector<NUM_TYPE>(std::begin(nums) + NUM_SIZE * i,
                                         std::begin(nums) + NUM_SIZE * (i+1));
    }
    futures.push_back(std::async(std::launch::async,
                                 is_prime_wrapper,
                                 std::move(split_nums)));
  }

  for (auto& worker : futures) {
    auto partial = worker.get();
    while (!partial.empty()) {
      result.push(partial.front());
      partial.pop();
    }
  }
  return result;
}

int main() {
  using std::chrono::high_resolution_clock;
  using std::chrono::milliseconds;

  milliseconds total_ms(0);
  for (size_t i = 0; i < REPEAT_TIMES; ++i) {
    auto t0 = high_resolution_clock::now();
    STORE_DATA_TYPE result = async_worker(TEST_NUM);

    auto t1 = high_resolution_clock::now();
    total_ms += std::chrono::duration_cast<milliseconds>(t1 - t0);
  }
  std::cout << "takes " << total_ms.count() / REPEAT_TIMES << " ms" << std::endl;
  return 0;
}

Result

takes 49 ms

concurrent.futures.ProcessPoolExecutor.map may be slow in some cases

A simple test using Python 3.3

Sample Code

from __future__ import print_function

from concurrent import futures
import math
import multiprocessing


def is_prime(num):
    if num % 2 == 0:
        return False

    sqrt_num = int(math.floor(math.sqrt(num)))
    for i in range(3, sqrt_num + 1, 2):
        if num % i == 0:
            return False

    return True


def prime_worker(count):
    return sorted(num for num in range(count) if is_prime(num))


def future_prime_worker(count):
    with futures.ProcessPoolExecutor(4) as executor:
        numbers = range(count)
        return sorted(num for num, prime in
                      zip(numbers, executor.map(is_prime, numbers)) if prime)


def multiprocess_prime_worker(count):
    pool = multiprocessing.Pool(4)
    numbers = range(count)
    return sorted(num for num, prime in
                  zip(numbers, pool.map(is_prime, numbers)) if prime)


if __name__ == '__main__':
    import timeit
    t = timeit.timeit("prime_worker(200000)",
                      number=1,
                      setup="from __main__ import prime_worker")
    print (t)

    t = timeit.timeit("multiprocess_prime_worker(200000)",
                      number=1,
                      setup="from __main__ import multiprocess_prime_worker")
    print (t)

    t = timeit.timeit("future_prime_worker(200000)",
                      number=1,
                      setup="from __main__ import future_prime_worker")
    print (t)

Result

1.1414704178459942
0.7401300449855626
88.23592492006719

References:
concurrent.futures.ProcessPoolExecutor.map() doesn’t batch function arguments by chunks

Apache HTTP Server 2.4 + PHP-FPM + mod_proxy_fcgi

You may use mod_fastcgi to integrate with your php-fpm.

Unfortunately, it seems not support Apache 2.4 at the moment.
Alternatively, you can use ByteInternet/libapache-mod-fastcgi

Or use mod_proxy_fcgi instead.

Configuration for DocumentRoot

ProxyPassMatch ^/(.*\.php(/.*)?)$ fcgi://127.0.0.1:9000/path/to/your/documentroot/$1

For virtual hosts, please add ProxyPassMatch for each virtualhost

<VirtualHost *:80>
    DocumentRoot /path/to/your/documentroot/wordpress
    <LocationMatch ^(.*\.php)$>
        ProxyPass fcgi://127.0.0.1:9000/path/to/your/documentroot/wordpress
    </LocationMatch>
    ServerName somewhere.com
</VirtualHost>

Reference:
http://wiki.apache.org/httpd/PHP-FPM

Using wheel for python deployment

I’ve discussed with my colleagues for better deployment process which replaces existing RPM based deployment.

There are several projects provides binary format, virtualenv manipulation, and caching for python deployment.

For example:

They are both great projects, but somehow our requirements are trivial

  • Install python package system-wise (may not be virtualenv)
  • Avoid binary recompile, should be done in build machine once
  • The deployed machines should not install gcc and development tools
  • The deployed machines may have limited internet access rights

After some investigation, wheel seems a perfect solution.

Usage:

# wheel need pip 1.4+
pip install --upgrade pip==dev
pip install wheel
pip wheel --wheel-dir=/tmp/wheelhouse flask
pip install --use-wheel --no-index --find-links=/tmp/wheelhouse flask

It also handles c-extension so well (we’re using MySQL-Python)
We can also deploy /tmp/wheelhouse to other deployment machines.

The result looks like

$ time pip install --pre --use-wheel --no-index --find-links=/tmp/wheelhouse flask
Ignoring indexes: https://pypi.python.org/simple/
Downloading/unpacking flask
Downloading/unpacking itsdangerous>=0.21 (from flask)
Downloading/unpacking Werkzeug>=0.7 (from flask)
Downloading/unpacking Jinja2>=2.4 (from flask)
Downloading/unpacking markupsafe (from Jinja2>=2.4->flask)
Installing collected packages: flask, itsdangerous, Werkzeug, Jinja2, markupsafe
Successfully installed flask itsdangerous Werkzeug Jinja2 markupsafe
Cleaning up...
pip install --pre --use-wheel --no-index flask 0.31s user 0.07s system 90% cpu 0.423 total

The PEP 427 was accepted, and the further pip release will contain wheel support.
I’m looking forward to seeing wheel will be used in our environment.

Use textwrap.dedent

If you’re writing long string in python, it will break indent and look like the followings

import string
import random

def generate_long_string(n):
    for _ in xrange(n):
        name = "".join([random.choice(string.letters) for i in xrange(15)])
        msg = """\
Hi %s,
Greeting

Hubert
""" % name
        yield msg

for msg in generate_long_string(3):
    print msg

You can use textwrap.dedent to solve it

import string
import random

from textwrap import dedent

def generate_long_string(n):
    for _ in xrange(n):
        name = "".join([random.choice(string.letters) for i in xrange(15)])
        msg = """\
            Hi %s,
            Greeting

            Hubert
            """ % name
        yield dedent(msg)

for msg in generate_long_string(3):
    print msg

數位下載音樂拆帳

這樣算起來在 iTunes 買專輯,唱片公司或創作者獲得的利益最大,但考慮到國內的平台而言,或許 KKBOX 跟 iNDIEVOX 都是不錯的選擇。

iNDIEVOX 7月1日也改了一下定價的策略,我想如果是目前的定價,我會比較偏向去 iNDIXVOX 買專輯。

自2013年7月1日起,iNDIEVOX將對音樂售價作調整,「購買整張專輯」訂價標準變更如下:

  • 唱片歌曲1~20首,總價超過台幣200元以200元計算;低於200元依據「專輯售價表」小於或等於單曲售價累加總額之價格計算。
  • 唱片歌曲21首以上,總價超過台幣380元以380元計算;低於380元依據「專輯售價表」小於或等於單曲售價累加總額之價格計算。