Some other interested readers pointed out that using numpy's RNGs in the pure Python version would sure improve the performance. Again I went back and tested it.
So without further ado, here is the new timings in the lame machine I am writing this in:
- Pure Python: 107.5 seconds
- Pure Python + Numpy: 106 seconds
- Pure Python + Numpy + storing the results in an array: 103.7 seconds
- Cython + standard library's random and math modules: 102 seconds
- Cython +Numpy: 93.26 seconds
- Cython + GSL RNGs: 5.3 seconds