Monday, July 30, 2012

More PyPy discussions

I'm very glad that my co-founder of Continuum Analytics,  Peter Wang, has published his recent follow-up blog-post that hopefully clarifies his perspective on the on-going dialogue about CPython and PyPy.

Peter is a fundamentally good-natured person, and he is a lot of fun to be around --- even when he is disagreeing with you.   I'm very fortunate to be working with him on a daily basis.   He can be opinionated, but his ability to connect deeply to a wide-variety of subjects means that you come away from a dialogue with him having learned something (even if you still remain unconvinced by his views).  

Peter is also one of the smartest people I've ever met.   One of my great memories in life is sitting at dinner with Peter and Eric Weinstein while those two great minds treated me, Wes McKinney, and Adam Klein to the most impressive display of metaphor ping-pong I've ever seen covering a wide-variety of topics from social justice to string theory.  I could keep up with the dialogue, but not enough to really participate meaningfully --- and the other two Ivy-league-educated dinner partners were in the same boat.

I fundamentally agree with Peter's perspective that CPython-the-runtime is and will remain the centerpiece of the Python conversation.    In fact, I would say that even more focus needs to be on CPython-the-runtime.   It is great to see improvements in Python 3.3 like the completion of the memory-view implementation and the fixing of the internal string (Unicode) representation, but there are many other improvements that could be made.

It is a wonderful and inspiring thing to see great developers think out of the box with novel projects like Jython, IronPython, and PyPy.   Nonetheless from my perspective we still have a long way to go to really connect the average developer with ideas of array-oriented computing that could really help the continuing onslaught of parallel-devices-in-search-of-software.   As a result, it feels like those wanting Java, .NET, and machine-code integration would be better served by more attention on JPype, Python.NET, LLVMPy, and even CorePy.   Such efforts would also be better for the entire user-base of Python --- especially a majority of industry uses of Python.

But regardless of my perspective, I'm encouraged by the PyPy developer enthusiasm, and I do want to encourage dialogue regardless of my views.   As a result, I am very happy to report that both NumFOCUS and Continuum Analytics recently joined forces to sponsor Maciej Fijalkowski on a small project to create an embedded version of PyPy --- a "PyPy-in-a-Box."  This is an integration of PyPy to the CPython run-time (so that you can speed-up a particular CPython function by calling out to a library-version of PyPy).   This is proof-of-concept code so it is not appropriate for production --- but it is a good example of what is possible when we all work together to promote the Python ecosystem.

The online project is here:  and you can get a binary version that works on 64-bit Linux here:

This approach needs more development to be a viable tool in the CPython ecosystem, but one of my suggestions to the PyPy community is that they focus on "shedding-tools" like this one for the CPython world --- so that everyone can benefit from their innovations.   With an integration effort like embeded PyPy, one can also make better comparisons with tools like Numba --- another dynamic-compilation run-time that uses LLVM and LLVM-py.     Numba has made a lot of progress in the last few months.   In fact, I recently gave a talk on the project at the well-attended SciPy2012 conference in Austin.   You can view my slides that outline and motivate the project online.   An actual release of the project is imminent, but you can already use Numba to very easily write signficant Python code using NumPy arrays that executes at "C-speeds."  But, that is worth another blog-post of its own....