Saturday, June 18, 2011

Python Enhancement Proposals I Wish I Had Time to Champion

Today I was trying to make progress on a few different NumPy proposal enhancements and ended up frustrated knowing that come Monday morning, I will not have any time to follow-up on them. Managing a growing consulting company takes a lot of time (Enthought is over 30 people now and growing to near 50 by the end of the year). There are countless meetings devoted to new hires, program development, project reviews, customer relations, budgeting, and sales. I also take a direct role in delivering on training and select consulting projects. Someday I may get a chance to write something of use about things I've learned along the way, but that is for another day (and likely another blog). This post is to get a few ideas I've been sitting on written down in the hopes that somebody might read them and get excited about contributing. At the very least anybody that reads this post, will know (at least some of) my current opinion about a few technical proposals.

About a month ago, I had the privilege of organizing a "data-array" summit in which several people in the NumPy and SciPy community came together at the Enthought offices to discuss some ideas related to how to improve data analysis with the NumPy and SciPy stack. We spent 3-days thinking and brainstorming which led to many fruitful discussions. I expect that some of the ideas generated will result in important and interesting changes to NumPy and SciPy over the coming months and years. More information about the summit can be learned by listening to the relevant inSCIght podcast.

It's actually a very exciting time to get involved in the SciPy community as Python takes its place as one of the approaches people will be using to analyze all the data that we are generating. In that spirit, I wanted to express a few of what I consider to be important enhancements that are needed to Python and NumPy.

I will start with Python and leave NumPy to another post. Here there are really three big missing features that would really benefit those of us who use Python for technical computing. Unfortunately, I don't think there is enough representation of the Python for Science crowd in the current batch of Python developers. This is not due to any exclusion from the Python developers who have always been very accommodating. It is simply due to the scarcity of people who understand the SciPy perspective and use-cases also willing to engage with developers in the Python world. Those (like Mark Dickinson) who cross the chasm are a real gem.

If anyone has an interest in shepherding a PEP in any of the following directions, you will have my '+1' support (and any community-organizing that I can muster to help you).  Honestly, if these things were put into Python 3, there would be a serious motivation to move to Python 3 for the scientific community (which is otherwise going to lag in the great migration).

Python Enhancements I Want


Adding additional operators


We need additional operators to easily represent at least matrix multiplication, matrix power, and matrix solve). I could possibly back-off the last two if we at least had matrix multiplication. This should have been done a long time ago. If I had been able to spare the time, I would have pushed to hold off porting of NumPy to Python 3 until we got matrix multiplication operators. Yes, I know that blackmail usually backfires and thankfully Pauli Virtanen and Charles Harris acted before I even had a chance to suggest such a thing :-). But, seriously, we need this support in the language. 

The reasons are quite simple:
  • Syntax matters: writing d = numpy.solve(numpy.dot(numpy.dot(a,b),c), x) is a whole lot more ugly than something like d = (a*b*c) \ x.  If the former is fine, then we should all just go back to writing LISP.  The point of having nice syntax is to minimize the line-noise and mental overhead of mapping the mental idea to working code.   For Python to be used with mental efficiency in technical computing you need to write expressions involving higher-order operations like this all the time.   
  • Right now, the recommended way to do this is to convert a, b, c, and x to "matrices", perform the computation in a nice expression and then convert back to arrays.  This is clunky at best.
I've been back and forth on this for 13 years and can definitively say that we would be much better off in Python if we had a matrix multiplication operator.   Please, please, can we get one!    The relevant PEPS where this has been discussed are:  PEP 211 and PEP 225.  I think I like having more than just one operator added (ala PEP 225, but the subject would have to be re-visited by a brave soul).

Overloadable Boolean Operations


PEP 335 was a fantastic idea. I really wish we had the ability to overload and, or, and not. Among other things, this would allow the very nice syntax so that mask = 2<a<10 would generate an array of True and False values when a is an array. Currently, to generate this same mask you have to do (2<a)&(a<4). The PEP has other important use-cases as well. It would be excellent if this PEP were re-visited, championed, and put into Python 3.

Allowing slice object literals outside of []


Python's syntax allows construction of a slice object inside brackets so that one can write a[1:3] which is equivalent to a.__getitem__(slice(1,3)). Many times over the years, I have wanted to be able to specify a slice object using the syntax start:stop:step outside of the getitem. Even, if Python's parser were extended to allow the slice literal to be accepted as the input to a function it would be preferred. The biggest wart this would remove is the (ab)use of getitem to return new ranges and grids in NumPy (go use mgrid and r_ in NumPy to see what I mean). I would prefer that these were functions, but I would need mgrid(1:5, 1:5) to work.

There was a PEP for range literals (PEP 204) once upon a time.  There were some interesting aspects about that proposal, but frankly I don't want the slice syntax to produce ranges.  I would just be content for it always to produce slice objects --- just allow it outside of brackets.

While I started by lamenting my lack of time to implement NumPy enhancements, I will leave the discussion of NumPy enhancements I'm dreaming about to another post.   I would be thrilled if somebody took up the charge to push any of these Python enhancements in Python 3.   If Python 3 ends up with any of them, it would be a huge motivation to me to migrate to Python 3 entirely.

6 comments:

  1. ISTR discussing a module in c.l.py that produced a CustomOperator class overloading the __ror__ and __or__ operators; given this (and the necessary “magic” inside the code), you could:

    from custom_operator import matmult

    a= b |matmult| c

    matpow being an instance of CustomOperator.

    If one overloads other special methods as well closer to the real meaning of the custom operator, it's possible to take advantage of operator precedence:

    from custom_operator import matmult, matpow

    a= b *matmult* c ^matpow^ d

    A hack, I know :)

    ReplyDelete
  2. Yep, found some link: http://groups.google.com/group/comp.lang.python/msg/fde158f60015ccc6

    ReplyDelete
  3. Thanks for the reminder about this idea. I thought it was clever when I first saw it proposed.

    With Python 3's allowing of unicode literals, you might even convince me that using something like this with a special unicode literal (i.e. that looks like a dot) would be acceptable.

    Hmm... Now, we just need someone to propose an appropriate Unicode character to make a variable out of...

    ReplyDelete
  4. You mean Unicode identifiers :)

    I would rather create a custom import mechanism that converts operator-enhanced Python (or whatever) to plain Python before importing a module. There would be drawbacks (e.g. inability to use custom operators directly in the standard Python interpreter unless a custome shell is created) and I am not sure whether this is yet/soon-will-be possible (I believe Brett Canon would be the one to ask about this).

    ReplyDelete
  5. Wouldn't it be very hard to disambiguate slice notation from dictionary creation?

    For example, is the following a set of slices or a dictionary?

    {1:3, 3:4}

    I have no opinion on the additional operators, but will take your word for it (and I have no objection). Overloadable boolean operators I would have found useful on occasion.

    ReplyDelete
  6. Overloading boolean operators sounds like a bad idea. That's what & and | are for.

    But it would be awesome if you could just overload a < b < c. But I guess there's no reasonable way to do this without really changing the language semantics. Perhaps if there were a straightforward way to tell and and or to not short-circuit.

    ReplyDelete