Tuesday, November 30, 2010

Zen of NumPy

While I was on-site working for a client, one of the developers I worked with would begin each day with a brief discussion of one of the tenets from the "Zen of Python." For those who are not familiar with this little pearl of Python goodness. You can find the "Zen of Python" as an Easter egg inside a Python distribution:

>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

The Zen of Python is often quoted from one Python user to another in trying to communicate something of the essence of what makes programming in Python different. While we were discussing one of the points, one of my co-workers suggested that there should be a "Zen of NumPy". This isn't the first time I've heard that suggestion. Actually David Morrill (author of Traits) was the first person who suggested there should be a book about the "Zen of NumPy." I totally agree with him. The only problem is that everybody involved with NumPy has apparently been too busy to write one :-)

With this idea in my mind, when it came time to give a talk on NumPy at the New York Python Meetup group in Manhattan, I decided to create a first-draft of the Zen of NumPy. The phrases are included on one slide in the deck shared here.

I'm interested in feedback on these before proposing them for placement as
numpy.this


Here is my attempt at a "Zen of NumPy"

Strided is better than scattered
Contiguous is better than strided
Descriptive is better than imperative (use data-types)
Array-oriented is often better than object-oriented
Broadcasting is a great idea -- use where possible
Vectorized is better than an explicit loop
Unless it’s complicated --- then use numexpr, weave, or Cython
Think in higher dimensions

I think there are useful edits as well as more statements that could be added to this list. Your feedback is welcome.

6 comments:

  1. "Vectorized is better than an explicit loop"

    This is definitely worth inclusion. I first picked it up when I was learning MATLAB, and grokking it is really what made me more effective there and elsewhere.

    ReplyDelete
    Replies
    1. This is also true in other languages like R :)

      I also like "Array-oriented is often better than object-oriented" : I learned that one the hard way during my Master thesis (not that) few (any more) years ago!

      Delete
  2. Would you be willing to give this as a lightning talk at PyCon?

    ReplyDelete
  3. Good idea for a lightning talk. Yes, I will try and do that at PyCon. The lightning talks fill up fast --- and there is a chance I may not be able to come this year.

    ReplyDelete
  4. I feel that the best thing should come first:
    Contiguous is better than strided
    Strided is better than scattered

    so the best is first. Similar to
    Simple is better than complex.
    Complex is better than complicated.

    ReplyDelete
  5. I would add:

    Take only what is needed

    ReplyDelete