Jeff Barr's Blog

Things I Like..

Building Sage (Open Source Math) on Amazon EC2

A quarter or two ago my son Andy took a rather unique course at the University of Washington. In his Math 480b: Programming for the Working Mathematician course, Andy learned about a number of important topics including the Unix command line, Python programming (including classes, exceptions and decorators). In the second half of the quarter they learned about the Sage open source math system.

The course ended by teaching the students how to make a genuine contribution to Sage. They were asked to find an open bug, figure out how to fix it, fix it, and to create and submit a patch. In essence, they learned a very practical skill that is taught all too rarely in school — how to be a contributor to an open source project. This is pretty significant. Despite the presence of the word “open”, I have come to learn that many people don’t understand the actual workings of the process. Walking the students through it, and having them make an actual contribution, will ensure that they leave school with this knowledge under their belt. With any luck it will be easier for them to find jobs and they’ll be more useful and more productive once they start.

Exposing the students to the source code also conveys the message that they have the ability to modify (and hopefully improve) their tools. In the COM 546 class that I am taking this quarter, we discussed the fact that a lot of modern hardware cannot even be opened up (apparently, the newest iPhones are held together using screws that have a proprietary head). While certainly fine if you want to treat your phone as an appliance, this closed model doesn’t encourage customization or hacking. It certainly is not in accord with the Maker’s Bill of Rights.

I watched Andy put Sage through its paces and thought that it looked kind of cool. While I am no mathematician, I could definitely appreciate what it was able to do. Andy pointed me at a blog post written by his professor, and I was intrigued. After finding himself “fundamentally dependent on a closed source non-free program in order to continue my own research,” his professor (William Stein) decided (by coincidence, on the same day that he learned about the GPL) to design the language that would eventually become Sage. The full story is in the aforementioned blog post and is interesting (but far too long to summarize here).

Andy spent about four days getting the Sage code to build on his Acer netbook. At the time I didn’t even realize what a feat this was. For his final project, Andy and his team-mates used Sage to solve something called the Ham Sandwich problem. They documented their work here.

Sage is a very rich and very powerful tool. It includes a large number of powerful packages in the standard distribution, along with a smaller set of optional packages. It has a nice Ajax-powered notebook user interface (you can also access it from a shell prompt). Sage also runs a number of other math packages as child processes and presents a unified interface to them.

Earlier this month I was asked to teach cloud computing workshop at Stanford University. I was told that the students would be from the Applied Math and Computer Science departments. Even though I knew that they used MATLAB, I thought that it would be fun to build Sage and to show them what an open source math tool looked like. Binary versions of Sage are readily available, but I wanted to see what it would take to build it from scratch.

I launched one of the more powerful EC2 instances (a High Memory Quadruple Extra Large) with the 64-bit Amazon Linux AMI. I logged in to the instance and installed a few packages based on the information in the Sage Installation Guide. I installed gcc, gcc-gfortran, gcc-c++, make, m4, perl, tar, readline, and readline-devel.

Then I downloaded the source distribution and captured some pre-build facts. The source tree contained 150 files and consumed about 350 MB of disk space. I also captured the system process count from /proc/stat.

I set the MAKE environment variable to “-j8″ since the EC2 instance had 8 cores and typed “time make.” I watched the build tree run configure again and again, and saw it compiling a ton of code in rapid-fire fashion. The load average never went above 4; there was just not that much parallel building to be done.

75 minutes later, the build was complete. The tree had grown to a staggering 2.5 GB and now contained 84,802 files! Even more interesting, the build process had used 397,484 processes.

I launched Sage from my shell, and typed a command to start the notebook user interface:

sudo ./sage -n port=80 interface='' require<em>login=False open</em>viewer=False

And that was it. I opened up my browser, copied the EC2 instance’s Public DNS name, and pasted it into the address bar, and opened up the administrator page. Here’s what it looked like after I created a couple of notebooks:

I opened up my notebook:

Then I did some simple calculations (my math skills are so rusty it is a wonder that I don’t have Tetanus):

I also drew a cool graph using some of the Sage sample code:

I have this fantasy of actually using Sage to re-learn math, starting from the very basics (Calculus or Algebra II), but there’s no time for that right now.

All in all, I find Sage to be more than impressive. The build process was remarkably clean for such a huge package, especially one that included code from so many different projects.

Just for fun, I also built Sage on an EC2 “small” instance. With MAKE set to “-j2″, this build took 343 minutes, or nearly six hours.

I know that members of the Sage team are interested in building it on EC2 and I hope that they find this post. If you are on the team and would like to pick up where I left off, let’s talk. You can get access to Amazon EC2 through the AWS in Education program.