Jeff Barr's Blog

Things I Like..

Web-Scaling Syndic8 – Introduction

One of the new terms we are using at Amazon is Web-Scale Computing. Using this model, businesses need not invest huge amounts of capital in infrastructure. Instead, they can use the Amazon Web Services on a pay-as-you-go basis. The business can start small, with minimal overhead. Perhaps it needs a little bit of storage (e.g. Amazon S3) and a server or two (e.g. Amazon EC2). As the business starts to take off, more storage and more servers are brought online when and as needed, and not a moment before. Instead of having to invest in anticipation of future needs that might or might not ever develop, costs rise in direct proportion to actual usage. With the right infrastructure in place, this can happen at a very fine-grained level. For example, I have talked to people who will run 5 EC2 instances during US business hours and then scale down to 1 or 2 on weekends and after hours.

Its time for me to practice what I preach and to start web-scaling Syndic8. I don’t anticipate getting rid of any hardware in the near future, but I am pretty sure that I can use S3, EC2, and SQS to create some much-needed room to grow.

Although I am ostensibly doing some of this work as part of my job, it will be charged to my own Amazon account and I’m going to do a real economic analysis to make sure that I can do it on a cost-effective basis. I don’t have a funny-money account and I don’t have a rich uncle.

In the next couple of months I will be writing a series of blog posts to document the calculations and design decisions that I made as I modify the Syndic8 code and architecture to make it web-scale. I will take on the following aspects of the Syndic8 processing system:

  • Ping Processing – Syndic8 is a ping receiver and handles 2 to 3 million pings per day and does some processing on them.
  • Feed Polling – Syndic8 downloads the latest content from nearly 400,000 feeds every 24 hours.
  • XML Storage – Syndic8 stores two weeks of raw XML for each feed.

I will build new, general purpose code as needed for this project. I will, however, try to use existing PHP packages where possible.

Because I don’t have a test server, I will be making all of the code changes in live, step-by-step fashion. I don’t anticipate the need to take Syndic8 offline and plan to rebuild the foundation code under the running system. This might seem tricky or even impossible, but it can be done with careful planning and sufficient attention to detail. In the past I have managed to change database schemas, file storage formats, and so forth without dumping and reloading huge amounts of data or running “alter table” commands on tables with millions of rows.

This is going to be fun!