I monitor Google's tech talks on Google Video and yesterday a very interesting one popped up: "Nooglers-and-the-pdb: Reactor". While I was watching the video I realised it was an internal Google talk for new Googlers that had been publicly posted by mistake, and it has now been withdrawn. Google Blogoscoped has a good summary of the talk. The blurb is:
Reactor is the backend that provides feed services for Google Reader, igoogle, and other applications. It provides access to the full history of feeds, with tagging and read state management. In this talk we will discuss the design of the reactor backend, including the recently-launched search feature.
It was very interesting as they talk about all sorts of internal Google projects and their code names. It includes the BigTable schema for Reactor, how the new search feature is a tree of 150 servers (150 million documents) to spread network bandwidth (and 40 machines serving 40 million fresh documents) and the fact that the team is three people on the backend and three people plus one intern on the frontend.
Reactor has two tables in BigTable: an items table and a streams table. The items table has an item column: ID which is a hash of the URL and other things, a column and a tag column family which has tags for each user that has read it. The stream table has two kind of streams: feeds streams (keyed by the URL) and with a list of item ids and for users' read tags. The first page is expensive to generate:
"Any time you come to Reader it's doing tens if not hundreds of lookups of BigTable lookups in parallel to find all of your streams and the items in them".
It was quite interesting. Google should make all these talks public ;-)