PHP and Java in "scalability" catfight

perrin on 2004-07-03T23:26:25

There is a bit of a flap which you may have seen on Slashdot recently over the moving of Friendster (aka "The Devil") from Java to PHP. The whole thing is summarized nicely by Chris Shiflett here.

The depressing thing about discussions like this is how much misinformation gets spread around. Here are a few examples:

  • The guy on Slashdot claiming that Apache 1 scales badly (that must be why nearly all of the largest websites in the world use it) and that a piddly little site like aceshardware.com proves the necessity of Apache 2 and a threaded model.
  • The guy who misread the abstract of an ACM article and is now posting all over the place that Java outperformed Perl and PHP by 8 times on a benchmark in this article. (Totally wrong. The article says that static pages were 8 times as fast as dynamic pages, and that Java sometimes outperformed the others and sometimes was slower.)
  • The PHP proponent who thinks that George Schlossnagle's article about PHP and Oracle where he says to use the equivalent of the mod_perl reverse proxy setup shows some kind of application-specific thinking on the part of PHP developers. (It's just a standard workaround that anyone who uses PHP or mod_perl would use to keep from wasting database connections in apache processes that are serving images. Java servlet architectures do not normally have this problem, and thus don't need this workaround.)
  • The people claiming that Yahoo does everything in PHP. PHP just replaced some in-house proprietary templating systems at Yahoo. Yahoo still uses C/C++ and Perl for the bulk of their applications, with PHP just serving as the final delivery engine, like a fancy SSI.
  • Even Rasmus Lerdorf with his mantra about how PHP is a "shared nothing" architecture sounds a bit disingenuous. Any interesting dynamic application is going to share something, probably through a database, with possibly some additional sharing in files or shared memory or a database-like daemon such as memcached. The mechanisms for sharing this data on a cluster of PHP servers are essentially the same as the ones for doing it with Java servers.

There is one thing about Java which can be a problem for scalability, and that is the proprietary nature of many of the server environments people run it on. If you buy a commercial J2EE server, the vendor will promise you incredibly scalable session data storage and database caching, but they will typically not tell you how they do it. When you run into trouble, you have no knobs to twiddle and no alternate options because the whole thing is closed to you. This is why I would recommend that people wanting to use a Java solution should look at the open source tools instead, especially the new lightweight ones.

The most unfortunate thing about this PHP/Java discussion is that people still aren't talking about the things that actually do make a system scalable, like ways of limiting the resources that each active user on the site takes up, and ways of handling overload gracefully.

Okay, I guess I had a little more to say about this than I thought. I'm going to stop here for now. Maybe some of this will filter into my OSCON talk.


Open Source, but still heavyweight

dws on 2004-07-04T06:04:35

This is why I would recommend that people wanting to use a Java solution should look at the open source tools instead, especially the new lightweight ones.

Friendster was using Tomcat, which is open source, but is hardly lightweight.

The first shoe dropped on J2EE when people began shying away from container-managed persistence in favor of rolling their own. I wonder if we're seeing the second shoe drop.

Re:Open Source, but still heavyweight

lachoy on 2004-07-04T13:26:45

I think Perrin is referring more to the lightweight containers like Spring and (to a lesser extent) Pico Container.

Spring is much easier to work with than J2EE and replaces a good deal of its useful functionality -- declaring transactions and security, creating components and dependencies -- and allows you to use Spring within J2EE for the other stuff like remoting, or when your project shalt use WebSphere or the like.

It's also interesting to note that one of the main motivations behind the lightweight containers is testability. EJBs are very difficult to test because you have to run them in-container, which makes the startup time too onerous to actually run the tests as many times as you should. I think testability is a really useful code smell -- if something is hard to test, you should probably stay away :-)