What Is Scalability?

shiflett on 2003-10-19T23:15:57

There is an interesting article on ONJava.com entitled The PHP Scalability Myth. The author describes scalability as follows:

There are a number of different aspects of scalability. It always starts with performance, which is what we will cover in this article. But it also covers issues such as code maintainability, fault tolerance, and the availability of programming staff.

Is this what scalability means? It's certainly not my definition. Do code maintainability, fault tolerance, and the availability of programming staff have something to do with scalability? They can if your definition of scalability takes human resources into account, which seems reasonable.

The definition I find on Dictionary.com describes scalability as:

How well a solution to some problem will work when the size of the problem increases.

This seems like a better definition. A textbook definition would be something to the effect of, "the ability to scale." This is probably a starting point that everyone can agree to. So why do some people argue that certain technologies (PHP, mod_perl, Java, etc.) don't scale? I have always assumed that these people define scalability as the ability for something to scale well and that they're using their own subjective opinions to define what scales well and what doesn't. This is where things go wrong. It also seems that more and more people use scalability as a measure of performance, when this is not the case either. Something that performs very poorly can still potentially scale very well. Scalability is a relative measurement.

Before I say more, I should describe what I think it means for something to scale well. Consider the following three figures:

http://shiflett.org/images/scalability_1.png
http://shiflett.org/images/scalability_2.png
http://shiflett.org/images/scalability_3.png

The first figure represents a case where the amount of required resources grows exponentially compared to the number of users. This is bad. In the second figure, the amount of required resources grows linearly. This is typical (the rate of growth can vary). In the third figure, the amount of required resources grows logarithmically. This is very nice. Of these three figures, my opinion is that both the second and third represent something that scales well. Because I am a Web developer, a growing number of users is typically when the "size of the problem increases" for me. The term "resources" refers to many things, but most people are concerned with cost. Things that cost money include hardware, software, human resources, and time.

Lastly, let's look at an example. Consider two hypothetical technologies, Technology A and Technology B:

Resources required to build an application that supports 100,000 users a day:
Technology A: 10 servers, 5 developers, and 6 months of development time
Technology B: 40 servers, 10 developers, and 3 months of development time

Resources required to build an application that supports 250,000 users a day:
Technology A: 25 servers, 5 developers, and 9 months of development time
Technology B: 50 servers, 10 developers, and 6 months of development

Which technology do you think scales better? Which appears to be the better choice when no more than 250,000 users a day need to be supported? Should things like maintainability and robustness be taken into consideration? How do you measure these things? If you are making decisions based on your assumptions about the scalability of certain technologies without asking these types of questions, you need to stop making such decisions.


scalability

gav on 2003-10-20T00:48:42

I'm sure my feelings about PHP are the same as a lot of PHP programmer's about Perl, you can probably build a decent web app in PHP but 95% of what I've seen is crap.

To build something scalable you need both people that know what their doing and a toolset that doesn't hinder them. The first part of the equation is the most important one.

For example, earlier this year I "fixed" a Java web app by re-writing it in Perl. It ran on 3 dedicated servers - 2 app servers and a DB server. It took a whole bunch of DB queries to render each page, so I wrote something that output static pages (the data changes perhaps once or twice a month). Now it runs on a single shared webserver. This doesn't mean that Perl is more scalable than Java, just that if you don't design a scalable solution from the start you'll have to throw a lot of money at the problem.

Re:scalability

shiflett on 2003-10-20T02:43:49

Actually, I don't think the general opinion of Perl in the PHP community is as poor as the general opinion of PHP in the Perl community. Some of this is warranted, but of course, much of it is not. In general, the Perl community is more advanced as well as more mature (I'm talking about the age of the community, not personality). This, by itself, leads to better code. Yes, there is a lot of bad PHP code, probably a higher proportion than bad Perl code. However, the top PHP developers can solve the same problems as the top Perl developers, and the relative elegance of the solution is at least debatable. Also, with apache_hooks, PHP is able to do things that only mod_perl could do before. PHP doesn't suck as much as you might think. :-)

As far as scalability is concerned, I think my point is that it is only a measurement of one thing. And while that thing is vague (the relation between resources and the size of the problem), it is not more than that. The best solution isn't necessarily the one that scales the best, even when dealing with hypothetical situations where the scalability can be compared.

People Count!

chromatic on 2003-10-20T01:14:48

Things that cost money include hardware, software, human resources, and time.
Should things like maintainability and robustness be taken into consideration?

Yes. Smart people have been saying this for years. I develop better software faster in Perl than I do in Java. (Of course, I develop better software faster in Java than I do in C.)

Relative productivity of a language is important. I think Jack wasn't quite ready to say that, but it's important to realize that ops-per-second isn't the be-all end-all of scalability and value.

Re:People Count!

shiflett on 2003-10-20T02:50:11

Yes. Smart people have been saying this for years. I develop better software faster in Perl than I do in Java. (Of course, I develop better software faster in Java than I do in C.)

But when people attempt to compare the scalability of programming languages (which isn't a practice I am advocating), shouldn't they try to take an "all else being equal" perspective? I know that I should consider the skillset of my current (or at least potential) employees when choosing a solution, but if my goal is only to measure the scalability (whatever that is) of something, should I take such things into consideration?

Re:People Count!

chromatic on 2003-10-20T03:06:06

I don't see what value there'd be in such a comparison. I certainly don't work in an ideal world. The handwavy things that don't matter in purely academic comparisons have a way of actually mattering a great deal in practice.

Re:People Count!

shiflett on 2003-10-20T03:22:46

I don't see what value there'd be in such a comparison.

I said I wasn't advocating it. :-)

I certainly don't work in an ideal world. The handwavy things that don't matter in purely academic comparisons have a way of actually mattering a great deal in practice.

I agree completely, which I guess was part of my point. I generally dislike scalability being used as this paramount characteristic that determines whether a particular technology is "worthy."

Because the true meaning of scalability doesn't justify this practice, it seems like more and more people choose to warp the definition so that it does.

Imagine a manager asking, "Does foo scale?" What is the right answer? To me, this is no different than someone asking, "Do you have a temperature?" Well, of course I do.

linear is the best you're going to do

mary.poppins on 2003-10-31T11:00:28

I don't see how you're ever going to better than linear, given that you have to do *something* for each user of your system, even if it's as simple as serving up a static page.