The Unix schedular and the virtual memory system use conflicting algorithms and fight with each other. The schedular favors interactive processes that require CPU only in bursts. The virtual memory system favors programs that demand lots of RAM. The schedular punishes long running CPU pigs. The virtual memory system punishes interactive programs that only run now and then.
As a result, a Unix system fights with itself
and the result is a loaded system with 256 megs of RAM has as bad of interactive performance as a 20 year DEC 3100 with 8 megs of RAM - running the same version number of the same operating system.
The virtual memory system and its page file is
critcal to performance a system with too little memory, such as
a heavily loaded server or a workstation running
the Mozilla browser which leaks memory faster
than any other single Unix application I've
ever seen in my entire life except for the
RealMedia Server which is infamous in its
leakiness.
Let me give you an example.
The other day, I was at
pantsfactory.org, a
computer security blog an old friend of mine
helps run.
They had a link with the text Dowd IM'd me this one as a heads up... hope you're not using Firefox just because IE sucks so bad!. Mindlessly, I click on the link. My HD starts to buzz and grind. I stare blankly at my system for a few moments, not having had my morning coffee yet. The grinding continues. Slowly, in my head, I start to add two and two: I had just clicked a link to an exploit for the browser I'm running and all of my files are being deleted. The system had been grinding for 30 seconds, non-responsive. I double click the x-term I'd been using 30 seconds ago, and the window manager draws its border but the xterm doesn't repaint itself (not a fancy xterm - just the little tiny xterm that comes with X11). More time pases. I think about pulling the plug and yanking out the battery but I have a load of unsaved work, too! Finally, the xterm draws in, and I killall Mozilla and start looking at my files (the system grinds for a solid minute after I kill Mozilla as Linux reclaims the memory). Nothing seems to be gone. All of my files seem to be there. I use Dillo to look at the page that had caused the mess: it's perfectly innocent. Mozilla had simply been running too long and had built up a half a gigabyte of allocated VM. Opening that first link of the day pushed out the RAM for most of X, all of the other programs I was running, the program pages for bash, xterm, and everything else - except for Mozilla, parts of which it was feveriously throwing around trying to get them all to fit into a memory a quarter the size of the crap the program had allocated.
Bloatware like Mozilla uses a large number
of memory pages and uses them frequently, so, as
long as smaller applications aren't actively
running, they quickly get swapped out to disc.
Once an application like xterm+bash is issued
a command or asked to redraw, it gets CPU
because it has a higher priority (actually, a low priority is a high priority, but I'm using the logical sounding terms rather than the correct ones). xterm+bash get more and more of their pages swapped in, slowly, over time, as pages get swapped back out to disc because they haven't used them frequently because they're waiting for other pages. This is like filling a leaky bucket. Then bash+xterm begin to respond - a full minute later. Comparing the performance of NetBSD/current on a reasonably well equiped machine made two years ago versus one made 20 years ago, it takes about the same amount of time for the system to become responsive to interactive input after a memory bound application was allowed to run for a minute. In both cases, the footprint of the software to run (parts of X, xterm, and bash) take the same amount of memory - less than 8 megs but it takes over a minute of having parts swapped in and then back out because a third part wasn't swapped in and xterm+bash waited too long for it to swap in. NetBSD has a similar VM strategy to Linux.
.
Favoring interactive processes by the schedular doesn't make long running applications take longer to run - it simply makes interactive applications more responsive, VM permitting. However, if the virtual memory system were to favor interactive processes, the system would have to do a larger total amount of paging and long running applications would take longer. However however, disc bound applications quite frequently are run in error and giving undue amounts of VM resources to them only wastes time as the programmer or admin tries to kill them.
While the Unix machine just wants to run its
task as quickly as possibily, admins and programmers often want the system to just
stop, damn it!. Other applications and users are starved for CPU and service while we struggle
to kill the run-away process.
In general, the system would run more slowly
but more efficiently if interactive
applications were favored on the CPU.
There are two topics here:
1. Conflict in design between the schedular and VM system causes pathological behavior that greatly and unnecessarily increases swapping.
2. Superficial solutions exist, but to truly solve the problem, the system must be optimized for either batch or interactive processes, not an indecisive mix, and I argue for favoring interactive performance.
Yes, I'm really suggesting that the pig Mozilla sit there and struggle with 8 fewer megs while
lean and mean xterm+bash move quickly.
As for implementation, using a most-frequently-used algorithm rather than most-recently-used would require a little more CPU but would radically speed both interactive and batch tasks up as swapping is done more intelligently.
A small, lean application like xterm uses only
a few hundred pages during normal use, but
it uses those pages extremely frequently.
A large application like Mozilla uses millions
of pages, and it uses hundreds of thousands
of those pages fairly constantly, but it
uses those far less frequently than xterm+
bash use their hot path code pages.
Using the MMU, Unix like systems can tell
whether a page has been read. They can't
tell how many times the pages have been
accesses - merely whether they've been
accessed since the last time the counters
were reset. On some platforms, this is implemented
in hardware as bits in the processes page
tables. On other platforms, it is emulated
by the OS by marking the pages protected and
then catching the page fault when a read
is attempted.
Right now, most platforms will do a few passes like this. Two passes is enough to divide
all of the pages into two categories:
those that have been accessed in the last
time period (in microseconds) and those that
haven't. This is far too crude to make
intelligent decisions about something
so expensive such as which pages to move to
disc.
(Note to self: this discussion should be
moved up, perhaps before the story.)
Three passes would better indicate which
pages tend, with very approximate statistic
accuracy, to be used more often than others.
Four passes would improve the accurancy.
Five passes would be even better.
Allowing more than one pass implies that
a byte or more of overhead per page (4k or 8k
of physical RAM,
usually) would have to be allocated non-swappable
to store information about the
frequency of use of the page.
At this point, a decaying average of usage frequency might as well be used, and then
usage information as far back as one hundred
ticks would influence the frequency usage
score for each page.
Then, when the VM system swaps a page out,
it is swapping out a page that is statistically
the least likely be used again in the near future.
Intelligent comments, please. I'd like to make
this an Advogato article.
-scott
I run a single-user Squid instance on my workstations and disable the Mozilla and Firefox memory caches. That seems to trim their memory growth. Perhaps it will help you.