My Web tests were working well, so I've been asked to run them and really pound our server. The tests take about 20 minutes to run, but I discovered that when I run them in a forked off process, they only take about two and a half minutes to run. I suspect that's because suppressing the output to STDOUT is a performance boost, though I'm surprised that it's that much of a boost. I worry that I did something wrong.
Still, last night I ran my program and forked off 40 processes, each one logging in as a different user and pounding our site. With one test taking two and a half minutes to run, one might guess that 40 processes might take 100 minutes to running sequentially, but since I'm forking off multiple processes, I would guess that it would actually take less time, but in reality, it took 340 minutes. Since I've mostly done Web and database programming, I don't know much about forked code and I'm not comfortable with these results. It looks like I'm misunderstanding how things work. Time to do more research.
Sometime multiple processes can get more done - for example, if a single process spends a large portion of its time waiting for external events then multiple processes can run at the same time without slowing each other down.
However, if the processes all compete for the same resource (i.e. they are CPU-bound, or they spend all of their time reading from different parts of the same disk) then they can interfere proportionally and take n times longer.
Finally, if their competition for the same resource is bad enough (like there are too many to fit into memory, or they cause sequential I/O to be turned into random I/O with lots of seeks) they can interfere more than proportionally and take longer to run together than it would take to run them all one-at-a-time sequentially.
Re:multiple processes can be better or worse
iburrell on 2003-12-05T20:23:12
Another factor can be caching. If the processes are I/O bound but are accessing the same data, then they can benefit from caching. The first process to access the data loads it into memory and the other ones don't have to read it from disk.This happens with databases where the same data and same queries are run.