Web Services and Inefficiencies

ziggy on 2003-03-31T16:16:37

One of the main themes of Paul Graham's keynote at PyCon DC last week was harnessing inefficiencies to make programs simpler and more elegant. The simple, inefficient solution will be easier to create than the (prematurely) hand-optimized solution, and increases in computing power will pick up the slack. Expect that computing power will increase, and you can take advantage of that to build simpler, more inefficient solutions sooner.

Graham's theory is that a good programming language will be able to span the expanse between problems that need highly efficient implementations, and problems that can deal with massively inefficient implementations yet continue to perform acceptably. (In Graham's idealized universe, these differences are handled with Lisp: the difference between the easy-to-write, massively-inefficient version and the massively-efficient version are the difference between a Lisp interpreter and a Lisp compiler using some optional type declarations.)

Web Services are a real world example of this principle. I tend to oscillate between loving and hating web services. Today, I'm on an uptick. Here's a passage from an article on why .NET/Web Services are not well-tuned for a specific network-based application:

We all know that Web services are a wonderful technology for allowing systems to exchange information at the object level. However, to support this functionality, those systems must have a common language that defines their interfaces (e.g., XML) and a common transmission carrier (e.g., SOAP). Unfortunately, what XML and SOAP give developers in ease of use, they take away in their verbosity. Moving even small amounts of data between two systems using Web services can result in large, wordy text files that must be serialised, transmitted, received, and deserialised. These processes are time-consuming, processor-intensive, and highly dependent upon the size of the pipe between the endpoints.

During our performance testing of the Web application, it became clear that replacing the existing simple serialisation mechanism with one based on Web services would generate a substantial performance hit. When running in a local network environment, the difference was not very noticeable; it was taking only two or three minutes per connection vs. one to two minutes before. However, remote clients?Windows CE applications?were required to access the application. Our two or three minutes over the wire became 10 to 20 minutes over a digital wireless phone connection. In the end, we decided to use ASP.NET and rewrite the communications protocol to be more efficient, while keeping the simple serialisation mechanism instead of moving it to Web services.

None of these tradeoffs are new. In fact, some people at PyCon described Graham's keynote as "remarkably content free".

Nevertheless, I do think that Graham is on to something. In an idealized world, something like SOAP should be the first tool used to connect to a remote process. Ideally, SOAP should just work and interoperate (the fact that it doesn't is a serious limitation). Although SOAP is inefficient, it is a (somewhat) standardized way to connect any two processes together. Once a working solution exists, then RPC mechanisms can be replaced to something better performing and domain specific.

The fact that SOAP frequently does not meet these idealized principles of being simple, easy to use and not noticably inefficient shows that it does not meet some of Graham's principles. This could be because (1) SOAP is ahead of the processing power curve, (2) SOAP is inefficient in the "wrong" way (i.e. it is not transparent), (3) there is no simple way to replace an inefficient SOAP implementation with a more optimized one, or (4) SOAP is simply the wrong abstration upon which to build a system.

In conclusion, SOAP purports to provide some of the expected properties of a good, easy-to-use but inefficient abstraction. Some of the inefficiencies described in this project are to be expected, and rewriting an RPC layer is not an unexpected optimization. However, the biggest problem with SOAP is that it is purported to be the last word in messaging and RPC, not the easy-to-use, inefficient first draft.


Inefficiency is often not significant

jmm on 2003-03-31T21:30:26

The simple, inefficient solution will be easier to create than the (prematurely) hand-optimized solution, and increases in computing power will pick up the slack.

Tom Christiansen has long been saying that, as a first approximation, Perl code is slower than the same C code by a factor of e. Sometimes it is close to or below 1, like when a regexp did most of the work using features that aren't in commonly available C regexp libraries [especially before PCRE came along], sometimes it is much more than e. Overall these extremes tend to balance each other. Using e as the measure emphasizes that the metric is somewhat irrationally based, that is, ad hoc.

In a talk on efficiency that we gave back when OSCON was called The Perl Conference, I noted that it is rare for a program to be close to the speed limit. Usually it is orders of magnitude off - either faster or slower. In those cases, switching from Perl to C for "efficiency" was worse than pointless, it was a waste of time and money.

Anyhow, it is often the case that the "increases in computing power" that are relevant are the ones that occurred a few years or a decade or more in the past.

Re:Inefficiency is often not significant

ziggy on 2003-03-31T22:04:00

Tom Christiansen has long been saying that, as a first approximation, Perl code is slower than the same C code by a factor of e.
Surely, and that idea isn't original to Tom, just as it isn't original to Paul Graham. And we all know that programmer productivity is more important to machine efficiency (except when they are in conflict).

The industry in general seems to have forgotten two important points. On the one hand, it's important to find these meaningful abstractions that can add an order of magnitude increase in productivity for a little ineffeciency. VMs are like that, as is garbage collection. On the other hand, some abstractions are at the wrong level -- SOAP offers a smidgeon of productivity but costs too much in terms of inefficiency. (You could say similar things about multithreaded programming in some situations.)

I think Paul's point is that there are good inefficiencies and bad ones. COBOL is an example of a bad one -- programmers don't need exceedingly verbose syntax to craft programs in an "English like manner"; they need an expressive syntax that exposes a small number of powerful primitives (lists, hashes, etc.).

Perl is an excellent example of both good and bad tradeoffs -- you can do a lot with lists, hashes, scalars and regexes, but the overhead to make your own classes is a smidgeon too high.

Anyhow, it is often the case that the "increases in computing power" that are relevant are the ones that occurred a few years or a decade or more in the past.
I'm not as sure about that point.

Looking backwards, we can see how inefficiencies we demand today were impractical 10 years ago. People were saying that exact same thing 10 years ago and 10 years before that. There's a good chance that we'll be able to say it 10 to 20 years hence, although we may not be able to fathom which inefficiencies will become practical in that timeframe, just as VMs, garbage collection, JIT and compiler algorithms that were impractical a decade ago are practical today.

Re:Inefficiency is often not significant

brev on 2003-04-01T18:28:58

You've been studying SOAP longer than I have, but my one experience with it (ppm3, in 2001) leads me to run screaming in the opposite direction.

I know that the hype around SOAP is still fairly high and of course MS will push it as long as they have to, but it just looks fundamentally wrong, in addition to merely being inefficient.

1 - fundamentally wrong;

a) it tries to place metadata and data in the same payload. It's the one-layer model of networking. Except that it doesn't actually perform networking. Many important aspects of the transmission are lost by the time it's accessible to the program. So you are forced to break out of the SOAP world to peek at transmission metadata.

You must adapt your entire system to this concept or perish. Kiss goodbye to security.

b) but do you get to work at some superior, more abstract paradigm for all this effort and risk? No, you get... pure function calls. SOAP: Dragging every language down to COBOL's level. The lack of any standard way to create objects, or pass around object references, seems to me a critical failure, if they want SOAP to enable all sorts of next-generation communication paradigms.

2 - implemented wrong;

a) I'm not against XML if the situation requires it, but the genericity is pretty shallow. It starts to pay off only when you have a many-to-many interaction model where one source does not control client implementations. A model that forces me to always work under worst-case-scenario assumptions does not seem like a good solution.

b) The SOAP format itself is mind-bogglingly silly. Debuggability and transparency of the SOAP format is largely a myth, for anything complex. Granted, one day there will be SOAP message analyzers just like network traffic analyzers but exactly what was the point of the switchover?

c) the problems with all the various SOAP libraries out there. One day that will finally settle down, but as Keynes said, in the long run we are all dead.

d) the numerous extensions that people propose to SOAP seems like a measure of its failure. Even if it's only the failure to identify one problem and solve it well.

Re:Inefficiency is often not significant (legible)

brev on 2003-04-01T18:31:31

You've been studying SOAP longer than I have, but my one experience with it (ppm3, in 2001) leads me to run screaming in the opposite direction.

I know that the hype around SOAP is still fairly high and of course MS will push it as long as they have to, but it just looks fundamentally wrong, in addition to merely being inefficient.

1 - fundamentally wrong;

a) it tries to place metadata and data in the same payload. It's the one-layer model of networking.
Except that it doesn't actually perform networking. Many important aspects of the transmission are lost by the time it's accessible to the program. So you are forced to break out of the SOAP world to peek at transmission metadata.

You must adapt your entire system to this concept or perish. Kiss goodbye to security.

b) but do you get to work at some superior, more abstract paradigm for all this effort and risk? No, you get... pure function calls. SOAP: Dragging every language down to COBOL's level. The lack of any standard way to create objects, or pass around object references, seems to me a critical failure, if they want SOAP to enable all sorts of next-generation communication paradigms.

2 - implemented wrong;

a) I'm not against XML if the situation requires it, but the genericity is pretty shallow. It starts to pay off only when you have a many-to-many interaction model where one source does not control client implementations. A model that forces me to always work under worst-case-scenario assumptions does not seem like a good solution.

b) The SOAP format itself is mind-bogglingly silly. Debuggability and transparency of the SOAP format is largely a myth, for anything complex. Granted, one day there will be SOAP message analyzers just like network traffic analyzers but exactly what was the point of the switchover?

c) the problems with all the various SOAP libraries out there. One day that will finally settle down, but as Keynes said, in the long run we are all dead.

d) the numerous extensions that people propose to SOAP seems like a measure of its failure. Even if it's only the failure to identify one problem and solve it well.

Re:Inefficiency is often not significant (legible)

ziggy on 2003-04-01T19:02:42

Don't get me wrong; SOAP is not without its serious faults. But it does do one thing well: abstract out machine-to-machine IPC to a library-level issue. And it does so in a neutral manner -- no more bit twiddling to get two processes to work together.

I think there's a modicum of strength to that approach. And that's where all of the benefit lies. SOAP is great for getting two black boxes to communicate (modulo interoperability problems that shouldn't exist in an interoperability protocol). It's a great first-pass solution. It would be a better first-pass solution if it were more simply specified and easier to implement or build upon. All of the issues you raise are proof that it's an exceedingly poor general solution.

At the end of the day, I agree with you that SOAP is a damnable solution, and it is doomed in the long term.

The Lisp analogy that Paul Graham used is that you do not absolutely need numeric primitives in Lisp; it is possible to represent the number N as a list of N elements. It's not an ideal representation, but it is transparent and workable, and in an ideal world, switching Lisp implementations should transparently provide a more efficient representation (native integers). In this respect, SOAP is slightly better than an N-element list. The core problem is we don't have a better solution to judge SOAP against, so it's unclear precisely how sucky SOAP is, because there's no reasonably transparent replacement that works a thousand times better. Yet.

SOAP successors

brev on 2003-04-02T07:08:28

I've often wondered what the non-sucky SOAP would look like.

Ideally: - does not require any new libraries or code for your app. Not even HTTP networking. - allows the full gamut of REST-ish operations. Including remote object creation. And it should make passing around object references trivial. - does not specify data format.

Is it too bizarre to contemplate a RESTfs?

$ cat order.xml > /dev/rest/widgets.com/create-purchase-order

Unix seems to prove any sort of communication can be done over /dev. If I can talk to my scanner like this, why not remote servers?

I admit this might not be easy to do on Windows.

SOAP successors (sigh, the legible version).

brev on 2003-04-02T07:09:21

I've often wondered what the non-sucky SOAP would look like.

Ideally:
- does not require any new libraries or code for your app. Not even HTTP networking.
- allows the full gamut of REST-ish operations. Including remote object creation. And it should make passing around object references trivial.
- does not specify data format.

Is it too bizarre to contemplate a RESTfs?

$ cat order.xml > /dev/rest/widgets.com/create-purchase-order

Unix seems to prove any sort of communication can be done over /dev. If I can talk to my scanner like this, why not remote servers?

I admit this might not be easy to do on Windows.