The New World Of Pipelining

2shortplanks on 2002-02-25T18:03:22

So, at lunch today a few of us from the #openframe channel decided to meet up and have lunch to discuss Pipelining and All That. Dim Sum rocks my world. Justin, if you're reading this, thanks for letting me take a long lunch!

Anyway, between eating lots of stuff many aspects of pipelining were discussed. What's Pipeline.pm all about then? What's the difference between a segment (aka a slot) and a pipeline? In our various models how are we going to link them together? How to we dispatch? What is the store and what can it hold?

muttley talked about Tanker. james talked about openframe and Pipeline.pm. acme complained about getting old (happy birthday tomorrow acme) and blech demanded more pastries.

pdcawley talked about problems with multiple entry points. It turns out that in some applications you simply need to enter in at a different point in the pipeline, or you need some other form of mechanism that floats around with the session (which is at a higher level than the current store) and allows for dynamic dispatch. This is needed for situations where you have multiple communication channels (e.g. multiple web servers, different irc channels, etc.) I'm sure he'll write a better journal entry discussing this.

Of particular interest to me is the complexity of a store in a pipeline model (the store is the name for the "thing" that flows through your pipeline from segment to segment, carrying the data with it.) james neatly sidesteps the issue by having an abstract store. I, being an awkward so and so actually want to know how this is going to work ;-)

The essential problem as I see it is that you should never throw data away. I always want to be able to access my data later on down the pipeline. This idea is simple, every time you do something you add more data, not take data away. In the standard ->get() and ->set() store this means setting a new key for each item in the pipeline. Once your nice segment has done something to "foo" then it shoves it in "bar". OpenFrame apparently currently does this kind of action by having one of each object type in the store, so that when you move from an Apache::Request to your broken up request (having extracted the data you want) you have it under another key.

The essential problem is this; When I insert a segment into an existing pipeline I want to do two things:

  • Preserve the existing data so that should something later on down the pipeline need it, it's accessible
  • Transparently alter data as it passes through so that later segments can use this segment's data being naive of it's presence

Of course, you can see that these ideas are mutually exclusive. In order to transparently replace data (so that you can later act on it and are none the wiser that a segment in the middle has altered it) you have to do just that - and then you can't access the original data anymore, because you've replaced it! Arrrgh.

My proposed solution to this was that whenever ->set() is called the module stores it under a key and the name of the segment that altered it. This way you can either request the generic version with just the key (in which case you'll get the latest version back) or you can request with the data stored under a key the named segment left. Of course, this isn't a perfect solution (if you do this too much you might want to insert something transparently in the pipeline between those elements, and now you need a third key to disambiguate the data, and so on and so on.) But it's a relatively simple solution that might be good enough.

Of course, if someone can come up with a better solution, let me know...


What I'd like to know...

Matts on 2002-02-25T20:11:43

Is how do Pipeline/Tanker's differ from SAX pipelines/machines. I mean apart fromt the fact that James doesn't grok SAX pipelines :-)

Re:What I'd like to know...

2shortplanks on 2002-02-25T21:17:58

I'm only really interacting at the annoying question level atm (I am the thorn in other people's side's.)..I'm not that near the code, so you might want to ask muttley or james, but...

Tanker (from what I understand) is a wrapper framework that wraps around pipelines to add concurency to them and other framework support issues. It's pretty much pipeline agnostic, so you (in theory, it's all 'in theory' code) wrap it round any pipeline - even existing SAX ones.

Pipeline.pm is just a simple pipeline concept that allows you to connect segments together. The dispatch method is designed to be overloaded in order to implement different dispatch techniques (plain perl, POE, maybe even via SAX)

The current idea is to steal coding concepts from this that and the other. The XML stuff is interesting. You're probably right we don't fully grok SAX pipleines properly. Got any good references?

Re:What I'd like to know...

Matts on 2002-02-25T21:53:32

Kip Hampton's articles on XML.com are the best place to start. Just try and read them with an open mind, thinking all the time: "This stuff has applications outside of XML". Honest guv.

Unfortunately it's hard to think about SAX pipelines while taking them out of the context of XML, but really they are highly generic, and a very interesting solution space for a wide class of problems.

It's certainly fun to see people starting to really rave about Pipelining solutions, when this really was one of the major basis for AxKit all those years ago :-)

Re:What I'd like to know...

2shortplanks on 2002-02-26T01:25:05

Of course it was the basis for axkit all those years ago. I still remember the talk you gave at London.pm. Don't think people won't try and steal ideas from you too...

I'm sure you'll agree that it's probably not a good idea to shoehorn Axkit into being an IRC bot for example though ;-)

Re:What I'd like to know...

Matts on 2002-02-26T08:11:48

No, AxKit wouldn't make a good IRC bot, but AxKitNG would (the thing I've been calling AxKitB2B elsewhere).