OPML

gnat on 2002-04-15T22:38:33

OPML is Dave Winer's outline markup language. Mark Pilgrim is sub-impressed.

Aaron Straup Cope has created OTLML to address some of OPML's deficiencies. He's written some SAX toys (and again for OPML). (props to Sam Ruby for the pointers)

I still haven't worked out why outlines are da bomb. I should get off my butt and install Radio and play with outlines there, but somehow real work intervenes. I just haven't seen any description of what you can do with OPML that inspires me yet.

--Nat


OTLML sax filters

aaron of montreal on 2002-04-15T23:14:21

It is true that v 0.1 of the filter is hard-coded to munge OPML files, but the goal has always been to load a "sub" filter on the fly based on the /xref/@content-type attribute. Something along the lines of...
if (my $ctype = $data->{Attributes}->{'{}content-type'}{'Value'}) {

    my $pkg = join("::",__PACKAGE__,(split(/\//,$ctype)));

    eval "require $pkg";
    if ($@) { ... }

    my $filter = $pkg->new();
    # Do stuff here....
}
If I manage to find a pair of dastardly POD errors in Net::Blogger before the day is over, I was planning on adding the changes and uploading v 0.2 to the CPAN.

Re:OTLML sax filters

ziggy on 2002-04-15T23:50:06

That code makes me rather uneasy from a security standpoint, since you're getting the module name by parsing the content-type. What happens when the value of content-type is something you don't expect to see like application/x-malcode-worm or ../../Malcode/BadWorm (thus overriding your initial setting of __PACKAGE__).

That code may not have a new method, but the BEGIN block and import code may be sufficient to do something you don't expect.

For safety's sake, you'd be better off to check to see if $pkg is one of the modules you expect to find in your namespace before blindly require'ing it dynamically through eval. There are ways to check what's available dynamically on the system before loading that module.

This kind of technique isn't as easy to exploit as the recent SOAP::Lite server issue, but that's no reason to blindly require modules that are determined exclusively from your input.

Re:OTLML sax filters

aaron of montreal on 2002-04-16T00:00:48

Yup. Yup. And yup again.

The code I included was of the "thinking out loud" variety, but I will make sure to try and add appropriate hooks to check for bad-ness. Thanks!

Outliners

ziggy on 2002-04-16T00:05:30

I still haven't worked out why outlines are da bomb.
Reason #1: Dave has been writing outliners for 20 years. If outliners weren't da bomb, why would he have wasted so much of his life writing them (and made so much money selling them)?

Reason #2: Microsoft grokked Outliners. Look at Word. Look at their market cap and their market share. Apple didn't grok Outliners. Look at MacWrite II. Look at their market cap and their market share. Any questions? (aside from the obvious one: Why won't Apple listen to reason and common sense!)

Reason #3: Most structures are hierarchical, can be made to fit into a hierarchical format, or be gently eased into one. A list, for example, is a flat hierarchy.

Reason #4: Outlines, like XML, can encode just about everything, or come as close as any generalized data structure can to encoding everything.

I still think that lists (check lists, todo lists, contact lists, RSS summaries, etc.) are really where it's at. That's one reason why Palm Pilots have succeeded so very well, even though the stock software doesn't include an outliner. You could argue that an outline is nothing more than a hierarchical grouping of lists, but that makes the metaphor so much more complex; I like cycling through a series of lists much more than I like digging deeply into an outline. Lists also feel much more malleable and filterable than an outline does.

But that's just me. Dave probably works much better with an outline than a series of lists, and that's fine.

Re:Outliners

autarch on 2002-04-16T00:16:22

Reason #3: Most structures are hierarchical, can be made to fit into a hierarchical format, or be gently eased into one. A list, for example, is a flat hierarchy.

This is obviously it. Look at the huge success enjoyed by hierarchical databases like H-Oracle, Hibase, and MyHQL. Sure, it seemed threatened by that crazy "relational model" stuff a while back, but the hierarchical model, which maps so well onto most real world processes and data, was never really in danger.

Re:Outliners

ziggy on 2002-04-16T00:38:37

Look at the huge success enjoyed by hierarchical databases like H-Oracle, Hibase, and MyHQL.
Once you start talking about FOREIGN KEYS and entity-relationship diagrams, the hierarchical nature of data in a relational database starts to show through.

It's just that the hierarchical model wasn't really easy to describe, generalize or optimize 30 years ago. Today the story is vastly different: witness the awe and power of native XML databases like...ah...er...um...well...nevermind. :-)

Seriously though, it's a SMOP to convert between the flat and hierarchical. (Some vendors would look at their development budgets and argue on the "simple" part. :-) What really matters now is: which technique is more practical, and which technique is more natural?

Re:Outliners

autarch on 2002-04-16T04:53:02

Flat => Hierarchical is definitely not a big deal.

Hierarchical => Relational is also relatively easy.

Relational => Hierarchical is asking for a mess.

Re:Outliners

darobin on 2002-04-16T01:09:48

The part that you're not taking into account is that this is mostly for human data. Most people do not want their data to be relational (well, at least not normalized), because it would put the incentive on them to find out things. Imagine trying to look up an address for Bob Foo and getting back "42, StreetID=2573, CityID=09708, CountryID=25". It wouldn't be very useful.

Relational databases are made to be customized by people that create schemata *. Hierarchical is just one approach that happens to work well with people and well for computers with manageable data sizes. It's good for what it does, which is why those outlines have been so succesful (though I agree with Nat, I tried them and I'm deeply unimpressed because I need much more. But then, it's Dave Winer stuff after all).

As for the hierarchical model (at least within the realm of semi-predictable hierarchies such as XML) I wouldn't speak too fast nowadays. Echoes I've had from people on the XQuery WG would seem to show that computer science has progressed sufficiently in the past decades that today we can be extremely efficient with hierarchies. At least, the Oracle people seem to think that XQuery can beat SQL... and with the space XML has taken in recent times...

* hehehehe

Re:Outliners

autarch on 2002-04-16T04:51:22

I'm not concerned with the _speed_ of hierarchical databases, I'm concerned with the flexibility.

It is a fact that an hierarchical structure can be expressed in the relational model. I do not think that the reverse is true.

I fail to see the purpose of an XML database.

I defy you to come up with a useful XML database schema (whatever that would be! a DTD?) that can easily handle, for the sake of argument, information about films.

This would be quite difficult given the fact that this sort of data is in no way hierarchical!

People work as cast and/or crew on films, doing certain jobs, sometimes performing a role. Companies may distribute and/or film and/or produce a film.

Now find me all the films made between 1970 and 1974 where John Doe worked as an actor or director.

With a relational database this is _relatively_ easy to express, even using something as lame as SQL.

For the record, this would look something like this:

 
  SELECT M.title
  FROM Movie, Person AS P, Credit AS C, Job AS J
  WHERE M.year BETWEEN 1970 AND 1974
      AND M.movie_id C.movie_id
      AND C.person_id = P1.person_id
      AND P1.name = "John Doe"
      AND C.job_id = J.job_id
      AND J.job_id IN (7, 13)


Now generate the same query using XQuery.

People don't want normalized data? What are you talking about? Are you suggesting that people want error-ridden, redundant data instead? I doubt that's what they want.

And I don't see why normalized data means results like "42, StreetID=2573, CityID=09708, CountryID=25"! Are you suggesting that getting a street name from a table is particularly difficult? I don't think so.

It's up to the application writer to put a nice face on things. I don't think that for most types of data a hierarchical database will make putting a nice face on things easier.

For those few areas where the data really is hierarchical, it may be easier, given the crapulence of today's relational tools, to use an XML database. But that's not the fault of the relational model, but rather the fault of the implementations.

Re:Outliners

darobin on 2002-04-16T12:39:16

You're stubbornly refusing to separate the users -- those that made outlines succesful -- from the developers -- which may or may not like the relational model, but either way know how to make it work (something which users don't. And as it is, they're the ones creating the data repositories).

Keep in mind that the thing that makes outlines tick is that they are for people that don't know about data models. Those people don't care about normalized data (at least, if it means they have to learn something complex they certainly don't). That's what makes outlines succesful, that's the first question I was answering, and I fail to see where you make a strong case against it.

Secondly, you fail to see that XML is not a hierarchical storage model. It is at the simple single vanilla document level, but otherwise it's not. Throw in a DTD with ID/IDREF(S) and you've already got a cyclic structure. XQuery goes much farther and works on forests. Yes, quite obviously, there are places that are hierarchichal. And there can also be many places that are not. That's what XML and XQuery are about.

It's just as flexible -- if not more -- than a relational model. And it would seem that graph/forest walking these days, is also believed to be more efficient/optimizable than relational access is.

Re:Outliners

autarch on 2002-04-16T15:15:59

XML is hierarchical. You clearly do not know what that model is, however.

XML with ids is the hierarchical model. In the hierarchical model ids are called pointers. That's what the hierarchical model is all about!

But I'll let the experts speak a bit now because there's no way I can these things better than they can.

Chris Date talks a bit about pointers in that one.

And a whole bunch of articles by Fabian Pascal: 1, 2, 3, 4, 5, and 6.

Re:Outliners

darobin on 2002-04-16T16:57:32

IDs are just one single, simple case which I used for demonstrative purposes. If you want something a lot more potent, you can go for XPointers. Those can do a lot more.

I do know what the hierarchical model is. I just don't see the point in spitting on something because it is hierarchical or looks like something hierarchical (although there are cases taken into account by the XML model that don't map well, if at all, to hierarchical modelling). XQuery is showing very serious promises, and that's reporting information from people that would be considered relational purists (or just purely relational, should they prefer that). New alleys open up, and different situations occur. Just because (to speak loosely, as you seem intent on jumping on any detail) OO didn't work all that well in DBs doesn't mean that it doesn't work well in application development. When it comes to docbase forests, I can see XML solutions working very well.

As for the articles, the first sounds more like a long rant than like a technical article. The following ones are more interesting, but he tends to attack the problem in a way in which I wouldn't. I wouldn't try to use XML as a database, at least not for the kind of thing that I use databases for today, no more than I'd be likely to use XML Schema typing and the such. I agree with him that that's probably a wrong approach, or at least a potentially suboptimal one (I tend to speak cautiously here, because I've seen some surprising results).

However for the kind of document base that I have in mind, where family is more important than typing, where metadata tends to apply recursively and in transversal ways, where documents and data intermesh all over the map, I would definitely think XML. If only because it takes into full account the semi-structured nature of its content while still remaining consistent.

Re:Outliners

autarch on 2002-04-16T17:42:31

I was only spitting on the hierarchical model as a general-purpose solution for data storage.

Of course, since the relational model works perfectly well for hierarchical data, as well as other types of data, I don't see a reason to embrace special-purpose hierarchical solutions.

What is "semi-structured"? Can you define it? Sounds like a buzzword with little meaning.

How can data be semi-structured? Either it has meaning (structure) or it doesn't.

Perhaps you mean a complex data type? In which case, I'd agree with you that today's SQL-based DBMS's do not handle them very well, if at all.

But as Pascal points out repeatedly, SQL DBMS's are not really relational, so to criticize the relational model on the basis of the failures of SQL DBMS's is foolish.

Re:Outliners

darobin on 2002-04-16T18:08:16

I was only spitting on the hierarchical model as a general-purpose solution for data storage.

Ah, in that case we agree (though it's a point on which I'm ready to be proven wrong).

Of course, since the relational model works perfectly well for hierarchical data, as well as other types of data, I don't see a reason to embrace special-purpose hierarchical solutions.

Would you be happier entering your use.perl comments using something that would map to a relation base, or using (X|HT)ML? It's a silly question/example, but in some cases just because you can use an approach doesn't mean that another isn't simpler.

What is "semi-structured"? Can you define it? Sounds like a buzzword with little meaning.

I've never seen it used as a buzzword, and I pretty much doubt that it has been. It's normally used in one of two senses (I meant mainly the latter in this case, though in fact they amount to the same): 1) either something that has some (implied: computer-readable) structure in a part, and not in another. An example of this is email: it has structured headers, and unstructured (from the computer's pov) text; 2) something which has parcellar structure for what it is that reads it.

You bring up (twice) the word meaning, and it is precisely what I'm concerned with here (I'll avoid the s-word of course ;). In a docbase, for a given application looking inside, some things may have meaning and others may not (eg metadata vs content , multiplied by the variety of content and metadata). It is even possible that an understood fragment would contain something that isn't understandable for the given app, but that is not a problem. A totally different app may use that to know that something of importance to it is attached to that larger block, of which it understands little but which it is nevertheless able to report on. In XML, this would most naturally be implemented using namespaces.

So, in other words, it is about data types that leave the door open to free-form addition and inclusion, so long as you can guarantee that no one will get confused (and you can). It's a powerful way of representing a fair number of things, especially when dealing with a great variety of data (produced by humans) as it is the case with content management and document bases.

I haven't mulled on how that maps to complex data types, but it could very well map indeed.

SQL DBMS's are not really relational, so to criticize the relational model on the basis of the failures of SQL DBMS's is foolish.

If my opinion of the relational model was the same as my opinion of SQL DBMSs, I wouldn't even bother discussing it ;)

Re:Outliners

ziggy on 2002-04-16T14:14:50

It is a fact that an hierarchical structure can be expressed in the relational model. I do not think that the reverse is true.
Cough

Re:Outliners

autarch on 2002-04-16T14:32:47

No, that expresses the result of a query as XML, which is not at all the same thing.

I suppose you could come up with a query that joined all your tables together and then dump it as XML. That'll be real useful, no doubt.

But it's true, I shouldn't have said that the hierarchical model cannot handle relational data. I should have said that it cannot handle most relational data in any useful and meaningful way.

When, Oh When ...

pudge on 2002-04-16T15:53:55

... will we start looking for problems in need of solutions?