Kontent Days 3-4: Storage

brentdax on 2005-07-06T06:07:54

Yesterday I started on the basic SQL store, WWW::Kontent::Store::NarrowDBI, and today I finished the reading component of it. Writing will be somewhat harder, because I have three requirements:

In the absence of transactions, a write error must not render a page unreadable; if a page is unwritable, it must be trivial to repair.
In the presence of transactions, a write error must not render a page unreadable or unwritable.
It has to work across most database engines; it can't use engine-specific features, or standard features which aren't widely supported.

One helpful point is that I don't actually want concurrent edits to work; I'd rather have the edit that starts second fail. If Alice and Bob are both revising the same page at the same time, and Alice saves first, Bob should have to examine Alice's edits instead of blindly reverting them.

The name of the store might strike people as a bit odd. NarrowDBI is "narrow" because the attrs table with all of the actual data has only three columns—revid, name and value. (Alternatives I've imagined are "wide", which has a table with a column for each possible attribute, and "deep", which uses several different tables.)

Basically, each revision has a bunch of attributes, like "content" and "title" and so on. Precisely which attributes a revision has will depend on its class (annoyingly, one of its attributes); the three SQL stores reflect three different ways to organize the attributes.

To save time, I'll only be implementing NarrowDBI for the Summer of Code project. I chose this store for three reasons:

It's the simplest conceptually, given the way the Revision class is defined.
It's the easiest to implement.
It's the only one that allows attribute names to change without altering the tables.

That last point is important, because they are changing; for example, I realized about half an hour ago that DeepDBI—and general good design—will require some way to separate different classes' attributes, so I'll be adding prefixes to indicate which part of the system owns the attribute in question. (For example, the revision-wide attributes that were previously called "author" and "date" will now be "kontent:author" and "kontent:date"—or maybe "rev:author" and "rev:date", I'll need to think about it.)

Tomorrow, the Revision::revise method, the Page::create method, and the Draft class to go with them.

sensitive handling of concurrent edit attempts

wickline on 2005-07-13T01:01:25

Please ignore if you've already moved on past this issue. If you've still got this one on the table, maybe the following notes will be of some use...

If CVS (or whatever) is an option, you can check for conflicts and do a blind merge if no conflicts arrise. If conflicts would occur, you can echo them back to the user and let them resolve before commit.

That approach may not work for one or more reasons. CVS (or whatever) may not be a prerequisite you want to have for Kontent installation. Merge skills may not be a prerequisite you want to have for content authors using Kontent-driven sites. Even if you could do the above, blind merges may not be appropriate for certain types of content. If Alice adds a parenthetical reference to "marsupial reproduction" early on in the content of the "about Australia" document, and commits that change just prior to Bob committing the change to remove that section in the interests of brevity, then the blind merge will result in an orphaned reference.

In the past I'd had to deal with a simliar circumstance. Multiple medical staff were expected to be editing one on-line form. Certain nurses tended to edit these areas, the pharmacy tended to edit those areas, the attending physician tended to edit other areas, but everyone needed to read and edit all parts of the form. Usage was generally expected to be concurrent, and concurrent edits needed to never result in blind changes to the form.

Each field was saved with field content, date of last modification, and user who last modified the field. Clients got the field content and a token which represented the last modification date, user who modified, and content (hashed with a server-side key to prevent tampering).

When the server got an edit request, it would first verify that the user is requesting to edit the content that is currently in the database. If that was true, then it would apply the edit. If that was not true, it would send a note back to the user showing their (unapplied) revised content next to the current content in the database along with a note about who made the most recent change and when.

The user could then edit the text again to incorporate their changes into the updated copy from the database (using the copy of their failed update for reference and/or copy/paste). If they had questions about conflicting intentions, then they knew who to talk to and could have them paged if they weren't nearby to ask.

This approach seemed to work for that application. In this case, every field on the form was small. The largest fields were perhaps a small paragraph in length. For larger data, it might be akward to get your whole failed edit right next to the updated original, but perhaps a diff-like representation of their change could be shown instead.

-matt