Lore #3: Types of Lore

Mark Leighton Fisher on 2006-11-03T17:13:10

Facts

Much of lore is simple facts. For example, in VB6, it is lore that you must test both BOF and EOF to ensure that you have an empty recordset (the set of records retrieved from a query). The VB6 documentation would lead you to believe that on a freshly-retrieved recordset, you only need to test EOF to find out whether the recordset is empty. What helps to make this fact a piece of lore is that this works *most of the time*. However, there are times when testing only EOF will lead your code to try and work with non-existent records in the recordset. Another example (for my taste) is the Perl5 lvalue sub, which from the docs you would think that it is a dandy way to implement data validation. Not so, my friend – you cannot examine an assignment value on the way in, nor can you verify the return value on the way out. (This is fixed in Perl6.)

Undocumented program feature facts are a rich source of lore, so much so that graphical features have their own nickname – Easter Eggs. However, I suspect that non-graphical hidden program features, like secret administrator passwords, cryptic database schemas, and clandestine command-line switches, form the bulk of undocumented program features.

Another origin of more lore-facts is performance issues, like BLOBs and DB fragmentation. If I recall correctly, it has been more than 10 years since database vendors added support for BLOBs to relational databases, but relational database BLOBs still suffer from performance problems. The tingling of my architect sense tells me to look at what is different about BLOBs compared to other DB datatypes. The difference is that BLOBs do not fit on a database page, while other datatypes do fit on a page. Databases are precisely tuned for high speed via sophisticated page-caching mechanisms. Many table schemas lead to entire rows that will fit within one page. BLOBs break this rule almost by definition, as a BLOB is expected to span multiple database pages. This makes a BLOB much more like a file in a filesystem than a field in a database row, which may account for the number of systems that I have seen out in the field where the BLOB-like data is held separately in a filesystem, with some kind of link to the BLOB data as a field in the database. From what I have seen (or failed to see), reading the DB vendor documentation would lead you to think that you should always use in-DB BLOBs for large pieces of data, while developers seem to continue placing their BLOB data outside of the DB with only a link to the BLOB data inside the DB. This is a prime example of lore (IMHO).

Processes
Processes that are lore usually seem to be (IMHO) processes inside organizations, as the immediate requirements of business often appear to preclude getting the processes written down, even the processes that the business depends upon. One advantage of ISO 9001 certification is that, by definition, it requires you to write down the steps in your business processes. The steps don't necessarily have to make sense (a weakness of ISO 9001 used by itself), but writing the steps down is a start towards understanding and possibly improving the processes.

Documents
Undocumented documents? What are *those*? Again, documents that are lore are usually seen inside an organization. Documents that are lore are often what people have written down, but not seen fit to put under version control or even back up. (Source code that is not under version control or even backed up is in the same ZIP/postal code as lore-documents.) Whether the documents are test results, descriptions of build steps, development environment configurations, et.al., if the document is at all important, it should be under version control and backed up, else it is at most one hard drive crash away from being lore.

Limited-Realm Documents
Limited-realm documents are a special type of lore-documents, as limited-realm documents should be shared across an organization but aren't. One personal example (of whom I won't name the organization) is that of separate IT and Engineering Software groups, which should have been sharing all of their processes, design guidelines, post-mortems, etc. but instead it was as if IT and Engineering Software groups existed in two separate companies that shared no information between them. Again IMHO, this is wasteful and counterproductive.