SQLite and typelessness

iburrell on 2003-05-06T03:38:59

I have been looking at the SQLite database and the DBD::SQLite module for accessing. SQLite is a database with some nice features: embeddable library, small and fast, single file, transactions, and pretty complete SQL92 support. This makes it real useful for places where a full database like PostgreSQL is overkill but you want DBI and SQL and the fun of relations.

It has one big problem: it is typeless. You would think that after programming in Perl for a long time, I would like typelessness. In a dynamic language, it works pretty well. In a database, it is pure evil. The reasons have nothing to do with performance or encoding. Or the historical usage of limited sizes and types in databases. The problem is with the logical definition of types, domains and operators.

Types have a domain, the set of allowed values. Boolean has a simple domain of two values, true or false. There ares the sets of positive integers, rational numbers, and irrational numbers. All the strings shorter than 2 GB. The domain is completely separate from the encoding. The integer 3 can be represented in many different ways, binary, octal, decimal, but all logically the same.

Tying is important for databases because it restricts the values that can be entered in a column. If I define a column to hold a price, I make it numeric. It shouldn't be able to hold any other values. I don't want anyone to insert their dog's name or their manifesto for world peace into that column. I expect it to always be a number that can be added or subtracted. I think strong typing is more important for databases because they are about rigidly specifying the permanent storage of data. Perl variables are fleeting, databases are permanent and logical.

Types also defined a set of operators that can operate on values of the type. In Perl, the numeric and string operators are distinct, > and gt, == and eq. This is because the operations are different. Comparing integers is different from comparing the same integers as strings: 10000 > 1001 but "10000" lt "1001". Adding two numbers is different than concatenating two strings even though some languages use the same operator.

SQL doesn't have the distinction of separate operators; it uses types to figure out which operator to use. The way SQLite tries to figure out which operator to use is unintuitive. It tries to see if it can parse the values as a number and use the numeric comparison, otherwise is uses string compare. Instead, it should use the type of the column. I found a web site talking about SQLite that got confused about how it works. This is the kind of confusion that makes bugs. If I am sorting part numbers, I want them compared as strings even though some of them happen to look like numbers. Similarly, if I am adding a column of integers, I don't want any fractional values or Fido to mess up the results. I want to be able to add the interval of '1 day' to a date and get a proper date as the result.

A big advantage of databases is that there are many different ways to access the data. It is possible (and wise) to check all values in the Perl code. But programmer's make mistakes and any code that doesn't could corrupt the database. This is especially bad since everyone expects SQL and databases to be strongly typed and writes SQL code that will break otherwise.

I think it is great that SQLite wants to free us from the encoding of values and size limits. Why should a varchar be limited to 255 or 8000 characters? Well, when it is a credit card number with at most 16 digits. Also, SQLite has some

A good comparison to SQLite is PostgreSQL. PostgreSQL is an object-relational database with a rich type system. It has extensible and user-defined types including writing operators and functions in multiple languages including Perl. It has a standard syntax from SQL99 for specifying type conversions.