Static and Dynamic Types: A Definition

Ovid on 2008-07-29T13:20:51

Note that the following was hastily written on my lunch break. It doubtless contains some serious errors. Feel free to correct any of my misconceptions.

I've read a fascinating online article about type systems and it's worth reading a few times. While I'm a far cry from an expert in type theory, I'm quite impressed by the article. The author is clearly well-versed in this topic and has clarified quite a bit about the debate, though I'm disappointed that references are generally not cited (though examples abound).

First, here's the author's definition of static and dynamic types:

A static type system is a mechanism by which a compiler examines source code and assigns labels (called "types") to pieces of the syntax, and then uses them to infer something about the program's behavior. A dynamic type system is a mechanism by which a compiler generates code to keep track of the sort of data (coincidentally, also called its "type") used by the program.

In other words, static typing is not about the kinds of data we have, but about the behavior of a system. Of course, this is often tightly related to the kinds of data. For example, asking for the factorial of an IP address doesn't make a lot of sense and for a proper program, won't be allowed in a static system, but it's trivial to generate a failure like this in a dynamic system. Of course, merely because we have safety (often compile-time) about certain failure modes, this does not guarantee the correctness of the program, but certain kinds of bugs are less common.

Dynamic typing, on the other hand, is primary concerned with tracking the kinds of data used and behavior is up to the programmer. It's not about the syntax and the kinds of data generally cannot be statically inferred. This has an interesting consequence that your type system cannot reason up front about the behavior of any given piece of code. For example, consider this pseudo-code:

total = price + tax;

If $tax is the string "fifteen percent", the above code will not perform as expected in a dynamically typed system but the type system has no way of knowing that this condition will not occur and thus cannot warn you about this failure. However, depending about how the dynamic typing was implemented, that code might continue to run or it might crash.

As a more insidious example, imagine trying to do type inferencing in Perl. Does that array reference contain only integers? You can toss a hashref in there and things will crash badly, but an inferencing system probably can't know this will happen. As a result (and as the author points out), testing tends to be more widely adopted in dynamic languages. Many types of tests are useless in solid static languages, but merely because there is some behavioral safety does not guarantee the correctness of the programs.

Moving on, consider three kinds of programs:

  1. Software is correct and runs without failure.
  2. Software is incorrect and crashes.
  3. Software is incorrect and doesn't crash.

Clearly correct software is preferable, but incorrect software which crashes is far preferable to incorrect software which doesn't crash. This perfectly exemplifies a terrible failure mode in SQL:

SELECT name
FROM   customer
WHERE  age > id

Not only is that completely useless, but it's likely to not crash. Thus, we have reasoning about type systems forced onto the developer when it should be handled by the type system itself. What we want to do is define the kinds of data we handle, the operations allowed with them and the results of those operations. This could allow the database to properly reason about a statement, something they frequently don't do. Thus, since SQL without user-defined types is more concerned about the kinds of data rather than the uses of the data, we can consider this is a very poorly implemented type system (but I've not had enough experience with user-defined types in SQL to know how solid they are).

Another interesting distinction was structural versus nominal typing. In structural typing, we assume a type whenever it is safe to do this (e.g., does it provide what our code needs?). Nominal typing requires this to be explicit. This appears to be very similar to one distinction between mixins and roles (traits). The distinction in question is that a role is named and mixins merely shove methods into your class. As a result, mixins rely on duck typing and hope that the "bark" method really refers to a dog and not a tree. With traits, you can explicitly check the name to have a better sense of safety that the method in question does what you want (it's not perfect, but it's better than mixins). Note that this makes traits similar to static typing, but it's still pushed on the developer, but Perl 6 might alleviate some of that.

Another interesting bit is the discussion of testing. Testing, as we know, doesn't prove the absence of bugs. It merely asserts that under a given set of conditions, certain behaviors hold.

A proper type system, however, can prove that certain types of bugs don't exist (but a given proof can merely show that the conclusion follows from the premises, not that the conclusion is desirable or that the premises are correct). Mark Jason Dominus once pointed out how a type system caught an infinite loop in his program. This is not a well-known feature of good type systems, but it's not unexpected. However, the author goes on to cite examples of type systems used to prove databases are normalized or that a program contains no "index out of bounds" exceptions on arrays, things which most programmers generally don't consider to be part of type systems.

At its core, the article generally defines a type system as a set of syntactic rules which verifies the absence of certain behaviors in a program (meditate about taint checking for a while). The complexity really comes in when we acknowledge that the needed "absent behaviors" vary greatly from program to program. Needless to say, dynamically typed languages leave much of this up to the programmer.

This paper has taught me a lot and reinforced many of my suspicions about type systems. What I think I ultimately want is a sound, implicitly typed language with nominal types, but what I want even more is a paycheck, so I'll stick with Perl for the time being.

It's also worth noting that taint checking and pragrams such as fatal and strict also alter how Perl's type system behaves. Interesting food for thought.


Missing One

chromatic on 2008-07-29T17:11:18

Perl's static type system cares more about containers than values. Ponder that for a while, if you really want to confuse the debate.

Re:Missing One

Ovid on 2008-07-29T17:38:23

That's actually a point explicitly raised in the article. Dynamic systems focus tracking what kinds of data are used in a system, not on syntactic evaluation of expressions. See the "Problems with Common Definitions" section which begins with this:

One common saying, quoted often in an attempt to reconcile static and dynamic typing, goes something like this: Statically typed languages assign types to variables, while dynamically typed languages assign types to values. Of course, this doesn't actually define types, but it is already clearly and obviously wrong. One could fix it, to some extent, by saying "statically typed languages assign types to expressions, ..." Even so, the implication that these types are fundamentally the same thing as the dynamic version is quite misleading.

Of course, that's only part of the thesis he presents, not his arguments in favor of it. I highly recommend reading them.

Re:Missing One

chromatic on 2008-07-29T19:58:16

That's not quite the same. Perl 5's built-in static types are scalar, array, and hash (and Maybe reference, if you pardon the Haskellish pun). You do get compile-time checks on using them appropriately.

Re:Missing One

Ovid on 2008-07-29T20:43:30

The important point is that Perl's type system is still primarily dynamic in nature, though as the article points out, static languages all have dynamic elements and vice versa. Perl's compiler is incapable of arbitrary reasoning about the adding two scalars together. They might be overloaded. They might be overloaded at runtime. They might have strings, they might have numbers, etc.

Perl's extremely limited type system in fact, must be explicitly requested:

#!/usr/bin/perl

print "foo ($test[1]) bar";
__END__
foo () bar

If if you do ask for extra compile time checks, they're easy to subvert:

#!/usr/bin/perl

use strict;
my $test = {};
print "foo ($test->{foo}[1]{baz}) bar";
__END__
foo () bar

What? I don't have a hashref of arrayref of hashrefs! Or we can get the less common but more obscure error messages:

<ecode>#!/usr/bin/perl

use strict;
my $test = {};
print "foo ($test->{foo}[bar]{baz}) bar";
__END__
Bareword "bar" not allowed while "strict subs" in use at test.pl line 5.

I know why that error is thrown, but that's only because I now know how Perl parses that.

Another example of having good type systems is exemplified in Tom Moertel's article on using types to prevent injection attacks. With Perl 6's grammars, we'll be able to achieve similar goals, but Perl 5 just ain't there because while Perl 5 can do the same thing, it cannot do the same thing easily because static typing is heavily dependent on static evaluation of expressions. Many of my favorite Perl 5 modules are ones I can't use in production because the best they can do is fake things like this with source filters.

In short, as the article points out, static typing is really based on having extra (and potentially mutable) domain-specific information embedded directly into the syntax of a language (and I'm not referring to explicit typing. That's a holdover from days when compilers needed that). Perl 5 doesn't have the arbitrary flexibility to do that. Perl 6 will. I don't think this means that Perl 6 will necessarily be static, but we have the potential.

Re:Missing One

btilly on 2008-07-30T00:12:23

While those three are what most people are aware of using, Perl 5 has more static types than that. For example typeglobs. Perl 5 also has a parallel static type system in its attribute system.

That said, the article did not draw the usual simplistic divisions. And made a point of saying that many languages have both static and dynamic systems, but usually one is more fully developed. Meaning that there are classes of problems that can be solved either through a dynamic type system or a static type system, so once you've solved it one way in a language there is little point in solving it the other way as well.

This seems to be an accurate description of Perl 5. Perl has both static and dynamic typing. In fact it has at least 2 static type systems, and 2 dynamic type systems! (The second being the taint mechanism.) However the complex things that you'd want from a type system are mostly done with dynamic typing in Perl.

It will be interesting to see how Perl 6 turns out. My bet is that people will choose to use the dynamic typing heavily, and the static typing features will be more of an afterthought.

Re:Missing One

Aristotle on 2008-07-30T04:20:47

Inside interface boundaries, quite likely, but I suspect that the use of static types will be quite common for published APIs at least in certain kinds of domains. At the point where client code might lob parameters at you and you can’t be certain that you’ll be called correctly, you need checks on the values anyway, and it’s much easier to do that by declaring signatures than by hand-rolling validation code (or even writing validation profiles for Params::Validate or whatever). Even so I expect to see few people go whole hog and take a Haskell-level gonzo approach to types – though some will.

It’ll be immensely interesting to watch how C6AN plays out.

Re:Missing One

Ovid on 2008-07-30T05:20:27

However the complex things that you'd want from a type system are mostly done with dynamic typing in Perl.

Wouldn't it be more fair to that that the things we think we want from a type system are done with dynamic typing in Perl? Since, as the author points out, you can capture and verify domain specific knowledge via a static type system (see the paper on using static types in Haskell to verify that a database is normalized), it appears that what people are typically doing with static types is rather generic, but their advanced expressiveness (such as catching infinite loops) is not possible with Perl 5, Python, Ruby, etc.

Re:Missing One

btilly on 2008-07-30T13:05:41

I didn't say all, I said mostly.

It is true that an advanced static typing system can do things you simply can't do within Perl 5. On the other hand duck typing allows things that can't be easily done within a static system. Think Test::MockObject and Test::MockObject::Extends.

However the majority of the day to day utility that most people want from a type system can be delivered either way. It is possible that if more people knew what was possible then our wants could change and that would no longer be true. But I happen to personally doubt it.

For example take the normalized database logic done in the type system. I like the idea of normalized databases as much as the next person. But my job is handling reporting. For reporting purposes it is standard to build deliberately denormalized aggregate tables for faster querying. No matter how clever it may be, I do not want to fight a type system that has decided to give me unwanted and unnecessary guarantees on how well normalized my deliberately denormalized data is.

Besides, I know SQL pretty well, and I regularly use advanced SQL features that most people don't. (You don't need to know that much to do more than most people. It is amazing what you can accomplish in Postgres by creating a series of temp tables and liberally using CASE statements where appropriate.) As a result the kind of query generators that I suspect this Haskell hack requires (I would read the paper except it looks like it is in Spanish) tend to get in my way. Besides even if Haskell's type system doesn't allow it, SQL allows casting, and casting is sometimes very important to do.

Re:Missing One

Ovid on 2008-07-30T15:07:28

Actually, the paper appears to start with Portuguese for the introduction, but the paper itself is in English. That threw me off, too. Aside from that, your points are well taken.