Little languages

brian_d_foy on 2003-01-19T23:10:32

For a couple of years I have been thinking about writing a little language to script web transactions. I frequently write specialized web agents to perform various tasks, and the Perl code looks mostly the same. My little language would simply describe an equivalent Perl program (i.e. just specify the steps), and something else would interpret it.

A little language script should be very short, terse, and cover most of the things that I need to do, and I do not need to support everything in the world. I translated a 50 line Perl program to this short script, with values

AGENT Mozilla 4.5
COOKIES on
REDIRECT auto
TIMEOUT 20

PARAM foo bar PARAM bar baz CONTENT_TYPE application/x-www-form-urlencoded URL http://www.example.com/script.cgi POST

DUMP SAVE script_output.html

CLEAR

CONTENT_TYPE application/x-multipart-mixed URL http://www.example.com/script2.cgi MESSAGE_BODY POST


Some directives set global conditions.

  • AGENT -- the user agent string to use
  • COOKIES -- "on" or "off"
  • REDIRECT -- follow "Location" or "Refresh" headers
  • TIMEOUT -- give up on a page
  • STICKY -- CGI parameters are sticky
  • VERBOSE -- show a lot of output


Other directives do something with the current state. There is no stack or queue or anything like that---just virtual registers.

  • PARAM name value -- set a CGI param, which is sticky and multi-valued
  • CONTENT_TYPE type -- the MIME type for requests with a message body
  • URL -- the current URL to work with (the previous URL becomes the REFERER by default)
  • REFERER -- the string to send with the Referer header, which is the previous value of URL by default
  • DUMP -- pretty print the state to the screen
  • SAVE [file] -- save the last response to a file
  • SAVE_STATE [file] -- save the state to a file
  • LOAD_STATE [file] -- load the state from a file
  • MESSAGE_BODY [file] -- get the message body from a file


Directives with the names of HTTP methods make a request.

  • GET
  • POST
  • HEAD


The program to intrepret this little language is simple. It reads the source line by line and uses the directive name to call a subroutine. The directive names (in all uppercase) are the keys in a hash. The hash values are an anonymous array which describes the directive and includes a subroutine reference that the program executes when it encounters that directive.

print "Waiting for commands on standard input\n$prompt" unless @ARGV;

while( <> ) { chomp; my( $directive, $string ) = split /\s+/, $_, 2; $directive = uc $directive; my @arguments = Text::ParseWords::quotewords( '\s+', 0, $string ); eval { die "Undefined subroutine" unless exists $HASH{$directive}; $HASH{$directive}[1](@arguments); }; warn "Not a valid directive: [$directive] at $ARGV line $.\n" if $@ =~ m/Undefined subroutine/; print "$prompt" if $ARGV eq '-'; }


Since everything the language can do is in a hash, extending the language is easy---just add to the hash. Similarly, I can change what something does by changing the hash.

As I was writing this today, I realized that I could abstract this design so that the way the interpreter works and the description of the language would not be coupled. Before I let anyone else see the rest of the code, I need to tear apart the proof-of-concept script I wrote and devise some simple scheme to describe the language. However, I want to do something else tonight---anything else.