Little languages

brian_d_foy on 2003-01-19T23:10:32

For a couple of years I have been thinking about writing a little language to script web transactions. I frequently write specialized web agents to perform various tasks, and the Perl code looks mostly the same. My little language would simply describe an equivalent Perl program (i.e. just specify the steps), and something else would interpret it.

A little language script should be very short, terse, and cover most of the things that I need to do, and I do not need to support everything in the world. I translated a 50 line Perl program to this short script, with values

AGENT Mozilla 4.5
COOKIES on
REDIRECT auto
TIMEOUT 20

PARAM foo bar
PARAM bar baz
CONTENT_TYPE application/x-www-form-urlencoded
URL http://www.example.com/script.cgi
POST 

DUMP
SAVE script_output.html

CLEAR

CONTENT_TYPE application/x-multipart-mixed
URL http://www.example.com/script2.cgi
MESSAGE_BODY 
POST

Some directives set global conditions.

AGENT -- the user agent string to use
COOKIES -- "on" or "off"
REDIRECT -- follow "Location" or "Refresh" headers
TIMEOUT -- give up on a page
STICKY -- CGI parameters are sticky
VERBOSE -- show a lot of output

Other directives do something with the current state. There is no stack or queue or anything like that---just virtual registers.

PARAM name value -- set a CGI param, which is sticky and multi-valued
CONTENT_TYPE type -- the MIME type for requests with a message body
URL -- the current URL to work with (the previous URL becomes the REFERER by default)
REFERER -- the string to send with the Referer header, which is the previous value of URL by default
DUMP -- pretty print the state to the screen
SAVE [file] -- save the last response to a file
SAVE_STATE [file] -- save the state to a file
LOAD_STATE [file] -- load the state from a file
MESSAGE_BODY [file] -- get the message body from a file

Directives with the names of HTTP methods make a request.

GET
POST
HEAD

The program to intrepret this little language is simple. It reads the source line by line and uses the directive name to call a subroutine. The directive names (in all uppercase) are the keys in a hash. The hash values are an anonymous array which describes the directive and includes a subroutine reference that the program executes when it encounters that directive.

print "Waiting for commands on standard input\n$prompt" unless @ARGV;

while( <> )
	{
	chomp;
	my( $directive, $string ) = split /\s+/, $_, 2;
	
	$directive = uc $directive;
	
	my @arguments = Text::ParseWords::quotewords( '\s+', 0, $string );
	
	eval {
		die "Undefined subroutine" unless exists $HASH{$directive};
		$HASH{$directive}[1](@arguments);
		};
		
	warn "Not a valid directive: [$directive] at $ARGV line $.\n"
		if $@ =~ m/Undefined subroutine/;
		
	print "$prompt" if $ARGV eq '-';
	}

Since everything the language can do is in a hash, extending the language is easy---just add to the hash. Similarly, I can change what something does by changing the hash.

As I was writing this today, I realized that I could abstract this design so that the way the interpreter works and the description of the language would not be coupled. Before I let anyone else see the rest of the code, I need to tear apart the proof-of-concept script I wrote and devise some simple scheme to describe the language. However, I want to do something else tonight---anything else.