For a couple of years I have been thinking about writing a little language to script web transactions. I frequently write specialized web agents to perform various tasks, and the Perl code looks mostly the same. My little language would simply describe an equivalent Perl program (i.e. just specify the steps), and something else would interpret it.
A little language script should be very short, terse, and cover most of the things that I need to do, and I do not need to support everything in the world. I translated a 50 line Perl program to this short script, with values
AGENT Mozilla 4.5
COOKIES on
REDIRECT auto
TIMEOUT 20
PARAM foo bar
PARAM bar baz
CONTENT_TYPE application/x-www-form-urlencoded
URL http://www.example.com/script.cgi
POST
DUMP
SAVE script_output.html
CLEAR
CONTENT_TYPE application/x-multipart-mixed
URL http://www.example.com/script2.cgi
MESSAGE_BODY
POST
Some directives set global conditions.
- AGENT -- the user agent string to use
- COOKIES -- "on" or "off"
- REDIRECT -- follow "Location" or "Refresh" headers
- TIMEOUT -- give up on a page
- STICKY -- CGI parameters are sticky
- VERBOSE -- show a lot of output
Other directives do something with the current state. There
is no stack or queue or anything like that---just virtual registers.
- PARAM name value -- set a CGI param, which is sticky and multi-valued
- CONTENT_TYPE type -- the MIME type for requests with a message body
- URL -- the current URL to work with (the previous URL becomes the REFERER by default)
- REFERER -- the string to send with the Referer header, which is the previous value of URL by default
- DUMP -- pretty print the state to the screen
- SAVE [file] -- save the last response to a file
- SAVE_STATE [file] -- save the state to a file
- LOAD_STATE [file] -- load the state from a file
- MESSAGE_BODY [file] -- get the message body from a file
Directives with the names of HTTP methods make a request.
The program to intrepret this little language is simple. It reads the source line by line and uses the directive name to call a subroutine. The directive names (in all uppercase) are the keys in a hash. The hash values are an anonymous array which describes the directive and includes a subroutine reference that the program executes when it encounters that directive.
print "Waiting for commands on standard input\n$prompt" unless @ARGV;
while( <> )
{
chomp;
my( $directive, $string ) = split /\s+/, $_, 2;
$directive = uc $directive;
my @arguments = Text::ParseWords::quotewords( '\s+', 0, $string );
eval {
die "Undefined subroutine" unless exists $HASH{$directive};
$HASH{$directive}[1](@arguments);
};
warn "Not a valid directive: [$directive] at $ARGV line $.\n"
if $@ =~ m/Undefined subroutine/;
print "$prompt" if $ARGV eq '-';
}
Since everything the language can do is in a hash, extending the language is easy---just add to the hash. Similarly, I can change what something does by changing the hash.
As I was writing this today, I realized that I could abstract this design so that the way the interpreter works and the description of the language would not be coupled. Before I let anyone else see the rest of the code, I need to tear apart the proof-of-concept script I wrote and devise some simple scheme to describe the language. However, I want to do something else tonight---anything else.