Data::AsObject Released - Data Structures Made Easy

pshangov on 2009-08-11T14:36:25

Cross-posted from http://mechanicalrevolution.com.

Perl is notorious for its punctuation-ridden syntax, and if there is one place where this is manifested most obviously, it is when working with data structures. While I myself can see the beauty behind the line noise and have nothing against the syntax per se, it sometimes feels there are just too many characters to type. In particular, I have recently had to do a lot of work with XML data represented by perl hashes, via XML::TreePP and XML::Compile. Working with the data structures generated by these modules can quickly become pretty painful.

Enter Data::AsObject. It allows you to work with hash and array references as if they were objects. For example, I often have to process XLIFF files, which are used in the translation industry. Using XML::Compile, I can get my XLIFF files serialized into a hash and use it as follows (you don't need to know the details of the XLIFF format to see the point of the example):

$xliff holds the serialized xml
# get the source language of the first file
my $source_lang = $xliff->{'seq_any'}->[0]->{'file'}->{'source-language'};

# get all the translation units in the first file
my @trans_units = @{ $xliff->{'seq_any'}->[0]->{'file'}->{'body'}->{'cho_group'}->[0]->{'trans-unit'} };

# for each translation unit, add an alternative translation with a source and a target
foreach my $tu (@trans_units) { 
    my @matches = get_matches($source->textContent);

    my $id = 0;
    foreach my $match (@matches) {
        $tu->{'cho_context-group'}->[$id]->{'alt-trans'}->{'source'}->{'_'} = $match->source;
        $tu->{'cho_context-group'}->[$id]->{'alt-trans'}->{'target'}->{'_'} = $match->target;
        $id++;
    }
}

The same example with Data::AsObject (for this to work, hooks need to be added to XML::Compile to automatically convert “source-language”, “trans-unit” and other elements with dashes to “source_language”, “trans_unit” etc.):

# Data::AsObject::dao converts a hashref or an arrayref to a 
# Data::AsObject::Hash or a Data::AsObject::Array object
dao $xliff;

my $source_lang = $xliff->seq_any(0)->file->source_language;
my @trans_units = $xliff->seq_any(0)->file->body->cho_group(0)->trans_unit;

foreach my $tu (@trans_units) { 
    my @matches = get_matches($source->textContent);

    my $id = 0;     
    foreach my $match (@matches) {
        $tu->cho_context_group($id)->alt_trans->source->{'_'} = $match->source;
        $tu->cho_context_group($id)->alt_trans->target->{'_'} = $match->target;
        $id++;
    }
}


This an almost real life example and you can easily see what benefits in terms of readability Data::AsObject provides. Of course there are many caveats, the primary one being that you need to be able to control your input and guarantee that hash keys will only contain alphanumeric characters and underscores. Go check out the docs for more usage details.


Now THAT kicks ass

xsawyerx on 2009-08-12T08:20:34

Very interesting. I like it!

I prefer XPath

grantm on 2009-08-13T01:17:11

For that type of problem, I find that using XML::LibXML and XPath leads to code with even less syntax.

Re:I prefer XPath

pshangov on 2009-08-20T12:01:18

Sometimes, however, XML::LibXML is not an option. In the example above, I use XML::Compile (which BTW uses XML::LibXML internally), which is currently the best option if you want to work with schema-compliant XML documents (especially if you want to create and modify ones). The other such module I often use is XML::TreePP, which is a pure-perl solution and is available in environments where XML::LibXML isn't.