Padre::Task 2.0 - Making Wx + Perl threading suck faster

Alias on 2010-06-03T03:37:29

Some time Real Soon Now, I'll be landing the shiny new second-generation Padre Task API (which is used for background tasks and threading control) onto trunk.

A few people have asked me to give a quick high level overview on how it works, so this will be a quick bottom to top pass over how Padre will be doing threading.

The first-generation Padre::Task API was true frontier programming. While some of us knew Wx and others knew Perl's threads a bit, nobody really had any idea how the two interact. Indeed, the proliferation of bugs we found suggests that Padre has really been the first major application to use both at the same time.

Steffen MÃÂ¼ller boldly led the Padre team into this wilderness, putting together a solid threading core loosely inspired by the Process API. He nursed a ton of bug reports and tracked upstream fixes in Wx.pm.

Some of the code was a bit ugly in places, and the threads burned a ton of memory, but it worked and worked well. It worked well enough that Andrew Bramble was later able to extend it to add Service support for long running background tasks with bidirectional communication.

But while we've reduced some of the worst of the memory problems with the "slave driver" methodology, the API has been at the end of its natural life for a while now. It's hard to write new background tasks without knowing both Wx and threads, which limited access to only three or four people.

The goals for the new Padre::Task 2.0 are threefold.

Firstly, to allow the creation of the three distinct background jobs we need in Padre, in a way that doesn't abuse either Wx or Perl's threads mechanism. These three background job types are Task (Request,Wait,Response), Stream (Request,Output,...,Response) and Service (Request,Input,Output,...,Response).

Second, to allow the implementation to have (or have in the future) the theoretically smallest possible memory consumption beyond the minimum overheads imposed by Perl's threading model.

Third, to allow the creation of Wx + threads tasks without the need to learn either Wx or threads. This should open up background tasks to the main body of Padre contributors, beyond the elites, and spur great improvements in Padre's ability to take full advantage of multi-core developer machines.

A fourth bonus goal is to allow us to migrate Padre's backgrounding implementation to something other than threads in the future, without having to change any of the code in the actual tasks next time around. This should also allow the people that don't like Perl threads and want us to use something else to move their arguments from bikeshedding to actual proof-by-code.

After several months of experimenting, I've gone with a somewhat unusual implementation, but one that is completely workable and meets the criteria.

The core of this implementation is a communications loop that allows messages from the parent thread to a child thread, and back again.

The odd thing about this particular communications loop is that the two halves of the loop are done using utterly different underlying mechanisms.

The parent --> child link is done using Perl threads shared variables, in particular Thread::Queue.

Each Padre thread is created via a a Padre::TaskThread parent abstraction, which governs the creation of the real threads.pm thread, but also provides a Thread::Queue "inbox" for each thread. This is inspired by the Erlang micro-threading model for message passing, but is way way heavier.

In this manner, the top level task manager can keep hold of just the queue object if it wants, feeding messages into the queue to be extracted at some unknown place in another thread it has no control over.

Once spawned, each worker thread immediately goes into a run-loop waiting on messages from its message queue. Messages are simply RPC invocations, with the message name being a method name, and the message payload becoming method parameters.

Every thread has the ability to clone itself if passed an empty Padre::TaskThread object to host it. This gets a little weird (you end up passing shared Thread::Queue objects as a payload inside of other shared Thread::Queue objects) but the end result is that you don't have to spawn Perl threads from the parent thread, you can spawn them from child threads but retain the parent's ability to send messages to them regardless of the location on the thread spawn graph they are actually running.

The end result of this trickery is that we can replicate the slave driver trick from the first-generation API. By spawning off an initial pristine thread very early in the process, when the memory cost is small, we can create new threads later by spawning them off this "master" thread and retain the original low per-thread memory cost.

And while we don't do it yet, we can be even tricksier if we want. If we have a thread that has had to load 5, 10, or 20 meg of extra modules, we don't need to load them again. Instead, we could choose to clone that child directly and have a new thread with all the same task modules pre-loaded for us.

The second half of this communications loop is the up-channel, which is done using a totally different Wx mechanism. For communicating messages up to the parent, the Wx documentation recommends the creation of different messages types for each message, and then the use of a thread event.

This care and feeding of Wx turns out to be difficult in practice for non-elites, because you end up registering a ton of different Wx event types, all of which need to be stored in Perl thread-shared variables. And each message needs to be pushed through some target object, and the choice of these can be difficult.

Instead, what we do instead is to register a single event type, and a single global event "conduit". As each message is received by the conduit, it filters down to just appropriately enveloped events and pass only those along to the task manager. The task manager removes the destination header and routes it to the correct task handle in the parent.

Again, messages are done as RPC style calls, with a message type being the method to call, and the payload being the params.

This lets you do a reasonable simple form of cross-thread message passing (at least from the child to the parent anyway).

sub in_child {
    my $self = shift;
    $self->message('in_parent', 'Some status value');
}

sub in_parent {
    my $self   = shift;
    my $status = shift;
}

We've also put a Padre::ThreadHandle layer over the task itself to do eval'ing and other cleanliness work, in the same way we put a Padre::PluginHandle over every plugin object to protect us against them.

This handle adds some extra value as well, automatically notifying the task manager when a task has started and stopped, and generally keeping everyone sane.

Within this communications loop lives the actual tasks.

The Task API forces work into a very rigid structure. The rigid rules on behaviour is the cost of allowing all the threading magic to happen without the task having to know how it happens.

The basic API looks something like this.

package Some::Task;

use strict;
use base 'Padre::Task';

# Constructor, happens in the parent at an arbitrary time before the job is run.
sub new {
    my $class = shift;
    my $self  = bless { @_ }, $class;
    return $self;
}

# Get ready to serialize, happens in the parent immediately before being sent
# to the thread for execution.
# Returns true to continue the execution.
# Returns false to abort the execution.
sub prepare {
    return 1;
}

# Execute the task, happens in the child thread and is allowed to block.
# Any output data should be stored in the task.
sub run {
    my $self = shift;
    require Big::Module;
    $self->{output} = Big::Module::something($self->{input});
    return 1;
}

# Tell the application to handle the output, runs in the parent thread after
# the execution is completed.
sub finish {
    my $self = shift;
    Padre::Current->main->widget->refresh_response( $self->{output} );
    return 1;
}

To fire off this task, in the widget which commissioned this background work you just do something like this.

sub refresh {
    my $self = shift;

    require Some::Task;
    Some::Task->new(
        input => 'some value',
    )->schedule;
}

sub refresh_response {
    my $self   = shift;
    my $output = shift;

    $self->wx_thingy->SetValue($output);
}

As you can see here, at no point do you need to care about threads, or event handling, or that kind of thing. You wrap a task class around the blocking part of your app, have the finish push the answer through to your wx component, and then handle it in the wx component.

Of course, this is still a little fidgety. So I'm currently in the process of adding a secondary layer for the common case where the task is created primarily for the purpose of a single parent owner.

The new API layer simplifies things even more. The task takes an "owner" param that represents the Wx component that commissioned it. It does some weaken/refaddr based indexing magic to map the task to the owner object, and then makes sure there is a default finish method to route the answer automatically back to the owner.

The neat part about this is that it takes care of synchronisation problems automatically. If the task finish is called at a time when the Wx component has been destroyed, the answer is automatically dropped and ignored.

The owner can also explicitly declare that the answers from any tasks currently in flight are no longer relevant, and those will be dropped as well.

With the new helper code, your task is shrunk to the following.

package Some::Task;

use strict;
use base 'Padre::Task';

sub run {
    my $self = shift;
    require Big::Module;
    $self->{output} = Big::Module::something($self->{input});
    return 1;
}

1;

And the code in the owner component is just this...

sub refresh {
    my $self = shift;

    # Ignore any existing in-flight tasks
    $self->task_reset;

    # Kick off the new task
    $self->task_request(
        task  => 'Some::Task',
        input => 'some value',
    );
}

sub task_response {
    my $self   = shift;
    my $task   = shift;
    my $output = $task->{output} or return;
    $self->wx_thingy->SetValue($output);
}

This is not the 100% final API look and feel, but it demonstrates the volume of code that you will need to write to do something in the background in the Task 2.0 API.

There's also some interesting opportunities to make it smaller again for very simple cases, by using the generic eval task to inline the execution code.

sub refresh {
    my $self = shift;

    # Ignore any existing in-flight tasks
    $self->task_reset;

    # Kick off the new task
    $self->task_request(
        task  => 'Padre::Task::Eval',
        input => 'some value',
        run   => <<'END_TASK',
            my $self = shift;
            require Big::Module;
            $self->{output} = Big::Module::something($self->{input});
            return 1;
END_TASK
    );
}

sub task_response {
    my $self   = shift;
    my $task   = shift;
    my $output = $task->{output} or return;
    $self->wx_thingy->SetValue($output);
}

This would mean we don't need a task class at all, which could be useful in cases where we want to avoid generating lots of tiny 10 line task classes, or where we want to generate the background code on the fly at run-time.

The downside would be that because the work is bunched inside a single task class, we lose the chance to do worker thread specialisation (where the workers track which tasks they have loaded into memory, so further tasks of the same type can be assigned to them in preference to other threads).

Perhaps we support all the different ways, so that we can pick and choose which option is best on a case by case basis and change them over time. That would certainly fit the Padre rule of "something is better than nothing, don't worry too much about having to break it later".

The downside of this approach, of course, is that breakage. The new Task API will basically kill any plugin currently doing their own background tasks.

The upside, of course, is that once they are fixed these plugins can do those background tasks much more efficiently that before.

The new Task 2.0 API will land in Padre 0.65, which should be out in about 2 weeks.