Some time Real Soon Now, I'll be landing the shiny new second-generation Padre Task API (which is used for background tasks and threading control) onto trunk.
A few people have asked me to give a quick high level overview on how it works, so this will be a quick bottom to top pass over how Padre will be doing threading.
The first-generation Padre::Task API was true frontier programming. While some of us knew Wx and others knew Perl's threads a bit, nobody really had any idea how the two interact. Indeed, the proliferation of bugs we found suggests that Padre has really been the first major application to use both at the same time.
Steffen Müller boldly led the Padre team into this wilderness, putting together a solid threading core loosely inspired by the Process API. He nursed a ton of bug reports and tracked upstream fixes in Wx.pm.
Some of the code was a bit ugly in places, and the threads burned a ton of memory, but it worked and worked well. It worked well enough that Andrew Bramble was later able to extend it to add Service support for long running background tasks with bidirectional communication.
But while we've reduced some of the worst of the memory problems with the "slave driver" methodology, the API has been at the end of its natural life for a while now. It's hard to write new background tasks without knowing both Wx and threads, which limited access to only three or four people.
The goals for the new Padre::Task 2.0 are threefold.
Firstly, to allow the creation of the three distinct background jobs we need in Padre, in a way that doesn't abuse either Wx or Perl's threads mechanism. These three background job types are Task (Request,Wait,Response), Stream (Request,Output,...,Response) and Service (Request,Input,Output,...,Response).
Second, to allow the implementation to have (or have in the future) the theoretically smallest possible memory consumption beyond the minimum overheads imposed by Perl's threading model.
Third, to allow the creation of Wx + threads tasks without the need to learn either Wx or threads. This should open up background tasks to the main body of Padre contributors, beyond the elites, and spur great improvements in Padre's ability to take full advantage of multi-core developer machines.
A fourth bonus goal is to allow us to migrate Padre's backgrounding implementation to something other than threads in the future, without having to change any of the code in the actual tasks next time around. This should also allow the people that don't like Perl threads and want us to use something else to move their arguments from bikeshedding to actual proof-by-code.
After several months of experimenting, I've gone with a somewhat unusual implementation, but one that is completely workable and meets the criteria.
The core of this implementation is a communications loop that allows messages from the parent thread to a child thread, and back again.
The odd thing about this particular communications loop is that the two halves of the loop are done using utterly different underlying mechanisms.
The parent --> child link is done using Perl threads shared variables, in particular Thread::Queue.
Each Padre thread is created via a a Padre::TaskThread parent abstraction, which governs the creation of the real threads.pm thread, but also provides a Thread::Queue "inbox" for each thread. This is inspired by the Erlang micro-threading model for message passing, but is way way heavier.
In this manner, the top level task manager can keep hold of just the queue object if it wants, feeding messages into the queue to be extracted at some unknown place in another thread it has no control over.
Once spawned, each worker thread immediately goes into a run-loop waiting on messages from its message queue. Messages are simply RPC invocations, with the message name being a method name, and the message payload becoming method parameters.
Every thread has the ability to clone itself if passed an empty Padre::TaskThread object to host it. This gets a little weird (you end up passing shared Thread::Queue objects as a payload inside of other shared Thread::Queue objects) but the end result is that you don't have to spawn Perl threads from the parent thread, you can spawn them from child threads but retain the parent's ability to send messages to them regardless of the location on the thread spawn graph they are actually running.
The end result of this trickery is that we can replicate the slave driver trick from the first-generation API. By spawning off an initial pristine thread very early in the process, when the memory cost is small, we can create new threads later by spawning them off this "master" thread and retain the original low per-thread memory cost.
And while we don't do it yet, we can be even tricksier if we want. If we have a thread that has had to load 5, 10, or 20 meg of extra modules, we don't need to load them again. Instead, we could choose to clone that child directly and have a new thread with all the same task modules pre-loaded for us.
The second half of this communications loop is the up-channel, which is done using a totally different Wx mechanism. For communicating messages up to the parent, the Wx documentation recommends the creation of different messages types for each message, and then the use of a thread event.
This care and feeding of Wx turns out to be difficult in practice for non-elites, because you end up registering a ton of different Wx event types, all of which need to be stored in Perl thread-shared variables. And each message needs to be pushed through some target object, and the choice of these can be difficult.
Instead, what we do instead is to register a single event type, and a single global event "conduit". As each message is received by the conduit, it filters down to just appropriately enveloped events and pass only those along to the task manager. The task manager removes the destination header and routes it to the correct task handle in the parent.
Again, messages are done as RPC style calls, with a message type being the method to call, and the payload being the params.
This lets you do a reasonable simple form of cross-thread message passing (at least from the child to the parent anyway).
sub in_child { my $self = shift; $self->message('in_parent', 'Some status value'); }We've also put a Padre::ThreadHandle layer over the task itself to do eval'ing and other cleanliness work, in the same way we put a Padre::PluginHandle over every plugin object to protect us against them.
sub in_parent { my $self = shift; my $status = shift; }
package Some::Task;To fire off this task, in the widget which commissioned this background work you just do something like this.
use strict; use base 'Padre::Task';
# Constructor, happens in the parent at an arbitrary time before the job is run. sub new { my $class = shift; my $self = bless { @_ }, $class; return $self; }
# Get ready to serialize, happens in the parent immediately before being sent # to the thread for execution. # Returns true to continue the execution. # Returns false to abort the execution. sub prepare { return 1; }
# Execute the task, happens in the child thread and is allowed to block. # Any output data should be stored in the task. sub run { my $self = shift; require Big::Module; $self->{output} = Big::Module::something($self->{input}); return 1; }
# Tell the application to handle the output, runs in the parent thread after # the execution is completed. sub finish { my $self = shift; Padre::Current->main->widget->refresh_response( $self->{output} ); return 1; }
sub refresh { my $self = shift;As you can see here, at no point do you need to care about threads, or event handling, or that kind of thing. You wrap a task class around the blocking part of your app, have the finish push the answer through to your wx component, and then handle it in the wx component.
require Some::Task; Some::Task->new( input => 'some value', )->schedule; }
sub refresh_response { my $self = shift; my $output = shift;
$self->wx_thingy->SetValue($output); }
package Some::Task;And the code in the owner component is just this...
use strict; use base 'Padre::Task';
sub run { my $self = shift; require Big::Module; $self->{output} = Big::Module::something($self->{input}); return 1; }
1;
sub refresh { my $self = shift;This is not the 100% final API look and feel, but it demonstrates the volume of code that you will need to write to do something in the background in the Task 2.0 API.
# Ignore any existing in-flight tasks $self->task_reset;
# Kick off the new task $self->task_request( task => 'Some::Task', input => 'some value', ); }
sub task_response { my $self = shift; my $task = shift; my $output = $task->{output} or return; $self->wx_thingy->SetValue($output); }
sub refresh { my $self = shift;This would mean we don't need a task class at all, which could be useful in cases where we want to avoid generating lots of tiny 10 line task classes, or where we want to generate the background code on the fly at run-time.
# Ignore any existing in-flight tasks $self->task_reset;
# Kick off the new task $self->task_request( task => 'Padre::Task::Eval', input => 'some value', run => <<'END_TASK', my $self = shift; require Big::Module; $self->{output} = Big::Module::something($self->{input}); return 1; END_TASK ); }
sub task_response { my $self = shift; my $task = shift; my $output = $task->{output} or return; $self->wx_thingy->SetValue($output); }