Test::POE::Stopping is finally completed

Alias on 2009-10-08T02:18:14

One of the things about POE that annoys me the most is it's shutdown process. In POE, a POE::Session remains running unless all resources are removed and all events related to it in any way are completed, and the POE kernel won't stop unless all sessions have stopped.

I understand why this is necesary, but it still becomes a fairly large source of bugs. In a large complex application, any tiny resource bug prevents shutdown unless forced with a raw kernel ->stop call. POE applications have a tendency to basically hang while the kernel waits on some "leaked" event handlers that will never fire.

In test scripts, which fail all the time during development, this is a particularly annoying problem.

Over the last year or two, as I've dabbled with POE, I've been building up a solution to this problem (for testing).

Test::POE::Stopping provides a single function to your test script, poe_stopping().

This function expects to be called as the final line in the final event of the final session. It examines your POE kernel for the many different things that could prevent elegant kernel shutdown.

If poe_stopping() is happy that nothing other than the current event is preventing the kernel stopping it passes the test and continues, allowing the kernel to stop naturally.

If poe_stopping() finds evidence of other activity that might prevent the kernel shutting down it will fail the test, hard-stop the kernel, and then print out a big diagnostics dump of the properties of the remaining sessions (and why they might be kept alive).

Currently this dump is just a YAML serialisation of the internal tracking tracking structures, and looks like the following.

not ok 16 - POE appears to be stopping cleanly

# Failed test 'POE appears to be stopping cleanly' # at t/10_server.t line 112. # --- # alias: 0 # children: 0 # current: 0 # extra: 0 # handles: 0 # id: 5 # queue: # distinct: 0 # from: 0 # to: 0 # refs: 0 # signals: 0 # --- # alias: 0 # children: 1 # current: 1 # extra: 0 # handles: 0 # id: 2 # queue: # distinct: 0 # from: 0 # to: 0 # refs: 1 # signals: 0 # --- # alias: 0 # children: 1 # current: 0 # extra: 0 # handles: 0 # id: 3 # queue: # distinct: 0 # from: 0 # to: 0 # refs: 1 # signals: 0


It's a little cryptic still (I plan to improve the diagnostics formatting in future releases) but looking at the list of sessions the first session is in an unusual state with nothing apparently preventing destruction (so either I'm doing something wrong with POE or there's a bug), and the other two sessions both have child sessions holding them alive (which is something I know for my particular program should not be happening).

So this clues me in that my session parent/child relationships are broken, and more importantly that the problem is NOT to do with aliases, extra refs, handles, events, alarms or signals.

This kind of knowledge is critically important once you start attempting to test large application stacks with perhaps a dozen sessions. When the whole ball of wax fails to shut down, you need to able to relatively easily work out what you aren't doing.