Get a stack trace from your running perl

jjore on 2009-07-20T05:05:49

When you've got a process that's spinning and don't know what's going on, it's useful to get a perl level stack trace. There's a "simple" gdb function to get your perl to write its call stack to STDERR. It's not entirely without risk to corrupting your process. I've used it just fine against my mod_perl in prod when all our other normal methods of inspection failed.

Step 1: attach with gdb.

jjore$ ps -x
  PID   TT  STAT      TIME COMMAND
...
  765 s001  S      0:00.01 perl -e while(1){sleep 1}

jjore$ gdb --quiet -p 765 Attaching to process 780. Reading symbols for shared libraries . done Reading symbols for shared libraries warning: Could not find object file "..." - no debug information available for "...c". ... ..... done 0x9005e46e in __semwait_signal () (gdb)


Step 2: Get a C level backtrace to see if that's informative enough. In this case, I can see the program is current in some sleep functions. In prod, I've seen things like bugs in libxslt show up nicely here - I could see the stack extending down into the libxml/libxslt libraries which eventually lead me to solve the problem by upgrade my libxml library.

(gdb) bt
#0  0x9005e46e in __semwait_signal ()
#1  0x9005e2ef in nanosleep$UNIX2003 ()
#2  0x900b3e71 in sleep$UNIX2003 ()
#3  0x0015a054 in Perl_pp_sleep ()
#4  0x00081bb7 in Perl_runops_debug ()
#5  0x000b6791 in perl_run ()
#6  0x0000247e in main ()


Step 3: Get your thread context pointer. For threaded perl, you'll need to pass in a thread context pointer. If this step returns 0x0, you don't have threaded perl and shouldn't pass this argument in. If you get something other than 0x0, you'll need to use this value.

(gdb) print $thread_context = (void*)Perl_get_context()
$1 = (void *) 0x0


OR

(gdb) print $thread_context = (void*)Perl_get_context()
$1 = (void *) 0x800000


Step 4: Use Carp::cluck() from inside gdb. If you've got a $thread_context, use it.

(gdb) call (void*)Perl_eval_pv($thread_context, "require Carp;Carp::cluck('Hi')", 0)


OR

(gdb) call (void*)Perl_eval_pv("require Carp; Carp::cluck('Hi')", 0)


Step 5: Profit.

Hi at (eval 1) line 1
	eval 'require Carp; Carp::cluck(\'Hi\')
;' called at -e line 1


sighandler

srezic on 2009-07-22T09:40:15

If I know beforehand that a script can run a long time or something unexpected is likely to happen, then I install a sighandler to give me a stack trace:

        use sigtrap "handler" => sub { require Carp; Carp::carp() }, 'USR1', 'INFO';

That way I can do "kill -USR1 " and see what's going on.

This is especially nice on BSD systems, where SIGINFO exists. SIGINFO is typically bound to CTRL-T, so I don't have to find the pid (if the script runs in the terminal). The default output of CTRL-T provides also some numbers about memory and CPU usage and the current process state.

Impressive!

bart on 2009-07-23T12:59:10

I've said it before, but... wow. You never cease to amaze me.