I've been trying to profile Memcached::libmemcached for a while.
Before I start, I'm not going to claim that I'm a C expert, so take my writings with a grain of salt. Ok?
I've run a few benchmarks on Cache::Memcached::libmemcached/ Memcached::libmemcached against Cache::Memcached::Fast, and have found that get_multi() seems to be consistently slower for Memcached::libmemcached. At this point I'm inclined to think that the bottleneck is within the underlying libmemcached library.
As the rules of optimization says, I wasn't going to do anything before I got a profile output, so I started digging around gprof.
Now, gprof looked much like my beloved Devel::DProf, so I was thinking this would be an easy task.... turns out not.
So the first thing to note: to enable profiling, you must enable -pg on both compilation and linking. in a nutshell, this is what you do:
shell> gcc -pg foo.c shell> ./a.out # executing ./a.out (and a successful call to exit()) # creates a gmon.out file shell> gprof
So first I created foo.c.
#include#include memcached_return fetch(memcached_st *ptr, memcached_result_st *result, void *context) { fprintf(stderr, "%s => %s\n", memcached_result_key_value(result), memcached_result_value(result) ); } int main(int argc, char **argv) { int i; memcached_st *ptr; char *keys[] = { "a", "b", "c", "d", "e", "f" }; size_t key_length[] = { 1, 1, 1, 1, 1, 1 }; uint32_t flags = 0; memcached_return error = 0; memcached_return (*callbacks[])(memcached_st *ptr, memcached_result_st *result, void *context) = { fetch }; ptr = memcached_create(NULL); memcached_server_add(ptr, "localhost", 11211); for(i = 0; i < 6; i++) { memcached_set(ptr, keys[i], key_length[i], "1", 1, (time_t) NULL, (uint32_t) NULL); } for(i = 0; i < 100000; i++) { memcached_mget(ptr, keys, key_length, 6); memcached_fetch_execute(ptr, callbacks, NULL, 1); } memcached_free(ptr); }
k. now, for my needs, I need to enable this on both libmemcached and my sample test file. So this is what I do:
shell> ./configure --enable-pic CFLAGS=-pg LDFLAGS=-pg shell> make shell> make install
and then I compile/link with my foo.c
shell> gcc -pg -lmemcached foo.c
and now gprof:
shell> gprof -b Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls Ts/call Ts/call name 57.14 0.04 0.04 fetch 42.86 0.07 0.03 main 0.00 0.07 0.00 800009 0.00 0.00 data_start Call graph granularity: each sample hit covers 4 byte(s) for 14.29% of 0.07 seconds index % time self children called name[1] 57.1 0.04 0.00 fetch [1] 0.00 0.00 600000/800009 data_start [3] ----------------------------------------------- [2] 42.9 0.03 0.00 main [2] 0.00 0.00 200009/800009 data_start [3] ----------------------------------------------- 0.00 0.00 200009/800009 main [2] 0.00 0.00 600000/800009 fetch [1] [3] 0.0 0.00 0.00 800009 data_start [3] ----------------------------------------------- Index by function name [3] data_start [1] fetch [2] main
Uh, what? where are my calls to memcached_* functions? Naturally, I try googling for gprof with dynamic linking, etc. But this didn't turn up much.
I fought with this for a while (actually, for like 2 months, on and off), and this morning I just had the thought to just link the objects statically. god, why didn't I think of this before?
So I've now moved back to where my .o files are (the "libmemcahced" directory) and moved my foo.c there. this time I do
shell> gcc -pg foo.c *.o shell> ./a.out shell> gprof -b
And guess what, I finally got my gprof!!! aaaaahhh bliss. I can finally start digging. At first flance, it looks like there are too many reads for the 100K mget() with 6 keys I'm doing (the profiling output reads 8.9 million reads). I haven't done any work just yet. will see what I can find.
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls us/call us/call name 37.02 0.48 0.48 8900049 0.05 0.05 memcached_io_read 17.56 0.71 0.23 700006 0.33 1.24 memcached_response 10.69 0.85 0.14 1400019 0.10 0.12 memcached_io_write 8.40 0.96 0.11 600000 0.18 0.31 value_fetch 4.58 1.02 0.06 100000 0.60 2.50 memcached_mget_by_key 4.58 1.08 0.06 __i686.get_pc_thunk.bx 3.82 1.14 0.05 600000 0.08 0.08 fetch 3.05 1.18 0.04 100000 0.40 9.60 memcached_fetch_execute 2.29 1.21 0.03 600001 0.05 0.05 memcached_string_check 2.29 1.24 0.03 100007 0.30 0.30 io_flush 1.53 1.25 0.02 600006 0.03 0.03 memcached_generate_hash 1.53 1.27 0.02 hash_crc32 1.15 1.29 0.01 600000 0.03 0.03 memcached_result_reset 0.76 1.30 0.01 main 0.38 1.30 0.01 io_wait 0.38 1.31 0.01 memcached_result_set_value 0.00 1.31 0.00 600000 0.00 0.00 memcached_result_value 0.00 1.31 0.00 600000 0.00 0.00 memcached_string_reset 0.00 1.31 0.00 100007 0.00 0.00 memcached_connect 0.00 1.31 0.00 100007 0.00 0.00 network_connect 0.00 1.31 0.00 100000 0.00 2.50 memcached_mget 0.00 1.31 0.00 7 0.00 0.12 memcached_do 0.00 1.31 0.00 6 0.00 1.64 memcached_send 0.00 1.31 0.00 6 0.00 1.64 memcached_set 0.00 1.31 0.00 6 0.00 0.00 storage_op_string 0.00 1.31 0.00 1 0.00 0.00 host_reset 0.00 1.31 0.00 1 0.00 0.05 memcached_create 0.00 1.31 0.00 1 0.00 0.18 memcached_free 0.00 1.31 0.00 1 0.00 0.00 memcached_io_close 0.00 1.31 0.00 1 0.00 0.18 memcached_quit 0.00 1.31 0.00 1 0.00 0.18 memcached_quit_server 0.00 1.31 0.00 1 0.00 0.05 memcached_result_create 0.00 1.31 0.00 1 0.00 0.00 memcached_result_free 0.00 1.31 0.00 1 0.00 0.00 memcached_server_add 0.00 1.31 0.00 1 0.00 0.05 memcached_string_create 0.00 1.31 0.00 1 0.00 0.00 memcached_string_free 0.00 1.31 0.00 1 0.00 0.00 rebalance_wheel 0.00 1.31 0.00 1 0.00 0.00 server_add 0.00 1.31 0.00 1 0.00 0.00 server_list_free 0.00 1.31 0.00 1 0.00 0.00 set_hostinfo 0.00 1.31 0.00 1 0.00 0.00 set_socket_options Call graph granularity: each sample hit covers 4 byte(s) for 0.76% of 1.31 seconds index % time self children called name[1] 93.1 0.01 1.21 main [1] 0.04 0.92 100000/100000 memcached_fetch_execute [2] 0.00 0.25 100000/100000 memcached_mget [5] 0.00 0.00 6/6 memcached_set [19] 0.00 0.00 1/1 memcached_free [21] 0.00 0.00 1/1 memcached_create [24] 0.00 0.00 1/1 memcached_server_add [35] ----------------------------------------------- 0.04 0.92 100000/100000 main [1] [2] 73.3 0.04 0.92 100000 memcached_fetch_execute [2] 0.23 0.64 700000/700006 memcached_response [3] 0.05 0.00 600000/600000 fetch [10] ----------------------------------------------- 0.00 0.00 6/700006 memcached_send [18] 0.23 0.64 700000/700006 memcached_fetch_execute [2] [3] 66.4 0.23 0.64 700006 memcached_response [3] 0.45 0.00 8300048/8900049 memcached_io_read [4] 0.11 0.08 600000/600000 value_fetch [7] ----------------------------------------------- 0.00 0.00 1/8900049 memcached_quit_server [23] 0.03 0.00 600000/8900049 value_fetch [7] 0.45 0.00 8300048/8900049 memcached_response [3] [4] 37.0 0.48 0.00 8900049 memcached_io_read [4] ----------------------------------------------- 0.00 0.25 100000/100000 main [1] [5] 19.1 0.00 0.25 100000 memcached_mget [5] 0.06 0.19 100000/100000 memcached_mget_by_key [6] ----------------------------------------------- 0.06 0.19 100000/100000 memcached_mget [5] [6] 19.1 0.06 0.19 100000 memcached_mget_by_key [6] 0.14 0.03 1400000/1400019 memcached_io_write [8] 0.02 0.00 600000/600006 memcached_generate_hash [13] 0.00 0.00 100000/100007 memcached_connect [29] ----------------------------------------------- 0.11 0.08 600000/600000 memcached_response [3] [7] 14.3 0.11 0.08 600000 value_fetch [7] 0.03 0.00 600000/8900049 memcached_io_read [4] 0.03 0.00 600000/600001 memcached_string_check [11] 0.01 0.00 600000/600000 memcached_result_reset [15] ----------------------------------------------- 0.00 0.00 7/1400019 memcached_do [20] 0.00 0.00 12/1400019 memcached_send [18] 0.14 0.03 1400000/1400019 memcached_mget_by_key [6] [8] 13.0 0.14 0.03 1400019 memcached_io_write [8] 0.03 0.00 100007/100007 io_flush [12] ----------------------------------------------- [9] 4.6 0.06 0.00 __i686.get_pc_thunk.bx [9] ----------------------------------------------- 0.05 0.00 600000/600000 memcached_fetch_execute [2] [10] 3.8 0.05 0.00 600000 fetch [10] 0.00 0.00 600000/600000 memcached_result_value [27] ----------------------------------------------- 0.00 0.00 1/600001 memcached_string_create [26] 0.03 0.00 600000/600001 value_fetch [7] [11] 2.3 0.03 0.00 600001 memcached_string_check [11] ----------------------------------------------- 0.03 0.00 100007/100007 memcached_io_write [8] [12] 2.3 0.03 0.00 100007 io_flush [12] ----------------------------------------------- 0.00 0.00 6/600006 memcached_send [18] 0.02 0.00 600000/600006 memcached_mget_by_key [6] [13] 1.5 0.02 0.00 600006 memcached_generate_hash [13] ----------------------------------------------- [14] 1.5 0.02 0.00 hash_crc32 [14] ----------------------------------------------- 0.01 0.00 600000/600000 value_fetch [7] [15] 1.1 0.01 0.00 600000 memcached_result_reset [15] 0.00 0.00 600000/600000 memcached_string_reset [28] ----------------------------------------------- [16] 0.4 0.01 0.00 io_wait [16] ----------------------------------------------- [17] 0.4 0.01 0.00 memcached_result_set_value [17] ----------------------------------------------- 0.00 0.00 6/6 memcached_set [19] [18] 0.0 0.00 0.00 6 memcached_send [18] 0.00 0.00 6/700006 memcached_response [3] 0.00 0.00 12/1400019 memcached_io_write [8] 0.00 0.00 6/7 memcached_do [20] 0.00 0.00 6/600006 memcached_generate_hash [13] 0.00 0.00 6/6 storage_op_string [31] ----------------------------------------------- 0.00 0.00 6/6 main [1] [19] 0.0 0.00 0.00 6 memcached_set [19] 0.00 0.00 6/6 memcached_send [18] ----------------------------------------------- 0.00 0.00 1/7 memcached_quit_server [23] 0.00 0.00 6/7 memcached_send [18] [20] 0.0 0.00 0.00 7 memcached_do [20] 0.00 0.00 7/1400019 memcached_io_write [8] 0.00 0.00 7/100007 memcached_connect [29] ----------------------------------------------- 0.00 0.00 1/1 main [1] [21] 0.0 0.00 0.00 1 memcached_free [21] 0.00 0.00 1/1 memcached_quit [22] 0.00 0.00 1/1 server_list_free [39] 0.00 0.00 1/1 memcached_result_free [34] ----------------------------------------------- 0.00 0.00 1/1 memcached_free [21] [22] 0.0 0.00 0.00 1 memcached_quit [22] 0.00 0.00 1/1 memcached_quit_server [23] ----------------------------------------------- 0.00 0.00 1/1 memcached_quit [22] [23] 0.0 0.00 0.00 1 memcached_quit_server [23] 0.00 0.00 1/7 memcached_do [20] 0.00 0.00 1/8900049 memcached_io_read [4] 0.00 0.00 1/1 memcached_io_close [33] ----------------------------------------------- 0.00 0.00 1/1 main [1] [24] 0.0 0.00 0.00 1 memcached_create [24] 0.00 0.00 1/1 memcached_result_create [25] ----------------------------------------------- 0.00 0.00 1/1 memcached_create [24] [25] 0.0 0.00 0.00 1 memcached_result_create [25] 0.00 0.00 1/1 memcached_string_create [26] ----------------------------------------------- 0.00 0.00 1/1 memcached_result_create [25] [26] 0.0 0.00 0.00 1 memcached_string_create [26] 0.00 0.00 1/600001 memcached_string_check [11] ----------------------------------------------- 0.00 0.00 600000/600000 fetch [10] [27] 0.0 0.00 0.00 600000 memcached_result_value [27] ----------------------------------------------- 0.00 0.00 600000/600000 memcached_result_reset [15] [28] 0.0 0.00 0.00 600000 memcached_string_reset [28] ----------------------------------------------- 0.00 0.00 7/100007 memcached_do [20] 0.00 0.00 100000/100007 memcached_mget_by_key [6] [29] 0.0 0.00 0.00 100007 memcached_connect [29] 0.00 0.00 100007/100007 network_connect [30] ----------------------------------------------- 0.00 0.00 100007/100007 memcached_connect [29] [30] 0.0 0.00 0.00 100007 network_connect [30] 0.00 0.00 1/1 set_hostinfo [40] 0.00 0.00 1/1 set_socket_options [41] ----------------------------------------------- 0.00 0.00 6/6 memcached_send [18] [31] 0.0 0.00 0.00 6 storage_op_string [31] ----------------------------------------------- 0.00 0.00 1/1 server_add [38] [32] 0.0 0.00 0.00 1 host_reset [32] ----------------------------------------------- 0.00 0.00 1/1 memcached_quit_server [23] [33] 0.0 0.00 0.00 1 memcached_io_close [33] ----------------------------------------------- 0.00 0.00 1/1 memcached_free [21] [34] 0.0 0.00 0.00 1 memcached_result_free [34] 0.00 0.00 1/1 memcached_string_free [36] ----------------------------------------------- 0.00 0.00 1/1 main [1] [35] 0.0 0.00 0.00 1 memcached_server_add [35] 0.00 0.00 1/1 server_add [38] ----------------------------------------------- 0.00 0.00 1/1 memcached_result_free [34] [36] 0.0 0.00 0.00 1 memcached_string_free [36] ----------------------------------------------- 0.00 0.00 1/1 server_add [38] [37] 0.0 0.00 0.00 1 rebalance_wheel [37] ----------------------------------------------- 0.00 0.00 1/1 memcached_server_add [35] [38] 0.0 0.00 0.00 1 server_add [38] 0.00 0.00 1/1 host_reset [32] 0.00 0.00 1/1 rebalance_wheel [37] ----------------------------------------------- 0.00 0.00 1/1 memcached_free [21] [39] 0.0 0.00 0.00 1 server_list_free [39] ----------------------------------------------- 0.00 0.00 1/1 network_connect [30] [40] 0.0 0.00 0.00 1 set_hostinfo [40] ----------------------------------------------- 0.00 0.00 1/1 network_connect [30] [41] 0.0 0.00 0.00 1 set_socket_options [41] ----------------------------------------------- Index by function name [9] __i686.get_pc_thunk.bx [4] memcached_io_read [19] memcached_set [10] fetch [8] memcached_io_write [11] memcached_string_check [14] hash_crc32 [5] memcached_mget [26] memcached_string_create [32] host_reset [6] memcached_mget_by_key [36] memcached_string_free [12] io_flush [22] memcached_quit [28] memcached_string_reset [16] io_wait [23] memcached_quit_server [30] network_connect [1] main [3] memcached_response [37] rebalance_wheel [29] memcached_connect [25] memcached_result_create [38] server_add [24] memcached_create [34] memcached_result_free [39] server_list_free [20] memcached_do [15] memcached_result_reset [40] set_hostinfo [2] memcached_fetch_execute [17] memcached_result_set_value [41] set_socket_options [21] memcached_free [27] memcached_result_value [31] storage_op_string [13] memcached_generate_hash [18] memcached_send [7] value_fetch [33] memcached_io_close [35] memcached_server_add
If you're happy with gprof, that's good. I found it a lot of work. Even though there's a substantial performance hit, Callgrind has been invaluable for me.
Re:Cachegrind
lestrrat on 2008-05-02T06:10:08
Well, I was happy until I tried callgrind upon your comment. I guess I'm not going back to gprof unless I need to. Thanks!