I've been trying to profile Memcached::libmemcached for a while.
Before I start, I'm not going to claim that I'm a C expert, so take my writings with a grain of salt. Ok?
I've run a few benchmarks on Cache::Memcached::libmemcached/ Memcached::libmemcached against Cache::Memcached::Fast, and have found that get_multi() seems to be consistently slower for Memcached::libmemcached. At this point I'm inclined to think that the bottleneck is within the underlying libmemcached library.
As the rules of optimization says, I wasn't going to do anything before I got a profile output, so I started digging around gprof.
Now, gprof looked much like my beloved Devel::DProf, so I was thinking this would be an easy task.... turns out not.
So the first thing to note: to enable profiling, you must enable -pg on both compilation and linking. in a nutshell, this is what you do:
shell> gcc -pg foo.c shell> ./a.out # executing ./a.out (and a successful call to exit()) # creates a gmon.out file shell> gprof
So first I created foo.c.
#include#include memcached_return fetch(memcached_st *ptr, memcached_result_st *result, void *context) { fprintf(stderr, "%s => %s\n", memcached_result_key_value(result), memcached_result_value(result) ); } int main(int argc, char **argv) { int i; memcached_st *ptr; char *keys[] = { "a", "b", "c", "d", "e", "f" }; size_t key_length[] = { 1, 1, 1, 1, 1, 1 }; uint32_t flags = 0; memcached_return error = 0; memcached_return (*callbacks[])(memcached_st *ptr, memcached_result_st *result, void *context) = { fetch }; ptr = memcached_create(NULL); memcached_server_add(ptr, "localhost", 11211); for(i = 0; i < 6; i++) { memcached_set(ptr, keys[i], key_length[i], "1", 1, (time_t) NULL, (uint32_t) NULL); } for(i = 0; i < 100000; i++) { memcached_mget(ptr, keys, key_length, 6); memcached_fetch_execute(ptr, callbacks, NULL, 1); } memcached_free(ptr); }
k. now, for my needs, I need to enable this on both libmemcached and my sample test file. So this is what I do:
shell> ./configure --enable-pic CFLAGS=-pg LDFLAGS=-pg shell> make shell> make install
and then I compile/link with my foo.c
shell> gcc -pg -lmemcached foo.c
and now gprof:
shell> gprof -b
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
57.14 0.04 0.04 fetch
42.86 0.07 0.03 main
0.00 0.07 0.00 800009 0.00 0.00 data_start
Call graph
granularity: each sample hit covers 4 byte(s) for 14.29% of 0.07 seconds
index % time self children called name
[1] 57.1 0.04 0.00 fetch [1]
0.00 0.00 600000/800009 data_start [3]
-----------------------------------------------
[2] 42.9 0.03 0.00 main [2]
0.00 0.00 200009/800009 data_start [3]
-----------------------------------------------
0.00 0.00 200009/800009 main [2]
0.00 0.00 600000/800009 fetch [1]
[3] 0.0 0.00 0.00 800009 data_start [3]
-----------------------------------------------
Index by function name
[3] data_start [1] fetch [2] main
Uh, what? where are my calls to memcached_* functions? Naturally, I try googling for gprof with dynamic linking, etc. But this didn't turn up much.
I fought with this for a while (actually, for like 2 months, on and off), and this morning I just had the thought to just link the objects statically. god, why didn't I think of this before?
So I've now moved back to where my .o files are (the "libmemcahced" directory) and moved my foo.c there. this time I do
shell> gcc -pg foo.c *.o shell> ./a.out shell> gprof -b
And guess what, I finally got my gprof!!! aaaaahhh bliss. I can finally start digging. At first flance, it looks like there are too many reads for the 100K mget() with 6 keys I'm doing (the profiling output reads 8.9 million reads). I haven't done any work just yet. will see what I can find.
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls us/call us/call name
37.02 0.48 0.48 8900049 0.05 0.05 memcached_io_read
17.56 0.71 0.23 700006 0.33 1.24 memcached_response
10.69 0.85 0.14 1400019 0.10 0.12 memcached_io_write
8.40 0.96 0.11 600000 0.18 0.31 value_fetch
4.58 1.02 0.06 100000 0.60 2.50 memcached_mget_by_key
4.58 1.08 0.06 __i686.get_pc_thunk.bx
3.82 1.14 0.05 600000 0.08 0.08 fetch
3.05 1.18 0.04 100000 0.40 9.60 memcached_fetch_execute
2.29 1.21 0.03 600001 0.05 0.05 memcached_string_check
2.29 1.24 0.03 100007 0.30 0.30 io_flush
1.53 1.25 0.02 600006 0.03 0.03 memcached_generate_hash
1.53 1.27 0.02 hash_crc32
1.15 1.29 0.01 600000 0.03 0.03 memcached_result_reset
0.76 1.30 0.01 main
0.38 1.30 0.01 io_wait
0.38 1.31 0.01 memcached_result_set_value
0.00 1.31 0.00 600000 0.00 0.00 memcached_result_value
0.00 1.31 0.00 600000 0.00 0.00 memcached_string_reset
0.00 1.31 0.00 100007 0.00 0.00 memcached_connect
0.00 1.31 0.00 100007 0.00 0.00 network_connect
0.00 1.31 0.00 100000 0.00 2.50 memcached_mget
0.00 1.31 0.00 7 0.00 0.12 memcached_do
0.00 1.31 0.00 6 0.00 1.64 memcached_send
0.00 1.31 0.00 6 0.00 1.64 memcached_set
0.00 1.31 0.00 6 0.00 0.00 storage_op_string
0.00 1.31 0.00 1 0.00 0.00 host_reset
0.00 1.31 0.00 1 0.00 0.05 memcached_create
0.00 1.31 0.00 1 0.00 0.18 memcached_free
0.00 1.31 0.00 1 0.00 0.00 memcached_io_close
0.00 1.31 0.00 1 0.00 0.18 memcached_quit
0.00 1.31 0.00 1 0.00 0.18 memcached_quit_server
0.00 1.31 0.00 1 0.00 0.05 memcached_result_create
0.00 1.31 0.00 1 0.00 0.00 memcached_result_free
0.00 1.31 0.00 1 0.00 0.00 memcached_server_add
0.00 1.31 0.00 1 0.00 0.05 memcached_string_create
0.00 1.31 0.00 1 0.00 0.00 memcached_string_free
0.00 1.31 0.00 1 0.00 0.00 rebalance_wheel
0.00 1.31 0.00 1 0.00 0.00 server_add
0.00 1.31 0.00 1 0.00 0.00 server_list_free
0.00 1.31 0.00 1 0.00 0.00 set_hostinfo
0.00 1.31 0.00 1 0.00 0.00 set_socket_options
Call graph
granularity: each sample hit covers 4 byte(s) for 0.76% of 1.31 seconds
index % time self children called name
[1] 93.1 0.01 1.21 main [1]
0.04 0.92 100000/100000 memcached_fetch_execute [2]
0.00 0.25 100000/100000 memcached_mget [5]
0.00 0.00 6/6 memcached_set [19]
0.00 0.00 1/1 memcached_free [21]
0.00 0.00 1/1 memcached_create [24]
0.00 0.00 1/1 memcached_server_add [35]
-----------------------------------------------
0.04 0.92 100000/100000 main [1]
[2] 73.3 0.04 0.92 100000 memcached_fetch_execute [2]
0.23 0.64 700000/700006 memcached_response [3]
0.05 0.00 600000/600000 fetch [10]
-----------------------------------------------
0.00 0.00 6/700006 memcached_send [18]
0.23 0.64 700000/700006 memcached_fetch_execute [2]
[3] 66.4 0.23 0.64 700006 memcached_response [3]
0.45 0.00 8300048/8900049 memcached_io_read [4]
0.11 0.08 600000/600000 value_fetch [7]
-----------------------------------------------
0.00 0.00 1/8900049 memcached_quit_server [23]
0.03 0.00 600000/8900049 value_fetch [7]
0.45 0.00 8300048/8900049 memcached_response [3]
[4] 37.0 0.48 0.00 8900049 memcached_io_read [4]
-----------------------------------------------
0.00 0.25 100000/100000 main [1]
[5] 19.1 0.00 0.25 100000 memcached_mget [5]
0.06 0.19 100000/100000 memcached_mget_by_key [6]
-----------------------------------------------
0.06 0.19 100000/100000 memcached_mget [5]
[6] 19.1 0.06 0.19 100000 memcached_mget_by_key [6]
0.14 0.03 1400000/1400019 memcached_io_write [8]
0.02 0.00 600000/600006 memcached_generate_hash [13]
0.00 0.00 100000/100007 memcached_connect [29]
-----------------------------------------------
0.11 0.08 600000/600000 memcached_response [3]
[7] 14.3 0.11 0.08 600000 value_fetch [7]
0.03 0.00 600000/8900049 memcached_io_read [4]
0.03 0.00 600000/600001 memcached_string_check [11]
0.01 0.00 600000/600000 memcached_result_reset [15]
-----------------------------------------------
0.00 0.00 7/1400019 memcached_do [20]
0.00 0.00 12/1400019 memcached_send [18]
0.14 0.03 1400000/1400019 memcached_mget_by_key [6]
[8] 13.0 0.14 0.03 1400019 memcached_io_write [8]
0.03 0.00 100007/100007 io_flush [12]
-----------------------------------------------
[9] 4.6 0.06 0.00 __i686.get_pc_thunk.bx [9]
-----------------------------------------------
0.05 0.00 600000/600000 memcached_fetch_execute [2]
[10] 3.8 0.05 0.00 600000 fetch [10]
0.00 0.00 600000/600000 memcached_result_value [27]
-----------------------------------------------
0.00 0.00 1/600001 memcached_string_create [26]
0.03 0.00 600000/600001 value_fetch [7]
[11] 2.3 0.03 0.00 600001 memcached_string_check [11]
-----------------------------------------------
0.03 0.00 100007/100007 memcached_io_write [8]
[12] 2.3 0.03 0.00 100007 io_flush [12]
-----------------------------------------------
0.00 0.00 6/600006 memcached_send [18]
0.02 0.00 600000/600006 memcached_mget_by_key [6]
[13] 1.5 0.02 0.00 600006 memcached_generate_hash [13]
-----------------------------------------------
[14] 1.5 0.02 0.00 hash_crc32 [14]
-----------------------------------------------
0.01 0.00 600000/600000 value_fetch [7]
[15] 1.1 0.01 0.00 600000 memcached_result_reset [15]
0.00 0.00 600000/600000 memcached_string_reset [28]
-----------------------------------------------
[16] 0.4 0.01 0.00 io_wait [16]
-----------------------------------------------
[17] 0.4 0.01 0.00 memcached_result_set_value [17]
-----------------------------------------------
0.00 0.00 6/6 memcached_set [19]
[18] 0.0 0.00 0.00 6 memcached_send [18]
0.00 0.00 6/700006 memcached_response [3]
0.00 0.00 12/1400019 memcached_io_write [8]
0.00 0.00 6/7 memcached_do [20]
0.00 0.00 6/600006 memcached_generate_hash [13]
0.00 0.00 6/6 storage_op_string [31]
-----------------------------------------------
0.00 0.00 6/6 main [1]
[19] 0.0 0.00 0.00 6 memcached_set [19]
0.00 0.00 6/6 memcached_send [18]
-----------------------------------------------
0.00 0.00 1/7 memcached_quit_server [23]
0.00 0.00 6/7 memcached_send [18]
[20] 0.0 0.00 0.00 7 memcached_do [20]
0.00 0.00 7/1400019 memcached_io_write [8]
0.00 0.00 7/100007 memcached_connect [29]
-----------------------------------------------
0.00 0.00 1/1 main [1]
[21] 0.0 0.00 0.00 1 memcached_free [21]
0.00 0.00 1/1 memcached_quit [22]
0.00 0.00 1/1 server_list_free [39]
0.00 0.00 1/1 memcached_result_free [34]
-----------------------------------------------
0.00 0.00 1/1 memcached_free [21]
[22] 0.0 0.00 0.00 1 memcached_quit [22]
0.00 0.00 1/1 memcached_quit_server [23]
-----------------------------------------------
0.00 0.00 1/1 memcached_quit [22]
[23] 0.0 0.00 0.00 1 memcached_quit_server [23]
0.00 0.00 1/7 memcached_do [20]
0.00 0.00 1/8900049 memcached_io_read [4]
0.00 0.00 1/1 memcached_io_close [33]
-----------------------------------------------
0.00 0.00 1/1 main [1]
[24] 0.0 0.00 0.00 1 memcached_create [24]
0.00 0.00 1/1 memcached_result_create [25]
-----------------------------------------------
0.00 0.00 1/1 memcached_create [24]
[25] 0.0 0.00 0.00 1 memcached_result_create [25]
0.00 0.00 1/1 memcached_string_create [26]
-----------------------------------------------
0.00 0.00 1/1 memcached_result_create [25]
[26] 0.0 0.00 0.00 1 memcached_string_create [26]
0.00 0.00 1/600001 memcached_string_check [11]
-----------------------------------------------
0.00 0.00 600000/600000 fetch [10]
[27] 0.0 0.00 0.00 600000 memcached_result_value [27]
-----------------------------------------------
0.00 0.00 600000/600000 memcached_result_reset [15]
[28] 0.0 0.00 0.00 600000 memcached_string_reset [28]
-----------------------------------------------
0.00 0.00 7/100007 memcached_do [20]
0.00 0.00 100000/100007 memcached_mget_by_key [6]
[29] 0.0 0.00 0.00 100007 memcached_connect [29]
0.00 0.00 100007/100007 network_connect [30]
-----------------------------------------------
0.00 0.00 100007/100007 memcached_connect [29]
[30] 0.0 0.00 0.00 100007 network_connect [30]
0.00 0.00 1/1 set_hostinfo [40]
0.00 0.00 1/1 set_socket_options [41]
-----------------------------------------------
0.00 0.00 6/6 memcached_send [18]
[31] 0.0 0.00 0.00 6 storage_op_string [31]
-----------------------------------------------
0.00 0.00 1/1 server_add [38]
[32] 0.0 0.00 0.00 1 host_reset [32]
-----------------------------------------------
0.00 0.00 1/1 memcached_quit_server [23]
[33] 0.0 0.00 0.00 1 memcached_io_close [33]
-----------------------------------------------
0.00 0.00 1/1 memcached_free [21]
[34] 0.0 0.00 0.00 1 memcached_result_free [34]
0.00 0.00 1/1 memcached_string_free [36]
-----------------------------------------------
0.00 0.00 1/1 main [1]
[35] 0.0 0.00 0.00 1 memcached_server_add [35]
0.00 0.00 1/1 server_add [38]
-----------------------------------------------
0.00 0.00 1/1 memcached_result_free [34]
[36] 0.0 0.00 0.00 1 memcached_string_free [36]
-----------------------------------------------
0.00 0.00 1/1 server_add [38]
[37] 0.0 0.00 0.00 1 rebalance_wheel [37]
-----------------------------------------------
0.00 0.00 1/1 memcached_server_add [35]
[38] 0.0 0.00 0.00 1 server_add [38]
0.00 0.00 1/1 host_reset [32]
0.00 0.00 1/1 rebalance_wheel [37]
-----------------------------------------------
0.00 0.00 1/1 memcached_free [21]
[39] 0.0 0.00 0.00 1 server_list_free [39]
-----------------------------------------------
0.00 0.00 1/1 network_connect [30]
[40] 0.0 0.00 0.00 1 set_hostinfo [40]
-----------------------------------------------
0.00 0.00 1/1 network_connect [30]
[41] 0.0 0.00 0.00 1 set_socket_options [41]
-----------------------------------------------
Index by function name
[9] __i686.get_pc_thunk.bx [4] memcached_io_read [19] memcached_set
[10] fetch [8] memcached_io_write [11] memcached_string_check
[14] hash_crc32 [5] memcached_mget [26] memcached_string_create
[32] host_reset [6] memcached_mget_by_key [36] memcached_string_free
[12] io_flush [22] memcached_quit [28] memcached_string_reset
[16] io_wait [23] memcached_quit_server [30] network_connect
[1] main [3] memcached_response [37] rebalance_wheel
[29] memcached_connect [25] memcached_result_create [38] server_add
[24] memcached_create [34] memcached_result_free [39] server_list_free
[20] memcached_do [15] memcached_result_reset [40] set_hostinfo
[2] memcached_fetch_execute [17] memcached_result_set_value [41] set_socket_options
[21] memcached_free [27] memcached_result_value [31] storage_op_string
[13] memcached_generate_hash [18] memcached_send [7] value_fetch
[33] memcached_io_close [35] memcached_server_add
If you're happy with gprof, that's good. I found it a lot of work. Even though there's a substantial performance hit, Callgrind has been invaluable for me.
Re:Cachegrind
lestrrat on 2008-05-02T06:10:08
Well, I was happy until I tried callgrind upon your comment. I guess I'm not going back to gprof unless I need to. Thanks!