Linux Kernel "perf" tools
For a long while I have used the oprofile tick-counting profiler to understand where the bottlenecks really are in production code. It works pretty well although sometimes it is a little cumbersome to use.
Recently I have made the switch to perf tools which are now integrated with Linux Kernel development. To get going you can simply type perf top (you may need to do that as sudo) and you immediately get a listing of the top 50 or so function that are consuming processor cycles. There are many more commands and options, available via perf help commands. I hope to write about the more in the near-ish future....