[CALUG] Questions on Linux kernel coding

Fri Mar 11 18:25:28 EST 2011

On 03/11/2011 10:35 AM, Jim Sansing wrote:
> I am working on a Linux kernel (version 2.6.24+) module that uses
> Netfilter to intercept packets and modify them.  This is working, but
> inconsistently.  The main problem is that when packets (ie. the skbuff
> structure) is returned to the stack, sometimes they do not get
> processed.  I am having a hard time figuring out where the skb goes when
> I call the 'okfn' (callback function) that Netfilter provides.

If you return NF_ACCEPT, the okfn function is called automatically.

> My debugging is currently the use of printk statements, using dmesg to
> view their output.  These are lost if the system reboots unless the
> crash dump facility works, which allows me to capture the kernel log.
> Unfortunately, I don't always get a crash dump.

If you have not done so, run:

echo "7 7 7 7" > /proc/sys/kernel/printk
echo 0 > /proc/sys/kernel/printk_ratelimit

and use a serial console.  Then the serial console will still have all 
the messages even if the box resets on you.

> Also, I have not been able to use the gdb debugger on the crash dump
> because it requires a kernel build with the debugging symbols, which is
> over 100Meg and must be built specifically for each kernel.  I have not
> been able to find instructions for how to do this correctly.
>
> My specific questions are:
>
> Q1: How can I consistently get crash dumps?

I haven't done much with crash dump, but I have used kdb.  It usually 
works ok, but as it doesn't understand structures, decoding things can 
be a pain.  But you can use global symbols (even in modules) and of 
course addresses.  It can dump out a backtrace for each CPU and lots of 
other things.

I can't recall not getting at least some output when the system crashes 
while having kdb enabled.  Although a few times it has had trouble 
convincing all the other CPUs to also enter kdb.  That may have been a 
really ancient system though.

> Q2: What is the correct procedure for building the kernel symbols file
> to use a debugger on crash dumps?
>
> Q3: Is there a way to identify a function by its address in the live
> code?  (I have seen stack traces in the kernel log, is there a way I can
> get access to that?)

Oopses seems to be able to display that info, so it ought to be 
possible.  You could dig done in that code and see what they are doing. 
  Or just used a debugger.  Or manually lookup in System.map.

> Q4: If you have experience with Netfilter, may I pick your brain?

I looked at it and used it a little bit.  You can always ask

> Q5: Have you had any experience using the kernel debugger for doing
> interactive kernel debugging?  The procedures for setting this up look
> very complicated and I would like to know how useful this is before
> going down that path.

I haven't tried remote gdb over serial port yet.

kdb is fairly straight forward to setup use.  It is just not the 
friendliest thing to use.  Support needs to be enabled in the kernel via 
one of the config options if that hasn't already been done in your 
kernel.  Then either boot with kdb=on or turn it on via a sysctl.  I 
think there is also a build-time option to always enable it at startup.

> Q6: What general comments on debugging kernel module code can you offer
> based on personal experience?

Keep most code out of the kernel :-)

You might what to try:
echo 1 > /proc/sys/kernel/panic_on_oops
that way when something bad happens, you find out right away without 
making more of a mess.

If you have an /etc/sysctl.conf file on your system, put all the sysctls 
mentioned above in there and they'll automatically be set on reboot.

You get much faster reboots from from a virtual machine (e.g. VMware).

Be careful of too many printk messages affecting timing.

Test with both SMP and single CPU.