/**************************************************************************/
/* Document: A very short and quick orientation on debugging utilities,   */
/*           like dbx and gdb.                                            */
/* File    : debuggers.txt                                                */ 
/* version : 0.1                                                          */
/* Purpose : Some orientation on unix debugging for a DBA.                */
/* Date    : 14/08/2009                                                   */
/* By      : Albert van der Sel                                           */
/**************************************************************************/


This very simplistic note is for people who would like a quick orientation
into a few debugging utilities like dbx and dbg.
You will not learn much on the debuggers themselves, except voor some
basic commands. The main purpose is to scetch the environment, what to expect,
and in which circumstances they are used.


1. Introduction:
================


Debugging has obviously "something" to do with troubleshooting.

When talking about general troubleshooting an application on the unix platform, four things immediately
comes to mind:


==> inspect the relevant logfiles.

    This is ofcourse a bit trivial and obvious.


==> Take a look at the environment and system.

    - Most programs need a certain set of Environment variables set, and may also need certain kernel
      parameter settings to be in effect. Also, maybe the program depends on certain other
      environments like perl, java etc.. (and probably there are very strickt requirements
      about versions).

    - And, there might be other requirements like "so and so" much free space in /tmp (or other filesystems), 
      or specific memory requirements etc..

    - And, there might be very strick requirements on some additional specific filesets 
      (or packages, or whatever they are called on your system) that needs to be loaded on your system.

    - "left over stuff" that remains in memory. If an application crashes, it might be possible that
      ipc related stuff like semaphores, queues, still remain in memory, thereby possibly prohibiting
      an applicaton restart, and it may also hinder troubleshooting.


==> tracing:

    This generally "shows" what a process "is doing" on runtime, that is, what systemcalls
    does it make, what ipc is established etc..

    This is somewhat more geared towards a System Administrator, although other folks
    like developers use it as well.

    --> Most tracing is done using a command, optionally with some number of parameters, 
        and the executable listed on that same commandline. 
        Typically, you would see the syscalls on your terminal, or logfile, which you
        can inspect afterwards. 


==> debugging:

    For example, you have created a program, and it does not behave as expected, or it
    might even crash. You want to find out where it goes wrong, and you would like to
    step through the program, or manipulate and inspect variables, or inspect core dumps.

    This is somewhat more geared towards a Developer or software engineer, although other folks like
    a System Adminstrator, may use it as well.
  
    --> Most debugging is done, starting an "environment" and loading the program
        and/or core dump, and interactively using specialized commands.
        Typically, you can run a program from the debugger, and "step" through it, and
        inspect variables etc..


When looking at tracing and debugging: It's ofcourse not a "black and white" situation. 
When you are "tracing", it looks a lot like as if you are "debugging" something.
In a sense, thats true. You probably only want to trace something, if there is an error
condition of some sort.
But, when you are really looking for troublesome code in a program, that is what most people
would call "debugging". 


There are a number of "standard" debugging utilities (for the unix platform) around. 
Do not confused them with third party tools (for example, delivered with some development environment).

The standard tools are: dbx, gdb, ddd, adb, sdb. But a few others exist as well.

If you want a manual or tutorial of some debugger right now, you can easily
"google" on some of those tools listed above.

It depends on your unix platform, which ones are available on your system.
However, dbx and gdb are quite common. Possibly gdb is the most common one, because
it falls under GNU.
 

2. Limitations and scope:
=========================


2.1 Tracing from the Application might be much better:
------------------------------------------------------


The above mentioned tools are powerfull. But sometimes, they are not really the best choice for investigating 
application problems, unless the OS created a core dump (in which case you can try to inspect that dump).

What I mean is this: If you investigate some "established" standard application,
it's better to use the tracing facility offered within that application.
For example, you suspect a memory leak in your app. How to proceed?

Many applications can be "put" in some form of "verbose mode" which may give you much more 
logging (and clues) to what is going on, compared to what some sort of unix debugging tool would ever shown you.

For example, if you want to investigate a Websphere application, you can "turn on debugging" 
from within Websphere. If you need extra information on the Java Garbage Collector (GC), then switch on the verbose
logging option of the GC process.

So be advised: Always check, if your application can be put in a some "tracing" or "verbose" mode,
which might prove to be much more effective.


2.2 Stripped executables:
-------------------------

The following is quite important.
Many production  executables are stripped of "symbol" information. This really makes using a debugger 
much less usefull. 

If "file program" shows the word "stripped" or "nm program" shows no output, then it is likely 
that the executable is stripped of symbolic information.

You should know that a program can be compiled in a special way, so that the object contains extra information
that a debugger can read, when using a core dump, or running the program from the debugger.
This extra information is then on purpose added to the object, for debugging purposes.

Obviously, this represents some overhead, and thats why many objects are "stripped" from that information.
Especially program objects which are considered to be "ready", might get stripped.
 
It should be noted however that in most situations, you can tell the debugger to use another file
with that special information.

Suppose you are building a new program (for example, using C, C++). In general, if you think you want 
to debug your newly build program at some phase, you might use the "-g" compiler option 


  Note: the file and nm commands

  "file" and "nm' are just unix commands. 
  With  the "file" command, you can determine the type of file, and the "nm" command 
  list the symbols from object files, if present.

  Examples:

  jimmy@starboss:/apps/mns/bin $ file myprg

  /apps/mns/bin/mns: executable or object module not stripped

  jimmy@starboss:/home/jimmy$ file test.txt

  /home/jimmy/test.txt: ASCII text

  jimmy@starboss:/apps/mns/bin$ nm myprg		# where we assume myprg is an executable

  This returns a list of symbols (if present)


3. A few remarks on the debuggers and context:
==============================================


3.1 General information:
------------------------


We might say that are actually two "types" of debuggers, namely "instruction level" debuggers,
and "source level" debuggers.

The first one, works in the area of microcode & machine instructions, and can usually be regarded
as off limits for regular sysadmins and developers.

The second one works at a higher level, namely in terms of common compilers, interfaces and libraries.
This is the category where most people talk about. Popular examples are dbx and gdb.


A Program might generate a fatal error when it attempt an operation that the Operating System
does not allow to happen, like divide by zero, or access memory that it does not own.
Or, more often, the developer declared a pointer in the wrong way in his or her code etc..

When a program errors in this manner, UNIX takes a snapshot of the program in memory, 
and dumps the results into a core file.

In many situations, the core file is created in the working directory of the process 
being core dumped. But also On many platforms, kernel parameters or environment variables, 
make it possible that the core be written to a dedicated location, like "/var/core".


3.2 Getting information on which program created the core:
--------------------------------------------------------

Suppose you create a listing of the directory where core dumps are stored. Suppose you see 
some core file laying around, but you have no clue as to what caused it. 
Then you might try the "file" command on that corefile, like so:
(no garantee that it works)

harry@starboss:/var/core $ ls -al
total 9864
drwxrwxrwt    3 bin      bin            4096 Aug 10 06:01 .
drwxr-xr-x   31 bin      bin            4096 Aug 10 06:50 ..
-rw-rw-r--    1 harryg   ontw        5040724 Aug 10 06:01 core.868574.10040145
drwxr-xr-x    2 root     system          256 Apr 04 2007  lost+found

harry@starboss:/var/core $ file core.868574.10040145

core.868574.10040145: AIX core file fulldump 32-bit, myprogram

At least we know now, that the program called "myprogram" was responsible for that core.


3.3 How core dumps are named:
-----------------------------

Dumps of user-processes traditionally get created as "*core*".
Remember that the core dumps gets created in the working directory of the application,
or in a specific directory (or filesystem) dictated by an environment variable,
or kernel parameter, like "/var/core" or "/var/coredumps" etc..

As an example of such kernel parameter setting, take a look at the following example statement:

sysctl kern.corefile="/var/coredumps/%U/%N.core"

So, different platforms may name dumps differently, like

- core.PID	      # pid is the Process ID
- core.PID.ddhhmmss   # pid as the Process ID, followed by day of month, hour, minutes, seconds
- UID.core            # uid means User ID

Anyway, the good news is that the string "core" is practically always part of the name.


3.4 Other stuff to be aware of:
-------------------------------

Generally, the following is true on most platforms:

- ulimit might play a role.

If a userprocess gets into such an errorcondition, that a core dump results,
the "ulimit" might play a role. The ulimit defines (among others) the maximum filesize
that a process may create. This could hinder to produce a full core dump.
Normally, this will not be a problem, because user-space dumps are less than maximum filesizes.

If the process which is being core-dumped is multi-threaded and the current core size ulimit is less than 
what is required to dump the data section, then only the faulting thread stack area 
is dumped from the data section.


- special environment variables:

If you would go "into depth" of the world of debuggers on some platform, 
then you would find that some specialized environment variables may play
an important role.

For example, on some platforms, a variable like for example CORE_NOSHM determines 
if you get shared memory information in the dump as well. 

So, normally the "default values" of those variables are ok, but be aware that for some
very special circumstances, you might investigate them as well.


3.5 Inspecting core dumps:
--------------------------

Some often used initiations (I mean startup commands) of gdb, or dbx, are the following: 

$ gdb program
$ gdb program core
$ gdb program pid  

$ dbx progam
$ dbx program core
$ dbx program pid  

$ gdb -c core
$ gdb -p pid


As you see, you can just call dbx, with a reference to the program and/or core dump.

Example:

harry@starboss:/var/core $ dbx /apps/test/bin/myprg

In the command above, you might have also put a reference to the core dump, but if one
is present in the current directory, dbx will use it automatically.

dbx ~

From the dbx prompt, you can enter the "where" command, which would show you
the code from where the program failure began.


Now, take a look at the following url's, which will show you some typical uses.
It's really instructive.

-- dbx:

http://www.glue.umd.edu/afs/glue.umd.edu/system/info/olh/Programming/C_Programming_Tools_on_Glue/A_dbx_Tutorial/dbx_n_core

-- multiple debuggers"

http://repettas.wordpress.com/2007/10/13/getting-a-stack-trace-from-a-core-file/


That's it !
Well, I told you this was a short note. Still I hope it was of any use !