/**************************************************************************/ /* Document: A very short and quick orientation on debugging utilities, */ /* like dbx and gdb. */ /* File : debuggers.txt */ /* version : 0.1 */ /* Purpose : Some orientation on unix debugging for a DBA. */ /* Date : 14/08/2009 */ /* By : Albert van der Sel */ /**************************************************************************/ This very simplistic note is for people who would like a quick orientation into a few debugging utilities like dbx and dbg. You will not learn much on the debuggers themselves, except voor some basic commands. The main purpose is to scetch the environment, what to expect, and in which circumstances they are used. 1. Introduction: ================ Debugging has obviously "something" to do with troubleshooting. When talking about general troubleshooting an application on the unix platform, four things immediately comes to mind: ==> inspect the relevant logfiles. This is ofcourse a bit trivial and obvious. ==> Take a look at the environment and system. - Most programs need a certain set of Environment variables set, and may also need certain kernel parameter settings to be in effect. Also, maybe the program depends on certain other environments like perl, java etc.. (and probably there are very strickt requirements about versions). - And, there might be other requirements like "so and so" much free space in /tmp (or other filesystems), or specific memory requirements etc.. - And, there might be very strick requirements on some additional specific filesets (or packages, or whatever they are called on your system) that needs to be loaded on your system. - "left over stuff" that remains in memory. If an application crashes, it might be possible that ipc related stuff like semaphores, queues, still remain in memory, thereby possibly prohibiting an applicaton restart, and it may also hinder troubleshooting. ==> tracing: This generally "shows" what a process "is doing" on runtime, that is, what systemcalls does it make, what ipc is established etc.. This is somewhat more geared towards a System Administrator, although other folks like developers use it as well. --> Most tracing is done using a command, optionally with some number of parameters, and the executable listed on that same commandline. Typically, you would see the syscalls on your terminal, or logfile, which you can inspect afterwards. ==> debugging: For example, you have created a program, and it does not behave as expected, or it might even crash. You want to find out where it goes wrong, and you would like to step through the program, or manipulate and inspect variables, or inspect core dumps. This is somewhat more geared towards a Developer or software engineer, although other folks like a System Adminstrator, may use it as well. --> Most debugging is done, starting an "environment" and loading the program and/or core dump, and interactively using specialized commands. Typically, you can run a program from the debugger, and "step" through it, and inspect variables etc.. When looking at tracing and debugging: It's ofcourse not a "black and white" situation. When you are "tracing", it looks a lot like as if you are "debugging" something. In a sense, thats true. You probably only want to trace something, if there is an error condition of some sort. But, when you are really looking for troublesome code in a program, that is what most people would call "debugging". There are a number of "standard" debugging utilities (for the unix platform) around. Do not confused them with third party tools (for example, delivered with some development environment). The standard tools are: dbx, gdb, ddd, adb, sdb. But a few others exist as well. If you want a manual or tutorial of some debugger right now, you can easily "google" on some of those tools listed above. It depends on your unix platform, which ones are available on your system. However, dbx and gdb are quite common. Possibly gdb is the most common one, because it falls under GNU. 2. Limitations and scope: ========================= 2.1 Tracing from the Application might be much better: ------------------------------------------------------ The above mentioned tools are powerfull. But sometimes, they are not really the best choice for investigating application problems, unless the OS created a core dump (in which case you can try to inspect that dump). What I mean is this: If you investigate some "established" standard application, it's better to use the tracing facility offered within that application. For example, you suspect a memory leak in your app. How to proceed? Many applications can be "put" in some form of "verbose mode" which may give you much more logging (and clues) to what is going on, compared to what some sort of unix debugging tool would ever shown you. For example, if you want to investigate a Websphere application, you can "turn on debugging" from within Websphere. If you need extra information on the Java Garbage Collector (GC), then switch on the verbose logging option of the GC process. So be advised: Always check, if your application can be put in a some "tracing" or "verbose" mode, which might prove to be much more effective. 2.2 Stripped executables: ------------------------- The following is quite important. Many production executables are stripped of "symbol" information. This really makes using a debugger much less usefull. If "file program" shows the word "stripped" or "nm program" shows no output, then it is likely that the executable is stripped of symbolic information. You should know that a program can be compiled in a special way, so that the object contains extra information that a debugger can read, when using a core dump, or running the program from the debugger. This extra information is then on purpose added to the object, for debugging purposes. Obviously, this represents some overhead, and thats why many objects are "stripped" from that information. Especially program objects which are considered to be "ready", might get stripped. It should be noted however that in most situations, you can tell the debugger to use another file with that special information. Suppose you are building a new program (for example, using C, C++). In general, if you think you want to debug your newly build program at some phase, you might use the "-g" compiler option Note: the file and nm commands "file" and "nm' are just unix commands. With the "file" command, you can determine the type of file, and the "nm" command list the symbols from object files, if present. Examples: jimmy@starboss:/apps/mns/bin $ file myprg /apps/mns/bin/mns: executable or object module not stripped jimmy@starboss:/home/jimmy$ file test.txt /home/jimmy/test.txt: ASCII text jimmy@starboss:/apps/mns/bin$ nm myprg # where we assume myprg is an executable This returns a list of symbols (if present) 3. A few remarks on the debuggers and context: ============================================== 3.1 General information: ------------------------ We might say that are actually two "types" of debuggers, namely "instruction level" debuggers, and "source level" debuggers. The first one, works in the area of microcode & machine instructions, and can usually be regarded as off limits for regular sysadmins and developers. The second one works at a higher level, namely in terms of common compilers, interfaces and libraries. This is the category where most people talk about. Popular examples are dbx and gdb. A Program might generate a fatal error when it attempt an operation that the Operating System does not allow to happen, like divide by zero, or access memory that it does not own. Or, more often, the developer declared a pointer in the wrong way in his or her code etc.. When a program errors in this manner, UNIX takes a snapshot of the program in memory, and dumps the results into a core file. In many situations, the core file is created in the working directory of the process being core dumped. But also On many platforms, kernel parameters or environment variables, make it possible that the core be written to a dedicated location, like "/var/core". 3.2 Getting information on which program created the core: -------------------------------------------------------- Suppose you create a listing of the directory where core dumps are stored. Suppose you see some core file laying around, but you have no clue as to what caused it. Then you might try the "file" command on that corefile, like so: (no garantee that it works) harry@starboss:/var/core $ ls -al total 9864 drwxrwxrwt 3 bin bin 4096 Aug 10 06:01 . drwxr-xr-x 31 bin bin 4096 Aug 10 06:50 .. -rw-rw-r-- 1 harryg ontw 5040724 Aug 10 06:01 core.868574.10040145 drwxr-xr-x 2 root system 256 Apr 04 2007 lost+found harry@starboss:/var/core $ file core.868574.10040145 core.868574.10040145: AIX core file fulldump 32-bit, myprogram At least we know now, that the program called "myprogram" was responsible for that core. 3.3 How core dumps are named: ----------------------------- Dumps of user-processes traditionally get created as "*core*". Remember that the core dumps gets created in the working directory of the application, or in a specific directory (or filesystem) dictated by an environment variable, or kernel parameter, like "/var/core" or "/var/coredumps" etc.. As an example of such kernel parameter setting, take a look at the following example statement: sysctl kern.corefile="/var/coredumps/%U/%N.core" So, different platforms may name dumps differently, like - core.PID # pid is the Process ID - core.PID.ddhhmmss # pid as the Process ID, followed by day of month, hour, minutes, seconds - UID.core # uid means User ID Anyway, the good news is that the string "core" is practically always part of the name. 3.4 Other stuff to be aware of: ------------------------------- Generally, the following is true on most platforms: - ulimit might play a role. If a userprocess gets into such an errorcondition, that a core dump results, the "ulimit" might play a role. The ulimit defines (among others) the maximum filesize that a process may create. This could hinder to produce a full core dump. Normally, this will not be a problem, because user-space dumps are less than maximum filesizes. If the process which is being core-dumped is multi-threaded and the current core size ulimit is less than what is required to dump the data section, then only the faulting thread stack area is dumped from the data section. - special environment variables: If you would go "into depth" of the world of debuggers on some platform, then you would find that some specialized environment variables may play an important role. For example, on some platforms, a variable like for example CORE_NOSHM determines if you get shared memory information in the dump as well. So, normally the "default values" of those variables are ok, but be aware that for some very special circumstances, you might investigate them as well. 3.5 Inspecting core dumps: -------------------------- Some often used initiations (I mean startup commands) of gdb, or dbx, are the following: $ gdb program $ gdb program core $ gdb program pid $ dbx progam $ dbx program core $ dbx program pid $ gdb -c core $ gdb -p pid As you see, you can just call dbx, with a reference to the program and/or core dump. Example: harry@starboss:/var/core $ dbx /apps/test/bin/myprg In the command above, you might have also put a reference to the core dump, but if one is present in the current directory, dbx will use it automatically. dbx ~ From the dbx prompt, you can enter the "where" command, which would show you the code from where the program failure began. Now, take a look at the following url's, which will show you some typical uses. It's really instructive. -- dbx: http://www.glue.umd.edu/afs/glue.umd.edu/system/info/olh/Programming/C_Programming_Tools_on_Glue/A_dbx_Tutorial/dbx_n_core -- multiple debuggers" http://repettas.wordpress.com/2007/10/13/getting-a-stack-trace-from-a-core-file/ That's it ! Well, I told you this was a short note. Still I hope it was of any use !