/******************************************************************************/ /* Document : Some simple and Generic Linux commands, to get info quickly. */ /* Version : 39 */ /* File : linux.txt */ /* Purpose : Simple listing of common commands for a quick start in Linux. */ /* Date : 10-10-2012 */ /* Compiled by: Albert van der Sel */ /* Note : This file is especially meant to find info on Linux systems. */ /* */ /* */ /* */ /******************************************************************************/ Hopefully, it can be of use in some circumstances. Contents: 1. Disk & Filesystem Info 2. CPU Info 3. Memory Info 4. Version - Release Info 5. Kernel modules Info 6. Process info 7. How local disks / partitions are named 8. Netcard and Network Info 9. A few notes on how to change your IP parameters 10. Some Monitoring commands 11. The shortest possible "vi" or "vim" survivalkit 12. Some remarks about booting Linux (bare metal) 13. Some remarks about Package Management 14. Some notes on Linux log files 15. A few words on cron, the default scheduler 16. A few words on User Accounts 17. The standard Linux filesystems 18. Some remarks about Linux as a Virtual Machine (VM) 19. A few words on Linux VM's under Xen and XenServer 20. A few words on Linux VM's under VMWare ESX(i) 21. A few words on Volumes and filesystems 22. Some remarks on how to autostart daemons on boottime 23. Some remarks on how to restart daemons on a running system 24. Some SAN and SCSI talk 25. Some notes on Disaster Recovery 26. Recovering the root password 27. A few notes on installing the driver or modules of SAN HBA cards 28. Some special filesystems. 29. A few typical examples of partitioning and creating filesystems 30. A few notes on implementing multipath IO to a SAN ============================================================================ 1. Disk & Filesystem Info: ============================================================================ # Only listings of filesystems/partitions/disks here. -- disk / partition / device info: # blkid # shows information about available block devices # blkid /dev/sda # shows information about sda only # cat /proc/partitions # fdisk -l # fdisk -l /dev/sda # lshw -class disk # lshw -C disk # ls /dev/disk/by-id # sfdisk -l # sfdisk -l /dev/sda # lsscsi # lsblk -f # if available on your system, it shows a # tree of partitions and filesystem types This might work too: # smartctl -i /dev/sda # maybe you need to install the package first # hwinfo --disk # maybe you need to install the package first -- filesystems free/used, and where it's mounted on: # df # show filesystem info # df -k | grep tmp # only show tmp (grep it on tmp) # df -h # human readable output # df -m # in Megabytes # df -h /tmp # only show tmp # df -T # shows filesystem type too (like ext2, ext3, ntfs etc..) # df /dev/hda3 # cat /etc/fstab # list the fstab file to view the standard mounts -- scsi / lun related: # ls -al /sys/class/scsi_host # shows HBA's # ls -al /sys/class/fc_host/ # shows FC HBA's # cat /proc/scsi/scsi # shows devices and LUNs # ls -al /sys/class/scsi_disk # Might show you luns in the form # of paths [host#:bus#:target#:lun#] # lsscsi -c -- Mounts: # mount # cat /etc/fstab -- usb: # lsusb # lsusb -v # verbose output -- swap info: # swapon -s # cat /proc/swaps # cat /proc/meminfo -- list raw partitions: # raw -qa # ls -lR /dev/raw* ============================================================================ 2. CPU Info: ============================================================================ # cat /proc/cpuinfo # dmesg | grep -i cpu This might work too: # lscpu # lshw -class cpu # lshw -class cpu -short # limited output # dmidecode --type 4 # reads DMI table cpu Usage: # top # mpstat # mpstat -P ALL ============================================================================ 3. Memory Info: ============================================================================ # cat /proc/meminfo # dmesg | grep -i memory # free # free -m # in MB This might work too (showing some advanced properties): # dmidecode --type 17 -> Note: What exactly is that "/proc" stuff? And "sysfs"? The pseudo or virtual "/proc" filesystem on a running system, can be seen as a sort of "window" to view kernel data structures. Here, subdirectories exists for all running processes, as well as for system resources, that is, the values of swap, memory, disks, cpu etc.. In most cases, consider it to be as "read only". However, in some cases you can use it to send information to the kernel as well. Also, whenever you hear of a "virtual filesystem", it means that it's memory based, build when the system boots, and maintained during runtime. In a sense, a newer, more structured version of proc is available (since kernel 2.6), which is called "sysfs". This too is a virtual filesystem, and it sort of exports the "device tree", and system information, through the use of such a virtual filesystem. You can see it by browsing through "/sys". You might say that "/proc" is more focussed on processes, while "/sys" is a new way to obtain device- and system information. ============================================================================ 4. Version - Release Info ============================================================================ # cat /proc/version # uname -r, uname -a # cat /etc/redhat-release # Specific for Redhat # cat /etc/SuSE-release # Specific for SuSE This might work too: # lsb_release -a # cat /etc/*issue # cat /etc/*release -- kernel locations: The running one is most often loaded from /boot/vmlinuz* You might find here several links here. However, in general, a Linux kernel image might be located in either / or /boot. Use "uname -a" to show the kernel version. -- 32 bit or 64 bit system? # uname -m ============================================================================ 5. Kernel modules Info: ============================================================================ # lsmod #Lists all the currently loaded kernel modules # rmmod #Unloads modules, Ex: rmmod ftape # depmod #Creates a dependency file, "modules.dep", later used by #modprobe to automatically load the relevant modules. # modprobe #Used to load a module or set of modules. Loads all #modules specified in the file "modules.dep". # modinfo #Shows module information modprobe adds or removes a module from the Linux kernel. You might say that "modprobe" supersedes commands like the more basic "insmod" and "rmmod" utilities. Linux maintains /lib/modules/$(uname-r) directory for modules and its configuration files (except /etc/modprobe.conf and /etc/modprobe.d). - modprobe.conf: modprobe checks /etc/modprobe.conf. - modules.dep : List of module dependencies, and will be checked too. # modprobe -l # display all available modules # modprobe -l abc* # list all abc* modules # lsmod # displays all loaded modules # modprobe thismodule # loads the module # modprobe -r thismodule # removes the module from the kernel ============================================================================ 6. Process info and control: ============================================================================ -- Show processes: # w # w command: list of who is logged on # w -h # without header # who # who is logged on # users # who is logged on # ps -A # show all processes # ps -ef # show all processes. This is the common unix # usage of ps, to show all, including pid, path. # ps aux | less # show all processes, one screen at the time # (due to the pipe to less) # ps -A | grep -i WhatEver # show processes, but filtered on WhatEver # pgrep WhatEver # show processes, but filtered on WhatEver # ps -u john # show processes of john # top # wellknow utility showing processes # and many properties like mem usage, cpu usage # mpstat # top and mpstat can show cpu% usage of pid's # mpstat -P ALL # htop # like an improved "top", but usually # it needs to be installed. # ptree # show processes in tree format # pmap -d pid # shows the memory map of a process (pid) -- Kill a process # kill -9 pid # pid is the process id found with "ps -A" # killall whatever # kill a process by it's name # pkill whatever # kill a process by it's name # xkill # a way to kill a graphical x program -- set priority of running process # renice 20 123 # set prio of pid 123 to 20 -- start a program in the background, so that the prompt returns at you terminal # myprg & # using "&" places it in the background # jobs # view your running jobs -- detach a program from your terminal, so that it keeps running # nohup myprg & # the "no hangup" nohup command # does the magic ============================================================================ 7. How local disks / partitions are named: ============================================================================ See section 4 on how to list disks and partitions. This section is only about device naming. Here you find information on *local* disc devices. For more info on SAN LUN's, please see Chapter 27. => Entire local harddisks are listed as devices without numbers, such as "/dev/hda" or "/dev/sda" or "/dev/sga" etc... The "standard" situation looks like this: /dev/sda - first SCSI disk (address-wise) /dev/sdb - second SCSI disk (address-wise) /dev/hda - master disk on IDE primary controller /dev/hdb - slave disk on IDE primary controller /dev/hdc - master disk on secondary controller /dev/hdd - slave disk on secondary controller Note: ----- There are some "deviations" for internal disks, especially using older hardware. With some distributions, using some specific older disk Array hardware, you might see the standard disk devices notated in a different way. like for example: /dev/cciss/c0d0 Controller 0, disk 0, whole device /dev/cciss/c0d0p1 Controller 0, disk 0, partition 1 /dev/cciss/c0d0p2 Controller 0, disk 0, partition 2 /dev/cciss/c0d0p3 Controller 0, disk 0, partition 3 /dev/cciss/c1d1 Controller 1, disk 1, whole device /dev/cciss/c1d1p1 Controller 1, disk 1, partition 1 /dev/cciss/c1d1p2 Controller 1, disk 1, partition 2 /dev/cciss/c1d1p3 Controller 1, disk 1, partition 3 Often, these "cciss devices" are associated with HP Smart Array block drivers, This is locally attached hardware, where Volumes are used as the standard disks. => CDROM / DVD devices: - To get information on CD/DVD drives use: # cdrecord -scanbus It displays information about your CD-R or CD-RW drive. You might see output like: scsibus0: 0,0,0 0) 'SONY ' 'CD-RW' 'NM56' 'Removable CD-RW' The first three numbers (for each item) refer to SCSI bus, device ID, and LUN (Logical Unit Number), respectively. The fourth number is scsi device again. If you want "to burn" a CD/DVD, those 3 numbers is what the "cdrecord" command wants to know for the device address. - Device files: CDROM like drives device file name is either /dev/cdrom, /dev/sr, /dev/scd, or /dev/dvd. But, sometimes the drive has a "disklike" devicefile like "/dev/hdb". Especially if your system has only one internal disk, and if a CD/DVD is a second device on the IDE primary controller So, you might see: /dev/cdrom - first CDROM device /dev/scd0, or /dev/sr0 - first SCSI CDROM/DVD /dev/hdc - CDROM/DVD on IDE, or /dev/hdb - CDROM/DVD on IDE /dev/dvd - DVD If you have IDE/ATAPI, but SCSI emulation is taken over, the device file has changed from something like /dev/cdrom0, or /dev/hdc, to /dev/scd0 You can also inspect the output of the following command: #dmesg | grep '^hd.:' Note: On systems with an IDE/ATAPI CD-Rom, often scd0 is linked to /dev/cdrom (scsi emulation). # ln -s /dev/scd0 /dev/cdrom => USB: Often an USB device is reckognized as /dev/sdb1. => Partitions: Partitions on a disk are referred to with a number such as: /dev/hda1 /dev/sda1 So, for example, you could use fdisk to partion /dev/sda as Device Boot Start End Blocks Id System /dev/sda1 1 255 2048256 83 Linux /dev/sda2 256 511 2056320 82 Swap /dev/sda3 512 5721 41849325 83 Linux => Naming in GRUB (at the start of Linux boot): The bootloader GRUB (formerly LILO) uses a naming like: (hd0,0) : meaning first harddisk, first partition (hd0,1) : meaning first harddisk, second partition (hd1,5) : meaning second harddisk, sixth partition This is a universal naming too. Usually, "/boot/grub/device.map" associates those entries to device files like /dev/sda ============================================================================ 8. Netcard and Network Info: ============================================================================ -- Listing network interfaces and IP parameters: # lshw -class network # ifconfig # lspci # list all pci devices # lspci | grep -i eth # as above, but now grepped # (filtered) on "eth" # lspci | egrep -i --color 'wifi|wlan|wireless' # dmesg | grep eth You can also check some files (using "cat file_name") to find info on netcard devices. But it depends a bit on your distribution, which files you should inspect. You might try to take a look in the following directories (or files) (if they exists on your system): /etc/network (directory) /etc/network/interfaces (directory) /etc/network/interfaces (as a file) /etc/sysconfig/network (as a file) /etc/sysconfig/network-scripts/ifcfg- (as a file) Here is an example on RedHat: [root@linRH507 /etc/sysconfig/network-scripts]# ls -al total 56 drwxr-xr-x 2 root root 4096 Sep 30 2008 . drwxr-xr-x 4 root root 4096 Sep 29 2008 .. -rw-r--r-- 3 root root 164 Sep 30 2008 ifcfg-bond0 -rw-r--r-- 3 root root 143 Sep 30 2008 ifcfg-bond1 -rw-r--r-- 3 root root 172 Sep 30 2008 ifcfg-eth0 -rw-r--r-- 3 root root 172 Sep 30 2008 ifcfg-eth1 -rw-r--r-- 3 root root 172 Sep 30 2008 ifcfg-eth2 -rw-r--r-- 3 root root 172 Sep 30 2008 ifcfg-eth3 [root@linRH507 /etc/sysconfig/network-scripts]# cat ifcfg-bond0 # Bonding van eth0 en eth1 : public interface DEVICE=bond0 BOOTPROTO=none IPADDR=10.132.68.11 NETMASK=255.255.254.0 ONBOOT=yes GATEWAY=10.132.69.254 TYPE=Ethernet In the example above, I used "cat ifcfg-bond0" to find the IP parameters on a "teamed interface", called "bond0", which uses eth0 and eth1 together as one "team". That makes no difference: bond0 just "acts" as one interface. You can also use some "network stats" commands, which are designed to show you network trafic/stats info. But often they list the interfaces too. For example: # netstat -nr # netstat -i In many cases, the netcards are listed as the "eth0", "eth1" devices, and others like "lo" (loopback). Certainly, other device names are possible too. It just depends on your system. -- Display Ethernet Card Settings (supposing you have the interface "eth0"). # dmesg |grep eth0 # ethtool eth0 # grep eth0 /etc/modules.conf ============================================================================ 9. A few notes on how to change your IP parameters: ============================================================================ On most systems, you can edit network configuration files, in order to change network- or IP related parameters (like hostname, IP address, mask etc..) Using the commandline (like ifconfig) is another option. 9.1 Editing files: ------------------ => For example in RedHat, editing config files: The configurations for each network device you have, are located in the "/etc/sysconfig/network-scripts/" directory. These configfiles have names like ifcfg-eth0, ifcfg-eth1 etc.. Here is an example ifcfg-eth0 DEVICE=eth0 BOOTPROTO=none IPADDR=10.10.10.11 NETMASK=255.255.255.0 ONBOOT=yes GATEWAY=10.10.10.254 TYPE=Ethernet With "vi" or "vim" you can edit the file, and change the address and mask, and optionally other parameters. See Chapter 11 for a extremely short intro on "vim". Editing the configuration files, will make the change permanently. When done, you need to restart your network services, like so: # service network restart => For example, on Ubuntu: Check out the "/etc/network/interfaces" file vi that file, and you see records like: iface eth0 inet static address 10.10.10.11 netmask 255.255.255.0 network 10.10.10.0 broadcast 10.10.10.255 gateway 10.10.10.254 Make changes as neccessary, and save that file. Next, restart your network services, like so: # /etc/init.d/networking restart 9.2 Using the "ifconfig" command: --------------------------------- # ifconfig eth0 # show all parameters of eth0 # ifconfig eth0 down # stop networking on eth0 # ifconfig eth0 192.168.99.14 # configure parameters, netmask 255.255.255.0 up # and bring the interface up In most cases, it's not suited for making "permanent" configurations. Edit the appropriate config file to make permanent changes. ============================================================================ 10. Some Monitoring commands: ============================================================================ On "monitoring" your system(s), you might think of two ways to do so: - real time monitoring, that is, interactively looking at what processes are running and what resources they use, or looking at general disk IO, or memory- and cpu usage. - performance stats gathering, which lets you view reports based on a certain time period (per day, this last week etc..) Here, we touch on "using commands" which often means that you take (a real time) look at how your system is performing now (or for some short duration). Here are some well know tools. Just listing the names (as I do right now), will not do much good ofcourse. You SHOULD REALLY try them. # top # this shows dynamically all processes and what resources # (like %cpu) they use. # It shows a graphical screen, but in text mode. # htop # Many view it as the successor of "top". # mpstat # the three on the left, primarily focusses on cpu usage # mpstat -P ALL # sysstat # iostat # this tool focusses on disk IO, and shows you several statistics. # basic usage: "iostat interval count", # like "iostat 4 5" which will give statistics data at # 4 seconds intervals, for 5 times. # vmstat # this tool focusses on cpu- and virtual memory usage. # Usually it's used with a "delay" option like "vmstat 5" # which shows you statistics every 5 seconds. Also, it's often # used with a "delay" and "count" like "vmstat 5 3" which shows # you stats every 5 seconds, for 3 times. Then vmstat exits. # sar # Usually used for reporting statistics over a certain period (like last 24h). # Ofcourse, "something" must be scheduled to collect data, # so that you can use the reporting tool sar to view reports over a period. # Indeed, if you install the sysstat utilities, it can be arranged # that sa1 and sa2 periodically gets fired from the schedular cron, # to build historical data. Then, you can view reports using sar. # Sar is very extensive, and deserves a manual of its own. Usually you will get some very nice graphical tools too, to be used from a Xwin console. Also a bit depending on your distribution, but if you have a Workstation with Linux configured, it's very likely it boots into a graphical Xwin environment. Some commands from section 7, can be viewed to fall into the "monitoring" category of commands as well. For example, "ps -A" shows you all processes with some interesting attributes. (Try it !). Please see section 7 for those other "monitoring" commands. ============================================================================ 11. The shortest possible "vi" or "vim" survivalkit in the Galaxy: ============================================================================ Sometimes you just need to "edit" some (ascii) textfile, like some configuration file, or a shell script, or whatever ascii file. There are some graphical editors on Linux too, but here we touch on the traditional textmode "vi" editor or "vim" editor (to be used from a terminal). You might say that the vim editor is an enhanced version of vi, and it's very likely that vim is available on your system. Here, we treat vi and vim as being "the same". If you try vim, just use that name in all examples below. If you don't like vi or vim, you can choose an alternative editor like "nano" (might even be better). However, One advantage of vi is, is that you can use it on any unix system as well (solaris, aix, hpux etc..). -------------------------------------------------------------------- Note: if you only want to view the contents on your screen, you can also simply use "cat" and "more", like: $ cat filename | more #types the content to your screen (while "more" #sees to it that it does not be dumped all in once.) $ more filename #allows you to "walk" through the text file. If you are not sure if a certain file is just ascii text, or binary, use the "file" command first, like so: $ file filename # tells you the file type. The ouput should refer # "in someway" to ascii or text, if it's ascii. -------------------------------------------------------------------- Now, I hope that you have an innocent, harmless, text file somewhere. Suppose you have found the file "readme" in some directory. In reality, you may have found some other text file, but in this example, I will pretend that we use some "readme" textfile. First, copy the file to the "/tmp" filesystem. Then, switch to "/tmp". $ cp readme /tmp $ cd /tmp $ vi readme # enter "vi readme" to start vi, and open the textfile. => Using the Esc key: you are not in edit mode / you are in command mode If you press the "Esc" key (at any time) you can "safely" walk through the text using the arrows keys (you are not in "edit" mode). Just play around with the cursor keys (the arrow keys). If you indeed have some substantial body of text, try: Ctrl-D : to go down half a screen. Try that a few times. Ctrl-U : to go up half a screen. Try that a few times. Indeed, those two keystrokes are quite handy to move quickly in your document. Now, just "Play" around a bit, using your arrow keys (cursor keys) and Ctrl-U/D. => Entering Insert mode, using the "a" key: now you can add or edit text: Now, just as a test, place the cursor right in front of the second word, on the second line (it's just an example). Press the "a" key and you will enter insert mode. Now type the word "help" (or whatever other word) . Press Esc again. So, using alternatively "esc" and "a" means this: -> If you were in in a situation where you have used Esc before, and if you THEN press "a" (or "i"), you can enter text from the position where your cursor was. -> If you want to quit entering text, press Esc, and you can again safely walk through the file again, while not being in the "edit" mode. -> Deleting some characters or words: Press Esc, navigate to characters or word you need to remove, and press the "d" key one or more times. Hopefully, you see that the text is getting removed. If you press "a", you can add text. Then, press Esc again. Saving changes, and/or quiting "vi": ------------------------------------ If you press Esc, you leave the "editing" mode, and you go to "command mode". If you now press ":" (the colon key), and type: q! (and press Enter): you quit vi, and you will NOT save any changes (q from "quit") wq! (and press Enter): you quit vi, but now you WILL save your changes to the document (q from "quit" and w from "write") Ok, this paragraph was ridiculous simple. It's not for nothing that quite some extensive tutorials exists on vi. On the most basic level, the above pointers should help you out a bit. ============================================================================ 12. Some remarks about booting Linux: ============================================================================ 12.1 Bootsequence in general: ------------------------------ There are some differences between the boot of Linux on bare metal, or if it would boot as a Virtual Machine (VM) on some type of hypervisor like vmware, Xen, Z etc... But, a lot is surprisingly the same. Traditionally, it goes a bit like this: -> BIOS: <- The BIOS will try to find the bootloader. Nowadays, the BIOS can check several devices like CD/DVD, hardisk, netboot etc.. in a certain preferred order. Traditionally, it would load the MBR (Master Boot Record) which is cylinder 0, head 0 and sector 1 of the first harddisk. -> MBR: <- The MBR contains the Partition Table for the disk, and a small amount of executable code and some error messages. This executable code examines the Partition Table, and identifies the System Partition (or Actice Partition). This is the partition that's used to boot the Operating System, and it contains the "Partition Bootsector". -> PARTITION BOOTSECTOR: <- The Partition Bootsector points to some essential loader of the Operating System, like for example "ntldr" of WinNT. From then on, control is passed to ntldr and the bootsequence of NT would start. -> GRUB: <- Once Linux is installed, the above sequence has been changed. This time, The MBR now (usually) contains "GRUB stage 1", which is a bootloader. Once that first stage is loaded, several paths could be followed. However, usually, some additional sectors are read, which also contain "file system drivers". Then, GRUB will load GRUB "stages 1.5 and 2", and a configuration file from "/boot/grub". The exact details will be left out here. -> GRUB AND MULTIBOOT: <- When GRUB is fully loaded, it will present a simple menu of booting to any Operating System, as is listed in "/boot/grub/grub.conf". Since GRUB is so smart, it thus could boot the system to XP, or Win7, if that would be installed too. But usually you would go for Linux. An example of a grub.conf could look like this: # cat /boot/grub/grub.conf # grub.conf generated by anaconda # # Note that you do not have to rerun grub after making changes to this file # NOTICE: You have a /boot partition. This means that # all kernel and initrd paths are relative to /boot/, eg. # root (hd0,1) # kernel /vmlinuz-version ro root=/dev/sda3 # initrd /initrd-version.img #boot=/dev/sda default=0 timeout=33 splashimage=(hd0,0)/grub/dark_sun.xpm.gz hiddenmenu title CentOS 5.2 x86_64 2.6.18-92.1.22.el5 root (hd0,0) kernel /vmlinuz-2.6.18-92.1.22.el5 ro root=LABEL=/ initrd /initrd-2.6.18-92.1.22.el5.img title Windows XP SP3 rootnoverify (hd0,1) chainloader +1 You might notice that (hd0,0) is a way to point to the first partion on the first harddisk. Also, (hd0,1) then points to the second partition on the first harddisk. This is why GRUB is able to let you choose to start to Linux or a system like Windows XP or Win7. Example of a grub.conf with just one option: # cat /boot/grub/grub.conf # grub.conf generated by anaconda # # Note that you do not have to rerun grub after making changes to this file # NOTICE: You have a /boot partition. This means that # all kernel and initrd paths are relative to /boot/, eg. # root (hd0,0) # kernel /vmlinuz-version ro root=/dev/myvg/rootvol # initrd /initrd-version.img #boot=/dev/cciss/c0d0 default=0 timeout=5 splashimage=(hd0,0)/grub/splash.xpm.gz hiddenmenu title Red Hat Enterprise Linux Server (2.6.18-308.13.1.el5) root (hd0,0) kernel /vmlinuz-2.6.18-308.13.1.el5 ro root=/dev/mapper/myvg/rootvol initrd /initrd-2.6.18-308.13.1.el5.img About grub's naming of disks/partitions: It's something like an OS-neutral naming convention for referring to disk devices. Here, hard disks are always called hdN (N=0,1,..), floppy disks are fd, and partitions are called hd(N,M) (N,M in 0,1,..). -> INITRD (or INITRAMFS on some systems), AND KERNEL LOAD: <- The initial RAM disk ("initrd" or "initramfs") phase, provides for an initial root file system that is mounted prior to when the "real root file system" can be mounted. The initrd is bound to the kernel and loaded as part of the kernel boot procedure. Thanks to this intermediate mount, modules can be loaded to the Kernel and as a result, the kernel is able to make the "real file systems" available and get access at the real root file system. -> INIT AND RUNLEVELS: <- The kernel will execute the "init" process. The init process starts all other processes. The /etc/inittab file contains instructions for init. It contains directions for init on what programs and scripts to run when entering a specfic runlevel. As of init, there might be slight variations between the different Linux distributions on how scripts will be executed and from which locations. A (partial) inittab file might look a bit like this: # Default runlevel. The runlevels used by RHS are: # 0 - halt (Do NOT set initdefault to this) # 1 - Single user mode # 2 - Multiuser, without NFS (The same as 3, if you do not have networking) # 3 - Full multiuser mode # 4 - unused # 5 - X11 # 6 - reboot (Do NOT set initdefault to this) # id:3:initdefault: # System initialization. si::sysinit:/etc/rc.d/rc.sysinit l0:0:wait:/etc/rc.d/rc 0 l1:1:wait:/etc/rc.d/rc 1 l2:2:wait:/etc/rc.d/rc 2 l3:3:wait:/etc/rc.d/rc 3 l4:4:wait:/etc/rc.d/rc 4 l5:5:wait:/etc/rc.d/rc 5 l6:6:wait:/etc/rc.d/rc 6 # Trap CTRL-ALT-DELETE ca::ctrlaltdel:/sbin/shutdown -t3 -r now So, suppose in inittab, the default "runlevel" is set at "3" (id:3:initdefault), the scripts as specified by "/etc/rc.d/rc 3" will be executed. On some systems, it may mean that all scripts (or symlinks to /etc/init.d) in "/etc/rc3", with a name that starts with "S" (as from Start), will be executed. So, it might be possible to find, say, a "S99Oracle" script, that boots Oracle. Note however, that there might be some variations on how exactly the "rc" scripts are found between the different distributions. See also Chapter 21. Note: ----- The former Grub was called "lilo". This bootloader was somewhat more limited in capabilities. It used "/etc/lilo.conf" as its configuration file. 12.2 Creating a Boot CD/DVD: ---------------------------- Here, just a few general remarks are presented. Option 1: using a downloaded .iso file, to create a CD/DVD: ----------------------------------------------------------- Here, the general outline would be: - First you find, or download, the .iso file. - Then, you burn that .iso file to CD/DVD, using "cdrecord" or a graphical burning tool. - Done. 1. Download a suitable "filename.iso" file, from the internet, or other location. 2. Once you have downloaded the right .iso file, optionally check it using: # md5sum filename.iso # check if the downloaded file has the same # "checksum" as you saw on the site 3. Next, you need to have CD/DVD burning software, or use the "cdrecord" command. You must burn the .iso file to the writable CD, or DVD "as an image". An .iso needs to be "burned" in a 'specific way" that expands/extracts the image, so that you end up with usable files on your disc. The burnprocess will create a bootable media automatically. Thus, the "bootable info" on the resulting bootable DVD, is just part of the .iso file. Typically, here a graphical burning tool in Xwin would be ideal. Ofcourse, using the commandline is possible too. Here, most often the "cdrecord" command is used. This command expands the .iso file to CDR/DVD. Hopefully, your OS has support for the drive, without needing to install anything. If the command "cdrecord -scanbus" shows a drive (or more drives), you are good. Using cdrecord: --------------- If you would go into a commandline session with "cdrecord", it would esemble something like this: # cdrecord -scanbus # in order to find the dev address (if needed) # See also Chapter 7. # cdrecord -v speed= dev= /path_to_iso like: # cdrecord -v speed=8 dev=0,0,0 /isofiles/example.iso # cdrecord -v dev=0,4,0 example.iso As you can see, once you have a .iso file, it is not too hard to create a bootable CD/DVD containing the extraction of that .iso file. Or, you simply burn it from Windows, using a graphical utility. Option 2: Create an .iso file, then burn it to CD/DVD: ------------------------------------------------------ Here, the general outline would be: - First you create the .iso file yourself, using "mkisofs". - Then, you burn the .iso file to CD/DVD, for example using graphical burning software or the "cdrecord"command we saw above. - Done. This procedure is just a tiny bit harder to perform. First, "mkisofs" has a "not so easy" to comprehend commandline syntax, using quite a few parameters. There are many ways to proceed further, also depending on your specific distribution. As 'handy' information for reading articles on this subject, we must realize that when booting from CD-ROM, there are couple of different "modes", among which exists: - "SYSLINUX like", where the boot information from a "bootable floppy" is stored in an image file on the CD. So, if you boot from that CD/DVD, that image is then loaded from the CD. It behaves like it was a "virtual floppy". It's also called "Floppy emulation mode". - "ISOLINUX like", which is "no emulation mode". The boot information is stored directly on the CD, and no floppy image is needed. Later isolinux makes its possible to store harddisk MBR bootinfo on the CD/DVD media. - EXTLINUX like", which is a general-purpose bootloader, like GRUB. Nowadays, it's integrated with isolinux. So, now it's best to find a good link describing the procedure for your distribution. If your purpose is to have a good Disaster Recovery procedure for Linux on Bare Metal (using a physical machine, instead of a using a virtualized environment), you might want to see what tools like "Mondorescue" can do for you. See also Chapter 25. Note: if for example using VMWare ESX(i), it's really easy to copy the systemstate of a Linux VM to another place, since it's only a .vmdk file. ============================================================================ 13. Some remarks about Package Management: ============================================================================ This is about software management. Proper Package management allows you to install, update, and delete software, and to query the present state of packages installed. Software and applications on Linux systems are usually organized in the form of "packages" that contain and describes all the relevant parts of an application (for example, binaries, configuration files, and libraries). This software needs to be installed in the correct way, at the right locations, and garding all dependencies. Indeed, that is the main responsibility of a packager. However, a few large software vendors, still use their own "installer". - Originally from RedHat, the Redhat Package Manager (RPM) is found, or can be used, on many Linux environments. You might say that RPM is a bit of a standard. - Another popular package manager, is "YUM" (Yellowdog Updater Modified). It's more like a frond-end to RPM. - Yast (yast2) is found on SuSE, but is also RPM compatible. - There are quite a few others like apt-get, dpkg etc.. Some rpm examples: ------------------ # rpm -ivh packagename # installing the package, using -i # rpm -Uvh packagename # Upgrading a package, using -U, # is similar to installing one, # but RPM automatically un-installs existing # versions of the package before # installing the new one # rpm -e packagename # remove (erase) a package, using -e # rpm -qa # querying on all packages. It uses the # "/var/lib/rpm" repository/database # rpm -q packagename # querying on a specific package # rpm -qa | grep whatever # querying all, but filtering on 'whatever' # rpm -qf /usr/bin/whateverfile # querying to what package a certain file # belongs, using -f # rpm -qlp whatever.rpm # showing all files in the package # rpm -Va # validates all packages Some yum examples: ------------------ # yum install package # installing # yum -y install package # like above, but without prompting for confirmation # yum remove package # removing the package # yum update package # updating the package # yum search mypackage # searching for a package As you can see, it's not too hard working with some package management tool. However, some operations might take some time, while it seems that you do not get response back in a timely fashion. It's probably wise "to give it some time", because in a few cases, if you interrupt the process, it might leave the repository in an indeterminate state. ============================================================================ 14. A few words on Linux logs: ============================================================================ Some main Linux logfiles, and how to view them. Remark 1: --------- The "syslog" subsystem / "syslogd" daemon, logs all sorts of system messages, from informational- to critical messages. The "/etc/rsyslog.conf" configurationfile determines (partly) what events are going to be logged and where. Remark 2: --------- Some of the logfiles maybe under the control of logrotate, meaning that multiple files might be present using similar names (but ending in .1, .2 etc.., or other extension, and possibly compressed too). See "/etc/logrotate.conf", or "/etc/logrotate.d/syslog" (or other file) for configuration of "logrotate". Usually, there exists a daily cron job, like for example scheduled like so: 05 6 * * * /usr/sbin/logrotate /etc/logrotate.conf Here, logrotate runs one a day at 06:05, taking its settings from /etc/logrotate.conf => The "/var/log/messages" file is one important one. It contains global (ongoing) system messages. # cat /var/log/messages | more # view the messages logfile # more /var/log/messages # view the messages logfile # tail -f /var/log/messages # View New Log Entries as they are # happening in real-time (using tail -f). # Use Ctrl-C to stop viewing => The "/var/log/dmesg" file contains (among other stuff) messages the system issued at boottime, and some kernel output. # dmesg # view the dmesg logfile # cat /var/log/dmesg | more # view the dmesg logfile => The "/var/log/lastlog" file, "/var/log/faillog" file, and "/var/log/auth.log" file. "/var/log/lastlog" - contains the recent login information for all the users. Use the "lastlog" command to view this log. # lastlog "/var/log/faillog" - contains user failed login attemps. Use the faillog command to view this log. # faillog "/var/log/auth.log" - contains logged auth events, like logins, the use of sudo, remote connections etc.. # grep sshd /var/log/auth.log | less # Here you only are interested # in sshd events, thats why # you grep on that string. # cat /var/log/auth.log | more # View all entries => "/var/log/daemon.log" - contains messages and events from system and application daemons. Note that there usually are multiple files like /var/log/daemon.log.1 and compressed ones after ".1". Just use one of our familiar commands to view the uncompressed files (like "cat", or the "more" command) => The "/var/log/kern.log" (or kernel or kernel.log) - contains kernel messages. ============================================================================ 15. A few words on cron: ============================================================================ Cron is the default scheduler in Unix and Linux. It uses the socalled "crontab" files which define which jobs are called, and the schedules of those jobs. You can think of all types of jobs, like backup jobs, certain print jobs, whatever.. Often, shell scripts are scheduled, but true programs can be scheduled too. Usually, root and some other admin accounts have a number of scheduled tasks. Here you might think of backup jobs, and all sorts of "housekeeping" tasks, like archiving old logfiles etc.. But any account, if authorized, can have it's own crontab. => Using cron For example, suppose you use the account "oracle", and you have logged on to the system, then: - If you want to see your schedule tasks, use the command "crontab -l": # crontab -l # view the scheduled tasks - If you want to edit the schedules, or add/remove jobs, then use "crontab -e": # crontab -e # edit your scheduled tasks When you edit your crontab file, using "crontab -e", vi (or another editor) will start and you are able to alter date/times of schedules, add or remove jobs etc.. If you just list, or edit your crontab file, then typically you will see records like the following example: minute hour day_of_month month weekday command 15 4 * * * /home/harry/bin/maintenance.sh From "left" to "right", the first 5 field simply define the schedule of the command. You see the following fields: -minute (from 0 to 59) -hour (from 0 to 23) -day of month (from 1 to 31) -month (from 1 to 12) -day of week (from 0 to 6) (0=Sunday) A "*" means "all" or "indifferent". So in the example above, the shell script "/home/harry/bin/maintenance.sh" is scheduled to run once, for every day, at 04:15 h. minute hour day_of_month month weekday command 15 4 * * 1 /root/archive.sh In the example above, the script "/root/archive.sh" only runs on Monday (day of week=1, that's monday), at 04:15h. Be carefull Not to use * * * * *, because that would mean that a job is scheduled to run every minute, every hour, every day, unless you really meant it to be that way. See, using cron is really easy. However, you need to be a bit handy with your editor (usually "vi") in order to add/remove or modify records. Here is another example: minute hour day_of_month month weekday command 30 18 7 * * /root/maint.sh 2>&1 >> /log/maint.log Here, the script "/root/maint.sh" only executes once at the 7th day of each month, at 18:30h. Notice the "2>&1 >> /var/log/maint.log". It means that standard error (2) is redirected along with standard output (1), in this case both to the /log/maint.log logfile. => Starting and stopping cron: On many distro's, root can use: # /etc/init.d/crond start # start crond daemon # /etc/init.d/crond stop # stop crond daemon Usually, there is no need for this since crond will start at bootime, and only in some rare cases you need to restart crond. In some environments, the following can be used to start/stop cron: # service crond start # service crond stop => Allow a login to use cron: If you are root, and an account needs to use the schedular, then add the name in the cron.allow file. Usually, there exists the "/etc/cron.allow" and "/etc/cron.deny" files On some other systems, take a look in /var/spool/cron to locate the cron.allow file. ============================================================================ 16. A few words on User Accounts: ============================================================================ On any distribution, graphical tools are available for creating users and groups if you want to use Xwin environments. Usually, authorisations are granted to groups, where each member of such a group will inheret these permissions. Anyway, from the cli, you can use the "adduser" or "useradd" commands in order to create a new user. You can use "usermod" to alter user properties. The exact mechanics will vary somewhat between distributions. A few examples: => "useradd" and "adduser" to create new (local) users: # useradd harry # adds the account with defaults # useradd -s /bin/bash -m -d /home/harry -c "King Harry" -g root harry Usually: -s : Login shell for the user. -m : Create user’s home directory if it does not exist. -d : Home directory of the user. -g : Group name or number of the user. UserName : Login id of the user. The "adduser" command will interact with you, so that the system will ask you to provide values. # adduser harry .. informational messages + questions asked... => "usermod" to modify an existing account: # usermod -d /home2/albert albert # alter homedir # usermod -e 2012-10-20 albert # disable account as of 2012-10-20 Many more options are available. When a user is created, it will be known to the system by it's UID. Although you are also able to change a user's UID, with "usermod", you should be VERY reluctant to do so, since all objects that the user created in the past are known to the system bij it's UID (user ID) and it's GID (group ID). => "addgroup" and "groupadd" to add Groups: Similar to the adduser and useradd commands, these can be used to create Groups. # groupadd dba => Showing your UID or that of another user: # id # id harry => the "/etc/passwd" and "/etc/group" files Local Users are described in the "/etc/passwd" file. Local Groups are described in the "/etc/group" file. The /etc/passwd is an ascii file, containg a list of the accounts, giving for each account information like the user ID, group ID, home directory, which shell the account uses. Passwords are not stored in /etc/passwd, but traditionally in "/etc/shadow". You can always view the contents of "/etc/passwd", or search for a string: # cat /etc/passwd # cat /etc/passwd | grep -i Albert # grep Albert /etc/passwd Ofcourse, Linux can function as an LDAP Server for central authentication and management of accounts, as well as that it can be integrated in an existing Directory Service. ============================================================================ 17. The Linux standard filesystems: ============================================================================ 17.1 Logical layout of the root "/" filesystem: ----------------------------------------------- When a Linux distribution is installed, some standard filesystems and mountpoints will be created. One of them is the "root" / fileystem. When you take a look at it, using some graphical utility, or using just the "ls" or"ll"command, logically it looks like this. Fig. 1. |-/bin (user binaries) |-/sbin (system binaries) |-/etc (configuration files, rc scripts) |-/dev (device files) |-/var (variable stuff, like logs. Sometimes apps are installed there too) / -|-/usr (user programs) |-/proc (process information, system information) |-/boot (files needed at boot) |-/home (user home directories) |-/root (home dir of root) |-/tmp (place for temp files) |-/lib (system libraries) |-/opt (optional location for programs) Most directories contain a lot of subdirectories as well, so actually it's a whole "tree". A "path" to a certain file, say the file "messages" in "/var/log/", starts from the root "/", then the directory "var", then we need to go to "log". 17.2 Is it a directory in the "/" filesystem, or a seperate filesystem?: ------------------------------------------------------------------------ Note that you cannot know beforehand (by browsing the tree), if "/home" is just simply a directory in "/" (root), or a seperate filesystem mounted on "/home". The same is true, for example for "/var". Well, the question is simply solved if you just use the "mount" command, or the "df -T" command, or simply take a look at the "/etc/fstab" file. All three methods will cleary show you the different filesystems, and where it's mounted on. For users, it's not an interresting question. For admins it is. So, what's the difference anyway? An explanation now follows. If you already know this stuff, it might be boring, and you might skip to the next Chapter. If you want to hear it: read on. Case 1: ------- Let's consider a traditional bare metal install of Linux on a PC. Suppose you have installed a second IDE harddisk, known to Linux as /dev/hdb. If you partition it using fdisk, # fdisk /dev/hdb Then fdisk asks you a couple of questions, and you end up with a partition "/dev/hdb1". Then you create, a filesystem on that partition, in order to make it usable: # mkfs.ext3 -b 4096 /dev/hdb1 Please note that any distribution has it's own tools, so in your case maybe another command must be used. Anyway "mkfs" or "crfs" will allmost always work. Ok, now we have a formatted filesystem (of type "ext3"). It's nearly usable. But not yet. Now we make it "alive" by "mounting" it to a "mountpoint". Suppose I create a directory "/data" (which will be my new mountpoint) Now let's mount the filesystem: I edit the "/etc/fstab" file (which registers all mounts) and place a new record in it, like: /dev/hdb1 /db ext3 defaults 1 1 (there are more records in fstab, similar to this one). Now I can simply mount the filesystem, by using: # mount /data And its available. So what happened here? /data looks like a simple directory, which it is, but its actually also a mountpoint for a complete seperate filesystem. Case 2: ------- I know that the following is ridiculous simple, but take a look at this scenario: I could create a directory "oracle" in "/", so now we have "/oracle". In /oracle, I could place all sorts of subdirectories and files. I could do something similar in "/data" of case 1. So whats the difference? "/data" corresponds to a whole seperate filesystem (in this case, on a second harddisk). "/oracle" is just a directory, within the "/" filesystem. There is a difference here. You agree? So, what about /usr, /bin etc..? Are those all seperate filesystems on seperate partitions, or are they simply directories within "/"? The answer is: "/" root corresponds to a filesystem, and with many distributions, almost all of the directories listed in Figure 1 above, are indeed directories, and not mountpoints (where "seperate" filesystems are mounted on). Usually, only a few "directories", like "/home", are associated with seperated filesystems. However, with some installations, you might see that /var, /usr, /opt, /home, (and possibly others), are separate filesystems, using these mountpoints. Actually, there is nothing really special about it. Most unixes do it "their own way". For example, in AIX, Solaris and others, "/usr", "var" etc.. sits in their own filesystems, assoiciated with it's own partition. 17.3 The "/etc/fstab" file - repository of filesystems and mountpoints: ----------------------------------------------------------------------- In Linux, the standard mounts are defined in the "/etc/fstab" file. If you have created a new filesystem, and you want that Linux mounts it at boottime, you need to enter a record in that file. Here are 3 examples: Example 1: LABEL=/ / ext3 defaults 1 1 LABEL=/boot /boot ext3 defaults 1 2 devpts /dev/pts devpts gid=5,mode=620 0 0 tmpfs /dev/shm tmpfs defaults 0 0 proc /proc proc defaults 0 0 sysfs /sys sysfs defaults 0 0 LABEL=SWAP-sda3 swap swap defaults 0 0 Example 2: /dev/hda2 / ext2 defaults 1 1 /dev/hdb1 /home ext2 defaults 1 2 /dev/cdrom /media/cdrom auto ro,noauto,user,exec 0 0 /dev/fd0 /media/floppy auto rw,noauto,user,sync 0 0 proc /proc proc defaults 0 0 /dev/hda1 swap swap pri=42 0 0 Example 3: # tmpfs /tmp tmpfs nodev,nosuid 0 0 /dev/sda1 / ext4 defaults,noatime 0 1 /dev/sda2 none swap defaults 0 0 /dev/sda3 /home ext4 defaults,noatime 0 2 Note that you usually do not see seperate filesystems for /usr, /var etc.. But, usually "/", "/boot", "/home", and swap do "have" their own partitions. Note: Here is an example of the standard filesystems in AIX. AIX does not use an "/etc/fstab" file, but the mounts and filesystems are registered in "/etc/filesystems" which has a similar function. Note that if we take a look at hdisk0 of rootvg, we can see that /usr and /var have their own partitions/filesystems (Logical Volumes), and are not simply part of "/". They are seperate filesystems, "simply" mounted on /usr and /var. # lspv -p hdisk0 hdisk0: PP RANGE STATE REGION LV NAME TYPE MOUNT POINT 1-1 used outer edge hd5 boot N/A 2-48 free outer edge 49-51 used outer edge hd9var jfs /var 52-52 used outer edge hd2 jfs /usr 53-108 used outer edge hd6 paging N/A 109-116 used outer middle hd6 paging N/A 117-215 used outer middel hd2 jfs /usr 216-216 used center hd8 jfslog N/A 217-217 used center hd4 jfs / 218-222 used center hd2 jfs /usr 223-320 used center hd4 jfs / .. ============================================================================ 18. Some remarks about using Linux as a Virtual Machine: ============================================================================ There are quite a few options on running Linux as a Virtual Machine (VM). If we would think in a "practical way", you might devide those options in "Desktop" solutions and "more Business-like" solutions. - An example of a Desktop solution might be a XP or Win7 PC (or Win Server) where a product like "VMWare Player" is installed (or VMWare Server), which makes it possible to run a Linux VM, within Windows. - More business-like solution would for example be an ESXi Host (VMWare) where possibly a large number of Windows- and Linux VM's are running (with more options like centralized Administration of ESX Hosts, and "Live Migration" of a VM to another ESXi Host). Often, these products can be considered to use a real "hypervisor". Some would say that the distinction would be better described with: - A "hosted" environment where a Host OS (Windows or Linux) can "host" one or a few Guest OS'ses. - A true specialized hypervisor kernel on a physical machine (like ESXi or Xen), which supports running many VM's. ---------------------------------------------------------------- For a general description of Virtualization: see the note below. ---------------------------------------------------------------- When you consider a Desktop solution, like a Windows Host OS, it might add a lot of value to your PC: . For example, if you have a Win7 PC or so, and you just want to try or test some Linux distro, you can run such a "Guest OS" without (relevant) modifications to your Host OS. . As another example, maybe your standard Desktop is Windows, but suppose you must write a lot of linux shell scripts, then loading a Linux VM might well be a solution. I agree that it sounds more "right" to run a "Windows VM" from Linux, but a fact is that many folks do it the other way around. -> A few examples of "Desktop Virtualization" Products (for Windows) which can run a Linux VM: 1. VMWare Server : You can run one, or a few Linux VM's from Windows or Linux. However, support has ended. 2. VMWare Workstation : Might be viewed as the succcessor of VMWare Server. 3. VMWare Player : Simple way to run one or a few VM's. However, some advance options are not present, like connecting to vSphere, upload VM's, snapshots etc.. -> A few examples of "Business Virtualization" Products which can run Linux VM's: 4. VMWare ESX(i) : One of the most implemented solutions in business infrastructures. 5. Xen Server : Another popular solution in business infrastructures. 6. Red Hat RHEV : Yet another popular solution in business infrastructures. 7. IBM Power and Z : IBM Z mainframes, and Power Systems (like system p), can run many Linux VM's (lpars). This is list is far from complete ! And please also realize that it's only focused on what virtualization products support Linux VM's. There are many more virtualization platforms, like hpux npars, solaris containers etc.. etc.. When you consider the popular VMWare products, among other files, typically you would see these files: - .vmx file : This one stores configurations with respect to the VM. - .vmdk file(s): This is a virtual disk file, which stores the contents of the virtual machine's hard disk drive. There are other files as well, like a .lck (lock file) and .log files and others. For some Linux distributions, you can download a vmx file and vmdk file from the internet, and instantly create a VM in a Test/Play configuration, using "VMWare Player" or "VMWare Workstation". Note: Types of Virtualization: ------------------------------ To describe the virtualization techniques in general, most people use the following classification to describe the business solutions: => Full Virtualization: An unmodified Operating system, like Linux, can be installed in a VM. The OS is "hypervisor unaware". In short: it "thinks" that it runs on a true physical machine. The hypervisor needs to intercept driver actions and all calls to hardware. Also, the hypervisor emulates the whole bunch (like the BIOS, all hardware). Often a process called "binary translation" takes place by the hypervisor. In general it might be percieved that there is a lot of overhead at the hypervisor level. => Para virtualization. A modified OS (with respect to kernel and drivers), like Linux, can be installed in a VM. The OS now is "hypervisor aware" and makes use of the API of the hypervisor. In short: the guest OS "knows" that it's running in a VM. The overhead on the hypervisor is less when compared to (traditional) full virtualization. - For Linux VM's, the socalled paravirt-ops code ( pv-ops) was included in the Linux kernel as of the 2.6.23 version. - Nowadays, for Windows VM's, Windows either need the socalled "enlightened" drivers, or the "Virtualization Vendor" wants you to install it's own paravirtualized drivers in Windows (after Windows was installed), or Windows runs under full virtualization (HVM, see below). => Hardware assisted virtualization (often called "HVM"): Instead of binary translation of calls of the VM, like in use with the traditional "full virtualization", the hypervisor (or VMM) now hands off the effort to hardware capable to simulate a complete hardware environment. The OS may be unmodified, just like with traditional full virtualization. This type of virtualization is often considered to be "a special case" of full virtualization. Hardware-assisted virtualization requires explicit support in the host CPU, and in the case of Intel, the socalled "VT" (VT-d, VT-x) capabilities needs to be present. Often Hardware assisted virtualization is percieved as the fastest virtualization technology, however many tests seem to point to the fact that the gains are not spectecualr over software based emulation (for now). ============================================================================ 19. A few words on Linux VM's under Xen and XenServer : ============================================================================ This section only serves to provide for a very, very, light-weight impression of Xen/XenServer. - Xen: Xen is the fundamental (and original) hypervisor. It main purpose is to support VM's and to provide for isolation between them. Xen is open source (GPLv2) and is managed by Xen.org. Later, Citrix maintained their own "evolution", called XenServer. - Citrix XenServer: Citrix XenServer comes as a free edition, and as a commercial one with support. XenServer still includes "Xen", the fundamental hypervisor. A Xen host will run VMs, which are called "Domains". The first one that boots, is called "Dom0" (Domain 0") and enables you to control the other VM's. Any other VM is unprivileged, and are known as a "domU" or "guest". Dom0 is a Linux machine, to which you can logon using the "console" or using a remote connection. Many Linux distributions are suited for Dom0 like Debian, Fedora, OpenSuse. From Dom0, you can use specific commands to create, start, stop, list VM's, as well as other commands. Grapical tools are available as well. The other VM's usually runs Linux distributions or Windows Server editions. There were some efforts done to split "Dom0" into Dom0 and "driver domain" which are is unprivileged Xen domains that has been given responsibility to a specific piece of hardware. We will not discuss that further in this simple note. Traditionally, the Xen or XenServer architecture resembles the following figure: Fig. 2. --------- ------------------------------------------ |Console| | |Dom0 | |DomU | |DomU | | | |---- |---|Linux | | | | | | --------- | | | |Guest OS| |Guest OS| | | |XAPI | |VM1 | |VM2 | | | |Xend |----| | | | | | | | | | | | | | |-------| |--------| |--------| | | |drivers| |Virt Drv| |Virt Drv| | |----------------------------------------- | XEN hypervisor | ------------------------------------------ | HARDWARE | ------------------------------------------ From Dom0, the commandline can be used to control the other Domains. Formerly, from Xen, the "xm" commandset was available. However, as of XenServer on, you should use the "xe" commandset. Usually, Volume groups are created from phyical volumes (disks). Once the Volume Groups (VGs) exists, Logical Volumes (LVs) are created, which will form the filesystems for the Virtual Machines. => Some examples of (older) the xm commandset or "Xen commandline user interface" # xm [OPTIONS] Where a few examples of the "subcommand" is listed below (like shutdown), and to specify on which domain the command is in effect, the "domain-id", or the "domain name" should be used. # xm list # list all VM's (Domains) on this Host # xm console domain-id # connect to a DomU # xm create testdom [-c] # creates a DomU (with per default file in /etc/xen) # xm create Fedora4 # create a VM # xm shutdown Fedora4 # shutdown a VM # xm suspend 166 # suspends the domain with id 166 # xm resume 166 # resumes the domain with id 166 # xm destroy Suse5 # remove a VM The list command for example can show you a list similar to: Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 98 1 r----- 5068.6 Fedora3 164 128 1 r----- 3.2 Fedora4 165 128 1 ------ 0.6 Fedora5 166 128 1 -b---- 3.6 Suse10 168 100 1 ------ 1.8 => Some examples of (newer) the xe commandset or "XenServer commandline user interface" The syntax of XenServer xe CLI commands is: # xe command-name argument=value argument=value ... A few examples: # xe vm-list # list all VM's (Domains) on this Host # xe vm-start vm=name # start this VM # xe vm-destroy uuid=UUID # remove the VM ============================================================================ 20. A few words on Linux VM's under VMWare : ============================================================================ This section only serves to provide for a very, very, light-weight impression of VMWare ESX/ESXi and Infrastructure. 20.1 Birds-Eye view on Host OS'ses, and "Management" infrastructures: --------------------------------------------------------------------- => Host Operating system: VMWare ESX or ESXi ESX(i) refers to the Host Operating System, which is the hypervisor and all supporting software on a physical machine, that makes it possible to run VM's on this Host. Multiple ESX(i) Hosts can form a Cluster. There are a "few" differences between ESX and the ESXi Operating systems, which will be addresses below. => Management frameworks for ESX and ESXi ESX: In a larger ESX environment, which may have many ESX Hosts, you often will find that the "VirtualCenter" Management Server is implemented, which is a Win2K3 Server with a SQL Server based repository. Windows AD authentication is possible which determines your permissions in the infrastructure. VMWare Admins may use a Windows based "VI client" to connect to the VirtualCenter and create VM's, stop/start VM's, change resources and the like. The Admins also can ssh to any ESX Host, and use a command line interface to manage VM's. ESXi: Quite similar to above, the Management framework is improved and renamed to "vSphere" using the "vCentre Management Server". VMWare Admins may use a Windows based "vSphere client" to connect to the vCentre Management Server and create VM's, stop/start VM's, change resources and the like. The Admins also can ssh to any ESXi Host, and use a command line interface to manage VM's. 20.2 ESX and ESXi: ------------------ - ESX ESX (on a bare metal Host) actually uses a slightly modified Linux OS (for boot) which is called the "Service Console" or "Console Operating System (COS)". This console uses the Linux kernel, device drivers, and other usual software like init.d and shells. This Service Console (a full Linux machine) can be seen as the primary environment for one ESX Host. You can for example just ssh to it, and (for example) also find the usual useraccounts as root and other typical Linux accounts. During boot of the Service Console (Linux), the VMWare "vmkernel" starts from "initrd" and it starts "to live" for itself, while Linux boots futheron. The vmkernel with all suppoting modules can be seen as the "Hypervisor" which is used for "virtualization". It's important to note that vmkernel is not "just" a Linux kernel module. After a full boot sequence, the state of affairs is that the vmkernel may be viewed as *the* kernel while the (Linux based) Service Console may be seen as the first Virtual Machine on that ESX Host. The Service Console then can be viewed as the "management environment" for that ESX Host. It's indeed obvious that a VMWare Admin can ssh to the Service Console, and perform CLI management of that ESX Host. - ESXi ESXi can be viewed as the successor of ESX. ESXi is a much smaller, and faster loading system. First of all, it does not use a Linux boot. So, also, the (Linux) "Service Console" does not exist anymore. Many people view ESXi as almost just firmware, and the change from ESX is indeed quite large. ESXi is essentially a vmkernel microkernel with loaded modules for supporting services, and there is no binding with a "Console Operating System" or "Service Console" as was the case with ESX. This "lighter" (smaller footprint) environment is considered to be faster and more secure (lower surface). It's still possible that a VMWare Admin can ssh to the VMkernel environment, and perform CLI management of that ESXi Host. Release: Year: Management Frmwrk VMware ESX Server 1.x 2001 VMware ESX Server 2.x 2003 VMware ESX Server 3.x 2006 VirtualCenter VMware ESX Server 4.x 2009 vSphere VMware ESXi 3.5 2008 vSphere VMware ESXi 4.0 2009 vSphere VMware ESXi 5.0 2011 vSphere The following dummy figure might help in understanding the ESX architecture. Fig. 3. ----------------------------------------- | ESX Host | -- ssh connection | | to Linux Console | ------ ------ | | agents | VM | | VM | | -- Agents communicate | ---------------------- | | | | | with VirtualCenter | |Linux | ------ ------ | or vCentre | |System Console | | | | |------------------ | |-kernel | VMKERNEL | | |-syslog | vmklinux | | |-accounts like root | | | |-some typical | | | | filesystems like | | | | /etc, /var/log |--------- | | |-drivers | | | |--------------------------------------| | CPU | | HARDWARE Memory | |----------------------------------------| 20.3 Command Line interfaces: ----------------------------- It can be quite confusing to understand and have a good overview of all CL interfaces as they emerged (and depreciated) with each new ESX or ESXi version. Here are a few examples A few typical cli's for an ESX v3 environment are: vmkfstools : used mainly for disk/ file system management vmware-cmd : used mainly for VM operations esxcfg-* : used mainly used to configure ESX (or use the graphical VI client or vSphere client) A few typical cli's for an ESX v4/v5 environment are: vim-cmd : used mainly for VM operations esxcli : used mainly used to configure ESXi If using a modern vSphere infrastructure, a VMWare Admin might also use "Powershell" for management of ESXi and Virtual Machines. Presently, CmdLets are available for that purpose. ============================================================================ 21. A few words on LVM, SAN, and filesystems: ============================================================================ 21.1 Get current info of diskdevices: ------------------------------------- From section 4, we have seen some commands to retrieve disk information from the system. Here a few commands are listed again: # fdisk -l # or use it like "fdisk -l /dev/sda" # lshw -class disk # sfdisk -l # cat /proc/scsi/scsi # lsscsi -c # has some interesting switches like # -l (long), -d (show magic numbers) # ls /sys/class/scsi_host # you might see output like # host0 host1 host2 host3 # lsblk -f # if available on your system, # it shows a tree of partitions # and filesystem types Especially "lsscsi" might be usefull, since it shows information of devices as ATA, SCSI, Fibre channel (FC), iSCSI and the like. (you might need to install it first) 21.2 Filesystems: ----------------- Once a new local disk is reckognized by the system, or what's more likely, a LUN on a SAN was made available, you then need to create a "filesystem" on that device before you can "mount" that filesystem and make it ready for use. You know that a "filesystem" is associated with any sort of storage device, like a physical disk. When you create a filesystem on a disk, the OS will organize it in allocation units, create specific areas for metadata (that is: sort of "bookkeeping" data structures), it does some sort of integrety check, and ultimately, makes it available for storage of files. There are many types of filesytems, where most have similar properties. But the newer ones have often much more extended support (for example for storing very large files, and supporting large partitions). Although all filesystems are all quite similar in the basis funcionality, some fileystems are better in some specific characteristic. - For example, a certain filesystem might be better in "journaling" (sort of logging) than others. - Or, for storing certain databases, you might have a preference for some filesytem over others. - As another example, there are also filesystem which are better equiped for "multi-node" access (clustering) than just the "normal" filesystems. So, in most Operating Systems, you have a lot of choice. Using the mount command, or df -T, you can quickly view what filesystem types which are mounted on your system. # df -T # mount The most popular filesystems found in Linux are: ext2, ext3 and ext4 XFS JFS ReiserFS And in certain cases, you might want to use special filesystems from certain Vendors, like: OCFS - Oracle Clustered Filesystems might be preffered in using Oracle clusters OCFS2 GPFS - General Parallel File System from IBM, to be used in clusters. VxFS - from Veritas GFS - from RedHat FAT,NTFS - Microsoft 21.3 Create Filesystems: ------------------------ => Local disk: So, in general, suppose you get a new local disk. Then what? Suppose you have "/dev/sda" as a new local SCSI disk. Then you would follow an approach like: # fdisk /dev/sda # use fdisk to create partitions # like sda1, sda2 Then create a filesystem on, for example sda1. If you want ext2, and mke2fs is available, you could do this: # mke2fs /dev/sda1 2048256 or like so: # mkfs -t ext2 -b 4096 /dev/sda1 or, if If you want ext3, you could do this: # fdisk /dev/sda # mkfs.ext3 /dev/sda1 Then you mount the new filesystem on a suitable mount point. # mount /dev/sda1 /data => LUN from a SAN: So, in general, suppose you want to use a new LUN from a SAN. Then what? Suppose it's an FC SAN. You'll need to install an FC card in your Linux box and load the appropriate driver. Then you'll need to configure the SAN gear to export LUNs. That is create the LUN, do the zoning and mapping (usually done by a Storage Admin). Once that's done, your Linux box should see the LUNs as Linux /dev/sdXY devices (just like a SCSI disk) that you can make filesystems on and mount them as usual. 21.4 Block and character devices: --------------------------------- If you take a look in the "/dev" directory, and make a listing using the special files over there, you might notice the "b" or "c" in front of the filemode or permission, like "brw-rw-rw-" or "crw-rw-rw-". One thing to understand is that these files are NOT the drivers for the devices. There are more like "pointers" to where the driver code can be found in the kernel. For example: brw-rw-rw- 2 bin bin 2,64 Dec 8 20:41 fd0 Here the "magic numbers" 2,64 tells you on which address the driver for "fd0" can be accessed. But what does the "b" or "c" tells us then? It shows us if the device is a "block" device or "character" device. A character device (file) is something that just gives a stream of characters that you read from or write to. A block device (file) is something that uses whole blocks to read to the cache and thats why it is neccessary for disks. 21.5 Detecting new disk devices, and bus-scanning: -------------------------------------------------- If, for example, a new LUN is made vailable, how do you make Linux to detect the device? A reboot usually scans all busses, but thats often not an option. Listing disks by using for example "fdisk -l" often does not help. The following might help: (1): On some distributions, you might find a script that will try to scan for new devices. For example, on RedHat, the following script exists: rescan-scsi-bus.sh On your specific system, a similar script might exist too. In general however, there usually are some limitations in using such scripts, so you need to scan the documentation of your system. (2): To initiate a SCSI bus rescan, type either: echo "1" > /sys/class/fc_host/hostX/issue_lip where X stands for the SCSI bus you want to scan. The command is essentially a bus reset, so do not use it on a busy system. or use: echo "- - -" > /sys/class/scsi_host/hostX/scan where again, X stands for the SCSI bus you want to scan. The command is essentially a full rescan of devices on that host. Depending on driver version, and kernel version, you might just need one command, or maybe even both. Usually, the scan commands are followed by a setup utility from the SAN manufacturer, like for example "powermt config". In many older situations, the /proc filesystem was used to communicate to kernel and drivers, like in for example: echo "scsi-qlascan" > /proc/scsi/qla2xxx/X where X again is the hostnumber. Whichever is appropriate for your specific system is ofcourse a bit hard to tell. I am afraid you need to spend some time investigating. ============================================================================ 22. Some remarks on how to autostart daemons on boottime: ============================================================================ When considering the question on how you can enable or disable the start of daemons (services) on Linux, a whole bunch of answers exists, which is also dependent on your particular distribution. Two "classical" solutions are: 1. Use the rc scripts: ---------------------- In general however, the following action should do the trick: To start a program automatically, go to the directory "/etc/init.d" and create a script that starts the program. Then make a link in "/etc/rc.d/rc3.d" that points to that script. So, suppose yhat you want, say, "apache" to start at boot time. Suppose you created a script "httpd" in "/etc/rc.d/init.d". Usually, such a script can take one of two parameters, namely 'start" and "stop". Now, create a symbolic link at /etc/rc.d/rc3.d folder with a name like for example "S80httpd" that is linked to /etc/rc.d/init.d/httpd #cd /etc/rc.d/rc3.d #ln -s ../init.d/httpd S80httpd Ofcourse keep in mind that scripts need to be executable, so set the filemode as approppriate. Note that I created a link in "/etc/rc.d/rc3.d", because usually the default runlevel often is "3". But some distributions use a default of 2, and some even treat 2-5 as being the same. 2. Use the "rc.local" file: --------------------------- Most Linux distributions use the "rc.local" file, which content will be executed at the end of the system initialization. This file is a bit like the "last minute" multi-user startup file. In "rc.local" you can put the desired startup commands. Usually, the rc.local can be found in "/etc", or otherwise try "/etc/rc.d". If you don't have it, you might try: # touch /etc/rc.local # chmod 700 /etc/rc.local Then put the command lines to start daemons (or services) in /etc/rc.local. For example, you could place the following record in rc.local: /etc/init.d/rc.d/httpd start The "rc.sysinit" file: ---------------------- You should not use the "rc.sysinit" file. It's more oriented for setting system parameters and settings, like network parameters, possibly swap, and many more... When init awakes it will parse the following: init -> reads the inittab (or init) file -> runs /etc/rc.d/rc.sysinit -> runs the rest of /etc/inittab -> inittab contains default runlevel: init runs all processes for that runlevel /etc/rc.d/rcN.d/ , -> runs /etc/rc.d/rc.local Usually, from inittab, init executes /etc/rc.d/rc.sysinit in new subshell. ============================================================================ 23. Some remarks on how to restart daemons on a running system: ============================================================================ In order see the status of all scripts and services in your system, use # chkconfig # chkconfig --list # service --status-all or to see the status of just one: # chkconfig --list # chkconfig | grep # ps -ef | grep -i # service --status-all | grep If you need to stop, or start, or restart a service or daemon on Linux, the following examples might help. Examples: => Red Hat and friends: # service nfs restart # service nfs start # service nfs stop Or, this works too: # service network restart # command similar to above #/etc/init.d/network restart # equivalent => On many other Linux distros: # /etc/init.d/nfs start # /etc/init.d/nfs stop # /etc/init.d/nfs restart # /etc/init.d/networking restart # /etc/init.d/networking start # /etc/init.d/networking stop ============================================================================ 24. Some SAN and SCSI talk: ============================================================================ 24.1 Some SCSI terminology, and how it's called in Linux: --------------------------------------------------------- It does not matter too much whether scsi commands are "encapsulated" in frames like for example used with an FC infastructure (using switches/directors), or with FCoE or iSCSI SAN, compared to the traditional local SCSI adapter: much terminology is the same. Lets take a look at a "traditional" locally installed SCSI adapter (HBA). - An adapter might have one or more "channels" or SCSI busses. So, in the case of multiple channels, each channel is it's own individual SCSI bus. See figure below. - Each SCSI bus can have multiple SCSI devices connected to it. - In narrow SCSI, we can have up to 8 SCSI devices, each identified by it's unique SCSI ID (0-7), where traditionally, the HBA takes ID 7. - In SCSI wide, we can have up to 16 SCSI devices, again each identified by it's unique SCSI ID (0-15). To illustrate this a bit, see the figure below. ---------------- scsi id7 ADAPTER or | channel/bus CONTROLLER ||----------------------------|--------------------|---------- | --------- --------- | [scsi id 2] [scsi id 3] | --------- --------- | |--lun0 |--lun0 | channel/bus |--lun1 |--lun1 ||------------- |--lun3 | ---------------- Now, suppose we have a SCSI device on the bus, for example with SCSI ID 2, which happens to be a CD Tower, using multiple CD Drives. Luckely, subadressing exists, so that the individual drives of this tower can be accessed. The devices, which reside under a certain SCSI ID, are called "Logical Units", identified by their "Logical Unit Number" or "LUN". Now, in this example we used a CD Tower, but it also can be some diskarray. To get to some LUN, the following "path" or full adress must be used. The list below shows the standard SCSI talk, and Linux talk. SCSI: Linux terminology: SCSI adapter number [host] channel number [bus] scsi id number [target] lun [lun] => So, in SCSI language, a "path" to a LUN would be: scsi_adapter, channel, scsi id, lun => Using the naming conventions of Linux, this becomes: host, bus, target, lun Later, when we use commands like "lsscsi" or take a look in "/proc/scsi/scsi" we will see output like [4:0:1:0] which is exactly the same as [host,bus,target,lun], or in scsi terms, [scsi adapter#, channel#, scsi id, lun#] Some remarks about "Initiator" and "Target": Suppose we have a PC using a SCSI card, where on one bus some SCSI disks are present. Now, suppose some application does a systemcall for some "file open". The OS will handle that, and ultimately, a driver takes care for the details. Anyway, the SCSI Card gets request from the driver. When we then consider then the processes on the SCSI bus, the SCSI card then acts as a controller called an "initiater". This one usually starts "the conversation" using SCSI commands. The "target" then, is one of the storage devices (like a SCSI disk) on the bus. When communication is performed between controllers and targets (and thus involving disks), typically the elements in transfer are "block address spaces" and "datablocks". That's why people often talk about "block I/O services" when discussing SAN's. 24.2 LUNs on SANS: ------------------ A socalled traditional LUN on a SAN, might come into being like this: The Storage Admin selects a couple of disks and create a RAID volume from those disks. At this point, a "Logical Unit" might be created. Then, the managing software for that SAN, will associate a "number" for that "Logical Unit", called a LUN. Please note that the "Logical Unit" might be seen as usable diskspace, once a "client" is able to "see" it. The LUN identifies this LU in the storage system. After some additional actions and zoning, the LUN in principle can be "seen" from the authorized client/initiator on the channel. However, on the client side, often some re-enummeration has to be done in order to see the new LUN as a disk. -Sometimes it's a facility of the OS, which can be called at any time. People then often say that "scanning the busses" needs to be performed. -Sometimes a special signal needs to be send to some module in the driver software stack. -Unfortunately, in some rare cases, only a reboot will help, since in some case, only then NVRAM or BIOS routines will rescan the busses. 24.3 Initiatives to (try to) unify driver stacks to access SANs: ---------------------------------------------------------------- Multiple SAN Vendors exists. Although the protocols and interfaces are quite well defined, the supported software (like drivers) for Linux is another matter. The risk exists that the Linux community would face a jungle of drivers, ways to implement software, and nummerous Vendor-tied issues. Initiative was taken to try to structure the ways to setup and maintain the needed software. Two main initiatives are: => STGT/TGT: Linux SCSI target framework TGT tries to simplify various SCSI target driver creation and maintenance, using iSCSI, Fibre Channel, SRP, and others. The framework encompass kernel-space and user-space code. The idea is that newer Linux kernels (as of 2.6.20) would/should be equipped with the supporting kernelcode, so that only userspace code needed to be installed. => LIO Target LIO Target is another multiprotocol SCSI target for Linux. This one too supports all modern protocols like iSCSI, Fibre Channel, InfiniBand (SRP), and a few other architectures. The same idea as with TGP applies here too. LIO went "upstream" into Linux with kernel version 2.6.38, and has become the standard unified block storage target in Linux. So, it seems that the Linux Community has favored the the LIO framework a bit, thereby not excluding any other target framework ofcourse. However, TGT went upstream as of of 2.6.20. So, users can choose which frameworks serves them best for a certain situation. However it's not all "whiskey and sunshine" here. Some distributions seems to have their own worries. Anyway, there are packages available for LIO, STGT and others like IET. ============================================================================ 25. Some notes on Backup/Restore: ============================================================================ A few notes on Disaster Recovery (DR) for Linux. In this context, we mean a proper way to recover the OS. 25.1 "Simple" backups are easy, but a DR solution is not: --------------------------------------------------------- Ofcourse, a number of "archiving" or backup tools are standardly available in Linux, like tar, cpio, dd and a number of others. So, using these tools, you can "backup" a file, or a number of files, or a directory, or a whole directorytree, to another location. So, suppose you have a lot of stuff in, say, "/apps", it is possible to create a tar file (containing the whole of /apps) on a backuplocation, for example some NFS mount, or tape, or just another filesystem on your machine. Example: # tar -cvf /backups/backup_apps.tar /apps Here, we create the tar (backup) file "backup_apps.tar" on nfs mount "/backups", and it contains the whole of "/apps". Or, a bit smarter, to compress the backupfile as well: # tar czf /backups/backup_apps.tar.gz /apps Note: If active database files are alive in some subdirectory of /apps, then they will not be backupped in a usable state. Creating backups using the standard tools (tar, cpio etc..) expects static or "cold" files. With respect to using the standard tools like tar, cpio, etc.., they will NOT enable you to create a proper "Disaster Recovery" solution for recovery of your whole OS environment. However, if you are familiar with "dd", you can. Although you can backup directories, or whole filesystems (and raw partitions), as a part of disaster recovery, you need to backup the root filesystem as well, and a way to boot from media, so you need "something" that holds your MBR and other bootareas as well. Also, even large third-party suites like TSM and many others, will help in creating good up-to-date backups of filesystems, but usually will not be of help in creating a product that immediately is usable when the bootdisk is corrupt. But these really are "bare metal" problems. If you run VM's under Xen or VMWare, DR solutions are around the corner. 25.2 Using VM's under a Virtualization Product (like ESX): DR is relatively easy: --------------------------------------------------------------------------------- If you would run Linux (or Windows) Virtual Machines under ESX, it's relatively easy to backup such a VM (the whole system). In a ESX Infrastructure, you can create a "snapshot" of a VM, which means you got the systemdisk in a file, which you can easily import again in case the "live" system goes bad. Also, since a VM is basically a .vmdf file in some datastore, it easy to copy it to a backup location. Here you have all the needed filesystems like "/" and "/boot", just stored "magically" in that vmdk file, while the ESX Host environment takes care of the conditions under which the VM will boot. So, usually a good DR compliant solution is available under virtualization. 25.3 Using Bare Metal: ---------------------- On a standard Linux distribution on a physical machine (bare metal), you can create backups of files, directories, and filesystems, but you cannot easily create a "single component" DR solution with the standard tools alone. Usually, you can easily backup data directories (like /apps), but making a good image of the Operating System is often not simple. Here we mean: you have a physical machine. Then, your bootdisk goes bad. Now, you want to apply a "one component" solution, by which we mean restoring MBR, grub, filesystems etc.. In short: the whole lot. There exists commercial, and Open Source tools, which can help you out. In the Open Source realm, you might take a look at the features of: - Mondorescue - Rear: Relax and Recover Both are able to backup a complete bootable system to a local disk like CDR, DVD and the like, or to a network mount from NFS. Often, the commandline operation of those tools are quite complicated, but I think it's worth the effort if your IT operations use important bare metal machines. The charm of these tools is, that you can create usable OS images on NFS mounts. So if you have tens of bare metal machines, these tools are certainly of interest. If you are only interested in backupping the OS of your PC, or a few machines, to CDR/DVD or USB stick, and to be able to boot from these media, much simpler solutions exists. Thousends of good articles can be found on the Internet. 25.4 Document your Servers: --------------------------- Whether you use a virtualized environment or not, having technical documentation about each server is very important. For your important servers, why don't you save the output of some important commands to a .txt file, like for example: cat /proc/scsi/scsi >> /home/admin/serverdoc.txt ls -al /sys/class/scsi_host >> /home/admin/serverdoc.txt df -h >> /home/admin/serverdoc.txt cat /etc/fstab >> /home/admin/serverdoc.txt cat /etc/exports >> /home/admin/serverdoc.txt raw -qa >> /home/admin/serverdoc.txt fdisk -l >> /home/admin/serverdoc.txt cat /proc/partitions >> /home/admin/serverdoc.txt cat /etc/inittab >> /home/admin/serverdoc.txt ls -al /boot >> /home/admin/serverdoc.txt cat /boot/grub/grub.conf >> /home/admin/serverdoc.txt cat /etc/*release >> /home/admin/serverdoc.txt uname -a >> /home/admin/serverdoc.txt cat /etc/group >> /home/admin/serverdoc.txt cat /etc/passwd >> /home/admin/serverdoc.txt ifconfig >> /home/admin/serverdoc.txt cat /etc/resolv.conf >> /home/admin/serverdoc.txt crontab -l >> /home/admin/serverdoc.txt lsmod >> /home/admin/serverdoc.txt etc.. I am sure you can "improve" this example easily ! This way, you have important config info of a server, in just one simple txt file. Then you could also script and schedule it. Then, from all your servers, you can collect those files into some repository. This then means you have pretty good technical documentation of all your Linux machines. ============================================================================ 26. Recovering the root password: ============================================================================ Suppose you don't know, or forgot, the root password of your Linux installation. Maybe the following will help: 1. Trivial option: maybe by using sudo? --------------------------------------- If sudo is implemented, and your account is authorized, you may try: $ sudo su - Sudo then asks for YOUR password. Next, you are entering root status, where you can change the root password. # passwd OK, very unlikely that it works, but it's worth a try if you knew that sudo was implemented. 2. Booting to single user mode: ------------------------------- As you might recall from Chapter 11, the "inittab" file defines the possible "runlevels" the system can enter at boottime. Usually, runlevel 3 is the default, meaning regular multi-user access. Runlevel 1 is the socalled "single user" mode, especially meant for specific maintenance purposes, where useraccess is not desired. "Single user mode" is sometimes also denoted by "emergency mode". When the system boots to single user mode, it should boot to the "#" prompt, from where you can change the password. On some systems, Grub will present "single user mode" as one of it's menu options. On other systems, you need to perform some actions when Grub has presented it's bootmenu. Usually, it goes like this. However there is no guarantee that it works for your specific system. When Grub presents it's menu: 1. go to the entry you want to modify (the Linux option) 2. use the "e" key to edit this entry 3. find the kernel line 4. at the end, add: single or single init=/bin/bash (see note) 5. use Esc to go back 6. press "b" to boot to single user mode 7. after a short time, hopefully a root shell appears. 8. Here you can use the passwd command Note: If the shell wants you to login, you might try to append single init=/bin/bash on the kernel line at step 4 instead of just 'single'. 3. Using a Rescue or Live CD/DVD: --------------------------------- Boot up from CD/DVD. Suppose that your harddisk based Linux has the root filesystem on /dev/sda2. Suppose that the booted live DVD has a tempfs on /tmp. 1. Mount the (harddisk) root partition in a directory, for example: # cd /tmp # mkdir mnt # mount /dev/sda2 /tmp/mnt 2. Bind the current /dev with the would-be root: # mount -–bind /dev /tmp/mnt/dev # eiher 2 or 1 "-" 3. changing the root file system: # chroot /mnt/tmp /bin/bash Now you can use: # passwd root And reboot the system, this time from harddisk. ============================================================================ 27. A few notes on checking/installing driver or modules of SAN HBA cards: ============================================================================ * IMPORTANT: * * Here you will find some general information only. * * It does not contain exact instructions on how to connect your system to * * SAN LUNs and how to configure them in a correct way. * In the former chapters, we have seen some commands to see which kernelmodules (lsmod, modprobe) are present, and what hardware is installed (lspci, lshw, lsscsi). Ofcourse you can use those commands to check out what SAN FC HBA card(s) are in your system, and what kernel modules are loaded, and what device info can be found in "sysfs" (/sys) and "procfs" (/proc). Note that with access to SANs, three main techniques are used: - (Traditional) Fiber Channel infrastructure, where fiber is used, and switches/directors connects hosts to SAN storage. - Fiber channel over Ethernet (FCoE), where FC is encapsulated in network packets. In effect, a network is used. - iSCSI, which is essentially SCSI over an true IP network infrastructure. Most what you will see implemented are the traditional FC fiber infrastructures and iSCSI networks. There is a difference between FCoE and iSCSI, although in both cases a network is used. In FCoE, the true FC protocol stack architecture is carried by the Layer 2 network (Ethernet). With iSCI, just SCSI commands are encapsulated in an IP network. With a (Traditional) Fiber Channel infrastructure, each HBA has a unique "World Wide Name" (WWN), which is similar to an Ethernet MAC address, so that this card can be uniquely defined. WWNs are 8 bytes strings. There are two types of WWNs on a HBA; a node WWN (WWNN), which can be shared all ports of a device, and a port WWN (WWPN), which is neccessary to uniquely identify each port. 27.1 Check to see if Fibre FC card and driver is installed: ----------------------------------------------------------- --> Check for kernel modules and hardware # lsmod | grep -i lpfc # Check if Emulex adapter driver module is loaded # lsmod | grep -i qla # or Qlogic # lsmod | grep -i scsi # or just see what SCSI related modules are there The adapter-type is often either qlaxxxx for QLogic adapters or lpfc for Emulex adapters. But many others are used as well. # lspci | grep -i fibre # grepping on keyword "fibre" # lspci | grep -i emulex # since Qlogic/Emulex is often used, # you might try that as well # lspci | grep -i qla # dmesg | grep -i fibre # recall that dmesg (among other things), # dmesg | grep -i emulex # can be used to view kernel boot messages # also, after boot, diagnostic messages can be viewed --> using /proc (procfs): # ls -al /proc/scsi/ # cat /proc/scsi/scsi # To display the SCSI devices currently # attached (and recognized) by the SCSI subsystem # ls -al /proc/scsi/lpfc --> using /sys (sysfs) Recall from chapter 24, how the hierarchy of hosts, busses, targets and luns are used. SCSI: Linux terminology: SCSI adapter number [host] channel number [bus] scsi id number [target] lun [lun] # ls -al /sys/class/scsi_disk Might show you luns in the form of paths [host#:bus#:target#:lun#] # ls -al /sys/class/fc_host/ Might show you dirs like host0, host1 etc.. # ls -al /sys/class/scsi_host Will show you hostN's too (N=0,1 etc..) => Why is that? Why two trees? An FC port can based on a the "true" physical port. But the newer FC protocols allow also for socalled virtual ports. Using the socalled "N_Port Id Virtualization (NPIV) mechanism, point-to-point connection to a "Fabric" can be assigned more than 1 N_Port_ID. So, usually, the driver will create a new scsi_host instance on the vport, resulting in a unique namespace for the vport. The result is that in all cases, whether a FC port is based on a physical port OR on a virtual port, each will appear as a unique "scsi_host" with its own target and lun space. # ls -al /sys/class/fc_transport Might show you dirs like target1:0:0, where 1 is host, 0 is the bus, and 0 is the target id. Often, the WWN can be found using: (1): # cat /sys/class/scsi_host/host1/device/fc_host:host1/port_name Here, in this example, we used "host1". (2): Or look in the "/proc/scsi/adapter_type/n" directory, where adapter_type is the host adapter type and n is the host adapter number for your card. --> Finding WWPN of a device, for example of "sdc": # scsi_id -g -u -s /block/sdc Note: ----- -> For 2.4 kernels, you can find the driver version in the "/proc/scsi/lpfc/n" directory, where n is the host adapter port that is represented as an SCSI host bus in Linux. -> For 2.6 kernels and higher, because of the newer to sysfs, the driver version might not be available in the /proc/scsi/lpfc/n directory. If so, go to the /proc/scsi/lpfc directory and inspect the values. Use "cat /sys/class/scsi_host/hostn/lpfc_drvr_version", where n is each of the values recorded from the /proc/scsi/lpfc directory. 27.2 Installing an Fibre FC card: --------------------------------- The following is ofcourse not an installation manual for FC drivers, but it serves to gives us an idea of a typical setup. Install the adapter card. If the card is not connected directly to the storage unit, or to a fabric switch port, install a "loop-back connector". This "loop-back connector" might be supplied with the adapter card. Next, reboot the server. Now, maybe you need to install the driver software... or maybe not ! Recall from section 24.3, that uniform driver stacks might already be present on your system. However, maybe you do indeed still need to install a driver kit. Say that we are installing an Emulex lpfc compatible driver. A typical session might go like this: - Download the driver kit from the Emulex Web site or copy it to the system from the installation CD. - Unpack the tarball with the following command: tar xzf lpfc_2.6_driver_kit-.tar.gz - Change to the directory that is extracted: cd lpfc_2.6_driver_kit- - Execute the 'lpfc-install' script (with no switches) to install the new driver kit. Use: # ./lpfc-install Ofcourse, every type of HBA will have it's own installation method. So the above is just an example. Such a script might compile .ko modules, put the stuff in the right directories, and then load the driver using "modprobe". Now if the SAN "exports" LUN's on a channel connected to your HBA, you should be able to see the devices. However, often a "re-scan" of the bus is neccessary. See also section 21.5. 27.3 The mapping of device names to scsi devices: ------------------------------------------------- So, we have the scsi disk device files like "/dev/sdb", and we can see stuff in "/proc/scsi/scsi", but how does that relate to each other? I mean, if you use "fdisk" or similar program, you think in terms of "/dev/sdb", and so does the kernel. So, there has to be some sort of "mapping" between what the kernel thinks, and to what the "scsi subssytem" thinks is the state of affairs. => If we use a command like this: # lsscsi -l we might see records like: [4:0:1:0] disk IBM 2145 0000 /dev/sdb state=running queue_depth=32 scsi_level=5 type=0 device_blocked=0 timeout=30 .. .. => If we now look in /proc/scsi/scsi, we see: # cat /proc/scsi/scsi .. Host: scsi4 Channel: 00 Id: 01 Lun: 00 Vendor: IBM Model: 2145 Rev: 0000 .. Then we see more or less the same info. From the "lsscsi" command alone, we can that the path to the LUN: [4:0:1:0], actually corresponds to "/dev/sdb". The SCSI subsystem like to think in fully qualified addresses, like [4:0:1:0], which actually defines a "path" (so to speak) to a LUN, so that it can easily address that LUN. Allright, we have been able to relate the usual disk device names to the identifiers we can see in /proc/scsi and in several places in /sys. Note that there is no relationship between SCSI devices and partitions. So, your system may get a LUN from a SAN, like [a:b:c:d] which gets a device name like sxy. 27.4 Modifying the initial ram-disk ("initrd" or "initramfs"): -------------------------------------------------------------- In Chapter 12, we have spend a few words on the Linux bootprocess. At a certain stage, "initrd" expose a mini filesystem so that the kernel can obtain all needed modules and mount all "real" filesystems. If you have a Linux driver that does not automatically configure any LUNs other than LUN 0, then we need to let the system to detect the LUNs automatically when the system is started, by modifying the initial ram-disk (initrd). If only LUN 0's are detected, we need to perform the step as described later in this section. Note: Other reasons for initrd (or initramfs): We cannot expect the kernel to know about all the possible hardware and disk devices in the world. So, if filesystems are on specific diskdevices, it needs to load the neccessary kernel modules first. Thats why in general: - Or you use a kernel with pre-enabled support for all devices connected to your system. So, this is "compiled-in" support for a certain hardware driver. - Or you use an initrd preliminary root file system image, where the kernel can find what it needs. Here the kernel uses loadable modules for supporting all devices. So if you want the kernel to use specific hardware (e.g. SCSI HBA), you can either "put" (compile) the driver into the kernel, or you make it easy for the kernel to find the appropriate modules. So, if not all SCSI devices are available at boot, and we want the kernel to find everything using an initrd ram disk: 1. Configure SCSI mid-layer: Usually, we should start with instructions for the SCSI mid-layer driver (that controls how many LUNs are scanned during a SCSI bus scan), that we need it to scan for more. - Open the /etc/modules.conf file. - For Linux 2.4 kernels, add the following line: options scsi_mod max_scsi_luns=n - For Linux 2.6 kernels, add the following line: options scsi_mod max_luns=n n is the total number of LUNs to probe, like for example 64, 128. 2. Rebuild the ram-disk for the current configuration: You probably do not need this if you build your scsi drivers right into the kernel, instead of into modules. Otherwise, we need to update the ram disk. To rebuild the ram-disk associated with the current kernel, use the appropiate command for your specific operating system. For example on RedHat: # cd /boot # mkinitrd –v initrd-kernel.img kernel where "kernel" is the string you see using "uname -r". So, on RedHat and similar systems, we can use the "mkinitrd" command. On other distributions, similar commands are available like "mkinitramfs", which uses a slightly different approach (using a cpio archive). >> IMPORTANT ! << If you want to use mkinitrd, you need more information, and use it a test system first. (!!!) ============================================================================ 28. Some special filesystems: ============================================================================ 28.1 The "tmpfs" filesystem, and "/dev/shm": -------------------------------------------- The "tmpfs" filesystem is actually backed by "virtual memory", and often you can see a mount on "/dev/shm" of type tmpfs. However, on your particular system, do not be very surprised if you do not see "/dev/shm" as a mountpoint. Since it's memory based, after a reboot it is cleared. It was designed for programs to communicate using shared memory, and for increasing performance if programs store temporary files in this filesystem. Some folks say that the "tmpfs" filesystem type, when mounted on a certain mountpoint, is just like a "ram disk". There can be some confusion if you compare it to the familiar "/tmp" mount. Usually, "/tmp" is diskbased, but sometimes it's memory based as well For "/tmp", there can be several implementations: - it can be of type "tmpfs" as well, so it's memory based. - it can be just a directory within "/", so it's disk based. - it can be a seperate partition, mounted on "/tmp", so it's disk based as well. Use "df -h" and take a look at fstab ("cat /etc/fstab") to find out how it is implemented on your system. So, in general, - there exists a filesystem "type" tmpfs (memory based). - you might have a mountpoint "/dev/shm", which is memory based. - you have a /tmp mountpoint, which might be diskbased, or memory based. Really, it's not "vaque" or something. Just keep in mind that not all distributions and versions take the same approach. 28.2 The /proc and /sys pseudo filesystems: ------------------------------------------- The pseudo or virtual "/proc" filesystem on a running system, can be seen as a sort of "window" to view kernel data structures. Here, subdirectories exists for all running processes, as well as for system resources, that is, the values of swap, memory, disks, cpu etc.. In most cases, consider it to be as "read only". However, in some cases you can use it to send information to the kernel as well. Also, whenever you hear of a "virtual filesystem", it means that it's memory based, build when the system boots, and maintained during runtime. In a sense, a newer, more structured version of proc is available (since kernel 2.6), which is called "sysfs". This too is a virtual filesystem, and it sort of exports the "device tree", and system information, through the use of such a virtual filesystem. You can see it by browsing through "/sys". You might say that "/proc" is more focussed on processes, while "/sys" is a new way to obtain device- and system information. 28.3 The "/dev/mapper" device mapper : -------------------------------------- In general, your system might have disk access through: - Directly Attached Storage (DAS), which could be some internal disk, or even a disk array, directly attached on a local SCSI HBA card. - SAN, for which a couple of variations exists like FC, iSCSI - NAS, meaning using real "file based IO" (instead of block IO), like a Network Attached Storage device. When you consider DAS, or SAN LUNs, you can treat the storage in the "traditional" way, or you make use of a LVM. => Not using Logical Volume Management (LVM): Traditionally, local harddisks are divided into partitions. Next, a "filesystem", like ext3, is written directly on a partition. This is how you typically would use Linux, on some simple PC system. - Now, with the traditional methods, you cannot, for example, add two disks "together" to form some sort of larger contiguous volume, which you logically can use a one "disk". - Also, using the traditional methods, you cannot create redudant information for high availability purposes, like RAID 1 (mirrorring of a disk or partition), or RAID 5 and other RAID implementations. => Using an LVM: In Linux, LVM is often implemented using "LVM2", or EVMS, or Veritas LVM, or some other LVM. In LVM terminology, physical disks are called PV's (Physical Volumes). The key point is that you create one or more "Volume Groups" (VG) from the available PV's. Each VG is thus made up of a pool of Physical Volumes (PVs). You can extend (or reduce) the size of a VG, by adding or removing a PV, as desired. Once a VG is in place, you carve out (using LVM commands) a Logical Volume (LV) on which you place a filesystem. Note that an LV can span multiple disks (PV's). A LVM also provide means for redundancy, like creating a mirrorred LV, which makes the system much more robust. As another plus, its easy to increase the size of an existing LV, so if a filesystem like /apps is (under the hood) actually some LV, it's easy to increase the size of /apps, as long as the Volume Group has space available for that purpose. Here is a very simple session. Suppose you have three extra disks, like /dev/sda, /dev/sdb, and /dev/sdc. Don't forget: below are generic commands. On your system, they might take a slightly different form. But it should give you a reasonable idea on how we act in a typical LVM environment. -> Step 1: formally add the disks to the LVM as usable PV's: # pvcreate /dev/sda /dev/sdb /dev/sdc So, now the LVM "knows" these disks are available as PV's. -> Step 2: create a Volume group from the PV's # vgcreate datavg /dev/sda1 /dev/sdb1 /dev/sdc1 Here we have called the VG "datavg" and used all 3 disks. -> Step 3: create one or more LV's in "datavg": # lvcreate -L 500G -n oraclelv datavg Here we have called the Logical Volume "oraclelv". Next, we can create a filesystem on "oraclelv", using known methods: # mkfs -t ext3 -v /dev/datavg/oraclelv So where does the "device mapper" comes in? The Device-mapper is a standard component of the 2.6 (or higher) linux kernel, which supports logical volume management in a "more natural way". It keeps track between the physical devices, and the "logical entities" used in LVM, like logical volumes. Also, it manages the realtion between the physical device files (like /dev/sdb) and the entities found in "/proc/scsi/scsi" and "/sys/class/scsi_disk". Note: With multipath SAN connections, the device mapper is even more "notable". Here is an example of the "df -h" output of a machine using SAN luns. Note the "/dev/mapper/" part in the devicenames. [root@zigzag tmp]# df -h Filesystem Size Used Avail Use% Mounted on /dev/cciss/c0d0p7 3.9G 715M 3.0G 19% / /dev/cciss/c0d0p6 7.8G 1.4G 6.0G 19% /usr /dev/cciss/c0d0p5 7.8G 3.8G 3.7G 51% /home /dev/cciss/c0d0p3 7.8G 788M 6.6G 11% /var /dev/cciss/c0d0p8 85G 32G 49G 40% /apps /dev/cciss/c0d0p1 494M 17M 452M 4% /boot tmpfs 31G 176M 31G 1% /dev/shm /dev/mapper/ocfs2backupp1 200G 44G 157G 22% /ocfs2_backup /dev/mapper/ocfs2appp1 30G 2.5G 28G 9% /ocfs2_app We will see some more of this in section 30, where touch on the implementation of "multipath" SAN storage connections, but, on RedHat specifically. ============================================================================ 29. A few typical examples of partitioning and creating filesystems: ============================================================================ On earlier occasions in this note, we already have seen how to use (for example) fdisk, to partition a disk, and next how to create a filesystem on that new partition. Let's again walk through a few simple examples. Example 1. A simple session on a simple local disk (not using LVM): ------------------------------------------------------------------- Suppose you have a new local disk, with the device file "/dev/sda". A simple session looks like this: # fdisk /dev/sda # first partition the disk. Fdisk will ask # a few questions like if you want a Primary # or Extended partition, the ending track etc.. # mkfs.ext3 -b 4096 /dev/sda1 # create a filesystem of type ext3 on sda1 # mkdir /data # create a dir that will serve as a # mountpoint # mount -t ext3 /dev/sda1 /data # mount it, so that it becomes available If you want the mount to be available after the next boot, then edit /etc/fstab and add a record resembling this: /dev/sda1 /data ext3 defaults 1 1 On your distribution, you might have several "mkfs-like" commands available, which basically all do the same thing. It's just that some are shells "over" other commands, just for the purpose to make it easier for us. So, it's likely that your distribution has a script named "/sbin/mkfs.ext3" (which we used above). Above, we could also have just used the "mkfs -t ext3" command. Here are a few other examples: # mke2fs /dev/sda1 # make an ext2 filesystem on sda1 # mke2fs –j /dev/sda1 # make an ext3 filesystem, # due to the -j (journaling) switch # mke2fs -t ext4 /dev/sda1 # make an ext4 filesystem Exercise: --------- If you are not too well aware about the different capabilities and properties of the ext2, ext3, and ext4 filesystems, then use a search machine and find more information. Especially, take notice of the max file sizes, max Partition size, journaling options, and whatever else you might find interresting. Example 2. Creating a filesystem on a Logical Volume (using a LVM): ------------------------------------------------------------------- So, how do we go about if using a LVM? Suppose you have a SCSI controller, which controls a diskarry on your system. So, here we say that we have DAS, or Directly Attached Storage. But, what we will see below, holds for SAN Storage too. However, in case of SAN storage, its wise to get more details about the implementation of LUNs in the SAN, and if additional features are in use like "multipath IO". In section 28.3, we outlined the creation of a Volume Group (VG), from Physical Volumes (PVs). Please take a look at that section again. Indeed, here we already saw an example of creating a filesystem on a Logical Volume (LV). Here is another example, using mke2fs for creating the filesystem: # vgcreate datavg /dev/sda1 /dev/sdb1 /dev/sdc1 # lvcreate -L 500G -n oraclelv datavg # oraclelv is the LV # within the datavg Volume Group # mkdir /oracle # create the directory/mountpoint # mke2fs -j /dev/datavg/oraclelv # create an ext3 filesystem # mount /dev/datavg/oraclelv /oracle # mount it to make it usable Note that "partitioning" is implemented in the creation of Logical Volumes. So, here we do not use a tool like "fdisk" or "parted" anymore. Note: ----- Ofcourse, the wellknown "fdisk" utility, does it's work very well. However, if your system has block devices that are greater than 2T, you should the "parted" command to create (or remove) partitions. Parted is quite an extensive utility, with a number of subcommands like "cp", "mkpart" etc.. Example session: # parted GNU Parted 2.3 Using /dev/sda Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) select /dev/sda (parted) print ... shows all partitions of the selected disk sda (parted) select /dev/sdc # let's partition /dev/sdc (parted) unit TB (parted) mkpart primary 0.00TB 5.00TB (parted) print ... shows all partitions of the selected disk sdc It really deserves to have a manual for itself, and indeed it has. You might like to take a look at: http://www.gnu.org/software/parted/manual/html_mono/parted.html ============================================================================ 30. A few notes on implementing multipath IO to a FC SAN: ============================================================================ This section relies on all the theory, and command usage, as we have seen already in earlier chapters. 30.1 See your FC cards and drivers: ----------------------------------- -> FC Cards: For example, to see your (FC) HBA cards, we can use: # ls -al /sys/class/scsi_host or specifically for FC: # ls -al /sys/class/fc_host/ drwxr-xr-x 3 root root 0 Jan 13 2012 host0 drwxr-xr-x 3 root root 0 Jan 13 2012 host1 So, in this example we have two FC Cards, or HBA's, which are connected to a SAN Fabric or switch. -> Drivers: Use commands like: # lsmod | grep -i scsi # lsmod | grep -i lpfc # Emulex # lsmod | grep -i qla # Qlogic Now suppose we have Qlogic HBA cards, then we might see output like: qla2xxx 1133797 32 scsi_transport_fc 73800 1 qla2xxx scsi_mod 196697 7 scsi_dh,sg,qla2xxx,scsi_transport_fc,libata, cciss,sd_mod Indeed, the kernel modules are thus active. 30.2 Path to a SAN LUN: ----------------------- Also, recall that a "path" to an exposed LUN (disk) in the SAN, is expressed like: => In SCSI language: scsi_adapter#, channel#, scsi id, lun# => Using the naming conventions of Linux, this becomes: host#, bus#, target#, lun# Many commands will show LUN paths like for example [2:0:1:0] in which we can reckognize the [host,bus,target,lun] notation. 30.3 Multipath if using 2 FC cards: ----------------------------------- For HA reasons, a Server often has two FC cards. This is called "multipath". There are two main modes: - failover: one FC card is active, the other one is idle. In case of a problem, the SAN connection will be taken over by the former idle card. - aggregated: both cards are active at the same time, and possibly some loadbalancing procedure is in place. In it's most basic form, the setup resembles this: ----------------------- | Server | hba1= host0 | | hba2= host2 | [hba1] [hba2] | -------|---------|----- | | | | ----------------------- | | | | SWITCH | ----------- | | | | | ----------------------- | | [|] [|] ----------------------- SAN| | | | | | | | | |-------[lun] | -----------------|----- You see the 2 (or 4) paths to that LUN? (depending on the exact implementation). 30.4 How many physical devices are shown: ----------------------------------------- Suppose the SAN Admin has done all actions needed, to expose a LUN to our Server. In the setup above, at least two paths to the same LUN exists, but usually in such a setup, 4 possible paths exists. Now, suppose we formerly had only internal disks like hba and hbb. So, when Linux has performed it's scanning of all busses, we might see the following new devices: # ls –al /dev/sd* brw-r----- 1 root disk 8, 0 Oct 3 17:21 sda brw-r----- 1 root disk 8, 16 Oct 3 17:21 sdb brw-r----- 1 root disk 8, 32 Oct 3 17:23 sdc brw-r----- 1 root disk 8, 48 Oct 3 17:23 sdd This is what Linux "thinks" is going on. It can access a diskdevice along 4 paths, so it created 4 device files. We know that in reality, it is just the same diskdevice. Now, let's take a look at this: #ls -al /sys/class/scsi_disk total 0 drwxr-xr-x 6 root root 0 Oct 3 17:23 . drwxr-xr-x 42 root root 0 Oct 3 17:21 .. drwxr-xr-x 2 root root 0 Oct 3 17:21 0:0:0:0 drwxr-xr-x 2 root root 0 Oct 3 17:21 0:0:1:0 drwxr-xr-x 2 root root 0 Oct 3 17:23 1:0:0:0 drwxr-xr-x 2 root root 0 Oct 3 17:23 1:0:1:0 Here, 4 "LUNs" are shown, due to the fact that 4 paths are available. If you "read" those paths, you can see that LUN0 can be reached via host0, that is, "[0:", and via host1, that is "[1:". Next, we will discuss what we need to do using a specific distribution, namely RedHat. With other distributions, a similar approach is followed. 30.5 Installing DM-Multipath software: -------------------------------------- Having drivers and connections, is not enough. We need a specific "multipath kernel module", and a "service" which monitors the HBA's and all paths. In case of failure, the kernelmodule will switch IO to the idle card. Installing and configuring the software means the following: - install: # rpm -ivh device-mapper-multipath.rpm - configure: Edit the configuration file "/etc/multipath.conf" to configure which devices with a certain WWID (see the next section) will fall under multipath, and which device to ignore (blacklisting). - starting the daemon: # service multipathd start # chkconfig multipathd on As you can see, the daemon is "multipathd", which should show up using the "ps -ef" command. However, you should not start the daemon, before "/etc/multipath.conf" is configured correctly. 30.6 How it works, and "/etc/multipath.conf": --------------------------------------------- To edit the configuration file "/etc/multipath.conf" in the correct way is crucial. If we take a look again at our four devices, let's determine the WWID, which is supposed to be a unique identifier for each device: # scsi_id -g -u -s /block/sda 3600601601310310095182d72710de211 # scsi_id -g -u -s /block/sdb 3600601601310310095182d72710de211 # scsi_id -g -u -s /block/sdc 3600601601310310095182d72710de211 # scsi_id -g -u -s /block/sdd 3600601601310310095182d72710de211 You see? They all have the same identifier, that is, the same WWID ! The trick now really is, to place an entry in /etc/multipath.conf, specifying the WWID which should fall under multipath, and specifying a friendly name. So, let's edit "/etc/multipath.conf": root@zigzag /etc# vi multipath.conf devnode_blacklist { devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" devnode "^hd[a-z]" devnode "^cciss!c[0-9]d[0-9]*" } multipaths { multipath { wwid 3600601601310310095182d72710de211 alias ocfs2ora path_grouping_policy failover } } There are 2 essential "parts" here. The "devnode_blacklist" section tells the software which devices to ignore. Here it means that all devices like raw,loop,fd,md,hd etc.., must be ignored. The "multipaths" section, tells the daemon which devices have the same WWID, and thus are the same, and it is exactly that device which falls under multipath. Also, an alias is specified, which means the "device mapper" will create the friendly name "/dev/mapper/ocfs2ora". So, if we now want to use "fdisk", or "parted", or if we want to create a Physical Volume (in LVM), we *should* now use this friendly name. When the software is installed, you also have the "multipath" command available with which you can check which paths and devices are actually the stuff under the "alias". In our example: # multipath -l ocfs2ora (3600601601310310095182d72710de211) dm-2 HP,HSV210 [size=300G][features=1 queue_if_no_path][hwhandler=0][rw] \_ round-robin 0 [prio=0][active] \_ 0:0:0:0 sda 8:32 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:1:0 sdb 8:96 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 1:0:0:0 sdc 8:160 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 1:0:1:0 sdd 8:224 [active][undef]