10 essential tricks for admins


The best systems administrators are set apart by their efficiency. And if an efficient systems administrator can do a task in 10 minutes that would take another mortal two hours to complete, then the efficient systems administrator should be rewarded (paid more) because the company is saving time, and time is money, right?
The trick is to prove your efficiency to management. While I won't attempt to cover that trick in this article, I will give you 10 essential gems from the lazy admin's bag of tricks. These tips will save you time—and even if you don't get paid more money to be more efficient, you'll at least have more time to play Halo.
The newbie states that when he pushes the Eject button on the DVD drive of a server running a certain Redmond-based operating system, it will eject immediately. He then complains that, in most enterprise Linux servers, if a process is running in that directory, then the ejection won't happen. For too long as a Linux administrator, I would reboot the machine and get my disk on the bounce if I couldn't figure out what was running and why it wouldn't release the DVD drive. But this is ineffective.
Here's how you find the process that holds your DVD drive and eject it to your heart's content: First, simulate it. Stick a disk in your DVD drive, open up a terminal, and mount the DVD drive:
# mount /media/cdrom
# cd /media/cdrom
# while [ 1 ]; do echo "All your drives are belong to us!"; sleep 30; done
Now open up a second terminal and try to eject the DVD drive:
# eject
You'll get a message like:
umount: /media/cdrom: device is busy
Before you free it, let's find out who is using it.
# fuser /media/cdrom
You see the process was running and, indeed, it is our fault we can not eject the disk.
Now, if you are root, you can exercise your godlike powers and kill processes:
# fuser -k /media/cdrom
Boom! Just like that, freedom. Now solemnly unmount the drive:
# eject
fuser is good.
Try this:
# cat /bin/cat
Behold! Your terminal looks like garbage. Everything you type looks like you're looking into the Matrix. What do you do?
You type reset. But wait you say, typing reset is too close to typing reboot or shutdown. Your palms start to sweat—especially if you are doing this on a production machine.
Rest assured: You can do it with the confidence that no machine will be rebooted. Go ahead, do it:
# reset
Now your screen is back to normal. This is much better than closing the window and then logging in again, especially if you just went through five machines to SSH to this machine.
David, the high-maintenance user from product engineering, calls: "I need you to help me understand why I can't compile supercode.c on these new machines you deployed."
"Fine," you say. "What machine are you on?"
David responds: " Posh." (Yes, this fictional company has named its five production servers in honor of the Spice Girls.) OK, you say. You exercise your godlike root powers and on another machine become David:
# su - david
Then you go over to posh:
# ssh posh
Once you are there, you run:
# screen -S foo
Then you holler at David:
"Hey David, run the following command on your terminal: # screen -x foo."
This will cause your and David's sessions to be joined together in the holy Linux shell. You can type or he can type, but you'll both see what the other is doing. This saves you from walking to the other floor and lets you both have equal control. The benefit is that David can watch your troubleshooting skills and see exactly how you solve problems.
At last you both see what the problem is: David's compile script hard-coded an old directory that does not exist on this new server. You mount it, recompile, solve the problem, and David goes back to work. You then go back to whatever lazy activity you were doing before.
The one caveat to this trick is that you both need to be logged in as the same user. Other cool things you can do with the screencommand include having multiple windows and split screens. Read the man pages for more on that.
But I'll give you one last tip while you're in your screen session. To detach from it and leave it open, type: Ctrl-A D . (I mean, hold down the Ctrl key and strike the A key. Then push the D key.)
You can then reattach by running the screen -x foo command again.
You forgot your root password. Nice work. Now you'll just have to reinstall the entire machine. Sadly enough, I've seen more than a few people do this. But it's surprisingly easy to get on the machine and change the password. This doesn't work in all cases (like if you made a GRUB password and forgot that too), but here's how you do it in a normal case with a Cent OS Linux example.
First reboot the system. When it reboots you'll come to the GRUB screen as shown in Figure 1. Move the arrow key so that you stay on this screen instead of proceeding all the way to a normal boot.

Figure 1. GRUB screen after reboot
GRUB screen after reboot
Next, select the kernel that will boot with the arrow keys, and type E to edit the kernel line. You'll then see something like Figure 2:

Figure 2. Ready to edit the kernel line
Ready to edit the kernel line
Use the arrow key again to highlight the line that begins with kernel, and press E to edit the kernel parameters. When you get to the screen shown in Figure 3, simply append the number 1 to the arguments as shown in Figure 3:

Figure 3. Append the argument with the number 1
Append the argument with the number 1
Then press EnterB, and the kernel will boot up to single-user mode. Once here you can run the passwd command, changing password for user root:
sh-3.00# passwd
New UNIX password:
Retype new UNIX password:
passwd: all authentication tokens updated successfully
Now you can reboot, and the machine will boot up with your new password.
Many times I'll be at a site where I need remote support from someone who is blocked on the outside by a company firewall. Few people realize that if you can get out to the world through a firewall, then it is relatively easy to open a hole so that the world can come into you.
In its crudest form, this is called "poking a hole in the firewall." I'll call it an SSH back door. To use it, you'll need a machine on the Internet that you can use as an intermediary.
In our example, we'll call our machine blackbox.example.com. The machine behind the company firewall is called ginger. Finally, the machine that technical support is on will be called tech. Figure 4 explains how this is set up.

Figure 4. Poking a hole in the firewall
Poking a hole in the firewall
Here's how to proceed:
  1. Check that what you're doing is allowed, but make sure you ask the right people. Most people will cringe that you're opening the firewall, but what they don't understand is that it is completely encrypted. Furthermore, someone would need to hack your outside machine before getting into your company. Instead, you may belong to the school of "ask-for-forgiveness-instead-of-permission." Either way, use your judgment and don't blame me if this doesn't go your way.
  2. SSH from ginger to blackbox.example.com with the -R flag. I'll assume that you're the root user on ginger and that tech will need the root user ID to help you with the system. With the -R flag, you'll forward instructions of port 2222 on blackbox to port 22 on ginger. This is how you set up an SSH tunnel. Note that only SSH traffic can come into ginger: You're not putting ginger out on the Internet naked.
    You can do this with the following syntax:
    ~# ssh -R 2222:localhost:22 thedude@blackbox.example.com
    Once you are into blackbox, you just need to stay logged in. I usually enter a command like:
    thedude@blackbox:~$ while [ 1 ]; do date; sleep 300; done
    to keep the machine busy. And minimize the window.
  3. Now instruct your friends at tech to SSH as thedude into blackbox without using any special SSH flags. You'll have to give them your password:
    root@tech:~# ssh thedude@blackbox.example.com .
  4. Once tech is on the blackbox, they can SSH to ginger using the following command:
    thedude@blackbox:~$: ssh -p 2222 root@localhost
  5. Tech will then be prompted for a password. They should enter the root password of ginger.
  6. Now you and support from tech can work together and solve the problem. You may even want to use screen together! (See Trick 4.)
VNC or virtual network computing has been around a long time. I typically find myself needing to use it when the remote server has some type of graphical program that is only available on that server.
For example, suppose in Trick 5, ginger is a storage server. Many storage devices come with a GUI program to manage the storage controllers. Often these GUI management tools need a direct connection to the storage through a network that is at times kept in a private subnet. Therefore, the only way to access this GUI is to do it from ginger.
You can try SSH'ing to ginger with the -X option and launch it that way, but many times the bandwidth required is too much and you'll get frustrated waiting. VNC is a much more network-friendly tool and is readily available for nearly all operating systems.
Let's assume that the setup is the same as in Trick 5, but you want tech to be able to get VNC access instead of SSH. In this case, you'll do something similar but forward VNC ports instead. Here's what you do:
  1. Start a VNC server session on ginger. This is done by running something like:
    root@ginger:~# vncserver -geometry 1024x768 -depth 24 :99
    The options tell the VNC server to start up with a resolution of 1024x768 and a pixel depth of 24 bits per pixel. If you are using a really slow connection setting, 8 may be a better option. Using :99 specifies the port the VNC server will be accessible from. The VNC protocol starts at 5900 so specifying :99 means the server is accessible from port 5999.
    When you start the session, you'll be asked to specify a password. The user ID will be the same user that you launched the VNC server from. (In our case, this is root.)
  2. SSH from ginger to blackbox.example.com forwarding the port 5999 on blackbox to ginger. This is done from ginger by running the command:
    root@ginger:~# ssh -R 5999:localhost:5999 thedude@blackbox.example.com
    Once you run this command, you'll need to keep this SSH session open in order to keep the port forwarded to ginger. At this point if you were on blackbox, you could now access the VNC session on ginger by just running:
    thedude@blackbox:~$ vncviewer localhost:99
    That would forward the port through SSH to ginger. But we're interested in letting tech get VNC access to ginger. To accomplish this, you'll need another tunnel.
  3. From tech, you open a tunnel via SSH to forward your port 5999 to port 5999 on blackbox. This would be done by running:
    root@tech:~# ssh -L 5999:localhost:5999 thedude@blackbox.example.com
    This time the SSH flag we used was -L, which instead of pushing 5999 to blackbox, pulled from it. Once you are in on blackbox, you'll need to leave this session open. Now you're ready to VNC from tech!
  4. From tech, VNC to ginger by running the command:
    root@tech:~# vncviewer localhost:99 .
    Tech will now have a VNC session directly to ginger.
While the effort might seem like a bit much to set up, it beats flying across the country to fix the storage arrays. Also, if you practice this a few times, it becomes quite easy.
Let me add a trick to this trick: If tech was running the Windows® operating system and didn't have a command-line SSH client, then tech can run Putty. Putty can be set to forward SSH ports by looking in the options in the sidebar. If the port were 5902 instead of our example of 5999, then you would enter something like in Figure 5.

Figure 5. Putty can forward SSH ports for tunneling
Putty can forward SSH ports for tunneling
If this were set up, then tech could VNC to localhost:2 just as if tech were running the Linux operating system.
Imagine this: Company A has a storage server named ginger and it is being NFS-mounted by a client node named beckham. Company A has decided they really want to get more bandwidth out of ginger because they have lots of nodes they want to have NFS mount ginger's shared filesystem.
The most common and cheapest way to do this is to bond two Gigabit ethernet NICs together. This is cheapest because usually you have an extra on-board NIC and an extra port on your switch somewhere.
So they do this. But now the question is: How much bandwidth do they really have?
Gigabit Ethernet has a theoretical limit of 128MBps. Where does that number come from? Well,
1Gb = 1024Mb1024Mb/8 = 128MB; "b" = "bits," "B" = "bytes"
But what is it that we actually see, and what is a good way to measure it? One tool I suggest is iperf. You can grab iperf like this:
# wget http://dast.nlanr.net/Projects/Iperf2.0/iperf-2.0.2.tar.gz
You'll need to install it on a shared filesystem that both ginger and beckham can see. or compile and install on both nodes. I'll compile it in the home directory of the bob user that is viewable on both nodes:
tar zxvf iperf*gz
cd iperf-2.0.2
./configure -prefix=/home/bob/perf
make
make install
On ginger, run:
# /home/bob/perf/bin/iperf -s -f M
This machine will act as the server and print out performance speeds in MBps.
On the beckham node, run:
# /home/bob/perf/bin/iperf -c ginger -P 4 -f M -w 256k -t 60
You'll see output in both screens telling you what the speed is. On a normal server with a Gigabit Ethernet adapter, you will probably see about 112MBps. This is normal as bandwidth is lost in the TCP stack and physical cables. By connecting two servers back-to-back, each with two bonded Ethernet cards, I got about 220MBps.
In reality, what you see with NFS on bonded networks is around 150-160MBps. Still, this gives you a good indication that your bandwidth is going to be about what you'd expect. If you see something much less, then you should check for a problem.
I recently ran into a case in which the bonding driver was used to bond two NICs that used different drivers. The performance was extremely poor, leading to about 20MBps in bandwidth, less than they would have gotten had they not bonded the Ethernet cards together!
A Linux systems administrator becomes more efficient by using command-line scripting with authority. This includes crafting loops and knowing how to parse data using utilities like awkgrep, and sed. There are many cases where doing so takes fewer keystrokes and lessens the likelihood of user errors.
For example, suppose you need to generate a new /etc/hosts file for a Linux cluster that you are about to install. The long way would be to add IP addresses in vi or your favorite text editor. However, it can be done by taking the already existing /etc/hosts file and appending the following to it by running this on the command line:
# P=1; for i in $(seq -w 200); do echo "192.168.99.$P n$i"; P=$(expr $P + 1);
done >>/etc/hosts
Two hundred host names, n001 through n200, will then be created with IP addresses 192.168.99.1 through 192.168.99.200. Populating a file like this by hand runs the risk of inadvertently creating duplicate IP addresses or host names, so this is a good example of using the built-in command line to eliminate user errors. Please note that this is done in the bash shell, the default in most Linux distributions.
As another example, let's suppose you want to check that the memory size is the same in each of the compute nodes in the Linux cluster. In most cases of this sort, having a distributed or parallel shell would be the best practice, but for the sake of illustration, here's a way to do this using SSH.
Assume the SSH is set up to authenticate without a password. Then run:
# for num in $(seq -w 200); do ssh n$num free -tm | grep Mem | awk '{print $2}';
done | sort | uniq
A command line like this looks pretty terse. (It can be worse if you put regular expressions in it.) Let's pick it apart and uncover the mystery.
First you're doing a loop through 001-200. This padding with 0s in the front is done with the -w option to the seq command. Then you substitute the num variable to create the host you're going to SSH to. Once you have the target host, give the command to it. In this case, it's:
free -m | grep Mem | awk '{print $2}'
That command says to:
  • Use the free command to get the memory size in megabytes.
  • Take the output of that command and use grep to get the line that has the string Mem in it.
  • Take that line and use awk to print the second field, which is the total memory in the node.
This operation is performed on every node.
Once you have performed the command on every node, the entire output of all 200 nodes is piped (|d) to the sort command so that all the memory values are sorted.
Finally, you eliminate duplicates with the uniq command. This command will result in one of the following cases:
  • If all the nodes, n001-n200, have the same memory size, then only one number will be displayed. This is the size of memory as seen by each operating system.
  • If node memory size is different, you will see several memory size values.
  • Finally, if the SSH failed on a certain node, then you may see some error messages.
This command isn't perfect. If you find that a value of memory is different than what you expect, you won't know on which node it was or how many nodes there were. Another command may need to be issued for that.
What this trick does give you, though, is a fast way to check for something and quickly learn if something is wrong. This is it's real value: Speed to do a quick-and-dirty check.
Some software prints error messages to the console that may not necessarily show up on your SSH session. Using the vcs devices can let you examine these. From within an SSH session, run the following command on a remote server: # cat /dev/vcs1. This will show you what is on the first console. You can also look at the other virtual terminals using 2, 3, etc. If a user is typing on the remote system, you'll be able to see what he typed.
In most data farms, using a remote terminal server, KVM, or even Serial Over LAN is the best way to view this information; it also provides the additional benefit of out-of-band viewing capabilities. Using the vcs device provides a fast in-band method that may be able to save you some time from going to the machine room and looking at the console.
In Trick 8, you saw an example of using the command line to get information about the total memory in the system. In this trick, I'll offer up a few other methods to collect important information from the system you may need to verify, troubleshoot, or give to remote support.
First, let's gather information about the processor. This is easily done as follows:
# cat /proc/cpuinfo .
This command gives you information on the processor speed, quantity, and model. Using grep in many cases can give you the desired value.
A check that I do quite often is to ascertain the quantity of processors on the system. So, if I have purchased a dual processor quad-core server, I can run:
# cat /proc/cpuinfo | grep processor | wc -l .
I would then expect to see 8 as the value. If I don't, I call up the vendor and tell them to send me another processor.
Another piece of information I may require is disk information. This can be gotten with the df command. I usually add the -h flag so that I can see the output in gigabytes or megabytes. # df -h also shows how the disk was partitioned.
And to end the list, here's a way to look at the firmware of your system—a method to get the BIOS level and the firmware on the NIC.
To check the BIOS version, you can run the dmidecode command. Unfortunately, you can't easily grep for the information, so piping it is a less efficient way to do this. On my Lenovo T61 laptop, the output looks like this:
#dmidecode | less
...
BIOS Information
Vendor: LENOVO
Version: 7LET52WW (1.22 )
Release Date: 08/27/2007
...
This is much more efficient than rebooting your machine and looking at the POST output.
To examine the driver and firmware versions of your Ethernet adapter, run ethtool:
# ethtool -i eth0
driver: e1000
version: 7.3.20-k2-NAPI
firmware-version: 0.3-0

1 comment: