Sysadmin's Notepad

Friday, September 4, 2015

Simple process monitoring script with email alerting

If you don't have or don't want to install additional software for system/application monitoring (like Nagios, Zabbix, Munin, Big Brother, etc.) you may use this simple script:

#!/bin/bash

MAIL="your@email.address"
PROGRAM="httpd"
HOST=$(uname -n)
DATE=$(date)
TMPFILE=/var/tmp/monitor-$(PROGRAM).lock

OUTPUT=$(ps -ef | grep -c "$PROGRAM")
if [ $OUTPUT -eq 1 ]; then
if [ -f $TMPFILE ]; then
    echo "Lock file exists"
else
    echo "$DATE $HOST program \"$PROGRAM\" is not running" | mailx -s "\"$PROGRAM\" is not running on $HOST" $MAIL
    touch $TMPFILE
fi
fi

In PROGRAM variable put the name of the process that you expect to be running, make sure that the monitoring script name will not contain the same string.
Basically, if the program is running "ps -ef | grep program" will return 2 or more rows (one with the program itself and the second one with "grep program").
Otherwise it will only return one row ("grep program") which will trigger the alert and you will get an email.
By creating TMPFILE script will avoid bothering you again and again about the same issue.
Make sure to remove that file after you restart monitored process.

Once the script is ready save it and add to cron, i.e.:
$ crontab -e
* * * * * /path/to/the/script > /dev/null 2>&1

Friday, August 28, 2015

can't start kdump service on virtual machine

# service kdump start
No kdump initial ramdisk found. [WARNING]
Rebuilding /boot/initrd-2.6.32-504.23.4.el6.x86_64kdump.img
No module vmmemctl found for kernel 2.6.32-504.23.4.el6.x86_64, aborting.
Failed to run mkdumprd
# lsmod | grep vmmemctl
vmmemctl 13966 0

Fixing VMMEMCTL module issue:

You can disable this module by editing /etc/vmware-tools/locations and changing answer VMMEMCTL_CONFED from yes to no.

More general approach:

More general way to handle missing modules is to ignore the ones which can not be found:
Edit /etc/sysconfig/kdump and set MKDUMPRD_ARGS="--allow-missing"
# service kdump start
WARNING: No module vmmemctl found for kernel 2.6.32-504.23.4.el6.x86_64, continuing anyway

Tuesday, August 25, 2015

Difference between du and df outputs

Sometimes people say they performed cleanup but filesystem is still (almost) full and df is giving different results than du:

$ df -h /tmp
Filesystem      Size Used Avail Use% Mounted on
/dev/sda3        20G   19G     0 100% /tmp

$ du -sm /tmp
1       /tmp

To find the missing bit you need to check if the deleted files are still in use (in other words those files might be still open):
# lsof | grep deleted
mysqld    2456     mysql    5u   REG       0,19        0 2025554220 (deleted) /tmp/iboy1WVS
mysqld    2456     mysql    6u   REG       0,19        0 2025554284 (deleted) /tmp/ibwlUTGy
mysqld    2456     mysql    7u   REG       0,19        0 2025554322 (deleted) /tmp/ibecOavf
[..]

To reclaim the space you need to bounce the process which is still using those files.
If you can't or don't want to kill running proceses you can try to truncate those "deleted" files:
cat /dev/null > /proc/2456/fd/5
cat /dev/null > /proc/2456/fd/6
cat /dev/null > /proc/2456/fd/7

Sunday, August 23, 2015

How to check if disk is failing or failed on Solaris

Failed disk:

1. It shows "disk not responding to selection" in /var/adm/messages
2. It only shows increased transport errors
3. it's not visible under format command ("disk not available")

Failing disk:

1. It shows read/write errors in /var/adm/messages
2. Soft/Hard error counters are increasing
3. Disk is available under format command

On old Sun Fire V440 it looks like this:

/var/adm/messages contain:
Aug 21 13:48:57 servername scsi: [ID 107833 kern.warning] WARNING: /pci@1f,700000/scsi@2/sd@0,0 (sd1):
Aug 21 13:48:57 servername     disk not responding to selection

iostat -En shows only transport errors:
c1t0d0          Soft Errors: 0 Hard Errors: 0 Transport Errors: 1
Vendor: FUJITSU Product: MAW3073NCSUN72G Revision: 1703 Serial No: XXX
Size: 73.40GB <73400057856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

Under format disk is no longer available:
AVAILABLE DISK SELECTIONS:
       0. c1t0d0 <drive not available>
          /pci@1f,700000/scsi@2/sd@0,0
       1. c1t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
          /pci@1f,700000/scsi@2/sd@1,0

metastat output:
# metastat d6
d6: Mirror
    Submirror 0: d16
      State: Okay
    Submirror 1: d26
      State: Needs maintenance
...
d26: Submirror of d6
    State: Needs maintenance
    Invoke: metareplace d6 c1t0d0s2 <new device>

Thursday, August 20, 2015

changing ILO settings from OS using hponcfg

hponcfg is quite useful tool if you're going to automate changing ILO settings on multiple machines.
It works the same way on various versions of ILO.

To get the current settings into file:

hponcfg -w current.xml

Note: I've noticed that it won't drop the "whole" config but only the most important things (i.e. secondary and tertiary dns server won't be included even if it's defined).

To set some new settings described in the xml file:

hponcfg -f update.xml

Note: you don't need to put whole config, you can change one parameter if needed.

If you're lucky and ILO driver works properly you should see something like this:

hponcfg -w current.xml
HP Lights-Out Online Configuration utility
Version 4.0.1 Date 09/24/2012 (c) Hewlett-Packard Company, 2012
Firmware Revision = 1.16 Device type = iLO 3 Driver name =
Management Processor configuration is successfully written to file

If you can't connect to ILO from OS:

HPONCFG RILOE-II/iLO setup and configuration utility
Version 4.0.1 Date 09/24/2012 (c) Hewlett-Packard Company, 2012

ERROR: Unable to establish communication with iLO/RILOE-II.

Try to restart hp-snmp-agents and usually it will resolve the problem.

/etc/init.d/hp-snmp-agents stop
/etc/init.d/hp-snmp-agents start

Saturday, June 20, 2015

Solaris gzip and tar one-liners

As the Solaris tar do not handle compression you can use following one-liners to get the files archived and compress in one line:

1. Archive and compress folder:

tar cf - folder_name | gzip -c > filename.tar.gz

2. Decompress and unpack:

gzcat filename.tar.gz | tar -xpf -

Tuesday, December 16, 2014

Securing ssh server from automated bot hacks

SSH server running on some quite popular hosting is exposed to many automated attacks these days.
There are more and more bots scanning the whole Internet and especially targeting popular hosting/VPS providers.

1. Changing the default sshd port number

In contrast to other well known services (ftp/mail/www) ssh server does not need to listen on the default port. It's being used by a fewer people, sometimes only you and changing the default port is yet another step in increasing your server security.
I have been using fail2ban software for some time. It scans the logs for failed login attempts and bans the attacker's IP if he fails to login too many times.
However, after a few days my firewall was full of banned IPs (good day - 5/10 IPs, bad day - more than 20).
From now on my ssh server is always running on a non-standard IP.
The number of failed login attempts has greatly decreased and the automated attacks are no longer spamming my logs.

Some useful settings from /etc/ssh/sshd_config file:
[root@server ~]# egrep "^Port|^PermitRootLogin|^MaxAuth" /etc/ssh/sshd_config
Port 4321
PermitRootLogin no
MaxAuthTries 3

2. Logging incoming connections

Even with ssh daemon running on a different port you may find it useful to log all the connection attempts to it. You can catch it via following iptables rule:
iptables -A INPUT -m state --state NEW -m tcp -p tcp --dport 4321 -j LOG --log-prefix "iptables: "
Which should be inserted before accepting the connection on ssh port.
Next if you're using rsyslogd you can filter all the messages starting with "iptables :" and put them into a separate file:
[root@server ~]# cat /etc/rsyslog.d/iptables.conf
:msg, startswith, "iptables: " -/var/log/iptables.log
& ~
[root@server ~]# service rsyslog restart

[root@server ~]# cat /var/log/iptables.log
Dec 16 18:39:41 server kernel: iptables: IN=eth0 OUT= MAC=00:1c:14:01:30:de:00:16:83:76:07:29:08:00 SRC=10.23.189.14 DST=20.40.50.101 LEN=60 TOS=0x00 PREC=0x00 TTL=54 ID=31528 DF PROTO=TCP SPT=44074 DPT=4321 WINDOW=29200 RES=0x00 SYN URGP=0