Friday, August 28, 2015

can't start kdump service on virtual machine

# service kdump start
No kdump initial ramdisk found.                            [WARNING]
Rebuilding /boot/initrd-2.6.32-504.23.4.el6.x86_64kdump.img
No module vmmemctl found for kernel 2.6.32-504.23.4.el6.x86_64, aborting.
Failed to run mkdumprd

# lsmod | grep vmmemctl
vmmemctl        13966 0

Fixing VMMEMCTL module issue:

You can disable this module by editing  /etc/vmware-tools/locations and changing answer VMMEMCTL_CONFED from yes to no.

More general approach:

More general way to handle missing modules is to ignore the ones which can not be found:
Edit /etc/sysconfig/kdump and set MKDUMPRD_ARGS="--allow-missing"
# service kdump start
WARNING: No module vmmemctl found for kernel 2.6.32-504.23.4.el6.x86_64, continuing anyway

Tuesday, August 25, 2015

Difference between du and df outputs

Sometimes people say they performed cleanup but filesystem is still (almost) full and df is giving different results than du:

$ df -h /tmp
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3        20G   19G     0 100% /tmp

$ du -sm /tmp
1       /tmp

To find the missing bit you need to check if the deleted files are still in use (in other words those files might be still open):
# lsof | grep deleted
mysqld    2456     mysql    5u   REG       0,19        0  2025554220 (deleted) /tmp/iboy1WVS
mysqld    2456     mysql    6u   REG       0,19        0  2025554284 (deleted) /tmp/ibwlUTGy
mysqld    2456     mysql    7u   REG       0,19        0  2025554322 (deleted) /tmp/ibecOavf

To reclaim the space you need to bounce the process which is still using those files.
If you can't or don't want to kill running proceses you can try to truncate those "deleted" files:
cat /dev/null > /proc/2456/fd/5
cat /dev/null > /proc/2456/fd/6
cat /dev/null > /proc/2456/fd/7

Sunday, August 23, 2015

How to check if disk is failing or failed on Solaris

How to check if disk is failing or failed on Solaris

Failed disk:

1. It shows "disk not responding to selection" in /var/adm/messages
2. It only shows increased transport errors
3. it's not visible under format command ("disk not available")

Failing disk:

1. It shows read/write errors in /var/adm/messages
2. Soft/Hard error counters are increasing
3. Disk is available under format command

On old Sun Fire V440 it looks like this: 

/var/adm/messages contain:
Aug 21 13:48:57 servername scsi: [ID 107833 kern.warning] WARNING: /pci@1f,700000/scsi@2/sd@0,0 (sd1):
Aug 21 13:48:57 servername     disk not responding to selection

iostat -En shows only transport errors:
c1t0d0          Soft Errors: 0 Hard Errors: 0 Transport Errors: 1
Vendor: FUJITSU  Product: MAW3073NCSUN72G Revision: 1703 Serial No: XXX
Size: 73.40GB <73400057856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

Under format disk is no longer available:
       0. c1t0d0 <drive not available>
       1. c1t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>

metastat output:
# metastat d6
d6: Mirror
    Submirror 0: d16
      State: Okay       
    Submirror 1: d26
      State: Needs maintenance

d26: Submirror of d6
    State: Needs maintenance
    Invoke: metareplace d6 c1t0d0s2 <new device>

Thursday, August 20, 2015

changing ILO settings from OS using hponcfg

hponcfg is quite useful tool if you're going to automate changing ILO settings on multiple machines.
It works the same way on various versions of ILO.

To get the current settings into file:

hponcfg -w current.xml 

Note: I've noticed that it won't drop the "whole" config but only the most important things (i.e. secondary and tertiary dns server won't be included even if it's defined).

To set some new settings described in the xml file:

hponcfg -f update.xml

Note: you don't need to put whole config, you can change one parameter if needed.

If you're lucky and ILO driver works properly you should see something like this:

hponcfg -w current.xml
HP Lights-Out Online Configuration utility
Version 4.0.1 Date 09/24/2012 (c) Hewlett-Packard Company, 2012
Firmware Revision = 1.16 Device type = iLO 3 Driver name =
Management Processor configuration is successfully written to file

If you can't connect to ILO from OS:

HPONCFG RILOE-II/iLO setup and configuration utility
Version 4.0.1
Date 09/24/2012 (c) Hewlett-Packard Company, 2012

ERROR: Unable to establish communication with iLO/RILOE-II.

Try to restart hp-snmp-agents and usually it will resolve the problem.

/etc/init.d/hp-snmp-agents stop
/etc/init.d/hp-snmp-agents start