Thursday, May 30, 2013

Configuring kdump in RHEL

kdump may be very useful for you if you're experiencing kernel crashes and want to analyze what is really happening there (or you may pass this part to support).
First install kdump package:
# yum -y install kexec-tools
Then modify /etc/grub.conf and add "crashkernel=128M@16M" after "root=LABEL=/" for current kernel.

It's necessary to reboot the box to make the new settings active:
# /sbin/init 6

After the reboot make sure that kdump service is enabled and running:
# chkconfig kdump on
# service kdump start

You should also test if your setup is fine. Open console and enter the following command to trigger the dump:
echo "c" > /proc/sysrq-trigger

Wednesday, May 29, 2013

How to add virtual IP in AIX

Adding virtual IP address to an existing network interface is pretty easy.

You can try doing it with smitty - I've found two ways in getting to the required screen:
smitty tcpip --> Further configuration -->Network interface -->Network interface selection -->Configure alias --> Add an IPV4 Network Alias 
Here select the available interface  and press enter and the Next screen, we need to insert an IP address and the relative subnet mask and press enter.

Second way:
smitty netinterface -> Network Interface Selection -> Configure Aliases

You can also do it directly from the command line:
chdev -l en0 -a alias4=192.168.0.120,255.255.255.0

and remove this alias using:
chdev -l en0 -a delalias4=192.168.0.120,255.255.255.0

Once configured this alias will be persistent over reboots.
You can add multiple aliases to one network interface.
To verify if alias was setup correctly run:
# ifconfig -a
en0: flags=5e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN>
        inet 10.0.0.10 netmask 0xffffff00 broadcast 10.157.86.255
        inet 192.168.0.120 netmask 0xffffff00 broadcast 169.168.23.255
         tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT>
        inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
        inet6 ::1/0

         tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1

How to scan and detect new luns in Linux (Veritas and EMC)

When new luns are become available for the server they are not being picked up automatically.
Simplest way - reboot the box.
More elegant way - scan for new luns and add them to existing configuration online.

On most of the distributions you may find rescan-scsi-bus.sh script which is responsible for triggering rescan.

Next - verify the current config - check how many luns are currently configured:
# powermt display
Symmetrix logical device count=121
CLARiiON logical device count=0
HP xp logical device count=0
Ess logical device count=0
Invista logical device count=0
=====================================================================
---- Host Bus Adapters ---- ---- I/O Paths -----  ------ Stats -----
###  HW Path               Summary   Total   Dead  IO/Sec Q-IOs Errors
=====================================================================
   2 lpfc                  optimal     242      0       -     0      0

   3 lpfc                  optimal     242      0       -     1      0

For more detailed output you may run:
# powermt display dev=all

To detect new luns and add them to the configuration run:
# powermt config
# powermt save
# powermt display
You should now see a more luns in the output.

Additionally if you're using Veritas Storage Foundation you may scan for new disk devices by triggering:
# vxdisk scandisks new
New devices will be visible and in "error" state (you will need to initialize them later):
# vxdisk -o alldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
cciss/c0d0   auto:none       -            -            online invalid
emcpowera   auto:cdsdisk    -            (datadg)  online
emcpowerb    auto:cdsdisk    -            (datadg)  online
emcpowerc   auto:cdsdisk    -            (datadg)  online
emcpowerd   auto:cdsdisk    -            (datadg)  online
emcpowere   auto:cdsdisk    -            (datadg)  online
emcpowerf   auto            -            -            error
emcpowerg   auto            -            -            error
emcpowerh   auto            -            -            error
emcpoweri   auto            -            -            error

Finding HMC IP address from AIX

To check IP address of HMC simply run:
# lsrsrc IBM.ManagementServer

In newer version of AIX (6.1 TL6 or 7.1) this command will return only title and no information.
In this case you may try to run this:
# lsrsrc IBM.MCP HMCIPAddr
Resource Persistent Attributes for IBM.MCP
resource 1:
        HMCIPAddr = "192.168.1.10"
resource 2:
        HMCIPAddr = "192.168.1.20"

To find out HMC's public IP addresses run:
# lsrsrc IBM.MCP IPAddresses

Saturday, May 25, 2013

Mysql server installation on CentOS/RHEL

Quick installation to website hosting:

1. Install mysql server package:

# yum install mysql-server

2. Enable mysql server on system boot and startup this service now:
# chkconfig mysqld on
# service mysqld start

3. Setup myql root's password:
/usr/bin/mysqladmin -u root password 'new-password'
/usr/bin/mysqladmin -u root -h hostname.domain.com password 'new-password'

4. Test if you can connect using new password:
$ mysql -u root -p
Enter password:
...
mysql>

Create new database:
mysql> CREATE DATABASE database_name;
Query OK, 1 row affected (0,01 sec)

Setup user account and grant him access to this database:
mysql> CREATE USER 'username'@'localhost' IDENTIFIED BY 'password';
mysql> GRANT ALL PRIVILEGES ON database_name.* TO 'username'@'localhost'
    -> WITH GRANT OPTION;

How to install relocatable packages on AIX

Install package into a custom path on AIX in three simple steps:

1. Create relocatable path:
mkusil -R /your/path/here
2. In the folder where your .bff file is run:
inutoc_r .
3. Install package with the command below:
installp_r -R /your/path/here -ad. packagename.rte

Note that you need to provide package name without version and .bff extension.

Wednesday, May 22, 2013

Custom openvpn port with SElinux enabled

Today I had to setup openvpn client connecting to the server on non-standard port.
Having SElinux enabled always make such things more "interesting".
Openvpn service has started up but in the logs I have found following alerts:

May 22 11:26:30 rhs333 setroubleshoot: SELinux is preventing the openvpn (openvpn_t) from connecting to port 5517. For complete SELinux messages. run sealert -l afe57636-f6d8-4df9-b9c3-1f2d25784025

You can find currently allowed ports by running:

# semanage port -l | grep openvpn_port_t
openvpn_port_t                 tcp       1194, 9997
openvpn_port_t                 udp      1194


Adding non-standard port is easy:
# semanage port -a -t openvpn_port_t -p tcp  5517

Finally restart openvpn service and it should work fine.

Tuesday, May 21, 2013

Investigating frame errors in ifconfig output

Today I've been alerted about network issues on one of the Linux servers. Fortunately Geneos monitoring tools are analyzing ifconfig output (where most of the other tools doesn't).
We have found out excessive frame errors in ifconfig output and a lot of rx_crc_errors in ethtool output.
What made things more interesting is that it was observed on two different servers connected to the same switch (which gave us a clue that there might be something wrong with the switch itself).

Layer 1 issues

Most people advise to check cables or hardware (NIC/switch) as rx_crc_errors indicates layer 1 issues. It might be the case if you have problems on one hosts only but having the same issue on different hosts from the same subnet made the switch guilty from the very beginning.

ifconfig and ethtool outputs

From ifconfig errors and frame counter were raising:

eth0      Link encap:Ethernet  HWaddr xx:xx:xx:xx:xx:xx  
          ...
          RX packets:277593775 errors:12013 dropped:0 overruns:0 frame:11763

I've started monitoring this using:
# for i in `seq 1 100`; do ifconfig eth0 | grep frame; sleep 1; done
RX packets:277593775 errors:12128 dropped:0 overruns:0 frame:11877
RX packets:277593775 errors:12135 dropped:0 overruns:0 frame:11884
RX packets:277593775 errors:12143 dropped:0 overruns:0 frame:11892
(...)

When checking ethtool -S eth0 rx_crc_errors were raising in the same rate.

Verify NIC settings

Run ethtool eth0 to see the current speed and duplex:
# ethtool eth0
Settings for eth0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: umbg
        Wake-on: d
        Current message level: 0x00000007 (7)
        Link detected: yes

You may also check dmesg to find out if there were any changes for eth0:
# dmesg | grep eth0
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
8021q: adding VLAN 0 to HW filter on device eth0
e1000: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX

Check switch settings

Finally I've asked network guys to verify switch setup.
It appeared that they've replaced old switch recently and put new one with slightly different settings. All the ports on the switch were set to auto:auto (speed & duplex).
They have found out that somehow switch has negotiated 100Mbit half duplex instead of 100 full duplex for all the servers' connections.

We have fixed this issue by setting up 100Mbit full duplex on all required port on the switch.

How to reset ifconfig counters?

After this issue we had a lot of errors logged on the interfaces. Unfortunately resetting these counters may be done only in two ways:
  1. reload NIC drivers module (modprobe -r module; modprobe module)
  2. reboot the box
If you're not sure what module to unload check ethtool -i eth0 output:
# ethtool -i eth0
driver: e1000
version: 7.3.21-k4-3-NAPI


Saturday, May 18, 2013

Logical Volume Manager on Linux

I can say that LVM started to be popular in my company when RHEL4 was introduced. We still have some RHEL3 and 4 built using pure partitions but it's a rare thing now.
LVM is cool and easy to use, it lacks a few features from JFS or VXFS but lets remember - it's free.
Best features are:
  • flexibility of structures you may create
  • simple commands that can show or modify things
  • dynamic volume resizing
  • snapshots creation

How does LVM works?

There are three layers of LVM:
  1. Physical volumes - local or SAN disks that you need to initialize using:
    pvcreate /dev/physical/volume/path
  2. Volume groups - a set created from one or more physical volumes
  3. Logical volumes - slices created in one volume group on which filesystems may reside
You may check the list of volume groups using:
vgs
Similarly you may verify logical volumes list:
lvs
and physical volumes list:
pvs

How to create a volume group?

First you need to initialize physical devices on top of which we will build our volume group:
pvcreate /dev/sdb
Then you can create a volume group using this device:
vgcreate newvg /dev/sdb

How to create a logical volume?

Once volume group is ready you may verify its size (it should be equal to the physical volume size or sum of them if you added multiple devices):
# vgs
  VG         #PV #LV #SN Attr   VSize VFree
  newvg          1      0   0 wz--n- 7.88G 7.88G

Now you can create logical volume using:
lvcreate -L 9G -n lv_home newvg
As an example we've created 9GB big volume named lv_home in newvg volume group.

Finally you may create a filesystem on this logical volume:
mkfs.ext3 /dev/mapper/newvg_lv_home 

How to extend logical volume?

If only you have free space in volume group you may extend logical volume by running:
lvresize -L +1G newvg/lv_home
You may specify to add some space or state exactly how big logical volume should be.
Once done remember to resize filesystem residing on that logical volume - for ext3/4 it may be done online:
resize2fs /dev/mapper/newvg_lv_home
From what I remember online resizing only works on RHEL5/CentOS5 or higher. On older distros you need to perform such changes on unmounted filesystems only (which often leads to booting from rescue CD if you need to extend root's filesystem).

How to shrink/reduce logical volume?

This operation is a little bit complicated.
Growing is easy but reducing might corrupt your underlying filesystem.
Other volume managers (Veritas Storage Foundation or LVM on AIX) can shrink volumes/filesystems online. However you should always take a backup before performing shrinking operation. Rarely it fails and you may be in big trouble later on.
For LVM on Linux you need to reduce the size of underlying filesystem first.
Lets assume you have 10GB filesystem on 10GB logical volume and you want to reduce it to 5GB.
All operations needs to be done on unmounted filesystem and checked filesystem:
umount /dev/mapper/newvg_lv_home
e2fsck -f /dev/mapper/newvg_lv_home
Now reduce the filesystem to something a little smaller than 5GB:
resize2fs /dev/mapper/newvg_lv_home 4G
Why not to 5GB? Just to be sure you didn't crop the filesystem (better safe than sorry).
Then you may reduce the size of logical volume to 5GB
lvreduce -L 5GB /dev/mapper/newvg_lv_home
Finally you may run resize2fs again to fit the filesystem size to the underlying logical volume:
resize2fs /dev/mapper/newvg_lv_home

Thursday, May 16, 2013

Creating virtual network interfaces in RHEL/CentOS

Setting up an additional IP address for a webserver or any other service/reason is easy on RHEL/CentOS.

Go to /etc/sysconfig/network-scripts and copy existing script:
cp ifcfg-eth0 ifcfg-eth0:0
If you will need to create more IP addresses just increase the number after colon in the file name and in the device name field of course.

Now you need to edit ifcfg-eth0:0 and put there new IP address, it should have at least the following entries included:

DEVICE=eth0:0
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.0.182
NETMASK=255.255.255.0

Finally restart network services:
service network restart

You should be able to see new virtual interface up and running:
# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 08:00:27:18:08:b3 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.181/24 brd 192.168.0.255 scope global eth0
    inet 192.168.0.182/24 brd 192.168.0.255 scope global secondary eth0:0

Wednesday, May 15, 2013

How to flush nscd cache in Linux

After adding user to a new group in LDAP I've noticed that change has been detected on almost all hosts except one. First I thought that maybe LDAP configuration on that host is broken but everything else was working fine except that one user showing invalid group membership.
Uncle google told me to restart nscd service - it didn't help.
However stopping nscd daemon completely seems to fix the issue, when I checked user again (groups user or id user) it shown the correct, updated group membership.
Nscd manual described valuable option which helped me resolve this issue. You need to use -i (invalidate) and provide cached table name to get it updated.
After running:
nscd -i group
caching daemon started responding with up-to-date data.

In similar way you may fix hosts cache to prevent keep old dns responses:
nscd -i hosts

Btw: if interested you may check what's in the current nscd database in /var/db/nscd folder. Tables stored in that folder are in binary folder so you may consider running strings on each table which you want to review.

Too many open files error

Today I've got a ticket about hitting resource limit on one of my Linux boxes.
User was running some java process which started failing with "too many open files" error.
Default limit on RHEL6 is set to 1024 open files per user which may be not enough for bigger applications and databases.
You can verify current limit by running:
ulimit -n
ulimit -Hn
and modify it in /etc/security/limits.conf. Change will be visible upon next login.

If not sure, you may verify current list of open files by using:
lsof -u username
You may notice that lsof will sometimes report higher number of open files than it's allowed (according to ulimit). To get more reasonable value you will need to filter the output to remove entries that are not counted as open file descriptors:
lsof -u username | grep / | egrep -v "mem|DIR|DEL|COMMAND" | sort  -k9 -u

While setting higher values of allowed open file descriptors it's worth to check operating system limit:
cat /proc/sys/fs/file-max
You may change it in /etc/sysctl.conf by adding:
fs.file-max = 123000
and running sysctl -p to re-read /etc/sysctl.conf and apply new settings.

Tuesday, May 14, 2013

How to recover (reset) root password on Linux

Reset password before reboot

First of all, it's a good practice to reset root's password before you're going to reboot the host after doing some changes or if system was running a long time without reboot.
If it fail to boot you might be asked to provide root password for fsck or even if you try to boot in single mode.


Reset root password in single user mode

If you're lucky, and system is using default settings you may try to boot up in single user mode (hit ESC when you see grub menu, choose the kernel you want to boot, hit 'e' and add 'single' at the end of the line which starts with 'kernel...' word).
You should get a root shell (without asking you for password).
Then type:
passwd
and reboot the box.

Reset root password using init=/bin/bash

You may find your system more secure and configured to ask for password even in single user mode. In that case reboot the box and add init=/bin/bash to the line starting with 'kernel...' word.
System will boot up and launch a bash session for you.
Now, mount the root's filesystem in rw mode:
mount -o remount,rw /
and reset root's password:
passwd
Finally reboot the box and you should be able to login using new password.

How to setup single user mode to ask for password

If you would like to protect your system from allowing bad guys to logon in single user mode without password it may be easily fixed.

in RHEL5/CentOS5:

Edit /etc/inittab and add the following entry at the end of file:
~~:S:wait:/sbin/sulogin

in RHEL6/CentOS6:

Edit /etc/sysconfig/init file and change the SINGLE variable to:
SINGLE=/sbin/sulogin

Sunday, May 12, 2013

How to rename VCS resource or service group name

Recently I had to rename some service groups in my Veritas Cluster without restarting services running under its control.
I was little nervous because there was a huge database import ongoing I couldn't interrupt it.
Both hagrp and hares commands do not have built-in functionality to rename group/resource.
However there is an easy way of changing names without having cluster resources impacted.

First you need to stop the cluster using:
hastop -all -force
Important part is the "-force" option which allows cluster-controlled resources to stay online.
Now you can modify /etc/VRTSvcs/conf/config/main.cf file and change the name of group/resource.
Make sure it's changed everywhere (so if you have some dependencies or huge main.cf file it's best to search for every occurance of the original group/resource name).

Check if the modified config is still valid:
hacf -verify /etc/VRTSvcs/conf/config

If it's fine distribute new main.cf file to the other cluster nodes and start up cluster:
hastart

Finally verify if the new names are visible under running cluster:
hastatus -summary

Saturday, May 11, 2013

Setting up SSH public/private keys


Setting up SSH public/private keys


This will be a short introduction "for dummies" describing how to generate and setup ssh keys between two computers.
I will be using OpenSSH version but process is similar for commercial distributions like SSH Tectia or VShell.
However you must be aware that connectivity between two different vendors my require some additional work (like converting OpenSSH public key to Tectia format before using it).


Generating SSH Keys

Linux/Unix:

On the source machine (host from which you will be initiating the connection) run:
[mike@client ~]$ ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/home/mike/.ssh/id_dsa):
Created directory '/home/mike/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/mike/.ssh/id_dsa.
Your public key has been saved in /home/mike/.ssh/id_dsa.pub.
The key fingerprint is:
df:95:5e:3a:1b:fc:30:81:2a:15:54:72:7e:1a:9d:2c mike@client.example.com
The key's randomart image is:
+--[ DSA 1024]----+
|          o.o    |
|         . + o . |
|          . E =  |
|           . * . |
|        S . o + .|
|         o o + + |
|        . o . O  |
|         .     B |
|              . .|
+-----------------+
[mike@client ~]$

You may accept default path and filenames (~/.ssh/id_dsa - private key, ~/.ssh/id_dsa.pub - public key) or provide your own.
When prompted for secret passphrase you may choose to provide such one (but then you will be asked for this passphrase every time when using this key pair) or leave this field empty to allow passwordless connectivity.

Remember to keep correct permissions to .ssh folder and its contents - some SSH versions may not work if those rights are not setup properly:
[mike@client ~]$ chmod 700 .ssh
Private key should always be readable by its owner only - this file stays on the source server and should never be shared with anyone:
[mike@client ~]$ chmod 400 .ssh/id_dsa
Public key should be writable by its owner only but it can be read by others - this file may be shared to other servers to allow passwordless connectivity:
[mike@client ~]$ chmod 644 .ssh/id_dsa.pub

Now you need to transfer public key (id_dsa.pub) to the destination machine:
[mike@client ~]$ scp ~/.ssh/id_dsa.pub mike@server.example.com:/home/mike/mike@client.example.com.pub

On the destination machine add recently copied public key to authorized_keys2 file:
[mike@server ~]$ mkdir .ssh
[mike@server ~]$ cat mike@client.example.com.pub >> .ssh/authorized_keys2
[mike@server ~]$ chmod 700 .ssh
[mike@server ~]$ chmod 644 .ssh/authorized_keys2

Finally you may test passwordless connectivity between two hosts:
[mike@client ~]$ ssh server
Last login: Sat May 11 18:23:34 2013 from client.example.com
[mike@server ~]$ 



Tricks:

Limit public key to selected hosts only

This may be an additional layer of security - you may allow only selected machines to be able to logon using selected public key.
If you know that mike will be always connecting from one host you may specify its ip address or dns name. You may also provide subnet, domain or combine both options.
Edit ~/.ssh/authorized_keys2 and add from= directive.

Examples:
from="*.example.com" ssh-dss AAAAB3NzaC1kc3MAA...yfinIwP9ZSoi0LMUM6g== mike@client.example.com
This setup will allow hosts from example.com domain to logon using mike's public key.

from="*.example.com,192.168.0.100" ssh-dss AAAAB3NzaC1kc3MAA...yfinIwP9ZSoi0LMUM6g== mike@client.example.com
This will allow hosts from example.com domain and from 192.168.0.100 host.

Debugging:
Check /var/log/secure or /var/log/authlog for entries related to sshd daemon.
Positive logon looks like:
May 11 18:42:46 server sshd[3064]: Accepted publickey for mike from 192.168.0.190 port 46816 ssh2
May 11 18:42:46 server sshd[3064]: pam_unix(sshd:session): session opened for user mike by (uid=0)
Negative logon:
May 11 18:42:06 server sshd[3005]: Authentication tried for mike with correct key but not from a permitted host (host=192.168.0.100, ip=192.168.0.100).



Limit public key to allow execution of one command only:

If you don't want to give somebody full shell access you may limit his access to one command only.
Using command= directive only chosen command will be executed when somebody will connect using particular key.
If he specify any other command during logon it will be ignored.

Example:
command="echo Bye Bye" ssh-dss AAAAB3NzaC1kc3MAA...yfinIwP9ZSoi0LMUM6g== mike@client.example.com
This setup will execute "echo Bye Bye" command after user will pass authentication process.
You may put here something more useful like svnserve or backup command.



Windows:

I prefer to use PuTTY and PuTTYgen utility.

To generate ssh key pair start PuTTYgen select key type (rsa/dsa), its strength (number of bits, 1024 or more) and click on "Generate" option.
Then save public and private key in some safe location (lets call them id_dsa.ppk and id_dsa.pub).

Copy public key to the destination's server using pscp command or copy&paste contents of that file while using PuTTY to some local file on destination server.
Once the public key is on the destination server you need to convert it first from PuTTY format to OpenSSH:
[mike@server ~]$ ssh-keygen -if id_dsa.pub > id_dsa_converted.pub
Then add id_dsa_converted.pub contents to ~/.ssh/authorized_keys2 file:
[mike@server ~]$ cat id_dsa_converted.pub >> .ssh/authorized_keys2

Now start PuTTY, go to the Connection->Data settings and fill in "Auto-login username" field (specify username that will be used on the destination server).
Then go to Connection->SSH->Auth and select "Private key for authentication" file (id_dsa.ppk).
Go back to the Session list, provide destination server name or ip address and save this connection under specified name.

Connection from Windows-based client should work fine now.