Sunday, August 23, 2015

How to check if disk is failing or failed on Solaris

How to check if disk is failing or failed on Solaris

Failed disk:

1. It shows "disk not responding to selection" in /var/adm/messages
2. It only shows increased transport errors
3. it's not visible under format command ("disk not available")

Failing disk:

1. It shows read/write errors in /var/adm/messages
2. Soft/Hard error counters are increasing
3. Disk is available under format command

On old Sun Fire V440 it looks like this: 


/var/adm/messages contain:
Aug 21 13:48:57 servername scsi: [ID 107833 kern.warning] WARNING: /pci@1f,700000/scsi@2/sd@0,0 (sd1):
Aug 21 13:48:57 servername     disk not responding to selection


iostat -En shows only transport errors:
c1t0d0          Soft Errors: 0 Hard Errors: 0 Transport Errors: 1
Vendor: FUJITSU  Product: MAW3073NCSUN72G Revision: 1703 Serial No: XXX
Size: 73.40GB <73400057856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0


Under format disk is no longer available:
AVAILABLE DISK SELECTIONS:
       0. c1t0d0 <drive not available>
          /pci@1f,700000/scsi@2/sd@0,0
       1. c1t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
          /pci@1f,700000/scsi@2/sd@1,0


metastat output:
# metastat d6
d6: Mirror
    Submirror 0: d16
      State: Okay       
    Submirror 1: d26
      State: Needs maintenance

...
d26: Submirror of d6
    State: Needs maintenance
    Invoke: metareplace d6 c1t0d0s2 <new device>


No comments:

Post a Comment