SCSI
Transport Errors and how to Identify them
Note: This is just a quick reference
on scsi errors. Scsi errors can and will be more indepth
than what we can detail in our knowledge center. If
you need more information, contact IntraServe sales
and they can help direct you to the appropriate engineer.
There are 4 tools that can
help identify scsi transport errors on a sun4u machine.
1. Find out what kind of system
it is and what OS they are running
2. Lastest /var/adm/messages
3. Copy of /usr/platform/sun4u/sbin/prtdiag -v
4. A copy of showrev -p
Showrev -p will tell you the
rev of the installed patches.
The /var/adm/messages will give you hints on what to
diagnose.
The OS will tell you what patches are avaiable for that
system.
The prtdiag will give you a break down of the hardware.
If the scsi errors look like this:
Aug 3 14:02:57 IntraServe unix:
WARNING: /pci@1f,4000/scsi@3 (glm0):
Aug 3 14:02:57 IntraServe unix: WARNING: /pci@1f,4000/scsi@3
(glm0):
Aug 3 14:02:57 IntraServe unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0
(sd0):
Aug 3 14:02:57 Intraserve unix: SCSI transport failed:
reason 'reset': retrying command
*Then check the level of the
glm patch they have. If it is much lower then the latest
rev have them update the patch level.
If the messages look like this:
Mar 24 06:58:10 IntraServe unix:
warning: /pci@6,4000/scsi@4 (glm4):
Mar 24 06:58:10 IntraServe unix: SCSI bus DATA IN phase
parity error
Mar 24 06:58:10 IntraServe unix: warning: ID[SUNWpd.glm.parity_check.6010]
Mar 24 06:58:10 IntraServe unix: warning: /pci@6,4000/scsi@4
(glm4):
Mar 24 06:58:10 IntraServe unix: Target 0 reducing sync.
transfer rate
Mar 24 06:58:10 IntraServe unix: warning: ID[SUNWpd.glm.sync_wide_backoff.6014]
Mar 24 06:58:10 IntraServe unix: warning: /pci@6,4000/scsi@4/sd@0,0
(sd60):
Mar 24 06:58:10 IntraServe unix: SCSI transport failed:
reason 'tran_err': retrying command
*Check the termination and cables
for bent pins. Usually with SCSI bus DATA phase parity
errors it is the cable and/ or termination. Also check
for patches for glm, and disk firmware.
If the errors looks like this:
Jun 6 19:16:34 IntraServe unix:
ID[SUNWssa.socal.link.5010] socal1: port 0:Fibre Channel
is OFFLINE
Jun 6 19:16:34 IntraServe unix: ID[SUNWssa.socal.link.6010]
socal1: port 0:Fibre Channel Loop is ONLINE
Jun 6 19:17:49 IntraServe unix: WARNING: /sbus@2,0/SUNW,socal@2,0/sf@0,
0/ssd@w22000020370ed7ff,0 (ssd16):
Jun 6 19:17:49 IntraServe unix: SCSI transport failed:
reason 'timeout':retrying command
Jun 6 19:19:14 IntraServe unix: WARNING: /sbus@2,0/SUNW,socal@2,0/sf@0,
0/ssd@w22000020370edd68,0 (ssd12):
Jun 6 19:19:14 IntraServe unix: SCSI transport failed:
reason 'timeout':retrying command
Jun 6 19:20:44 IntraServe unix: WARNING: /sbus@2,0/SUNW,socal@2,0/sf@0,
0/ssd@w22000020370edd69,0 (ssd13):
Jun 6 19:20:44 IntraServe unix: SCSI transport failed:
reason 'timeout':retrying command
*These are typical error messages from an A5x00 array.
You can see that the machine is going offline and online.
*If the errors are on more than
one disk. The first thing an engineer should check is
the A5x00 patch matrix for latest firmware of the array
and disks.
If you see errors like this:
Aug 6 07:49:59 IntraServe unix:
WARNING: /pci@1f,0/pci@1/scsi@2/sd@2,0 (sd2):
Aug 6 07:49:59 IntraServe unix: Error for Command: write(10)
ErrorLevel: Fatal
Aug 6 07:49:59 IntraServe unix: Requested Block: 6429986
ErrorBlock: 6429986
Aug 6 07:49:59 IntraServe unix: Vendor: SEAGATE Serial
Number: NG031399
Aug 6 07:49:59 IntraServe unix: Sense Key: Not Ready
Aug 6 07:49:59 IntraServe unix: ASC: 0x4 (<vendor
unique code 0x4>), ASCQ: 0x1,FRU: 0x2
Aug 6 07:49:59 IntraServe unix: WARNING: /pci@1f,0/pci@1/scsi@2/sd@2,0
(sd2):
Aug 6 07:49:59 IntraServe unix: Error for Command: write
ErrorLevel: Fatal
*This will tell you there is
the problem with a disk at sd@2,0 (sd is for scsi disk)
at target 2 controller 0 (onboard).
For A1000 and D1000 scsi errors:
Oct 7 16:30:00 IntraServe unix:
WARNING: /sbus@1f,0/QLGC,isp@3,10000/sd@1,0(sd16):
Oct 7 16:30:00 IntraServe unix: SCSI transport failed:
reason 'incomplete':retrying command
Oct 7 16:30:00 IntraServe unix: SCSI transport failed:
reason 'incomplete':retrying command
Oct 7 16:30:01 IntraServe unix:
Oct 7 16:30:01 IntraServe unix:
Oct 7 16:30:11 IntraServe unix: WARNING: /sbus@1f,0/QLGC,isp@3,10000/sd@1,0(sd12):
Oct 7 16:30:11 IntraServe unix: WARNING: /sbus@1f,0/QLGC,isp@3,10000/sd@1,0(sd18):
Oct 7 16:30:11 IntraServe unix: SCSI transport failed:
reason 'incomplete':retrying command
Oct 7 16:30:11 IntraServe unix: SCSI transport failed:
reason 'incomplete':retrying command
*This tells you that they are
using a QLOGIC differential card that is having the
problem. |