Toll Free 1-866-786-7278
 
Home Site Map Contact Us
  << Back to Knowledge Center  
 

SCSI Transport Errors and how to Identify them

Note: This is just a quick reference on scsi errors. Scsi errors can and will be more indepth than what we can detail in our knowledge center. If you need more information, contact IntraServe sales and they can help direct you to the appropriate engineer.

There are 4 tools that can help identify scsi transport errors on a sun4u machine.

    1. Find out what kind of system it is and what OS they are running
    2. Lastest /var/adm/messages
    3. Copy of /usr/platform/sun4u/sbin/prtdiag -v
    4. A copy of showrev -p

Showrev -p will tell you the rev of the installed patches.
The /var/adm/messages will give you hints on what to diagnose.
The OS will tell you what patches are avaiable for that system.
The prtdiag will give you a break down of the hardware.

If the scsi errors look like this:

Aug 3 14:02:57 IntraServe unix: WARNING: /pci@1f,4000/scsi@3 (glm0):
Aug 3 14:02:57 IntraServe unix: WARNING: /pci@1f,4000/scsi@3 (glm0):
Aug 3 14:02:57 IntraServe unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
Aug 3 14:02:57 Intraserve unix: SCSI transport failed: reason 'reset': retrying command

*Then check the level of the glm patch they have. If it is much lower then the latest rev have them update the patch level.

If the messages look like this:

Mar 24 06:58:10 IntraServe unix: warning: /pci@6,4000/scsi@4 (glm4):
Mar 24 06:58:10 IntraServe unix: SCSI bus DATA IN phase parity error
Mar 24 06:58:10 IntraServe unix: warning: ID[SUNWpd.glm.parity_check.6010]
Mar 24 06:58:10 IntraServe unix: warning: /pci@6,4000/scsi@4 (glm4):
Mar 24 06:58:10 IntraServe unix: Target 0 reducing sync. transfer rate
Mar 24 06:58:10 IntraServe unix: warning: ID[SUNWpd.glm.sync_wide_backoff.6014]
Mar 24 06:58:10 IntraServe unix: warning: /pci@6,4000/scsi@4/sd@0,0 (sd60):
Mar 24 06:58:10 IntraServe unix: SCSI transport failed: reason 'tran_err': retrying command

*Check the termination and cables for bent pins. Usually with SCSI bus DATA phase parity errors it is the cable and/ or termination. Also check for patches for glm, and disk firmware.

If the errors looks like this:

Jun 6 19:16:34 IntraServe unix: ID[SUNWssa.socal.link.5010] socal1: port 0:Fibre Channel is OFFLINE
Jun 6 19:16:34 IntraServe unix: ID[SUNWssa.socal.link.6010] socal1: port 0:Fibre Channel Loop is ONLINE
Jun 6 19:17:49 IntraServe unix: WARNING: /sbus@2,0/SUNW,socal@2,0/sf@0, 0/ssd@w22000020370ed7ff,0 (ssd16):
Jun 6 19:17:49 IntraServe unix: SCSI transport failed: reason 'timeout':retrying command
Jun 6 19:19:14 IntraServe unix: WARNING: /sbus@2,0/SUNW,socal@2,0/sf@0, 0/ssd@w22000020370edd68,0 (ssd12):
Jun 6 19:19:14 IntraServe unix: SCSI transport failed: reason 'timeout':retrying command
Jun 6 19:20:44 IntraServe unix: WARNING: /sbus@2,0/SUNW,socal@2,0/sf@0, 0/ssd@w22000020370edd69,0 (ssd13):
Jun 6 19:20:44 IntraServe unix: SCSI transport failed: reason 'timeout':retrying command
*These are typical error messages from an A5x00 array. You can see that the machine is going offline and online.

*If the errors are on more than one disk. The first thing an engineer should check is the A5x00 patch matrix for latest firmware of the array and disks.

If you see errors like this:

Aug 6 07:49:59 IntraServe unix: WARNING: /pci@1f,0/pci@1/scsi@2/sd@2,0 (sd2):
Aug 6 07:49:59 IntraServe unix: Error for Command: write(10) ErrorLevel: Fatal
Aug 6 07:49:59 IntraServe unix: Requested Block: 6429986 ErrorBlock: 6429986
Aug 6 07:49:59 IntraServe unix: Vendor: SEAGATE Serial Number: NG031399
Aug 6 07:49:59 IntraServe unix: Sense Key: Not Ready
Aug 6 07:49:59 IntraServe unix: ASC: 0x4 (<vendor unique code 0x4>), ASCQ: 0x1,FRU: 0x2
Aug 6 07:49:59 IntraServe unix: WARNING: /pci@1f,0/pci@1/scsi@2/sd@2,0 (sd2):
Aug 6 07:49:59 IntraServe unix: Error for Command: write ErrorLevel: Fatal

*This will tell you there is the problem with a disk at sd@2,0 (sd is for scsi disk) at target 2 controller 0 (onboard).

For A1000 and D1000 scsi errors:

Oct 7 16:30:00 IntraServe unix: WARNING: /sbus@1f,0/QLGC,isp@3,10000/sd@1,0(sd16):
Oct 7 16:30:00 IntraServe unix: SCSI transport failed: reason 'incomplete':retrying command
Oct 7 16:30:00 IntraServe unix: SCSI transport failed: reason 'incomplete':retrying command
Oct 7 16:30:01 IntraServe unix:
Oct 7 16:30:01 IntraServe unix:
Oct 7 16:30:11 IntraServe unix: WARNING: /sbus@1f,0/QLGC,isp@3,10000/sd@1,0(sd12):
Oct 7 16:30:11 IntraServe unix: WARNING: /sbus@1f,0/QLGC,isp@3,10000/sd@1,0(sd18):
Oct 7 16:30:11 IntraServe unix: SCSI transport failed: reason 'incomplete':retrying command
Oct 7 16:30:11 IntraServe unix: SCSI transport failed: reason 'incomplete':retrying command

*This tells you that they are using a QLOGIC differential card that is having the problem.

 
     
  << Back to Knowledge Center