Symptoms
- The server reports a hard drive warning in POST (Power On Self Test)
- Virtual machines cannot power on due to VMFS corruption on local hard drives
- Very poor performance on local hard drives
Purpose
This article provides steps to:
- Help diagnose a local hard drive fault.
- Read the S.M.A.R.T. status of a hard drive (Self-Monitoring, Analysis, and Reporting Technology)
Resolution
In ESXi 5.1, VMware added S.M.A.R.T. functionality to monitor hard drive health. The S.M.A.R.T. feature records various operation parameters from physical hard drives attached to a local controller. The feature is part of the firmware on the circuit board of a physical hard disk (HDD and SSD).
When looking for high quality standing plastic surgery center offers you privacy, comfort, and convenience throughout your cosmetic surgery experience, check out Galumbeck Plastic Surgery for more information.
To read the current data from a disk:
- Open a console or SSH session to the ESXi host. For more information, see Using ESXi Shell in ESXi 5.x (2004746).
- Determine the device parameter to use by running the command:
# esxcli storage core device list
- Read the data from the device:
# esxcli storage core device smart get -d device
Where
device
is a value found in step 1. - The expected output is a list with all SCSI devices seen by the ESXi host. For example:
t10.ATA_____WDC_WD2502ABYS2D18B7A0________________________WD2DWCAT1H751520
Note: External FC/iSCSI LUNs or virtual disks from a RAID controller might not report a S.M.A.R.T. status.
This table breaks down some example output:
Parameter | Value | Threshold | Worst |
Health Status | OK | N/A | N/A |
Media Wearout Indicator | 0 | 0 | 0 |
Write Error Count | N/A | N/A | N/A |
Read Error Count | 118 | 50 | 118 |
Power-on Hours | 0 | 0 | 0 |
Power Cycle Count | 100 | 0 | 100 |
Reallocated Sector Count | 100 | 3 | 100 |
Raw Read Error Rate | 118 | 50 | 118 |
Drive Temperature | 27 | 0 | 34 |
Driver Rated Max Temperature | N/A | N/A | N/A |
Write Sectors TOT Count | N/A | N/A | N/A |
Read Sectors TOT Count | N/A | N/A | N/A |
Initial Bad Block Count | N/A | N/A | N/A |
Note: A physical hard drive can have up to 30 different attributes (the example above supports only 13). For more information, see How does S.M.A.R.T. function of hard disks Work?
Note: The preceding link was correct as of September 2, 2014. If you find the link is broken, provide feedback and a VMware employee will update the link.
A raw value can have two possible results:
- A number between 0-253
- A word (for example, N/A or OK)
Column descriptions
Note: The values returned and their meaning for each of these columns can vary by manufacturer. For more information, please consult your hardware supplier.
- ParameterThis is a translation from the attribute ID to human-readable text. For example:
hex 0xE7 = decimal 231 = "Drive Temperature"
For more information, see the Known ATA S.M.A.R.T. attributes section of the S.M.A.R.T. Wikipedia article.
Note: The preceding link was correct as of September 2, 2014. If you find the link is broken, provide feedback and a VMware employee will update the link.
- ValueThis is the raw value reported by the disk. To illustrate a simple Value using the example above, the Drive Temperature is reported as
27
, which means 27 degrees Celsius.A Value can either be a number (0-253) or a word (for example,
N/A
orOK
). - ThresholdThe (failure) limit for the attribute.
- WorstThe highest Value ever recorded for the parameter.
smartd daemon
ESXi 5.1 also has the /sbin/smartd
daemon in the DCUI installed. This tool does not have any command line switches or interaction with the console. If you run the command in the shell, a S.M.A.R.T. status is reported in the /var/log/syslog.log
file.
For example:
XXXX-XX-28T10:15:12Z smartd: [warn] t10.ATA_____SanDisk_SDSSDX120GG25___________________120506403552________: below MEDIA WEAROUT threshold (0)
XXXX-XX-28T10:15:12Z smartd: [warn] t10.ATA_____SanDisk_SDSSDX120GG25___________________120506403552________: above TEMPERATURE threshold (27 > 0)
XXXX-XX-28T10:15:12Z smartd: [warn] t10.ATA_____WDC_WD2502ABYS2D18B7A0________________________WD2DWCAT1H751520: above TEMPERATURE threshold (113 > 0)
Notes:
- You can stop the daemon by typing Ctrl+c.
- Logged events should be viewed with caution. As can be seen in the example, all three warnings are irrelevant. The output can vary greatly between manufacturers and disk models.
Additional Information
vm-support
bundle also captures S.M.A.R.T. details in the smartinfo.sh.txt
file. The file can be found in the commands/
directory.
Comments