Terminology:
Hot-Swappable Devices:
Compute/Database node HDD works with RAID 5.
DEGRADED--> The hard drive in the RAID system corrupted or damaged, the system continues to function but with restrictions in performance is called DEGRADED state.
FAILED--> There are serious problem in the RAID.
A) Pre-Replacement
SR 3-12345678: HALRT-02007: Database node hard disk failure.
Once SR is created, Oracle will create internal task and will assign Oracle Field Engineer to replace disk.
We need to schedule a visit with Oracle Field Engineer and inform them to bring the part.
2) To check the status of the current cache policy, use the following command, the current should be WriteBack, not WriteThrough
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Disk Cache Policy: Disabled
3) Check the the MegaRAID card to get the enclosure ID
Device ID: 252
4) Check for the failed disk
In this example, it is located in physical slot 2
[root@servername ~]# /opt/MegaRAID/storcli/storcli64 -pdlist -a0 | grep -iE "slot|firmware"
Firmware state: Online, Spun Up
Device Firmware Level: ORAB
Slot Number: 1
Firmware state: Online, Spun Up
Device Firmware Level: ORAB
Slot Number: 2
Firmware state: Failed
Device Firmware Level: ORAB
Slot Number: 3
Firmware state: Online, Spun Up
Device Firmware Level: ORAB
[root@servername ~]# /opt/MegaRAID/storcli/storcli64 -pdlist -a0 | grep -iE "slot|predictive|firmware"
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Online, Spun Up
Device Firmware Level: ORAB
Slot Number: 1
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Online, Spun Up
Device Firmware Level: ORAB
Slot Number: 2
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Failed
Device Firmware Level: ORAB
Slot Number: 3
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Online, Spun Up
Device Firmware Level: ORAB
5) Verify the state of the RAID is optimal or degraded, with the good disk(s) online before hot-swap removing the failed disk
Virtual Drive: 0 (Target Id: 0)
State: Degraded
Slot Number: 0
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 1
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 2
Firmware state: Failed
Foreign State: None
Slot Number: 3
Firmware state: Online, Spun Up
Foreign State: None
6) Locate command If the LED is not turned on, do this to identify the failed disk
# /opt/MegaRAID/storcli/storcli64 -PdLocate -start -physdrv[E#:S#] -a0
where E# is the enclosure ID number identified in step a, and S# is the slot number of the disk identified in step b.
In the example above, the command would be:
/opt/MegaRAID/storcli/storcli64 -PdLocate -start -physdrv[252:2] -a0
B) Replace Disk
Oracle FE will visit DC and replace disk.
C) Post-Replacement
1) Get disk status after it is physically replaced
Virtual Drive: 0 (Target Id: 0)
State: Degraded
Slot Number: 0
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 1
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 2
Firmware state: Rebuild
Foreign State: None
Slot Number: 3
Firmware state: Online, Spun Up
Foreign State: None
[root@servername ~]# /opt/MegaRAID/storcli/storcli64 -pdlist -a0 | grep -iE "slot|firmware|target|state|predictive"
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Online, Spun Up
Device Firmware Level: ORAB
Foreign State: None
Slot Number: 1
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Online, Spun Up
Device Firmware Level: ORAB
Foreign State: None
Slot Number: 2
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Rebuild
Device Firmware Level: ORAB
Foreign State: None
Slot Number: 3
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Online, Spun Up
Device Firmware Level: ORAB
Foreign State: None
2) Check all disk info
Slot Number: 2
Drive's position: DiskGroup: 0, Span: 0, Arm: 2
Enclosure position: 0
Device Id: 4
WWN: 5111E567D934A118
Sequence Number: 3
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 1.090 TB [0x8bba0cb0 Sectors]
Non Coerced Size: 1.090 TB [0x8baa0cb0 Sectors]
Coerced Size: 1.089 TB [0x8b94f800 Sectors]
Logical Sector Size: 512
Physical Sector Size: 512
Firmware state: Rebuild
Commissioned Spare : No
Emergency Spare : No
Device Firmware Level: ORAB
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x5111E567d934a119
SAS Address(1): 0x0
Connected Port Number: 9(path0)
Inquiry Data: SEAGATE ST1307IN9SUN1.2TORAB2215LC2M5A
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 12.0Gb/s
Link Speed: 12.0Gb/s
Media Type: Hard Disk Device
Drive: Not Certified
Drive Temperature :30C (86.00 F)
PI Eligibility: No
Drive is formatted for PI information: No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 12.0Gb/s
Port-1 :
Port status: Active
Port's Linkspeed: 12.0Gb/s
Drive has flagged a S.M.A.R.T alert : No
Exit Code: 0x00
3) Monitor disk rebuild progress
Rebuild Progress on Device at Enclosure 252, Slot 2 Completed 21% in 2 Minutes.
Estimated time left is 2 Hours 38 Minutes.
Exit Code: 0x00
4) Output after completion of disk rebuild
Device(Encl-252 Slot-2) is not in rebuild process
Exit Code: 0x00
5) Check disk status and it should be "Online".
Virtual Drive: 0 (Target Id: 0)
State: Optimal
Slot Number: 0
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 1
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 2
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 3
Firmware state: Online, Spun Up
Foreign State: None
[root@servername ~]# /opt/MegaRAID/storcli/storcli64 -pdlist -a0 | grep -iE "slot|firmware|target|state|predictive"
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Online, Spun Up
Device Firmware Level: ORAB
Foreign State: None
Slot Number: 1
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Online, Spun Up
Device Firmware Level: ORAB
Foreign State: None
Slot Number: 2
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Online, Spun Up
Device Firmware Level: ORAB
Foreign State: None
Slot Number: 3
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
Firmware state: Online, Spun Up
Device Firmware Level: ORAB
Foreign State: None
Useful ORACLE MOS Doc ID:
NOTE:1967510.1 - How to Replace an Exadata X4-8/X5-2 (or later) Compute Node Server HDD (Predictive or Hard Failure) (Doc ID 1967510.1)
NOTE:1360343.1 - INTERNAL Exadata Database Machine Hardware Current Product Issues
NOTE:1360360.1 - INTERNAL Exadata Database Machine Hardware Troubleshooting
NOTE:1416303.1 - How to identify which Exadata disk FRU part number to order , based on image, vendor and mixed disk support status
NOTE:1113034.1 - HALRT-02007: Database node hard disk failure
NOTE:1113014.1 - HALRT-02008: Database node hard disk predictive failure
NOTE:1084360.1 - Bare Metal Restore Procedure for Compute Nodes on an Exadata Environment
NOTE:1071220.1 - Oracle Sun Database Machine V2 Diagnosability and Troubleshooting Best Practices
NOTE:1452325.1 - Determining when Disks should be replaced on Oracle Exadata Database Machine
NOTE:1274324.1 - Oracle Sun Database Machine X2-2/X2-8 Diagnosability and Troubleshooting Best Practices
No comments:
Post a Comment