If your Dell database servers get slow suddenly, and I/O seems sluggish, do yourself a favor and check if the RAID battery is currently going through its 'relearning' cycle. If this is so, then the Write-Back policy is disabled and Write-Through is enabled -- as a result writes become very slow compared to the standard operation.
Details:
This turns out to be a fairly well known problem with RAID controllers in Dell servers, specifically LSI controllers. The default mode of operation for the RAID battery is to periodically go through a so-called 'relearn cycle', where it discharges, then charges and recalibrates itself by finding the current charge. In this timeframe, as I mentioned, Write-Back is disabled and Write-Through is enabled.
For our MySQL servers, we have innodb_flush_log_at_trx_commit set to 1, which means that every commit if flushed to disk. In consequence, the Write-Through mode will severely impact the performance of the writes to the database. A symptom is that CPU I/O wait is high, and the database gets sluggish. Pain all around.
We started to experience this database slowness on 3 database server at almost the same time. Two of them were configured as slaves, and one as master. The symptoms included high CPU I/O wait, slow queries on the master, and replication lag on the slaves. Nothing pointed to something specific to MySQL. We opened an emergency ticket with Percona and were fortunate to be assigned to Aurimas Mikalauskas, a Percona principal consultant and a MySQL/RAID hardware guru. It took him less than a minute to correctly diagnose the issue based on these symptoms. Now that we knew what the issue was, some Google searches turned out other articles and blog posts talking about it. It turns out one of the most frequently cited posts belongs to Robin Bowes, my ex-coworker from RIS Technology/Reliam! It also turns out Percona engineers blogged about this issue extensively (see this post which references other posts).
In any case, for future reference, here is what we did on all the servers that have the LSI MegaRaid controller (these servers are Dell C2100s in our case):
1) Install MegaCli utilities
I had a hard time finding these utilities, since the LSI support site doesn't seem to have them anymore. I found this blog post talking about a zip file containing the tools, then I googled the zip filename and I found an updated version on this Gentoo-related site. Then I followed the steps in the blog post above to extract the statically-linked binaries:
# apt-get install rpm2cpio
# mkdir megacli; cd megacli
# wget http://download.gocept.com/gentoo/mirror/distfiles/4.00.11_Linux_MegaCLI.zip
# unzip 4.00.11_Linux_MegaCLI.zip
# unzip MegaCliLin.zip
# rpm2cpio MegaCli-4.00.11-1.i386.rpm| cpio -idmv
At this point I had 2 statically-linked binaries called MegaCli and MegaCli64 in megacli/opt/MegaRAID/MegaCli.
2) Inspect event log for RAID controller to figure out what has been going on in that subsystem (this command was actually run by Aurimas during the troubleshooting he did):
# ./MegaCli64 -AdpEventLog -GetSinceReboot -f events.log -aALL
# cat events.log
...
Event Description: Time established as 09/09/11 15:27:18; (48 seconds since power on)
--
Time: Fri Sep 9 16:27:36 2011
Event Description: Battery temperature is normal
--
Time: Fri Sep 9 16:56:51 2011
Event Description: Battery started charging
--
Time: Fri Sep 9 17:08:46 2011
Event Description: Battery charge complete
--
Time: Sat Sep 10 19:54:16 2011
Event Description: Battery relearn will start in 4 days
--
Time: Mon Sep 12 19:53:46 2011
Event Description: Battery relearn will start in 2 day
--
Time: Tue Sep 13 19:54:36 2011
Event Description: Battery relearn will start in 1 day
--
Time: Wed Sep 14 14:54:16 2011
Event Description: Battery relearn will start in 5 hours
--
Time: Wed Sep 14 19:55:26 2011
Event Description: Battery relearn pending: Battery is under charge
--
Time: Wed Sep 14 19:55:26 2011
Event Description: Battery relearn started
--
Time: Wed Sep 14 19:55:29 2011
Event Description: BBU disabled; changing WB virtual disks to WT, Forced WB VDs are not affected
--
Time: Wed Sep 14 19:55:29 2011
Event Description: Policy change on VD 00/0 to [ID=00,dcp=01,ccp=00,ap=0,dc=0] from [ID=00,dcp=01,ccp=01,ap=0,dc=0]
Previous LD Properties
Current Cache Policy: 1
Default Cache Policy: 1
New LD Properties
Current Cache Policy: 0
Default Cache Policy: 1
--
Time: Wed Sep 14 19:56:31 2011
Event Description: Battery is discharging
--
Time: Wed Sep 14 19:56:31 2011
Event Description: Battery relearn in progress
--
Time: Wed Sep 14 22:43:21 2011
Event Description: Battery relearn completed
--
Time: Wed Sep 14 22:44:26 2011
Event Description: Battery started charging
--
Time: Wed Sep 14 23:53:46 2011
Event Description: BBU enabled; changing WT virtual disks to WB
--
Time: Wed Sep 14 23:53:46 2011
Event Description: Policy change on VD 00/0 to [ID=00,dcp=01,ccp=01,ap=0,dc=0] from [ID=00,dcp=01,ccp=00,ap=0,dc=0]
Previous LD Properties
Current Cache Policy: 0
Default Cache Policy: 1
New LD Properties
Current Cache Policy: 1
Default Cache Policy: 1
So as you can see, the battery relearn started at 19:55:26, then 3 seconds later the Write-Back policy was changed to Write-Through, and it stayed like this until 23:53:46, when it was changed back to Write-Back. This shows that the I/O was impacted for 4 hours. Luckily for us it was outside of our high traffic period for the day, but it was still painful.
3) Disable autoLearnMode for the RAID battery
This is so we don't have this type of surprise in the future. The autoLearnMode variable is ON by default. You can see its current setting if you run this command:
# ./MegaCli64 -AdpBbuCmd -GetBbuStatus -a0
# echo "autoLearnMode=1" > tmp.txt
# ./MegaCli64 -AdpBbuCmd -SetBbuProperties -f tmp.txt -a0
4) Force battery relearn cycle
It is still recommended to run the battery relearn cycle manually periodically, so we did it on all servers that are not yet in production. For the rest of the servers we'll do it at night, during a time frame when traffic is lowest. In the future, we'll take maintenance windows every N months (where N is probably 6 or 12) and force the relearn cycle.
Here's the command to force the relearn:
# ./MegaCli64 -AdpBbuCmd -BbuLearn -a0
For reference, LSI has good documentation for the MegaCli utilities on one of their KB sites. Another good reference is this Dell PERC cheatsheet.
I hope this will be a good troubleshooting guide for people faced with mysterious I/O slowness. Thanks again to Aurimas from Percona for his help. These guys are awesome!
6 comments:
read/write caching cheatsheet:
http://hwraid.le-vert.net/wiki/LSIMegaRAIDSAS
deb packages:
---
# cat > /etc/apt/sources.list.d/hwraid
deb http://hwraid.le-vert.net/debian squeeze main
^D
# apt-get install megacli megaclisas-status megactl
and unified nagios plugin:
http://ftp1.pld-linux.org/people/glen/nagios-plugin-check_raid-2.1.1.97.tar.gz
Thanks bk! These are very good resources.
The servers were very sluggish today but we never thought of checking the battery. Our heads are already painful from the headache (and from scratching). Google didn't help much.
I guess, checking the battery will be added to the checklist.
Ma bucur ca ai avut timp si de blog cu atatea pe cap :-)
Very useful blog! Thanks for that, I'd never have thought to look at the battery!
By the way, I found the latest version of megacli for windows on the LSI site :
http://www.lsi.com/downloads/Public/MegaRAID%20Common%20Files/8.00.46_Windows_MegaCLI.zip
Robin Bowes's blog post is at https://blog.yo61.com/dell-drac-bbu-auto-learn-tests-kill-disk-performance/
Post a Comment