Wednesday, November 09, 2011

Troubleshooting memory allocation errors in Elastic MapReduce

Yesterday we ran into an issue with some Hive scripts running within an Amazon Elastic MapReduce cluster. Here's the error we got:

Caused by: Spill failed
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(
 at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(
 at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(
 ... 11 more
Caused by: Cannot run program "bash": error=12, Cannot allocate memory
 at java.lang.ProcessBuilder.start(
 at org.apache.hadoop.util.Shell.runCommand(
 at org.apache.hadoop.fs.DF.getAvailable(
 at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(
 at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(
 at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(
 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$
Caused by: error=12, Cannot allocate memory
 at java.lang.UNIXProcess.(
 at java.lang.ProcessImpl.start(
 at java.lang.ProcessBuilder.start(

Googling around for error=12, Cannot allocate memory, it seems it's a common problem. See this AWS Developer Forums thread, this Hadoop core-user mailing list thread, and this explanation by Ken Krugler from Bixo Labs.

Basically, it boils down to the fact that when Java tries to fork a new process (in this case a bash shell), Linux will try to allocate as much memory as the current Java process, even though not all that memory will be required. There are several workarounds (read in particular the AWS Forum thread), but a solution that worked for us was to simply add swap space to the Elastic MapReduce slave nodes.

You can ssh into a slave node from the EMR master node by using the same private key you used when launching the EMR cluster, and by targeting the internal IP address of the slave node. In our case, the slaves are m1.xlarge instances, and they have 4 local disks (/dev/sdb through /dev/sde) mounted as /mnt, /mnt1, /mnt2 and /mnt3, with 414 GB available on each file system. I ran this simple script via sudo on each slave to add 4 swap files of 1 GB each, one on each of the 4 local disks.

$ cat 

dd if=/dev/zero of=$SWAPFILE bs=1024 count=1048576
mkswap $SWAPFILE
swapon $SWAPFILE
echo "$SWAPFILE swap swap defaults 0 0" >> /etc/fstab

This solved our issue. No more failed Map tasks, no more failed Reduce tasks. Maybe this will be of use to some other frantic admins out there (like I was yesterday) who are not sure how to troubleshoot the intimidating Hadoop errors they're facing.

No comments: