I/O Problem Solved

Ok, this, I suppose, is a teardown of how I messed up in diagnosing the problem that I was talking about yesterday.As I wrote, I saw a lot of I/O happening on my EBS volume attached to my EC2 host in AWS.The steps I took to diagnose were:

  • Check iotop to see what's causing the problem
  • Notice that httpd (Apache) was the cause
  • Run strace to check what's being written to
  • Verify that the PHP APC caching was using /tmp
  • Verify that /tmp was served out of shared memory

Well, I messed up on at least one of these. On top of that I wasn't 100% right on another.So, iotop wasn't lying at all. Apache was doing writes. This part was correct.Where I started on the path of screwing up is with strace. I saw that httpd was writing somewhere but I didn't see what or how much was being written in large batches. Sure, I saw bits of the access, but not megs of writes. This wasn't adding up.I looked around to see what it might be. One thing that I saw was someone talking about how APC might've been set up wonky or something. Sure enough it was pointing to /tmp. I was able to verify this fact using lsof (LiSt of Open Files). I wasn't seeing any writes to it.I checked the mounts:

/dev/xvda1 on / type ext4 (rw,noatime)proc on /proc type proc (rw)sysfs on /sys type sysfs (rw)devpts on /dev/pts type devpts (rw,gid=5,mode=620)tmpfs on /dev/shm type tmpfs (rw)none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)

tmpfs is on /dev/shmThat, my friends, is the big fail. I read that line saying that /tmp is on the shared memory device. Actually, I read that backwards. tmpfs is the source and it is mounted on /dev/shm.The reason I wasn't seeing this in strace is that this wasn't going through normal system calls which strace is watching. No, this was through a memory mapped file. My diagnostics weren't doing what I thought it would be doing.All I did was change one line in my /etc/php.d/apc.ini file:

apc.mmap_file_mask=/apc.shm.XXXXXX

Yeah, this is a dumb set of mistakes.Now this is what happened to the disk graph:The line fell to the floor.The way it should be.Lesson to be learned: just because you checked your assumptions doesn't mean you did it right. Check again after taking a break. Look at it with a fresh set of eyes the next day.

Previous
Previous

Under the covers

Next
Next

Hosting and I/O