We are working on the site stability issueby Thomas Hruska on Jan. 17, 2011, under Site changes, Technical Glitches
Hello from your local computer nerd here at TNI. As you can tell, we’ve mostly returned to normal now that the amount of traffic to site has also decreased a bit. However, we are still working to improve the load on this poor server that had to endure all sorts of problems this past week. Bloggers feel the most pain whenever the server goes down because they can’t get to the site to post. Mark is unhappy whenever that happens too because that means you can’t view the site and his page views go down. We’re all a bit frustrated about the situation, but the reality is: WordPress is a resource hog.
Here is a picture showing the server load during the past week:
Simply put, as soon as the word was out about the shooting, this site went down. A friend told me about the shooting and, literally 30 seconds later, I got a phone call from Mark saying the site was inaccessible. A couple months ago we discovered that any time the box hits 100% CPU, it generally can’t keep up with the requests and ends up falling all over itself – even after the connections to the server have long been terminated. The only “solution” so far is to reboot the machine. We went down at least 60 times the first day alone. Having more users than a server can handle is supposedly a good problem to have. But not if someone has to camp out next to the restart button. And that someone is usually Mark. He gets grumpy when he has to do that. Understandable because going into work on a weekend is no fun.
Obviously, rebooting the machine is not a solution. Rebooting only temporarily covers up the real problem. At the end of last week, we finally got another server set up to run tests on.
The machine the Citizen website currently runs on is no slouch. It is a Dual-Core Intel Xeon @ 2.66 GHz (4 cores) with 4MB L2 cache per processor (8MB total), 10GB RAM, bus speed is 1.33GHz, and a little under 1TB hard drive space in some RAID configuration. The test machine is slightly less impressive in some ways, more in others.
At any rate, the site runs at 60% CPU utilization – on average. For the moment. The goal over the next couple of weeks is to get that down to something smaller. Mark wants to grow the site but the current setup clearly can’t handle too much more growth. The first thing we did after getting the test server up was install a better WordPress caching plugin on the test server, made sure it worked, pushed it to the live server, and then dealt with the minor fallout from that. Nothing appears to have changed on the CPU utilization front, which is a disappointment, but the plugin probably helps significantly whenever there is a large influx of visitors. We have noticed a drop in the number of connections to MySQL, which does help things a bit.
APC, eAccelerator, or another PHP opcode cache is now at the top of our list to get CPU utilization down. We’ve held off on trying this because we didn’t have a test server to use. Opcode caches for PHP are invasive bits of software that have to be custom-built but supposedly increase PHP’s performance significantly.
After that, who knows what we’ll try. For better or worse, we’re stuck with WordPress for various reasons. So it boils down to fixing the system itself to better handle the load. Now that we’ve got another box, we can finally try some new things we’ve been wanting to try. To summarize: We’re trying to make the experience of loading pages faster for everyone.