Tuesday, 31 July 2012

Be careful with memcached keys

Using invalid characters in a memcached key can cause disasterous problems on your server, but first, the details...
Memcached's text protocol uses the space character as its delimiter. For example, the following line:

set mykey somevalue\r\n

will set the key "mykey" to the value "somevalue".

If you know that the protocol uses the space character as a delimiter it is very obvious that key's should never contain spaces as it will mess with the memcached parser. However, if you use the PHP PECL extension to access the memcached server you don't necessarily know the protocol as it is hidden from you and in none of the PHP documentation is it mentioned to not use spaces in your keys.

 

So what is the big deal though, surely it will simply not work when using an invalid key to get/set data?


Sadly, this is not the case. In fact, what will happen is that your PHP instance/script will halt completely and hold the Apache thread forever, causing Apache to start new threads/workers until your server runs out of resources and completely falls over.

 

How we discovered the problem.

We have an application that was never supposed to send keys that contained spaces in them in the first place but sadly, out of millions of requests received every few hours, some actually did contain spaces that managed to slip past our checks (thanks to a crappy reg-ex check mainly). I noticed our server would simply start running out of resources and the number of Apache processes kept climbing and climbing.

Debugging the problem took 2 full days as I didn't know what was causing the issue in the first place and at first couldn't replicate the problem on my test server. What i did know was that amongst one of the many features we added to our application, one was to use memcached for some additional memory storage we required caching and in the back of my mind I had a feeling that memcached had something to do with the problem. I just couldn't replicate it, not knowing that spaces in keys caused the problem or that there was keys with spaces in them.

Through process of elimination I eventually found the problem and I will show an example below on how you can replicate the problem yourself. Don't run this on a live server, but you can easily test it on your development machine. This simple PHP script will crash the PHP instance/Apache thread and is one of the very few things you can do to cause such a serious problem:
<?php
    $memcached = new Memcached();
    $memcached->addServer("localhost", 11211);
    echo "Setting key (and hanging instance)<br>";
    $memcached->set('spaced key', 'Some data', 60);
   
    echo "Your script will never reach this point and will never timeout either<br>";
    echo $memcached->get('spaced key');
?>

To recover from the problem simply restart your Apache server:
/etc/init.d/apache2 restart

 

How you can avoid the problem.

Avoiding the problem is very simply. One method is to convert all keys to valid keys before getting or setting values. As an example, you can use the function below:
function memcachedKey($key) {
    return preg_replace("/[^A-Za-z0-9]/", "", $key);
}


You can then use the function like this:
$memcached->set(memcachedKey('spaced key'), 'Some data', 60);
$memcached->get(memcachedKey('spaced key'));

Another method (but which I personally don't like) is to MD5 your keys before setting and getting them. The reason I don't like it of course is that an infinite number of key names may end up with the same names so use this MD5 hashing only when you know that you won't deal with an infinite number of different strings as keys.

 

Other things to check.

Make sure your key is never empty either as it will produce the same problem.

 

Conclusion

I sincerely hope that others that run into the same problem doesn't have to spend so much time trying to fix it and that they'd come upon this blog post before pulling their hair out. Google, now go forth and do your job. :)

Wednesday, 28 March 2012

PHP arrays and Copy-on-write

What is Copy-on-write and why is it important to understand?

To start with this topic, lets see what happens when we assign one array to another and then change the first element of the first array:

$a = array("apples", "oranges", "peaches");
$b = $a;
$a[0] = "grapes";
print_r($a);
print_r($b);
 

Results:
Array ( [0] => grapes [1] => oranges [2] => peaches )
Array ( [0] => apples [1] => oranges [2] => peaches )
As with most simple variables, arrays are passed and assigned by value, in other words, when we did $b = $a we seemingly created a duplicate of the original array. We know this because we changed the first entry of the first array and the change wasn't reflected in the second array.

If you are used to other dynamic languages however you'd think that assigning an array would perform an assign-by-reference in order for the array not to be duplicated in memory. You would then assume that changing one array would also change the original array since they are basically the same array. Duplicating arrays as PHP is doing here would cause concern for anyone who cares about memory usage and who uses a lot of arrays to pass data around. You would also rightfully be concerned about the speed impact of having PHP duplicating arrays all the time.

But this... this is madness!?

Luckily for us, there is method behind all this madness. PHP uses what is called copy-on-write technology. What this means is that the array is actually assigned by reference and that a copy of the array is only made if any one of the arrays is changed later on. When we did $b = $a there was still only one copy of the array in memory up to the point when we changed one of them.

There is a question though... If you really don't want PHP to duplicate arrays, ever, should you always implicitly pass/assign arrays by reference, e.g.:
$b =& $a;


Well, yes you could if you really wanted to, but be careful since passing an array to a function by reference, e.g.: function test(&$parameter){} is actually slower than just passing the array the usual way, e.g.: function my_function($parameter) {} since PHP needs to do extra work behind the scenes.

Some links regarding the topic:
Research paper on Copy-On-Write in PHP (PDF)
http://php.net/manual/en/functions.arguments.php
http://www.php.net/manual/en/features.gc.refcounting-basics.php
http://php.net/manual/en/internals2.variables.intro.php

Friday, 3 February 2012

Daily Link Round-up

In this 2 part blog post, John McCutchan writes about implementing WebSockets into game servers as a means to provide remote web based administration to the game servers.

As the title says, Eliot talks about trying out Twisted Matrix's Websocket functionality. Twisted Matrix itself is a great Python networking framework. As a big fan myself I do suggest you give them a visit. While I am on Websockets, also check out Autobahn Websockets RPC/PubSub.

From the website: "When debugging a web page, the last thing one needs is to have the browser crash under the memory-hogging ability of a plug-in. All web developers have been there with Firebug and its propensity to make a web page either incredibly slow or take the browser down with it.
Firebug is still my web debugger of choice, but Firefox has taken steps towards closing the gap with its new Firefox 10 release. The video below shows off the new features:"

"Dive Into HTML5 seeks to elaborate on a hand-picked Selection of features from the HTML5 specification and other fine Standards. The final manuscript has been published on paper by O’Reilly, under the Google Press imprint. Buy the printed Work — artfully titled “HTML5: Up & Running” — and be the first in your Community to receive it. Your kind and sincere Feedback is always welcome. The Work shall remain online under the CC-BY-3.0 License."