Friday, 3 February 2012

Daily Link Round-up

In this 2 part blog post, John McCutchan writes about implementing WebSockets into game servers as a means to provide remote web based administration to the game servers.

As the title says, Eliot talks about trying out Twisted Matrix's Websocket functionality. Twisted Matrix itself is a great Python networking framework. As a big fan myself I do suggest you give them a visit. While I am on Websockets, also check out Autobahn Websockets RPC/PubSub.

From the website: "When debugging a web page, the last thing one needs is to have the browser crash under the memory-hogging ability of a plug-in. All web developers have been there with Firebug and its propensity to make a web page either incredibly slow or take the browser down with it.
Firebug is still my web debugger of choice, but Firefox has taken steps towards closing the gap with its new Firefox 10 release. The video below shows off the new features:"

"Dive Into HTML5 seeks to elaborate on a hand-picked Selection of features from the HTML5 specification and other fine Standards. The final manuscript has been published on paper by O’Reilly, under the Google Press imprint. Buy the printed Work — artfully titled “HTML5: Up & Running” — and be the first in your Community to receive it. Your kind and sincere Feedback is always welcome. The Work shall remain online under the CC-BY-3.0 License."

MySQL's (crappy) Spatial Extensions



Introduction
In the world of mapping it is often required to determine whether a certain point on a map falls within pre-defined zones or areas, or the opposite, to determine the specific points that falls in a particular zone or area. These zones are defined by shapes such as polygons, rectangles and circles/ovals and are often referred to as geospatial data or objects. Luckily for database designers many of the popular database engines now supports the indexing and storage of these geospatial objects for quick indexing and searching.

Since geospatial technology in itself is a rather vast and relatively complex subject I'm not going to discuss it in great depth here. I need to mention though that most database engines support the OpenGeo specification for spatial data. You can read more about it here.

MySQL's Spatial Implemention and its Catches
Let me first say that although MySQL is an immensely popular database engine it does have its drawbacks in some areas. Its Geospatial implementation is one of those areas unfortunately. Lets have a quick overview of these issues:

Problem 1: Even in the latest and greatest MySQL v5.6 InnoDB has no support for spatial indexing, meaning that although it supports the storage and some functionality revolving around spatial data, it is all going to be terribly slow to try and lookup anything. This means that if you are an InnoDB user you'll have to revert to MyISAM for fast spatial index based lookups.

Problem 2: Geospatial data can not be dumped or exported. That's right, MySQL stores the geospatial data in a binary format that is completely ignored by mysqldump's --hex-blob and --compatible=target parameters. What you get instead is a whole lot of binary garbage in your dumped text file that you will be unable to use for imports in the feature. What this comes down to is that you will have to ignore the table when you do a mysqldump (e.g. --ignore-table=yourdb.yourgeotable) and write your own tool to parse and dump the geospatial data. This issue is absolutely terrible in my books.

Problem 3: Although fixed now, MySQL 5.0.16 had a bug where if you tried to use geospatial functions using InnoDB it would literally crash the server instead of just generating SQL errors. If you are still dealing with an older unpatched MySQL 5.0.16 be aware.

Problem 4: Since you have to use MyISAM for your geospatial indexes you have no support for transactions.

Problem 5: The limited implementation of OpenGeo (OGC) function support can make it really hard or impossible to use MySQL for complex geospatial functionality.

Conclusion
Using MySQL's Spacial implementation is Ok if you are willing to use MyISAM for your table structure and if you intend to use fairly basic indexing of geospatial features. It is not absolutely worthless but I really hope that Oracle will allow MySQL to catch up to other database engines in this regard in the near future.

Thursday, 2 February 2012


Daily Link Round-up
The Evil Unit Test (Blog Post)
In this blog post, Alberto Gutierrez complains about the fact that programmers tend to use Unit Test's so religiously, without regard to its use and practicality in all situations.




In this nearly 2 hour presentation, Randal Schwartz presents "Introduction to Git", presented on January 5th, 2012 for the monthly UUASC-LA meeting.



From the SourceForge page: "Alenka is a modern analytical database engine written to take advantage of vector based processing and high bandwidth of modern GPUs:
  • Vector-based processing: CUDA programming model allows a single operation to be applied to an entire set of data at once. 
  • Self optimizing compression: Ultra fast compression and decompression performed directly inside GPU.
  • Column-based storage: Minimize disk I/O by only accessing the relevant data.
  • Fast database loads: Data load times measured in minutes, not in hours.
  • Open source and free."
So far this is still in very early development and lacks some serious features when compared to current database solutions. But it is definitely showing potential in terms of performance and who knows, one day it might just lead to something big.

In the above linked .pdf file from 2003 an email discussion goes on about certain issues on the microsoft.com website. Things get interesting when Bill Gates starts ranting (from page 3) about the trouble users have to go through to download software from the microsoft.com website. Things have certainly changed a lot since then for Microsoft but I have not doubt things got pushed significantly when the big boss (who is a geek himself after all) voiced his utter dissatisfaction in cases such as these (together with some competition from other tech companies). This is the very reason a large company such as Microsoft needs some clear direction from the top boss. Of course the opposite is often true as well, a great deal of businesses have been pushed over cliff edges due to poor management decisions and lack of drive.

A quick look at ApacheBench

Introduction
Benchmarking tools usually performs stress tests on a piece of software to see how it performs under pressure or heavy load. In turn one can optimize your source code or server configuration as a result of the benchmarking tests. There are a number of HTTP server benchmarking tools available, including ApacheBench, Apache JMeter, curl-loader, openSTA, HttTest and httperf. Today I will be posting about ApacheBench, or known simply as, ab.

ApacheBench is a rather simple and basic tool but then again it is quite easy to use. I installed it on my Ubuntu setup using apt-get:
$ sudo apt-get install apache2-utils

Lets take a look at a basic test and its result:
$ ab -n 100 http://localhost/mysite/index.php

The above test basically makes the same request 100 times to the specified URL. The length of time the test takes obviously depends on the number of requests you specified, the rendering speed and the output size of your site as well as the speed of your server/PC and connection speed. The site I tested is a PHP based site hosted on my own PC and the results showed the following:

Server Software:        Apache/2.2.14
Server Hostname:        localhost
Server Port:            80

Document Path:          /mysite/index.php
Document Length:        6937 bytes

Concurrency Level:      1
Time taken for tests:   6.492 seconds
Complete requests:      100
Failed requests:        98
   (Connect: 0, Receive: 0, Length: 98, Exceptions: 0)
Write errors:           0
Total transferred:      737279 bytes
HTML transferred:       693479 bytes
Requests per second:    15.40 [#/sec] (mean)
Time per request:       64.915 [ms] (mean)
Time per request:       64.915 [ms] (mean, across all concurrent requests)
Transfer rate:          110.91 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    10   65 318.0     20    2895
Waiting:       10   65 318.0     20    2895
Total:         10   65 318.0     20    2895

Percentage of the requests served within a certain time (ms)
  50%     20
  66%     22
  75%     24
  80%     27
  90%     41
  95%     61
  98%   1407
  99%   2895
 100%   2895 (longest request)

The most useful results here are the "Time taken for test", "Requests per second" and "Time per request" readings. I can for example see that the complete time it took to run the tests were 6.492 seconds and that, on average, it took 64.915ms (milliseconds) per request.

However, notice that the result is also showing that 98 of the 100 tests were failed requests, does this really mean that almost all our requests failed? Luckily for us it doesn't, the Failed Requests result is a little misleading at first. If you look closely it actually tells us that 98 of the requests returned were Length errors: (Connect: 0, Receive: 0, Length: 98, Exceptions: 0)

But all this means is that after the initial HTTP response, subsequent responses contained differently sized HTML documents. Since the site I was testing creates dynamic content this is bound to happen and therefore there is no reason for me to worry about the reading as the actual test results are still valid.

Concurrent Testing
Unless you only have 1 user ever visiting your site (kind of like my blog) it is quite meaningless to test your site this way. We need to add some concurrency to simulate multiple users accessing your site at the same time. To do this we use the -c flag and by specifying a number of concurrent connections, for example:
$ ab -c 10 -n 100 http://localhost/mysite/index.php

This still means we will only be performing 100 test requests, but we will be making 10 requests at a time instead of 1.

Using KeepAlive
There are some catches when using ab you will need to look out for, one of them is that the KeepAlive option is turned off by default. This means that every request sent to the server is done over a new connection which is terribly slow and effects your test results. If your own site is configured to handle requests over the same connection by using KeepAlive then it will make sense to turn on KeepAlive for your benchmarking tests as well. This can be done using the -k flag. Example:
$ ab -ck 10 -n 100 http://localhost/mysite/index.php
or
$ ab -k -c 10 -n 100 http://localhost/mysite/index.php

Other useful options
Lets take a look at some of the other useful options as well:
  • -A auth-username:password - Supply BASIC Authentication credentials to the server. The username and password are separated by a single : and sent on the wire base64 encoded. The string is sent regardless of whether the server needs it (i.e., has sent an 401 authentication needed).
  • -e csv-file - Write a Comma separated value (CSV) file which contains for each percentage (from 1% to 100%) the time (in milliseconds) it took to serve that percentage of the requests.
  • -p POST-file - File containing data to POST.
  • -t timelimit - Maximum number of seconds to spend for benchmarking. This implies a -n 50000 internally. Use this to benchmark the server within a fixed total amount of time. Per default there is no timelimit.
  • -w - Print out results in HTML tables. Default table is two columns wide, with a white background. 
Feel free to look at the official ab man pages (manual) here.

Happy Benchmarking!