Posts Tagged ‘mongodb’

The long dark tea-time of the node

Thursday, November 4th, 2010

Looking back I would’ve never guessed this project would have had so many twists and turns. I set out to build a game that was about synchronizing with computer generated rhythms. Ie. Make the box move at the same rhythm as a sine wave. I really liked the idea of something so simple requiring intense concentration. It’s immersive but really really simple. Anyways, I set out to craft this thing with some jQuery, plain js, and recursive timeouts. At some point I thought it might be fun to track the score and even make an easy way to share your score on Twitter or Facebook. I figured I’d make some type of nodejs app to handle the sharing and the rest could be static html pages.

What followed was a journey through two different types of oauth implementations, several nodejs modules and many greps through codebases to figure out how and why this worked or didn’t. Since much of this will be useful to some other poor sap building with the same tools, I’ll outline here what I did and how it all works together.

Javascript, front to back

Animation
Building the game itself took some time but was pretty straight-forward. jQuery is extremely useful for binding events and such, but I tried to use bare js inside the loops for efficiency. Since I’m not using canvas here, just manipulating positions of dom objects, it’s fairly processor intensive. With that said Chrome and Safari handle it with ease. Firefox will make your processor fan spin up.

The core of the animation stuff is variations on this simple recursive timeout:

block = $('#block');
move_timeout = null;        
(function send_it_down() {
  move_timeout = setTimeout(function() {
    block.css('top',block.css('top')+1);
    send_it_down();
  },30);
})();
// to stop animation
// clearTimeout(move_timeout);

jQuery has some nice animation functionality but it’s all based on a fixed time of execution in which each step’s timeout is based on the amount of steps to execute and the total time for the animation to run. Rolling my own animation code allowed me to control an infinitely executing constant speed animation.

The jQuery api docs and w3schools js ref got me through anything tricky and Wolframs Mathworld was the key to the color game (intially it was a spirograph pattern, then a simpler nephroid, finally a circle).

OAuth
First thing’s first. What is OAuth and how do I use it? In order to make sharing your score easy I had to write an app that interfaced with both Twitter and Facebook’s apis. I heard Twitter was simpler and their docs were better so I started there. A link in the twitter docs to this Oauth Guide made for an easy introduction. Twitter is currently using an OAuth 1.0 implementation that is pretty true to what you would expect after reading the guide. The only difference was that Twitter seems to store your callback url for you on your request token call so you don’t need to supply it on your auth token call. Facebook has an OAuth 2.0 implementation that requires ssl and has fewer steps. The only catch on Facebook is they’re sticklers for your callback url. It must be present in both your authorize call as well as your auth token call and it must match exactly.

Now that I know what OAuth is and have a basic understanding of how to interface with Twitter and Facebook’s apis, I searched out an OAuth module for nodejs.

Nodejs
I found some buzz around Ciaranj’s node-oauth and found it to a pretty good wrapper for the functionality I would need. It’s lacking a way to make POST requrests in the OAuth2 code, but writing my own wasn’t too tough. I had my eye on Expressjs as a framework for building http handlers for this app. It’s a sweet little implementation with a very simple premise and good docs. For my datastore I decided to take another crack at MongoDB. I’ve built PHP apps that interfaced with Mongo but nothing with nodejs yet.

So far Nodejs has been a great tool for ad hoc api clients and prototypes. This is the first server app I’ve written with nodejs. A few things I noticed that were unexpected. One, nesting callbacks can become tedious. This was especially true with the MongoDB module. I don’t see any way to get around it, but it’s ugly. The other thing was that error handling with callbacks is much different than exception handling within a single thread of execution. Wrap anything with a callback in a try/catch and you’ll see it’s useless. So becoming expectant on an onError() method for all of the connections is the norm.

Other challenges with nodejs included: since my nodejs app is listening on a different port than my html pages are being served from, I had to do some extra redirection at the end of my OAuth process so that my final landing page of my popup would be able to call back to the main window and trigger a success event. And any exception that I didn’t catch takes down my app completely.

With that aside I was able to craft a sweet little app in less than a couple hundred lines that is fast and easy. I can manage the sharing process with a couple ajax calls and a bind to my custom success event.

nDistro
Installing nodejs modules can be done a few different ways. Initially I was using npm. It worked ok but since it installs to a central npm repository, and nodejs and it’s modules version so quickly, it made more sense to use nDistro. It allows you to manage your nodejs versions as well as your module versions from a simple config file. This is super useful since I can literally break my app with the wrong version.

I found one oddity using nDistro to install MongoDB’s node module. If you compile the C version of the BSON parser and intend to use it with your code, it will not be available. The reason for this is that nDistro symlinks each module to a common lib/node directory. There is a relative path that is broken in one of mongodb’s require statements for including the compiled BSON parser. I was able to fix this by setting up a symlink in the lib directory. Another issue with the MongoDB module is that in order to compile the native BSON parser you need to have a full nodejs install with the node-waf binary in an available path. The node binary installed by nDistro is not sufficient. After all of this I had errors compiling BSON on my vps that were not on my mac.. The js parser will work fine given the size of my records.

MongoDB
MongoDB showed up on my radar over a year ago and since then it’s gotten easier to use and more reliable. The 10gen developers have put a lot of effort into building good libraries in many languages to interface with MongoDB. With the postmortem after FourSquare’s recent outage, I got a little window into the scale that mongo seems to be handling well. FTA the outage was due to overactivity in one of the shards that was running too close to memory capacity. Given that FourSquare is running multiple 66GB RAM EC2 mongodbs and they just hit their 200 millionth checkin I am confident that my 2 dozen friend’s tweet/post counts will be handled with grace and speed. :)

Installing MongoDB is pretty much a matter of unzipping a tgz and running mongod from the command line.

Putting it all together

This post has been short on code snippets cus I’m hosting the entire codebase at the Github. This is my first run at a nodejs app with mongo running live and I am curious to see how stable everything is. My new linode server is purring like a kitten with anything I throw at it. Hopefully the tunr app continues the trend.

Oh and you can play tunr here.

MongoDB and Symfony. Yes? :) Part 2: Simple Queries

Sunday, August 23rd, 2009

Ok welcome to the second part of this matchup between MongoDB and MySQL 5.4 via Doctrine ORM on Symfony. In our previous test MongoDB blew away MySQL’s MyISAM and InnoDB engines on raw insert speed. We were unable to use Doctrine for the inserts test as 2.5 million records quickly runs out of memory in Doctrine. So we opted for straight sql through PHP 5.3 commands with the MySQL native drivers. It wasn’t even close.

So now let’s see how MongoDB holds up against MySQL in the simple query arena.

Simple Queries

For our purposes here we’ll only use the fastest setup and we’ll vary the query parameters for different results. When I set up this test system I actually used a yaml schema so that we could test Doctrine. I had to abandon that in the insert tests, but here it should work fine.

Total Record Count
Mongo:

echo $mongo->selectDb('objects')->selectCollection('users')->count();

The response time was 25ms (only 1ms over Symfony’s overhead).

MyISAM with Doctrine ORM:

echo Doctrine_Query::create()->from('User u')->count();

The response time was 30ms (only 5ms over Symfony’s overhead).

InnoDB with Doctrine ORM:

echo Doctrine_Query::create()->from('User u')->count();

The response time was 916ms..

Unindexed query

Mongo:

$query = array(
  'pet' => 'sugar glider',
);
echo $users->find($query)->count();

This returns a count of 500k and after several tests the average response time was 950ms.

MyISAM with Doctrine ORM:

$users = Doctrine_Query::create()
   ->from('User u')
   ->where('u.pet = "sugar glider"');
echo $users->count();

This returns a count of 500k and after several tests the average response time was 870ms.

InnoDB with Doctrine ORM:

$users = Doctrine_Query::create()
   ->from('User u')
   ->where('u.pet = "sugar glider"');
echo $users->count();

This returns a count of 500k and after several tests the average response time was 1300ms.

Indexed query
Mongo:

$query = array(
  'number' => 1005,
);
echo $users->find($query)->count();

This returns a count of 250 in 25ms.

MyISAM with Doctrine ORM:

$users = Doctrine_Query::create()
   ->from('User u')
   ->where('u.onumber = 1005');
echo $users->count();

This returns a count of 250 in 30ms.

InnoDB with Doctrine ORM:

$users = Doctrine_Query::create()
   ->from('User u')
   ->where('u.onumber = 1005');
echo $users->count();

This returns a count of 250 in 30ms.

Indexed range query
Mongo:

$query = array(
  'number' => array('$gt'=>1000,'$lt'=>1005),
);
echo $users->find($query)->count();

This returns a count of 1,000 in 26ms.

MyISAM with Doctrine ORM:

$users = Doctrine_Query::create()
   ->from('User u')
   ->where('u.onumber between 1001 and 1004');
echo $users->count();

This returns a count of 1,000 in 32ms.

InnoDB with Doctrine ORM:

$users = Doctrine_Query::create()
   ->from('User u')
   ->where('u.onumber between 1001 and 1004');
echo $users->count();

This returns a count of 1,000 in 31ms.

Indexed range query (large set)
Mongo:

$query = array(
  'number' => array('$gt'=>1000,'$lt'=>9005),
);
echo $users->find($query)->count();

This returns a count of 2,001,000 in 2.5 seconds.

MyISAM with Doctrine ORM:

$users = Doctrine_Query::create()
   ->from('User u')
   ->where('u.onumber between 1001 and 9004');
echo $users->count();

This returns a count of 2,001,000 in 1.5 seconds.

InnoDB with Doctrine ORM:

$users = Doctrine_Query::create()
   ->from('User u')
   ->where('u.onumber between 1001 and 9004');
echo $users->count();

This returns a count of 2,001,000 in 760ms.

Indexed range query, ordered, limit 10, with output
Mongo:

$query = array(
  'number' => array('$gt'=>1000,'$lt'=>9005),
);
$sort = array(
  'number' => -1,
);
foreach ($users->find($query)->sort($sort)->limit(10) as $user)
{
  echo $user['_id'],':',$user['number'],'
';
}

This returns in 26ms.

MyISAM with Doctrine ORM:

$users = Doctrine_Query::create()
   ->from('User u')
   ->where('u.onumber between 1001 and 9004')
   ->orderBy('u.onumber desc')
   ->limit(10);
foreach ($users->execute() as $user)
{
  echo $user->getId(),':',$user->getOnumber(),'
';
}

This returns in 35ms.

InnoDB with Doctrine ORM:

$users = Doctrine_Query::create()
   ->from('User u')
   ->where('u.onumber between 1001 and 9004')
   ->orderBy('u.onumber desc')
   ->limit(10);
foreach ($users->execute() as $user)
{
  echo $user->getId(),':',$user->getOnumber(),'
';
}

This returns in 34ms.

Query MongoDB MyISAM InnoDB
Total Record Count 25ms 30ms 916ms
Unindexed query 950ms 870ms 1300ms
Indexed query 25ms 30ms 30ms
Indexed range query 26ms 32ms 31ms
Indexed range query (large set) 2.5s 1.5s 760ms
Indexed range query, ordered, limit 10, with output 26ms 35ms 34ms

Well we’ve got a lot of interesting numbers here. Seems as though MongoDB has a little trouble with large set range searches, and unindexed queries, but kicks butt in the other tests. And InnoDB seems to handle the large ranged queries the best with a modest showing in the rest, except for total record count.. Wth happened there? And MyISAM seems to be a pretty speedy running mate for everything, with an edge on the unindexed queries.

Real world value of these tests? I would have liked to run even more extensive queries here, but wanted to make sure I saved time for the next section which covers relations. But a decent real world example would be that last test. And MongoDB seems to have almost a 10ms edge on MySQL when it comes to this.

I realize that these results are not accounting for the overhead of using Doctrine. And I’m ok with that because I’m interested in real development environment stuff, not so much what is the bare bones potential. And so far MongoDB is doing well, although, the features of Doctrine would largely be missing from a MongoDB setup until we get a good framework that can sit on top of a MongoDB (which might even out the scores in many of these queries).

In the next section we’ll try to see how we can use MongoDB to persist relations between objects using their MongoDBRef class and a little forethought.. :)

System Setup
Some VPS setup with 1and1.com with a gig of dedicated ram
Ubuntu 8.04.1 64bit Hardy
Symfony 1.2.9 dev updated repository
MongoDB Linux 64 0.9.9 (mongodb-linux-x86_64-0.9.9.tgz)
Doctrine 1.0.10 (bundled with Symfony)
Apache 2.2.8
PHP 5.3.0 with mysqlnd
MySQL 5.4 with the my.cnf for medium systems
PECL MongoDB package 0.9.4

MongoDB and Symfony. Yes? :) Part 1: Inserts

Sunday, August 23rd, 2009

MongoDB

What is it? That was my first thought. A quick review of their site gave me a taste of something that sounded good. I’ve been working with Symfony and Doctrine for almost a year now and really loving it. Started writing a custom orm for my current employer and found that it really brought to light some issues with relational modeling. For instance, free-form relations between objects. In a RDBMS you’ve got to setup your long table with key/val pairs that reference other tables and ids and then there’s the whole issue of efficiently joining on that. I found that trying to translate that structure into an orm style development tool was challenging to say the least. I wondered if a DODB would offer a better way. Ideally the speed of development AND the speed of the data manipulation would increase. MongoDB has a nice little DBRef class that allows you to reference your way around your collections, effectively hardcoding your joins at the data level. Would it be a nice way to handle creating relationships between objects without premeditation?

Setup

Ok so I wanted to set up some simple benchmarks just to see what we were looking at as far as ease of use and speed for basic dumb things. I setup a basic Symfony/Doctrine project and pulled down the latest version of MongoDB for Linux 64.

I started my mongod server without the web interface since I won’t need it for this:

mongod --dbpath=/var/www/sites/mongobjects/web/data/mongodb/ --nohttpinterface --quiet &
Sun Aug 23 04:09:56 Mongo DB : starting : pid = 11534 port = 27017 dbpath = /var/www/sites/mongobjects/web/data/mongodb/ master = 0 slave = 0  64-bit
Sun Aug 23 04:09:56 db version v0.9.9, pdfile version 4.4
Sun Aug 23 04:09:56 git version: cd0bf23e74ef8ded0684f74996ee762889b93294
Sun Aug 23 04:09:56 sys info: Linux ofc-n1.10gen.com 2.6.23.17-88.fc7 #1 SMP Thu May 15 00:02:29 EDT 2008 x86_64
Sun Aug 23 04:09:56 waiting for connections on port 27017

And we’re up and running.

Filling our Databases

These tests will be run via a page load in Symfony. So all the code takes place within an actions class method that is triggered by a url and tracked via symfony’s developer bar. A simple benchmarking tool that produces a little fluctuation in the numbers, but fine for what we’re doing here. It’s worth noting that on this system Symfony takes ~24ms to render a page without anything else happening. So all our tests will have an extra ~24ms from Symfony execution time.

Mongo

I wanted a simple single table test setup so I decided to build a user set of 500,000 users. But then I realized toward the end of this article that I my 5 types of pets actually made all my tests insert 2.5 million records. So that’s what we’re using. Loading data with MongoDB was simple and quick. Just select the db and collection you want and if it’s not already in existence it will be when you start writing to it.

$mongo = new Mongo;
$users = $mongo->selectDb('objects')->selectCollection('users');
$animals = array('cat','dog','sugar glider','pigeon','dove');

for ($j=0; $j < 50; $j++)
{
  for ($i=0; $i < 10000; $i++)
  {
    foreach ($animals as $animal)
    {
      $users->save(array(
        'number' => $i,
        'pet' => $animal,
        'name' => 'Jason Mooberry',
        'email' => 'jason@barkingsnake.com',
      ));
    }
  }
}

Mongo ran through this like it was bored. Occasionally it had to create a new datafile but this was only apparent in the log.

Sun Aug 23 04:23:22 newExtent: objects.users file 0 full, adding a new file
Sun Aug 23 04:23:22 allocating new datafile /var/www/sites/mongobjects/web/data/mongodb/objects.1, filling with zeroes...
Sun Aug 23 04:23:22 done allocating datafile /var/www/sites/mongobjects/web/data/mongodb/objects.1, size: 134217728, took 0.243 secs

MongoDB uses preallocated data files to store it BSON data. Data files are created in increasingly larger sizes to handle fragmentation more efficiently.

The total insert time on 2.5 million records was 55 seconds. Let’s see how MySQL holds up.

MySQL

OK let’s see what happens when we create a simple table in MySQL and load the same 2.5 million user set. I created a simple table in Navicat:

CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`onumber` int(11) DEFAULT NULL,
`pet` varchar(255) DEFAULT NULL,
`email` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

I chose InnoDB since MySQL 5.4 is boasting some big improvements in the InnoDB engine. I realize much of that is about large scale systems with complex joins, but let’s just see how it does. :)

Running an insert set of this size is an impossibility with this version of Doctrine. It quickly runs out of memory. I think this might be solved by a better garbage collection method in Doctrine 2.0 or perhaps even 1.1, but not in 1.0. So we do it the old fashioned way:

$db = mysql_connect('localhost', 'mongobjects_user', 'm0ng0');
mysql_select_db('mongobjects',$db);
$animals = array('cat','dog','sugar glider','pigeon','dove');
for ($j=0; $j < 50; $j++)
{
  for ($i=0; $i < 10000; $i++)
  {
    foreach ($animals as $animal)
    {
      $qs = "insert into `user` (`onumber`,`pet`,`name`,`email`) values ($i,'$animal','Jason Mooberry','jason@barkingsnake.com')";
      mysql_query($qs);
    }
  }
}

OK so I hit the url and wait… tick tock.. tick tock.. TICK TOCKKKKK… After several minutes I peek at the insert log and see that MySQL is still diligently inserting rows. So I continue to wait and wonder what my max execution time is set at..

After having a conversation with my girlfriend about stuff, MySQL finally crosses the finish line in 18.23 minutes… I realize I have the query log on and that is a bit of a hog, so let’s rerun it with the log turned off. bin-log appears to be on as well and since we’re not replicating here, I’ll turn that off too.

I restart MySQL, truncate the table, and try again.

While I wait I’d like to pontificate on the merits of a fast file system. ReiserFS is the default system on my vps but my recent experiences with XFS have led me to believe it to be a better option… [10 mins go by] … and so that is why I think Hans Reiser is a jerk! OH um I think we’ve finished.

MySQL without the logs comes in at 9.96 minutes for inserting 2.5 million records. That’s a big difference. Maybe it’s worth checking the speed with MyISAM. I change the table to MyISAM, truncate and repeat.

Ok MyISAM puts MySQL back in the running with an insert of 2.5 million records in 2.58 minutes. I guess it’s probably a better comparison with MyISAM since MongoDB isn’t transactional and we’re not dealing with foreign key constraints. Still I was hoping for a much stronger showing from InnoDB..

EDIT: After rummaging through this article today about auto-commit transaction handling in InnoDB and PHP, I decided to give InnoDB another chance to prove it’s worth here. So I reran the insert tests inside a single transaction:

$animals = array('cat','dog','sugar glider','pigeon','dove');
$db = mysql_connect('localhost', 'mongobjects_user', 'm0ng0');
mysql_select_db('mongobjects',$db);
mysql_query('START TRANSACTION',$db);
for ($j=0; $j < 50; $j++)
{
  for ($i=0; $i < 10000; $i++)
  {
    foreach ($animals as $animal)
    {
      $qs = "insert into `user` (`onumber`,`pet`,`name`,`email`) values ($i,'$animal','Jason Mooberry','jason@barkingsnake.com')";
      mysql_query($qs,$db);
    }
  }
}
mysql_query('COMMIT',$db);

And InnoDB is back in this thing… sort of. 2.5 million rows inserted inside a single transaction: 2.45 minutes. I also reran this with the Index specified below and it ran in 3.28 minutes. Our use case may not actually involve this type of scenario so it’s of mixed value.

Indexes

Our insert tests were run without an index to slow it down. The inserted records were done in 50 batches of 10,000 batches of 5 records with a number field storing the record number within it’s batch. The end result is that the number field will match in 250 records for each number. So a nice ranged query on that column will be a good way to test our search speed. So lets add an index to our number column and rerun our inserts.

Mongo

Adding an index to a collection in MongoDB consists of a php command:

$mongo = new Mongo;
$users = $mongo->selectDb('objects')->selectCollection('users')->ensureIndex(array('number'=>1));

Seeting the key ‘number’ to 1 creates an ascending index.

I’ve gone ahead and rm’d all the mongo db files so that we get an accurate test.

$mongo = new Mongo;
$users = $mongo->selectDb('objects')->selectCollection('users');
$users->ensureIndex(array('number'=>1));
$animals = array('cat','dog','sugar glider','pigeon','dove');
for ($j=0; $j < 50; $j++)
{
  for ($i=0; $i < 10000; $i++)
  {
    foreach ($animals as $animal)
    {
      $users->save(array(
        'number' => $i,
        'pet' => $animal,
        'name' => 'Jason Mooberry',
        'email' => 'jason@barkingsnake.com',
      ));
    }
  }
}

With the index on the number column Mongo inserts 2.5 million records in 1.32 minutes. Not bad I guess. Let’s see how MySQL does.

MySQL (MyISAM)

I’m sticking with MyISAM for the rest of these tests since I don’t care to wait all day for my tests to run and the feature sets match a little better between MongoDB and MyISAM. So I’m just gonna truncate the table and add an index and rerun.

CREATE TABLE `user` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
 `onumber` int(11) DEFAULT NULL,
 `pet` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
 `email` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `user_number_idx` (`onumber`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

PHP did not change for this test.

MySQL handles the insert with roughly the same increase at 3.02 minutes.

Insert Test Breakdown

Here’s the breakdown for our insert tests with the various flavors of MySQL and MongoDB. It’s a pretty impressive difference on massive inserts.

Type Time
MongoDb without index 55 seconds
MongoDB with single index 1.32 minutes
MySQL InnoDB with logging 18.23 minutes
MySQL InnoDB 9.96 minutes
MySQL InnoDB single transaction 2.45 minutes
MySQL InnoDB single transaction, with single index 3.28 minutes
MySQL MyISAM without index 2.58 minutes
MySQL MyISAM with single index 3.02 minutes

The winner is clear here on inserts. But let’s see how MongoDB holds up with Simple Queries, and Relations…

System Setup

  • Some VPS setup with 1and1.com with a gig of dedicated ram
  • Ubuntu 8.04.1 64bit Hardy
  • Symfony 1.2.9 dev updated repository
  • MongoDB Linux 64 0.9.9 (mongodb-linux-x86_64-0.9.9.tgz)
  • Doctrine 1.0.10 (bundled with Symfony)
  • Apache 2.2.8
  • PHP 5.3.0 with mysqlnd
  • MySQL 5.4 with the my.cnf for medium systems
  • PECL MongoDB package 0.9.4