CHI::Driver::FastMmap and Parallel::ForkManager make for happy babies on multicore boxes

This is an issue I’ve dealt with at work, and today I must admit I am finally happy with a solution. Let’s say that you have the following pattern:

To summarize, you have a list of work (either from a database query, a CSV file, the ether, lord only knows.) This list of work is used to fetch data from back end systems (web services, database queries, whatever), stuff happens to that data that is retrieved, and then it’s pushed on to some endpoint (an FTP server, another web service, a DB, whatever.) This is all fine and dandy, except that it’s single threaded, and the fact is that you need more performance. So, you’re lucky enough to have a multicore multiprocessor server, and you bust out the awesome Parallel::ForkManager. And your code might look like this:

This is a pretty good approach… You’re parallelizing immediately after every single piece of work is doled out, and you’re going to spread the load out among all your cores. However, you’re now potentially causing difficulty to your back end. For every $unit_of_work x number_of_endpoints, you are hitting the back end to retrieve_value_from stuff. If this is, for example, an oracle database that is already loaded down, this isn’t going to make you any friends. Especially if you happen to be running with $PROCESSES being some large number.  So, your next approach is going to be something along the lines of:

You’re now only hitting the retrieve_value_from_stuff once per $unit_of_work. So this is great! Except that now your entire process is bottlenecked on retrieve_value_from_stuff. That’s not good. We want to decouple this as a bottleneck.

So you can’t actually top factor. Sad face. What would really be nice, would be if you could cache the return value of retrieve_value_from_stuff. You can’t just stick it in a global hash, you can’t just keep it in a scalar since these processes are forked. What to do, what to do…

Enter CHI, and more specifically CHI::Driver::FastMmap! From the docs:

This cache driver uses Cache::FastMmap to store data in an mmap’ed file. It is very fast, and can be used to share data between processes on a single host, though not between hosts.

So, in this case, we change our code to look like this:

We now have full value of our parallelization code (through P::FM), we are no longer bottlenecked on the retrieve_value_from_stuff, the value retrieved is shared across all the processes using this cache, and fundamentally all it required was the addition of 6 lines of code. 4 of them for instantiating the cache object, one of them a closing curly brace, and one of them to simply use the cache.

This is a pretty powerful technique that I have seen speed up my code by a factor of 100x with minimal effort. Mind you, my sample code was oversimplified, but hopefully it gives you some sharp fun ideas.

Advertisements
This entry was posted in tech and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s