More on constants in perl

So, the thing that always happens happened. When you post something about which modules you use, or how the behave, perl folks come out of the woodwork to tell you all about how you got it wrong some new module you should be using, or how awesome some other module is.

Mind you, this is one of the greatest things about the perl community in general. Y’all sure are a vocal bunch. So, in particular I was told:

  1. I don’t understand constant folding. Which, whether true or not, isn’t really relevant for my original point, which was “golly this behavior is unexpected and surprising”
  2. I should be using Const::Fast, because it is better than Readonly.

So, here’s some code I wrote up:

It basically has 3 subs, one for constant, one for Readonly, and one for Const::Fast. It Dumper’s and Deparses the code (so we can see what awesome constant folding is happening (and it is awesome.)) It then uses the lovely Dumbbench module to benchmark them and see the actual difference. Here are the results:

So, I can’t really see a massive performance difference or compelling difference. Mind you, there *is* a performance difference, but like, we’re talking microseconds here right? I understand TIMTOWTDI. I ❤ that about perl. This time, I simply don’t get the benefits.

UPDATE: Holy crap, some nice commenter fellow sent me a link to a page in which someone compared 21 ways of doing constants in perl on CPAN. 21. There are 21 different solutions on how to do constants on CPAN. What a grim, meathook future.

Advertisements
Posted in tech | Tagged , , | Leave a comment

Homebrew Niceties

So, I’ve been using homebrew on my mac for quite a while now. So long that it had gotten into a state of disrepair. How do I know? It was nice enough to tell me.

~/play/random_shit $ brew doctor
Warning: Your Xcode (4.5.2) is outdated
Please install Xcode 4.6.
Warning: /Library/Frameworks/Mono.framework detected
This can be picked up by CMake's build system and likely cause the build to
fail. You may need to move this file out of the way to compile CMake.
Warning: You have a non-Homebrew 'pkg-config' in your PATH:
 /usr/bin/pkg-config => /Library/Frameworks/Mono.framework/Versions/2.10.11/bin/pkg-config
This was most likely created by the Mono installer. `./configure` may
have problems finding brew-installed packages using this other pkg-config.
Mono no longer installs this file as of 3.0.4. You should
`sudo rm /usr/bin/pkg-config` and upgrade to the latest version of Mono.

Wow! Look at all that text. This isn’t just a bunch of computer nerd nonsense. It’s straight up telling me what is going to break and how. It was smart enough to call out Xcode (and let me know that a new version was available.)  That by itself is pretty impressive.  However, the next set of error messages are what blew me away.

I have just recently started playing around with mono.  I installed mono as per the instructions on the homepage, and thought nothing of it. It turns out, it does things that are going to conflict with my homebrew installation.  Seriously, check out the text of that error message. It tells me: “hey, there is a problem with X, this was most likely installed by mono. It will fail when you are trying to do Y.”

Ok, impressive. But you know what really impressed me?

Mono no longer installs this file as of 3.0.4. You should
`sudo rm /usr/bin/pkg-config` and upgrade to the latest version of Mono.

Homebrew isn’t just diagnosing itself. Someone made homebrew smart enough to tell me that a newer version of Mono will no longer cause this problem. *This* is why I ended up settling on homebrew over macports (and it’s possible macports handles this equally well), because it makes the process of being a developer on an OSX box *pleasant.*

Rather than sitting around and trying to figure out if maybe a new version of Mono would work, homebrew kindly and lovingly tells me that a new version will fix it, won’t conflict, and I will be able to get back to my actual life in which I do stuff other than sysadmin my toaster.

Thanks homebrew 🙂

Posted in tech | Tagged , , | Leave a comment

use constant will surprise you with its evil.

Let’s say you have some perl code that seems nice and normal:

It defines a constant, and then you use it. It should work fine, right? No. In perl, use constant just makes a sub for that word that when evaluated will return the value, it’s not a “real constant” in a way which other languages have constants. This will fail you whenever you are trying to interpolate or in a hash or basically anywhere.

I genuinely can’t imagine a scenario in which ‘use constant’ would be useful to me.

UPDATE: I’d like to add that the way I’d rather deal with this problem is:

Posted in tech | Tagged | 22 Comments

Playing with Go

So I’m doing a little bit of playing around with Go. Currently I’m reimplementing the gnu coreutils in it. So far, so good. I’ve run into some interesting problems, but overall I’m quite pleased. If you want to read some real basic non parallel non interesting Go code, check out my github repo.

I may take a break from them soon, the guys running the golang meetup in LA challenged me to reimplement netcat in go, and that sounds like a good time.

Posted in tech | Tagged , , , , | Leave a comment

“Perl is too low level” and wackyness I’ve heard.

So, I’m working on a little side project. This side project is a website / mobile website that has to care about time. So, a human being wants to set a time and a place, for example, “The movie theatre at address X at 9:00 pm”.

Well, the fact of the matter is that 9:00 pm is a lie. There is no such thing as 9:00 pm. There is 9:00 pm in a certain time zone on a certain date. So, I was at a hackaton, worrying about how to implement this properly. You see, it’s possible that a person would have multiple locations within a block. It’s possible even that within a block, one of those location/time tuples may be in a different time zone than the next location/time tuple. It’s also possible that the person *entering* the time is themselves in a different time zone than the person who is supposed to fulfill the location/time tuples intent. It’s also possible that the server who fulfills the location/time tuple action is in a totally different timezone.  And it’s further possible that the very next location/time tuple action is handled by a totally different server in a totally different timezone.

Real software problems. Hard software problems. Time is a challenging abstraction, and handling time properly is hard. Handling time properly with an understanding of timezone spanning and 3rd party oberservers of time, is even harder.  So, I reach for CPAN and start assembling the modules I need to do things properly.

I bust out Time::Piece, DateTimeDBIx::Class::InflateColumn::DateTime, Date::Parse, and one or two other ones I’m thinking about. I step outside to talk clear my mind, and start talking to another hackathon participant. He asks me, “so what are you coding in?” and I tell him Perl. He mentions that I must be very old, it hurts my feelings. I invest about 45 seconds to let him know that Perl has experienced one heck of a renaissance! But they weren’t even listening, and I wasn’t in the mood to waste my breath. He proceeds to tell me that I’m wasting my time, and that I should really be programming in Ruby.

“You see, there’s this thing called gems, man. And everyone is putting their code in these gems, and it’s incredible. You don’t have to do any of that shit any more. Perl is too low level, I can’t solve any real problems in it.” He said. Young. So young. More importantly, clearly someone who hasn’t actually developed software that has to run 24/7. Software that if it fails, someone may lose an arm. The kind of software that powers multi-million dollar enterprises. Not to say that people using his stack *couldn’t* do so… but this young kid clearly hadn’t done it.

So I ask him: “ok, so I have to track multiple sets of time/location tuples in a way which is completely timezone agnostic, messaging needs to happen to people in those timezones in real time, an external person in a different timezone may require completely different messages at different times but calculated off of offsets of of their timezones, it’s possible that these sets themselves span multiple timezones, and it’s further possible that the servers are themselves in multiple different timezones. How would you solve it?”

I expected him to tell me an answer that made sense, something like: “don’t store timezones, convert everything to UTC and make your tuple consumer logic do timezone transformations in real time” or something even more clever than I could have come up with. I waited, and took a deep breath of the crisp fresh air.

He replied: “Perl is just too low level. I’d find a ruby gem that handled that.”

Thanks kid, I appreciate it.

UPDATE: If there is a ruby gem that handles this, I’d love to hear about it.

Posted in tech | Tagged , , , , , , , | 5 Comments

On re-evaluating old hammers. XML::Simple is crazy slow.

Everyone uses XML. Whether you want to or not, whether you think it’s a good thing or a bad thing. The fact of the matter is that somehow, somewhere, you are subjected to XML. You are subjected to it as a configuration file format, or you are subjected to it for data interchange, or lord knows, something worse. Maybe you’re doing something insane like updating an XML document in real time to treat it like a database. The fact of the matter is that XML is everywhere, and in 2012, we take it for granted.

At my day job, we deal with lots and lots of transactions, and lots and lots of data. Most of this data is binary blobs (PDFs), but often times this data comes pre-packaged with some sort of “configuration file.” Something that we take for granted. After all, it’s 2012. Most of our real hardware is running on SSDs, how expensive can it really be to parse a little bit of XML?

So, this blog post isn’t really about what we do every day at my job. At my job, we discovered at our old trusty XML “parsing” module, XML::Simple was slow. That probably doesn’t come as a surprise to any of you seasoned perl developers out there. But what surprised me, was how slow it was. XML::Simple is a pretty useful little module, it allows you to take an XML document and turn it into a perl data structure with very minimal effort. However, this comes at a larger than expected cost. I went out to the XMark project website and grabbed their “ready made document” monster 116 megabyte XML file. I wrote a simple perl script to parse this file:

and ran it on my nice relatively modern computer with an SSD. And I waited, and I waited. I’m not a particularly patient person when it comes to these things, so after about 2 minutes of waiting, I hit Ctrl-C. This test was no good, I decided, clearly the problem was that I was taking much too long in simply *reading* the file off the disk (even though, again, I have an SSD. It was late, what can I say.) So I wrote the following bit of code using File::Slurp:

And ran it. I was certain I was going to find that reading this file was taking longer than I expected only I was immediately greeted with:

{11:23:40} (Eduardos-Mac-Pro) ☹ [ 1 ]
<~/play/xml_vs_json> $ perl scripts/xmlsimple.pl 
Slurp: 0.345096111297607

and a whole bunch of waiting. So, it was only taking 1/3 of a second to read 100 megabytes off of disk. I’d have to congratulate Uri on a fast little module. And so I let my mind wonder and thought, well clearly there has to be a faster way to parse XML from within perl. This led me to a perlmonks thread titled “Fastest XML Parser” which seemed promising. The collective wisdom of the monks agreed that the fastest module was clearly XML::LIbXML. This made good sense as this module was perl bindings into the venerable libxml2 library. It didn’t do exactly what I wanted, instead of turning an XML document into a perl data structure, it gave me DOM to play with, but maybe that was the best I could expect. Eventually, the XML::Simple solution finished, by the way:

{10:56:26} (Eduardos-Mac-Pro) ☺
<~/play/xml_vs_json> $ perl scripts/xmlin.pl 
Slurp: 0.346162080764771
Parse: 124.664380073547

124 seconds. Nearly two minutes. That was clearly not going to fly. So, I went off and created a solution with XML::LibXML.  The code was, basically, identical for testing purposes:

Again, it didn’t do exactly what I wanted, but sometimes it takes a tough man to cook a tender chicken. The runtimes were considerably nicer too:

{11:31:40} (Eduardos-Mac-Pro) ☹ [ 1 ]
<~/play/xml_vs_json/scripts> $ perl xml-libxml.pl 
Slurp: 0.37142014503479
Parse: 3.30488514900208

So I want that to sink in. Switching XML parsing modules, from XML::Simple to XML::LibXML gave me a 3700% performance increase. From 124 seconds to 3.3 seconds to parse. This was clearly valuable, but it wasn’t a fair comparison. After a little bit of digging I discovered that, as it is so often the case in perl, I was not the first person to want a faster version of XML::Simple. Some kind soul had invested their time and effort and had provided CPAN with XML::LibXML::Simple. A “re-implementation” of XML::Simple using libxml2. It doesn’t have all the features of XML::Simple, for example, it doesn’t provide an XMLout, just an XMLin, but for the problem at hand, it may have been the exact thing I wanted. So, I wrote the following snippet:

which is basically the first slurp based gist with the XML parsing modules swapped out, ran it, and hoped for the best. The best, however was not particularly great. File::Slurp was still as fast as ever, I had greatly underestimated the complexity of taking a DOM document and turning it into a perl structure however:

{11:35:46} (Eduardos-Mac-Pro) ☺
<~/play/xml_vs_json/scripts> $ perl xml-libxml-simple.pl 
Slurp: 0.362099885940552
Parse: 60.4428260326385

How were we back here? Mind you, simply changing from XML::Simple to XML::LibXML::Simple had still doubled the performance, but unmarashalling the XML into perl objects had once again made it slow. Much slower. Nearly 2000% slower than simply retrieving the DOM. I began to get sad. Then I began to get a crazy idea… Why does it have to be XML at all? Sure, sometimes we get stuck with XML, but often times I just *chose* it because it is convenient and because I know that XML is a nice text format for me to marshall a data structure for IPC or persistance or god knows what. Was I always doomed to take this kind of hit coming in and out of perl hashes? So, I wrote the following bit of code:

This simple little bit of code, simply takes the XML data and turns it into JSON data, and writes it out. Mind you, the irony of the fact that I reached for XML::Simple even though I had already learned it was slow is not lost on me. So entrenched was my thought when I wrote this code, that I had not yet integrated the fact that there was a faster solution. With this code, I took my 116 megabyte XML data, and turned it into a 99 megabyte JSON file. The contents are “similar.” I won’t call them identical, but for *my* purposes they are. And then I wrote some code to read this file, and unmarshall it into perl data structures:

Such a silly little change, right? From XMLin to decode_json. It was still doing the same basic work, reading a file off of disk into a scalar, and then turning that scalar into a deep nested perl data structure. The performance surprised me:

{11:44:39} (Eduardos-Mac-Pro) ☺
<~/play/xml_vs_json/scripts> $ perl json.pl 
Slurp: 0.315643787384033
Parse: 1.24264287948608

Preposterous. 2.5x faster than the XML::LibXML solution. Ridiculously faster. Preposterously faster. Annoyingly faster. All those clock cycles that I had wasted. All of those clock cycles in my code, when I was writing out a job file as an XML file, only to turn around in another process and XMLin it. XML had never failed me, and it had always seemed fast enough… but now, dealing with billions of transactions and petabytes of data… now it mattered.

I know JSON is not XML. I knew JSON was fast and lightweight and easy. I use JSON. I simply didn’t realize that for this use case, a use case I believe is probably quite common, the difference was so drastic. If you’re going to write out a job file, if you just need to marshall state, give JSON a try. In my use case, with zero added lines of code, it was 10,000% faster. That’s a lot less clock cycles on the cloud you have to pay for.

Posted in tech | Tagged , , , , , , | 4 Comments

CHI::Driver::FastMmap and Parallel::ForkManager make for happy babies on multicore boxes

This is an issue I’ve dealt with at work, and today I must admit I am finally happy with a solution. Let’s say that you have the following pattern:

To summarize, you have a list of work (either from a database query, a CSV file, the ether, lord only knows.) This list of work is used to fetch data from back end systems (web services, database queries, whatever), stuff happens to that data that is retrieved, and then it’s pushed on to some endpoint (an FTP server, another web service, a DB, whatever.) This is all fine and dandy, except that it’s single threaded, and the fact is that you need more performance. So, you’re lucky enough to have a multicore multiprocessor server, and you bust out the awesome Parallel::ForkManager. And your code might look like this:

This is a pretty good approach… You’re parallelizing immediately after every single piece of work is doled out, and you’re going to spread the load out among all your cores. However, you’re now potentially causing difficulty to your back end. For every $unit_of_work x number_of_endpoints, you are hitting the back end to retrieve_value_from stuff. If this is, for example, an oracle database that is already loaded down, this isn’t going to make you any friends. Especially if you happen to be running with $PROCESSES being some large number.  So, your next approach is going to be something along the lines of:

You’re now only hitting the retrieve_value_from_stuff once per $unit_of_work. So this is great! Except that now your entire process is bottlenecked on retrieve_value_from_stuff. That’s not good. We want to decouple this as a bottleneck.

So you can’t actually top factor. Sad face. What would really be nice, would be if you could cache the return value of retrieve_value_from_stuff. You can’t just stick it in a global hash, you can’t just keep it in a scalar since these processes are forked. What to do, what to do…

Enter CHI, and more specifically CHI::Driver::FastMmap! From the docs:

This cache driver uses Cache::FastMmap to store data in an mmap’ed file. It is very fast, and can be used to share data between processes on a single host, though not between hosts.

So, in this case, we change our code to look like this:

We now have full value of our parallelization code (through P::FM), we are no longer bottlenecked on the retrieve_value_from_stuff, the value retrieved is shared across all the processes using this cache, and fundamentally all it required was the addition of 6 lines of code. 4 of them for instantiating the cache object, one of them a closing curly brace, and one of them to simply use the cache.

This is a pretty powerful technique that I have seen speed up my code by a factor of 100x with minimal effort. Mind you, my sample code was oversimplified, but hopefully it gives you some sharp fun ideas.

Posted in tech | Tagged , , , , , | Leave a comment

Being a github polyglot

One of the great presentations at YAPCNA 2012 was Miyagawa’s Becoming a Polyglot. One of the ideas that it brought up was the idea that it was OK to play around in other languages, contribute to said languages, and see what idea you could steal from them or contribute to them. So, it’s been a pleasure to start trying to contribute more to perl projects on github. I think I’ve even found a niche of projects I like to contribute to (GitHub commandline wrappers.)  So, imagine my joy when I saw that today, a new CLI for github had been released.

Octogit is a CLI for github. It’s pretty new and it lets you do lots of neat basic things. It’s written in Python, a language I am absolutely not an expert in, but I can figure it out. So, I checkout out the github repository, clicked on issues, and found one I thought I could attempt. The one that struck my eye was Add command line gist support The fact is that this was a good issue for me to attempt for a number of reasons:

  1. It was similar to existing functionality in the project (listing issues, for example.)
  2. It had a nicely documented REST api (like all of github, really.)
  3. The complexity was low.

So, I forked the repo, and went to town.  The first thing I realized was that I simply didn’t even know how a python project is really structured (that uses setup.py.)  No matter, I simply edited the code, and deployed it to my box using ‘python setup.py install’. I’m sure there is a better way to do this, but I figured that this was going to be the fastest way for me not to get trapped down a rabbit hole of learning, and just solve a problem.

So, I started to figure it out. The cli.py file was the command line parsing, it imported the functions it used at the very top from core.py, and I just needed to add my handling in the appropriate places. Fortunately my instincts were right:

  1. Adding additional command line parsing was templated code from existing command line parsing
  2. Adding additional web service calls was cut/paste
  3. Displaying the additional content was cut/paste

The entire process took at most 30 minutes. It cracks me up however, that in such a short period of time, I learned so very much. I was trying to solve a real actual problem, and I was able to move quickly to do so. The things that I relearned and experienced were:

  1. How indenting works in python
  2. How string interpolation works in python
  3. How throwing / managing exceptions works in python
  4. How the requests library works (it’s so very nice)
  5. How the clint.textui library works
    1. which is awesome
    2. seriously, columns and colored by themselves are amazing
  6. How to open up a browser on my client machine from python

There was so much to learn! And it was awesomely fun.  So, I commited my code and I sent a pull request. I don’t know that it’ll get accepted, it’s possible my code is garbage and I don’t know… But I can now honestly say I have at least attempted to contribute to an open source python project!  Github polyglot, here I come!

Posted in tech | Tagged , , , , | Leave a comment

Unexpected awfulness.

I’m in the middle of “moving” some code from server A to server B. This has to happen, because server A is old, has one power supply, isn’t being really being backed up, and (believe it or not) a number of other reasons.  Since this code is from before “the great perl re-awakening” it doesn’t use perlbrew, it doesn’t use local::lib, it doesn’t use anything resembling a modern perl technique.  It’s a script with a bunch of modules that it uses in a directory and I hope to god it works for you.

So, I started porting it to a slightly more modern method.  Nothing crazy, simply using perlbrew and installing the dependencies in a directory tree so the thing could be deployed intelligently without having to much about with the box it ran on.  I always forget hor much everything is going to fail, and I sure did in this case.  One of the modules that my codebase uses is POE::Component::Jabber.  This is an old module, but not ancient (it’s been updated in 2009, which isn’t *that* ancient.)  So, I go to install it when:

$ cpanm POE::Component::Jabber
–> Working on POE::Component::Jabber
Fetching http://lvmirrors.lightningsource.com/CPAN/authors/id/N/NP/NPEREZ/POE-Component-Jabber-3.00.tar.gz … OK
Configuring POE-Component-Jabber-3.00 … OK
==> Found dependencies: POE::Component::PubSub
! Finding POE::Component::PubSub on mirror http://lvmirrors.lightningsource.com/CPAN failed.
! Couldn’t find module or a distribution POE::Component::PubSub
Installing the following dependencies failed:
==> POE::Component::PubSub

Huh, what’s that even mean? Couldn’t finda module or distribution for that?  Weird… I’m sure I can find it on CPAN, right?

Image

Oh… I guess I can’t. Ok, so what does that even mean?  I do some googling when I find it’s available on backpan.  Oh, that’s good. Sadly, this means that I can’t use our automated method of installation with cpanm, and I can’t in any kind of reasonable faith work under the assumption that it’s going to work in a reliable fashion. After all, it was taken off of CPAN for a reason.

So, my little project of moving something from box A to box B just because a larger project of porting from POE::Component::Jabber to Net::Jabber and rearchitecting the way the thing works.  No good deed goes unpunished, I reckon’

Posted in tech | Tagged , , , , | 1 Comment

I always forget how much everything is going to fail.

I don’t know why it is that way, it just seems to be. I always have this idea that I’m going to sit down and write some code and the hard part is going to be solving the problem and coming up with an elegant algorithm. 20+ years of being a professional (kind-of) programmer and that hasn’t been the problem yet. i don’t believe a single time that i’ve sat down to try to write some code and accomplish something technologically, that the challenge has been the code or the algorithm. It’s always the toolchain.

I love my profession, mind you. I enjoy the problem solving process, and I enjoy deploying code and having people use it. But my god, the toolchain. In my day job, my toolchain has gotten so much better than it was, even just a few years ago. Where I had to deal with CVS and tar files, I now have git and puppet. Where I had to deal with CPAN I now have CPANM.  Where I had to deal with perl 5.8 I now have perlbrew. My entire life situation has gotten better in nearly every way professionally, and the reborn perl community should really be proud of themselves for the changes they’ve enacted in the recent years.

In this little world that I’ve carved myself, I forget that the entire world isn’t CPANM and SSDs… it’s gross out there.  I went to a cool presentation today by @cfjedimaster, hosted by the Interactive Developers of Nashville group,  and I was really excited to go home and try out some http://build.phonegap.com, i mean cordova, action. Because HTML5 + CSS + JS is really not an awful toolchain.  And quite frankly, I had seen what he had to do to build and app, and I was pretty sure that the phonegap stuff was just going to work.  I forgot, however, that I needed to get a new certificate since I had recently replaced my iphone. Oh, how simple minded I was:

Image

Touche Apple, Touche. I wasn’t expecting an unknown error that kicked me completely out of the process in a completely unrepairable fashion. You’ve succeeded in making me feel wholly unempowered, that takes skill.

Posted in tech | Tagged , , , , , | Leave a comment