On re-evaluating old hammers. XML::Simple is crazy slow.

Not that I particularly like XML::SImple, but a fair amount of people don’t even bother to read the documentation. From XML::Simple::FAQ:

Why is XML::Simple so slow?
If you find that XML::Simple is very slow reading XML, the most likely
reason is that you have XML::SAX installed but no additional SAX parser
module. The XML::SAX distribution includes an XML parser written
entirely in Perl. This is very portable but not very fast. For better
performance install either XML::SAX::Expat or XML::LibXML.

Greetings! Thanks very much for your feedback. I do agree that most folks simply don’t read the FAQ, and I probably should have mentioned it, however in my case, both of those modules were already installed and the performance we saw was the performance we got. I did not however explicitly force a different XML parser using $XML::Simple::PREFERRED_PARSER, though that may very well be an interesting benchmark on it’s own. Again, thanks!

How long did it take to s/xml/json/ for your test? If I understand correctly, your actual project can just output JSON directly … but for simple xml, maybe we can do a deliciously evil regex to convert to JSON and then parse. XML::Simple::EvilJSONHack here we come…

As the author of XML::Simple, some thoughts …

1. I’m not even slightly surprised that XML::Simple is slow. Any ‘design’ that went into the module was aimed at optimising for programmer convenience and not speed.

2. I ran your program on my desktop machine with the linked file and it took 189 seconds when XML::Simple used the PurePerl parser and 34 seconds when it used the XML::SAX::ExpatXS parser. So either your machine is much slower than mine or your default setup is using the PurePerl parser.

3. The main reason XML::LibXML is so fast is precisely because it doesn’t build Perl data structures – the libxml2 library uses C code to assemble C data structures into a DOM tree. Crossing the boundary from C space to Perl space (and back) is expensive – a SAX parser will cross that boundary several times *per element* whereas XML::LibXML only has to cross that boundary several times *per document* (during the parse phase).

4. JSON parsers can be faster than XML parsers for one very simple reason – XML is *much* more complex to parse than JSON. I have worked with XML quite a lot and I have never grown to like it. JSON, on the other hand, I liked from day one.

5. When I have to parse XML (or in fact HTML), I use XML::LibXML. It really is awesome and once you understand XPath, it’s easier than XML::Simple and much more consistent (ref: PerlMonks Node ID 490846 ). Make the switch – you won’t regret it 🙂

Thanks for the write-up – anything that drives people to chose a better module (e.g. XML::LibXML) or a better data format (e.g.: JSON) has got to be good.

PS: Apparently your hateful blog software assumes I’m a spammer if I include a link.

	#!/usr/bin/env perl

	use warnings;
	use strict;
	use FindBin;

	use File::Slurp;
	use XML::Simple;
	use Time::HiRes;

	my $infile = "$FindBin::Bin/../data/standard.xml";
	my $start_time = Time::HiRes::time;
	my $data_in = XMLin($infile);
	print STDERR "Parse: ".(Time::HiRes::time – $start_time)."\n";

	#!/usr/bin/env perl

	use warnings;
	use strict;
	use FindBin;

	use File::Slurp;
	use XML::Simple;
	use Time::HiRes;

	my $infile = "$FindBin::Bin/../data/standard.xml";
	my $start_time = Time::HiRes::time;
	my $data = read_file $infile;
	print STDERR "Slurp: ".(Time::HiRes::time – $start_time)."\n";

	$start_time = Time::HiRes::time;

	my $data_in = XMLin($data);
	print STDERR "Parse: ".(Time::HiRes::time – $start_time)."\n";

	#!/usr/bin/env perl

	use warnings;
	use strict;
	use FindBin;

	use File::Slurp;
	use XML::LibXML;
	use Time::HiRes;

	my $infile = "$FindBin::Bin/../data/standard.xml";
	my $start_time = Time::HiRes::time;
	my $data = read_file $infile;
	print STDERR "Slurp: ".(Time::HiRes::time – $start_time)."\n";

	$start_time = Time::HiRes::time;

	my $dom = XML::LibXML->load_xml(string => $data);
	print STDERR "Parse: ".(Time::HiRes::time – $start_time)."\n";

	#!/usr/bin/env perl

	use warnings;
	use strict;
	use FindBin;

	use File::Slurp;
	use XML::LibXML::Simple qw(XMLin);
	use Time::HiRes;

	my $infile = "$FindBin::Bin/../data/standard.xml";
	my $start_time = Time::HiRes::time;
	my $data = read_file $infile;
	print STDERR "Slurp: ".(Time::HiRes::time – $start_time)."\n";

	$start_time = Time::HiRes::time;

	my $data_in = XMLin($data);
	print STDERR "Parse: ".(Time::HiRes::time – $start_time)."\n";

	#!/usr/bin/env perl

	use warnings;
	use strict;
	use FindBin;

	use File::Slurp;
	use XML::Simple;
	use JSON;
	use Time::HiRes;

	my $infile = "$FindBin::Bin/../data/standard.xml";
	my $start_time = Time::HiRes::time;
	my $data = read_file $infile;
	print STDERR "Slurp: ".(Time::HiRes::time – $start_time)."\n";

	$start_time = Time::HiRes::time;

	my $data_in = XMLin($data);
	print STDERR "Parse: ".(Time::HiRes::time – $start_time)."\n";

	$start_time = Time::HiRes::time;
	my $encoded_json = encode_json $data_in;
	print STDERR "JSON_Encode: ".(Time::HiRes::time – $start_time)."\n";

	print $encoded_json;

4 Responses to On re-evaluating old hammers. XML::Simple is crazy slow.

Leave a reply to Grant McLean (@grantmnz) Cancel reply

about me / this blog

tweets

Meta

pics

Blogs I Follow

Spam Blocked