So I’ve gotten the qrplop codebase to the point where I have to start writing web services. I’m enjoying doing this as a Dancer project (though I promise the next one, either the “lost dog” or the TokBox one will be Mojolicious.) One of the nice things about using a nice modern application framework like Dancer is the cleanness of the code. It’s easy for me to have my nice model layer, have it testable, have it separate, and plumb in REST actions on routes with minimal effort.
How do I know the “image” the client just uploaded to me is actually an image? When you ask online invariably the answers fall into a number of camps:
- Look at the Content-Type header of the file upload. Ok, this ignores the fact that a perfectly simple malicious vector is just to fake the Content-Type header. Seriously, that’s all you have to do to get around this check. Here we have an LWP script that sets the Content-Type manually as a jpeg, and then simply uploads some php. So, if all you are doing is making sure your Content-Type header actually matches the uploaded content, you’re in some trouble.
and here is the output from a Dancer script.
Congratulations, your php is now a jpeg and living nicely on the servers file system.
So, usually some smart folks will say:
- well that’s because looking at the Content-Type is a clearly inferior way of verifying the type of an upload. Everyone knows that the proper thing to do is to use a module like File::MMagic to verify the contents of the file. Well, this had been banging around in my head for the last few days when I happened to stumble upon the very example I was worried about. You can see the original here. But I’ll reproduce it on my blog for funsies:
If you look at the source of that image, it is very much a jpeg. However, if you look at it in a browser, directly, it renders itself as an HTML page. Seriously, go ahead, look at the content of the page/image in all of it’s glory:
that renders in your browser like an HTML page. So, if someone wanted to put XSS attacks in there or lord knows what, they could. You know what the file command does to that file? (And therefore what the venerable File::MMagic module would do):
It would tell you, beyond a shadow of a doubt, that what you have there is a fine jpeg. And you would store it in your file system, and it would live there sinisterly waiting for the day a misconfigured directive served it to a web browser, and blammo. You have someone elses code being served from your website.
There is a third option, that I came up with, that I’m not so sure I like. You can take the output of both the content-type and File::MMagic and use those to attempt to load that image using whatever image specific library is out there (libjpeg, libpng, whatever) or through some intermediary… and then re-save that file to a different format. So, if it came across as a ‘jpeg’, save it to png and then back to jpeg. If it was a png, save it to jpeg and then back. Basically, the idea being “never serve the exact contant that was uploaded to you as it was uploaded, always make sure to put your stamp on it.” The advantages and disadvantages break down as follows (as far as I can tell):
- If you do it right, you can be pretty sure that image is actually an image. It might be a corrupted stupid looking image, but at least it won’t have “evil exploitable content.”
- Requires a lot more CPU. Storing an image and then serving it doesn’t really ask a lot of the web server. Reading an image and re-saving it in a different format does require a bit.
- A really smart evil hacker could find some exploit in libjpeg or libpng or whatever, know that you are re-encoding the image, and then target that particular library to exploit your server.
I haven’t decided how I feel about the problem at hand, I just know that it’s a much larger problem than I originally expected when I started thinking about it. On #plack on irc.perl.org we came upon a pretty simple solution:
It’s really easy to avoid this possible issue, simply don’t let your code ever talk to any other code ever under any circumstances.
Pretty sure that’s not going to be an option.