Validating an uploaded file is an image. Can it even be done?

So I’ve gotten the qrplop codebase to the point where I have to start writing web services. I’m enjoying doing this as a Dancer project (though I promise the next one, either the “lost dog” or the TokBox one will be Mojolicious.) One of the nice things about using a nice modern application framework like Dancer is the cleanness of the code. It’s easy for me to have my nice model layer, have it testable, have it separate, and plumb in REST actions on routes with minimal effort.

Another thing that is pretty nice is that frameworks like Dancer provide is abstraction layers like Dancer::Request::Upload.  I like that when an HTTP POST contains a file parameter, automagically the object my handler receives is special, and I don’t have to write any hokey file management logic by hand. Now I just need to find some nice HTML5 drag and drop file upload javascript compatible with Bootstrap to get me past the stone age of interfaces, where I live. However, focusing on file uploads, particularly on image uploads, has led me to start thinking about one major problem.

How do I know the “image” the client just uploaded to me is actually an image?  When you ask online invariably the answers fall into a number of camps:

  • Look at the Content-Type header of the file upload. Ok, this ignores the fact that a perfectly simple malicious vector is just to fake the Content-Type header. Seriously, that’s all you have to do to get around this check. Here we have an LWP script that sets the Content-Type manually as a jpeg, and then simply uploads some php. So, if all you are doing is making sure your Content-Type header actually matches the uploaded content, you’re in some trouble.

and here is the output from a Dancer script.

Congratulations, your php is now a jpeg and living nicely on the servers file system.

So, usually some smart folks will say:

  • well that’s because looking at the Content-Type is a clearly inferior way of verifying the type of an upload. Everyone knows that the proper thing to do is to use a module like File::MMagic to verify the contents of the file. Well, this had been banging around in my head for the last few days when I happened to stumble upon the very example I was worried about. You can see the original here. But I’ll reproduce it on my blog for funsies:


If you look at the source of that image, it is very much a jpeg. However, if you look at it in a browser, directly, it renders itself as an HTML page.  Seriously, go ahead, look at the content of the page/image in all of it’s glory:


that renders in your browser like an HTML page. So, if someone wanted to put XSS attacks in there or lord knows what, they could. You know what the file command does to that file? (And therefore what the venerable File::MMagic module would do):


It would tell you, beyond a shadow of a doubt, that what you have there is a fine jpeg. And you would store it in your file system, and it would live there sinisterly waiting for the day a misconfigured directive served it to a web browser, and blammo. You have someone elses code being served from your website.

There is a third option, that I came up with, that I’m not so sure I like.  You can take the output of both the content-type and File::MMagic and use those to attempt to load that image using whatever image specific library is out there (libjpeg, libpng, whatever) or through some intermediary… and then re-save that file to a different format. So, if it came across as a ‘jpeg’, save it to png and then back to jpeg. If it was a png, save it to jpeg and then back. Basically, the idea being “never serve the exact contant that was uploaded to you as it was uploaded, always make sure to put your stamp on it.”  The advantages and disadvantages break down as follows (as far as I can tell):


  1. If you do it right, you can be pretty sure that image is actually an image. It might be a corrupted stupid looking image, but at least it won’t have “evil exploitable content.”


  1. Requires a lot more CPU.  Storing an image and then serving it doesn’t really ask a lot of the web server. Reading an image and re-saving it in a different format does require a bit.
  2. A really smart evil hacker could find some exploit in libjpeg or libpng or whatever, know that you are re-encoding the image, and then target that particular library to exploit your server.

I haven’t decided how I feel about the problem at hand, I just know that it’s a much larger problem than I originally expected when I started thinking about it.  On #plack on we came upon a pretty simple solution:

It’s really easy to avoid this possible issue, simply don’t let your code ever talk to any other code ever under any circumstances.

Pretty sure that’s not going to be an option.

This entry was posted in tech and tagged , , , , , , . Bookmark the permalink.

3 Responses to Validating an uploaded file is an image. Can it even be done?

  1. Converting an image is always the way to go. At least you resize it, so it fits naturally on a website.

  2. We have a section for job ad’s on one of our sites. We resize all uploads. I have several UX friends that tell me you should always run things like png crush on your website images. Doing that and a resize/resave should be safer. One more layer.

  3. Converting the image is definitely a must.
    Then copy the converted image onto a host for serving static images that has no php, perl etc, just a dummy nginx-server, and you should be safe on your end at least.Then prevent that other server from XXSing, and you should be fairly safe?

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s