I ran into an interesting (though quite obvious) vulnerability in a site that I was looking at today. The idea is that you buy access to a file and then you go ahead and download it.

No problem — in fact it’s a pretty run-of-the-mill internet use case.

There are three problems with this scheme for this site:

  1. Everyone who buys a file gets the same file
  2. Downloading requires no authentication
  3. The filenames are guessable

The last one is the real killer for the site. If you’re hosting paid content anywhere (especially something like Amazon’s S3 where you don’t get fine-grained access control) and someone can guess the name of the file, and then you aren’t authenticated to verify you’ve purchased it, then anyone who’s a good enough guesser never has to really pay for content. Even if you make an unguessable (think: GUID) file, you can simply buy one then share the URL with your friends — you know, all of them on the internet.

This can be solved by a few different strategies — and these differ based on what the limiting resource, constraint or capability is.

Let’s play out trying to store these on files on S3. Let’s further assume that you do have a server that does some authentication. One strategy could be to make a copy of the resource for each purchase. This would combat the guessability aspect of the problem, but not the sharing. It’s better because at least one person needs to buy it, but you can still share it easily. The other downside is that you’d have to pay for indefinite storage on S3 for ever purchase. S3 is pretty cheap (and in fact is getting cheaper), but it’s still not free.

What you can do is simply regularly (maybe every hour, or even every few minutes) rename the file to another un-guessable name. Now when someone wants to down download a file, all you have to do is vend the current URL to the authenticated user. This way you get the benefit of low storage cost, and you solve the other problems as well. The downside is you need to maintain a database for item -> URL mappings and have a batch job that renames things regularly. The downsides are comparatively minimal. It doesn’t completely close the sharing hole, but it minimizes it greatly. People could still email the resources around, it limits bulk infringement quite successfully.

Another solution would involve the server that handles the authentication to serve the file itself. This would close the hole completely, but you’d need bigger servers to accommodate the added load. It’s not much load at all, but it’s something to consider none-the-less. The storage can still be in S3, but no longer as a public bucket. In the case of AWS, the transmit cost would wind up a wash if you configure the server to be in the same region as the S3 bucket so that’s not something that would affect things at all. You would also have marginally higher latencies with the additional processing, but that’s likely not important to this use case.

I’ll wrap it up there for now… I just thought it was an interesting thing to look at tonight.