you're reading...

Founders BLOG

WordPress on S3: how to prevent hotlinking

Now that you use Amazon S3 to store and serve media files in scalable fashion, how would you prevent undesirable hotlinking to your files?

Hotlinking to cloud storage URLs

Hotlinking (or direct linking, inline linking) is when other sites link to or embed the image (or other media file, e.g. video) directly without providing a link to the source page. Oftentimes hotlinking happens without meaning any harm: people just want to share a picture with their Facebook friends, or post it to forums they like or etc. Regardless of intention, though, hotlinking may be harmful to your business: the media file gets downloaded from your site and consume the bandwidth that you pay for, but the user doesn’t get to see your page depriving you of ad revenue, etc.

With WordPress-to-Cloud solution all media files are stored in the cloud storage such as Amazon S3. Now it is the cloud storage (and not your web server) that serves the images and other media files. Even though this approach reduces load on your web server and hotlinking is unlikely to cause scalability problems, you are still paying for the bandwidth.

Configure access policies to cloud storage URLs

As the WordPress on S3 solution becomes more popular, we often get questions about access control to the site’s images and other media files that are stored and served by Amazon S3.

Amazon S3 supports bucket policies that can be used to implement flexible access restrictions for the objects stored in the bucket.  The access can be restricted based on various factors, including the referer header of the HTTP request that the web browser sends when it downloads the image.  I.e. if the image is downloaded from a web page, the referrer is going to be the URL of the web page.  So when the image is downloaded from your website the referrer URL is going to contain your website’s host name; but when the image is downloaded from somewhere else (e.g. Facebook) the referrer URL is going to contain some other host name (e.g. www.facebook.com).

Here is an example of a bucket policy that only allows downloading the media files if they are referred from www.example.com and www.subdomain.example.com:

{
	"Version": "2008-10-17",
	"Id": "HTTP referrer policy",
	"Statement": [
		{
			"Sid": "1",
			"Effect": "Deny",
			"Principal": {
				"AWS": "*"
			},
			"Action": "s3:GetObject",
			"Resource": "arn:aws:s3:::mybucket/myprefix/wblob/*",
			"Condition": {
				"StringNotLike": {
					"aws:Referer": [
						"http://www.example.com/*",
						"http://www.subdomain.example.com/*"
					]
				}
			}
		}
	]
}

The policy is specified in JSON format, but largely can be copied from the sample policy substituting the essential parts that are highlighted in red: the bucket name and the (optional) prefix, and the list of the referrer patterns.

Step-by-step instructions

Let’s launch a ready-to-run WordPress-on-S3 and do a practical study of hotlinking prevention. After following this step-by-step guide the website is going to end up looking like this:

Now suppose you like the bottom-left picture with the pool.  You copy the link and send an email to you friend.  The email is going to look like this:

Without the bucket policy in place, clicking on the link is going to do what you expect: the web browser is going to download the picture of the pool and show it.  So the link can be used anywhere: in a blog, on Facebook, on Twitter, etc.

Now let’s create a bucket policy that would only allow using the image from the website.  To do that log into the AWS management console, pick the bucket that stores the data for the website (in our example it’s oblaksoft-yapixx), choose permissions and select ‘Add bucket policy’:

Enter the following policy:

{
	"Version": "2008-10-17",
	"Id": "HTTP referrer policy",
	"Statement": [
		{
			"Sid": "1",
			"Effect": "Deny",
			"Principal": {
				"AWS": "*"
			},
			"Action": "s3:GetObject",
			"Resource": "arn:aws:s3:::oblaksoft-yapixx/db0/wblob/*",
			"Condition": {
				"StringNotLike": {
					"aws:Referer": "http://www.yapixx.com/*"
				}
			}
		}
	]
}

The highlighted parts correspond to this web site’s data location (oblaksoft-yapixx/db0) and the domain name of this site (www.yapixx.com). You will need to supply the values that correspond to your own setup.  Now the images still appear fine when they are embedded or referred from www.yapixx.com. But downloading the images from any other domain is going to result into the 403/Forbidden response, like this:

You can specify multiple referrers, if you wish.  For example, if you have multiple domains that are used as aliases for your web site, you can list all of them there.  You can even specify domain names that you don’t own.  For example, if you want to allow the image to be referred from web email clients, but not from social networks, you can add a policy like this:

{
	"Version": "2008-10-17",
	"Id": "HTTP referrer policy",
	"Statement": [
		{
			"Sid": "1",
			"Effect": "Deny",
			"Principal": {
				"AWS": "*"
			},
			"Action": "s3:GetObject",
			"Resource": "arn:aws:s3:::oblaksoft-yapixx/db0/wblob/*",
			"Condition": {
				"StringNotLike": {
					"aws:Referer": [
						"http://mail.google.com/*",
						"http://*.mail.yahoo.com/*",
						"http://www.yapixx.com/*"
					]
				}
			}
		}
	]
}

This policy allows the images to be referred from www.yapixx.com, Gmail and Yahoo! mail.

Sometimes you may want to allow requests that don’t have the referrer specified.  This can happen if you paste the image URL directly into the address field of the web browser – it’s not referred from anywhere.  In that case you can use a special Null condition for the referrer key, for example:

{
	"Version": "2008-10-17",
	"Id": "HTTP referrer policy",
	"Statement": [
		{
			"Sid": "1",
			"Effect": "Deny",
			"Principal": {
				"AWS": "*"
			},
			"Action": "s3:GetObject",
			"Resource": "arn:aws:s3:::oblaksoft-yapixx/db0/wblob/*",
			"Condition": {
				"StringNotLike": {
					"aws:Referer": "http://www.yapixx.com/*"
				},
				"Null": {
					"aws:Referer": false
				}
			}
		}
	]
}

So, as you can see, Amazon S3 puts you in control of where your media files can be referred from.  As of the time of this writing, Google Cloud Storage doesn’t seem to have an equivalent access control mechanism, but we hope that they will implement this functionality soon.

Looking forward to hearing your thoughts on your direct linking prevention needs and practices.

Discussion

Trackbacks/Pingbacks

  1. [...] more here: WordPress on S3: how to prevent hotlinking | OblakSoft This entry was posted in WordPress Guide and tagged a-files-are, amazon, cloud, media-files, [...]

Post a Comment

Categories

Twitting ...