Security Headers on the Top 1,000,000 Websites

I would like to share with you all the results of my scan and review of the Alexa Top 1,000,000 Sites HTTP response headers as they relate to security. I was mostly curious about which sites were using Content Security Policy (CSP) but ended up becoming more interested in all of the various modern day security headers that sites specify. The results were pretty impressive and I certainly learned a lot from it.

The Test Environment

To gather a legitimately sized sample set, I decided to send HTTP and HTTPS requests to every site in the Alexa Top 1,000,000 Sites list. I followed redirects to their final destination but did not crawl any additional pages outside of the first non-redirect page returned. I decided to use Python’s asynchronous event API gevent along with Kyoto Cabinet as my data store. By splitting my request workers into three processes, each handling about 2000 concurrent requests, I was able to scan and store the results of the entire Alexa list in three to four hours. Out of the one million sites in the list, I was able to gather a total of 1,253,735 valid responses.

My handling of responses was considerably unforgiving. If any error occurred, whether it was a connection error, Unicode decoding format error, or anything else that could give potential problems, the response data was simply dropped. Only responses that contained headers and content that could be easily JSON serialized were processed in this data set.

Two custom headers were sent in every request; a User-Agent string corresponding to Firefox 16, and a Connection: close header. In the future I will conduct additional experiments using Google Chrome’s User-Agent string. Due to time constraints, only the response headers were reviewed. While some of these security mechanisms can also exist within the meta-tag elements of the HTML body, these were not included in the analysis.

General Findings

One thing that really surprised me was how many people either use incorrect header names or invalid header values. In some cases header names are completely wrong, such as http://www.openthegraph.com/ which returned a CORS header of “allow-access-control-origin” instead of “Access-Control-Allow-Origin”. Other examples of incorrect header names included: serx-frame-options (instead of x-frame-options), access-control-allow-methodsaccess-control-allow-methods, access-control-allow-methods’, access-control-allow-headers’, and so on.

The Results

A total of 1,253,735 HTTP and HTTPS responses were analyzed. Out of the 1.25 million results there were 17,692 security relevant headers on a total of 16,109 unique URLs. Some URLs responded with multiple security headers. Testing was done to determine the usage of the following security relevant headers:

X-Frame-Options Results

The purpose of the X-Frame-Options header is to protect against clickjacking attacks by enforcing which sites are allowed to frame the requested resource. This header is by far the most widely adopted by websites with a whopping 12,812 sites using it correctly to protect their resources. With X-Frame-Options there are three possible values (depending on the browser): SAMEORIGIN, DENY and Allow-From. Currently, Firefox only implements the SAMEORIGIN or DENY values. Chrome and IE9 allow an origin list to be configured by including them in the Allow-From directive.

What surprised me the most about the results from this was not only the wild variation in which developers chose values, but how permissive Google Chrome and IE9 are with allowing values; they seem to almost fail open.

What I found absolutely amazing was that out of the 30 valid Allow-From values, 29 came from Craigslist’s domains. Of the 217 invalid values, most were attempting to mix SAMEORIGIN with Allow-From, which ends up causing Chrome and IE9 to simply fail open and allow any site to frame the resource. This effectively nullified any security benefit of setting the header. A large number of sites simply set “GOFORIT” as a value, which again causes Chrome and IE9 to allow the resource to be framed. Other sites specified the header as “Allow From” with a space instead of a hyphen, again causing the browsers to fail-open and allow any site to frame the resource.

Cross Origin Resource Sharing (CORS) Results

The Cross Origin Resource Sharing (CORS) specification was developed to meet the growing need of safely allowing third party sites access to the response data of specific resources. CORS headers are defined per-resource and can limit when and how data is accessed by third party origins. The primary header for allowing third party origins is the Access-Control-Allow-Origin header. Out of 1.25 million responses only 2,539 had this header defined in some way:

There appears to be a misconception for what is allowed in the Access-Control-Allow-Origin value. I noticed eight types of syntax being used when only two are actually valid:

  • (Valid) Wildcard value: *
  • (Valid) Single origin defined with scheme
  • (Invalid) Single origin defined without scheme (http:// or https://)
  • (Invalid) Multiple origins defined separated by a comma: “,”
  • (Invalid) Multiple origins defined separated by space: ” “
  • (Invalid) Multiple origins defined separated by space and comma: “, “
  • (Invalid) Wildcard subdomain origins defined such as: *.domain.com
  • (Invalid) Wildcard subdomain origins defined with scheme such as http://*.domain.com

A common use of CORS headers is to allow third party sites to not only read responses, but to allow cookies to be sent in the request so they may access protected resources as well. By default, CORS requests will not send cookies. For a request to be sent with cookies and a response to be readable by the browser, three things are required: The resource being requested must define a single, valid origin with a scheme. The resource must also respond with an “Access-Control-Allow-Credentials” header name with the value set to true, and the client XMLHttpRequest object must have the “withCredentials” property set to true. I noticed behavior in both Firefox and Chrome sending cookies when the withCredentials was set to true, but the HTTP response was not readable.

Another common misconception is that you can use the Access-Control-Allow-Credentials header with a true value alongside a resource which returns an Access-Control-Allow-Origin: * wildcard value. While the browser will indeed make the request to the target page and possibly send cookies (if withCredentials is set), it will not be able to read the response. Chrome alerts the user with below console error:

In total there were 94 URLs out of the 2,076 wildcard values that also had Access-Control-Allow-Credentials set to true.

HTTP Strict Transport Security (HSTS) Results

The Strict-Transport-Security header is a relative newcomer to the field. A server providing this header instructs the browser to connect over HTTPS for any requests going forward. When a user moves to a potentially insecure network, HSTS will ensure their connections (provided the max-age attribute value is still valid) will be forced over HTTPS.

There are two directives that a site can specify in the Strict-Transport-Security response. The first is the max-age directive, which basically determines how long the browser should keep the target site in its known HSTS list. The second is an includeSubDomains directive which tells the browser it should include any subdomains in its HSTS list with the specified max-age value. Note the max-age directive is required whereas the includeSubDomains directive is not.

When reviewing the numbers I was shocked to see how many sites were setting the max-age to zero. This effectively tells the browser to remove the requested site from its list of HSTS sites. After looking at the URLs, it turned out that the majority of URLs were related to www.etsy.com. After asking our contact at Etsy why this was this case, we were told it was due to their SSL opt-in policy. If a member with a valid user account enables full site SSL, they will get a longer, non-zero max-age value. Etsy took this approach as a fail-safe to ensure their services were not impacted while they continue to monitor usage and set larger values.

However, this ability for a sub-resource to set a domain and even subdomain wide policy is somewhat concerning. I re-read the specification a number of times and never saw this concern raised that a sub-resource can override the domain policy. To confirm, I once again asked my friend at Mozilla who had the lead developer of HSTS check the relevant source and determined that this is indeed the case. One thing to keep in mind is that the HSTS specification is in draft form and it may change.

A total of the 980 URLs had a valid STS header and value. The below invalid value count is not included in this total count.

Keep in mind that only 9 sites actually have their max-age set 0, the other 206 values came from www.etsy.com’s sub-resources because my crawler was accessing each of the individual shop URLs an unauthenticated user.

There were however a number of unique sites that were setting their max-age to a rather low value. In this case I determined a short max-age value to be less than 8,000 seconds. A lot of sites set their value to 500 seconds, which is quite useless in terms of protection. By the time you shutdown your laptop, go to your local Starbucks and connect to their Wi-Fi, the HSTS value will have expired and the browser will continue to behave as if the value had never been set.

Content Security Policy (CSP) Results

The moment we’ve all been waiting for; well at least me, as this is was the entire reason I conducted this analysis. I’m a pretty big fan of CSP and I can only see it getting more and more popular as time goes on. To me it is pretty much what the web world needs; we just need to make sure people implement it properly. If my results so far have showed you anything, it should be that how often people use invalid values even for headers which only require a single directive.

Content Security Policy was started at Mozilla and has since grown into its own W3C specification. It was designed to limit the ability and impact of cross-site scripting attacks. Much like the other security headers, the X-Content-Security-Policy-* and X-WebKit-CSP headers are defined per-resource and does not apply to the entire site. There is one caveat to the results presented here, which is that all requests were made using a User-Agent header corresponding to Firefox 16. This may skew results towards having more X-Content-Security-Policy results over X-WebKit-CSP results if server administrators are responding differently depending on the requesting User-Agent header.

While the specification is quite clear, if you wish to allow inline scripts or eval script code you need to supply the unsafe-inline and unsafe-eval options, respectively. However, in my testing by not specifying anything, inline scripts and eval were automatically blocked. Adding unsafe-inline and unsafe-eval and to the script-src directive did absolutely nothing. The only time I was allowed to execute either was if I explicitly set an “options inline-script” or “options eval-script” as its own directive.

After speaking with a friend over at Mozilla he informed me that they were currently working on implementing the unsafe-eval and unsafe-inline directives. The reason “options eval-script” or “options inline-script” work is because Mozilla’s CSP implementation was created before the W3C specification.

It should be noted that any website which allows these directives is significantly reducing the protection of CSP from XSS attacks since an attacker can just inject their script inline to the target resource. Since it is rather new, only 79 URLs were found to be using CSP, of which 32 included the inline-script or eval-script options.

Final Thoughts

Time and time again I was surprised at how many of the security headers were incorrectly specified. Making security decisions is never a task one should take lightly, depending on the organization it could even be a time consuming affair to get a simple header added to a particular resource. To do it incorrectly is just a waste of time. There are even times when it is potentially dangerous such as the max-age being set to 0 for the Strict-Transport-Security header values.

I will continue to monitor these sites and see what improvements they make. If you’re curious about the security headers list, feel free to download it here. As for testing the various security headers, feel free to check out my test cases.

Al | November 6, 2012 2:20 pm

It would help if you summarize the baseline of settings that would provide security in the header….

    Isaac Dawson | November 6, 2012 8:43 pm

    Hello Al,
    For me I would say X-Frame-Options: DENY or SAMEORIGIN on login pages or pages that are high value targets to someone looking to exploit click-jacking. Strict-Transport-Security should be set with a long max age (3-6 months depending on your comfort level) for sites which use SSL. CORS headers need to be evaluated on a case by case basis and it really depends on your use case for what should be enabled. As for CSP, it really depends on the layout of your site. Moving all of your inline script to external JS files and then putting CSP in Report Only mode would probably be a great start.

Joe | November 6, 2012 8:02 pm

I tried using CSP headers before and what I’ve found is that the enforcements they provide vary over even sub revisions of firefox.

The practical upshot of this was that code which worked on one version of firefox suddenly didn’t work on the next, resulting in the website failing.

Specifically it functions differently when using the reporting only mechanism compared to when its actually enforcing, which makes it near impossible to prototype and test on a site.

There are also inconsistencies regarding how reporting works when you have http authentication in use is another example. In short, enabling CSP is a good way to break a site randomly in the future. It isn’t ready for prime time yet and the mere 79 urls using it would seem to reflect that opinion.

Not that I’m bitter of course…

    Isaac Dawson | November 6, 2012 8:45 pm

    Hello Joe,
    Hopefully once the w3 specification actually becomes adopted by the browser vendors these inconsistencies will be taken care of. The options for eval/inline script don’t even show up in documentation on Mozilla’s site anymore, even though the implementation of unsafe-eval and unsafe-inline are not yet complete.

Andy Steingruebl | November 6, 2012 9:34 pm

Can you explain your concerns about HSTS and subdomains overriding the parent setting? Do you think this opens a security hole? It would be nice if you have comments that you’d post them to the IETF websec mailing list which is the best place for discussions of it.

Stomme poes | November 7, 2012 5:03 am

Honestly, a Headers-Lint program (maybe one exists) to check things like value typos and name-value mismatches would be awesome. Not only for those who know nothing (yes, I know, sec people hate that, but it is reality) but also for those who do know better but are human.

Odin Hørthe Omdal (odinho) | November 7, 2012 6:13 am

Nice work :-)

If you have any more conformance tests for CORS, where the current ones reside is http://w3c-test.org/webappsec/tests/cors/submitted/ — opera/staging/ has some nice ones that I’ll ask for approving for shortly.

CSP needs much more W3C conformance tests (they’re rather easy to write!), so that the faults that Joe talked about doesn’t happen. BTW, repository is dvcs.w3.org/hg/ — although we’re moving some test repositories to github as well: http://github.com/w3c/

Another problem for CSP-reach is the reporting being flooded with useless reports, often because of user-js or extensions etc.

    Isaac Dawson | November 7, 2012 11:40 am

    Hey Odin,
    Awesome stuff, I threw together my tests to confirm my suspicions of how CORS worked in the various browsers, good to see there are more formal ones in w3c :). As for CSP it seems like people are still waiting to see how the browsers end up implementing some of the directives.

Pete | November 7, 2012 10:30 am

Nice work. One of the things I believe will help in the debate btwn whether HTML5 increases or decreases risk is to get an understanding of how much of this stuff replaces workarounds.

For CORS, it looks like 80% of sites are increasing their risk by explicitly allowing anyone to access it. And 20% are defining a same-origin policy so the effect must be compared to their previous susceptibility to same-domain bypass (xss).

Do you know of any alternatives to CORS that could do this? Maybe some sort of server-side API?

WRT CSP, do you intend to f/u with deeper analysis of the 69 sites that use it? Since this is an existence test, I can’t assess how the defined policy might compare with existing environment.

Thanks,

Pete Lindstrom

    Isaac Dawson | November 7, 2012 11:53 am

    Hello Pete,
    Recently, I’ve changed my position to be not convinced one way or the other yet on if these new HTML specifications are more or less risky for the web. While any new addition in functionality can increase attack surface, some of these new security mechanisms clearly help and have little to no negative impact. (X-Frame-Options is a great example). For me — and as I think my results show — the biggest risk is people implementing things incorrectly and being falsely led to believe they are more safe. The amount of typos was quite astounding.

    For CORS, I can’t say “increasing risk” because it totally depends on the nature of the resource being requested. Keep in mind you can’t send credentials to a resource that is set to a wildcard origin and read the response. So that really limits what an attacker can do in terms of stealing sensitive or protected information. The problem all boils down to how it gets implemented. If the remote server is sending session identifiers in the URI then yes that site will most certainly have problems when combined with the power of CORS. In my opinion, the biggest “issue” with CORS is how easily it becomes to ex-filtrate data once you’ve found an XSS issue, or for how you can take advantage of sites which don’t properly sanitize user input for the url part of a call to the XMLHttpRequest object.

    Other ways of “message passing” (which I think is what you mean?) would be the Web Messaging APIs take a look at the specification from w3: http://www.w3.org/TR/webmessaging/.

    I think CSP is still going through its infant stages, I think it’s best to watch the specification and how the browser vendors develop it further. I really love the idea of Report Only mode so you can try it out and see what shakes loose.

Sid | November 8, 2012 12:13 pm

Possibly where the X-Frame-Options “GOFORIT” probably orignated:
http://stackoverflow.com/a/6767901

Briefly: when your web host is adding x-frame-options to the pages you’re creating, you can invalidate it by adding another, invalid token like “GOFORIT”.

Stefan Arentz | November 10, 2012 2:01 pm

This is great research. To see the HSTS results in better context, do you have numbers on how many of the sites you tested actually support https and how many redirect from http to https?

Krupa | November 20, 2012 8:45 am

The data gathered here must be pretty large. Were you gathering headers only, or was the body of the website(home page only) also saved along with it? Also, to complete the full download in 3-4 hours, u require very fast internet service, even when you are gathering headers by sending concurrent requests. What was the speed of your data connection & average download speed?

Jaap | November 21, 2012 1:16 am

Great work, Isaac!

It would be awesome if you open source the code you used to do this research. This research got me thinking about building a Burp plugin which automatically scans for incorrect Security header declarations. It would be great to be able to re-use your code if you are at all interested.

Cheers.

Please Post Your Comments & Reviews

Your email address will not be published. Required fields are marked *

*

RSS feed for comments on this post