Security Headers on the Top 1,000,000 Websites: March 2013 Report
Back in November 2012 I did Veracode’s initial release of a security headers report on the top 1 million websites from the Alexa list. My goal was to turn it into a series so it would be possible to track how these sites change over time in regards to security headers that are added, removed or changed. For this recent scan, only a single change was made to the original scripts. The tool now sends a recent Chrome User-Agent to track if sites respond with different headers depending on the supplied User-Agent.
Only the Firefox User-Agent data was used when comparing the results with the previous November 2012 data set. Out of the original 1.25 million requests gathered from the November scan, a total of 719,355 URLs matched this most recent run. As for the new data, we had roughly the same amount of valid responses. There were a total of 1,256,787 responses for Firefox and 1,257,273 responses for Chrome. Both HTTP and HTTPS requests were sent to each site.
Changes, Additions and Removals
Each security header of the November 2012 and the March 2013 data sets were analyzed to see if sites had modified their value, added new headers or removed headers. A total of 2,450 new headers were added to the 719,355 URLs that exist between the two scans.
Similar to last time we tracked the following security relevant headers:
Of these headers a total of 2,198 had been added, 75 had changed and 452 headers had been removed. The majority of those removed were X-Frame-Options (246) and Access-Control-Allow-Origin (166). To reiterate, only Firefox based User-Agent requests were used when comparing the two data sets.
The rate of change matches what we expect; more popular headers, such as X-Frame-options and Access-Control-Allow-Origin, saw the highest rate of change.
Calculating sites that added headers is straightforward; we simply identified sites with security headers that did not have any during the November scan. Sites that changed their header values are a bit more difficult to characterize because a number of sites include the same header multiple times in the response. It is quite common when parsing response headers that multiple headers have their values ‘merged’ into a single value, separated by a comma and a space. This is a documented behavior which can be found in section 4.2 in RFC 2616. As an example shown below, we see a site returning the X-Frame-Options header twice.
For our purposes, we merge these headers into a single value of “X-Frame-Options: SAMEORIGIN, SAMEORIGIN.” In some cases the number of times a header is returned changes depending on when we make the request, which can skew the results when comparing the two data sets. This is most likely due to load balanced servers in which one or more servers are configured differently.
One of the more interesting data points comes from sites that removed security headers. Of the 426 sites that removed the X-Frame-Options header, 226 had originally set the value to SAMEORIGIN. A similar pattern can be seen with Access-Control-Allow-Origin. Of the 166 sites that removed the header, 130 had originally set the value to *. Of the 18 sites that removed the Strict-Transport-Security header, most had previously set a very high timeout value.
March 2013 Results
The scan conducted on March 10, 2013 used the latest Alexa list available at that time. As with the previous scan, both HTTP and HTTPS connections were attempted. This time, a second set of HTTP and HTTPS requests were sent using the latest Google Chrome browser User-Agent. Unfortunately during the scan, not every site that responded in Chrome also responded in Firefox. While the next two charts show the varying result count between the two browsers, all detailed security header value results analyzed below will only be the distinct results of requests sent using both user-agents.
For this scan, we analyzed a total of 1,256,787 responses for Firefox. We see relatively the same distribution of headers configured. X-Frame-Options are still the most popular, followed by Access-Control-Allow-Origin.
For Chrome, we see roughly the same as we do for Firefox. However, when using the Chrome User-Agent we see a total of 96 sites using the X-Webkit-CSP header, by far the largest variance between the two browsers. While 96 may seem like a lot, 83 of these sites are owned by Facebook, so only 14 of these results can really be considered unique.
This time the data was broken out a bit further than when we reported back in November. Invalid values were broken down further into sites that included conflicting headers. Conflicting headers means that a site returns two different header values for X-Frame-Options, such as a site returning DENY and SAMEORIGIN in the same response. Invalid values are simply that, for instance sites that configure Allow From (without a hyphen), or values such as ‘sameorigem’ (sic).
Overall SAMEORIGIN is still by far the most common setting, followed by DENY. GOFORIT is still used quite a bit with 215 sites configured with this value. Only twelve sites bothered to configure X-Frame-Options with an Allow-From origin list.
Cross Origin Request Sharing (CORS) Headers
CORS continues to be a popular mechanism for sharing data between sites. As described in the previous post the Access-Control-Allow-Origin header determines which sites can request a User-Agent to send a request and read the response data. When configured with the wildcard value, any site can send requests and read the response data. However, if the request object attempts to send credentials to a site configured with a wildcard, the request will fail and no response data will be returned.
We still see the wildcard value being by far the most popular way of configuring Access-Control-Allow-Origin. Configured to allow a single origin is in a very far second, with people continuing to configure it with invalid values. Most of the invalid values continue to be hosts with wildcards, such as http://*.domain.com or simply *.domain.com with out the scheme, or multiple hosts specified in various ways. For a list of valid values, please consult our previous post on this subject.
For Access-Control-Allow-Credentials, only 217 sites set the property to true, five set it to false and three had it set to an invalid value.
As last time we break STS values in to four broad categories; long max age, which is greater than 8000 seconds, short max age which is less than 8000 seconds, 0 which basically tells the User-Agent that the host should be removed from the browser’s HSTS list, and finally invalid values.
While the number of invalid values was quite low, we still see a max-age of 0 being quite high. Upon looking at the sites that have max age set to zero, the majority continue to be coming from www.etsy.com. As described last time, the reason for this can be attributed to their SSL opt-in policy.
There has only been a small change in sites using Content Security Policy. The biggest gain was in results for X-Webkit-CSP where we now seemingly have more X-Webkit-CSP responses than X-Content-Security-Policy. This can be attributed completely to Facebook. When using the Chrome User-Agent, popular user pages from Facebook began to respond with the X-Webkit-CSP header. In fact, 83 out of the 96 sites that had X-Webkit-CSP came from domains owned by Facebook. Unfortunately, we are still seeing a high amount of sites specifying the options “inline script” or “eval script”. For X-Webkit-CSP we have started to see the unsafe-eval which was also included in this count.
Overall it is good to see the number of sites adopting security headers trending upwards. It was a bit surprising to see that Content Security Policy still doesn’t have too much adoption, but compared to other security headers it is far more complex to implement and has a higher chance of impacting how a site operates. In this sense, one can understand why it is taking longer than the other security headers to become mainstream. Invalid values specified in headers continue to be a problem. If you utilize any of these headers on your site it would be worth double checking the values configured against its relevant specification.
We feel this information could be useful in the community, so Veracode has decided to distribute the raw data used in this study. Both the November 2012 and March 2013 data sets are now available for download. To give a more accurate picture of our comparison, these archives contain the full list of web sites that were analyzed whether or not they had security headers in their responses.