How to Prevent Web Attacks Using Input Sanitization

Three of the top five most common website attacks – SQL injection, cross-site scripting (XSS), and remote file inclusion (RFI) – share a root cause in common: input sanitization. Or to be more accurate, a lack thereof.

All three exploits are leveraged by data sent to the Web server by the end user. When the end user is a good guy, the data he sends the server is relevant to his interaction with the website. But when the end user is a hacker, she can exploit this mechanism to send the Web server input which is deliberately constructed to escape the legitimate context and execute unauthorized actions.

Input sanitization describes cleansing and scrubbing user input to prevent it from jumping the fence and exploiting security holes. But thorough input sanitization is hard. While some vulnerable sites simply don’t sanitize at all, others do so incompletely, lending their owners a false sense of security.

Incoming Data Dangers

There are three roads data can take to get from a user’s browser to the Web server:

GET requests. These are parameters included in the URL, often (but not always) generated by form input on a Web page. The parameters in a GET request appear after the question mark in a URL:

Anyone can easily manipulate the data in a GET request simply by editing the URL.

POST requests. These are parameters included in the header information sent from the browser to the Web server. POST data does not appear in the URL, but can be manipulated by hackers using browser plugins like Tamper Data For Firefox  or simply with custom code using a library like cURL.

Cookies. Often overlooked when sanitizing input, cookies created by a website can contain exploitable data. Cookies are stored as plain text files on the end user’s machine and can easily be modified by a hacker to manipulate input data sent to the server.

Example Attacks and Defenses

Exploiting input sanitization weaknesses can take many forms, but let’s look at some classic attacks. We are using PHP for these examples, but the same principles apply to other Web development languages like ASP and Ruby.

ATTACK: Returned form input. In this attack, the hacker exploits a Web page that returns an incomplete form by echoing back the user’s own input.



Imagine that the user submitted the form above, but it was returned for a validation error – perhaps they failed to complete a required field. The form echoes back the user’s input so that the fields are already completed with their prior input. But there is no input sanitization – the code simply repeats the exact input from the user. Now suppose the user entered this data into the “comment” field:


When the form is output, this “comment” does a sneaky thing – it closes the INPUT tag and inserts a Javascript tag which executes code to launch a pop-up window with a URL to a website controlled by the hacker (which might include spam or malware). Worse, what if this comment was saved to a database and can be seen by other visitors to the website – their browsers will be tricked into opening the pop-up to the hacker’s site.

DEFENSE: Any attack like this – where user input is echoed back to the Web page – requires that data be sanitized before output. Because we are defending against Web page injection, you need to escape HTML special characters including tag brackets () and the entity ampersand (&) so they are not rendered by the browser.

PHP developers can use the function filter_input to do the heavy lifting:

$safe_data=filter_input(INPUT_GET, ‘comment’, FILTER_SANITIZE_SPECIAL_CHARS);

Because this function only works on a single GET parameter at a time, you might want to write a function to make a new array (e.g. $SAFE_GET) which iterates through the GET parameters and sanitizes them all in one go.

Alternatively you can set a directive in the php.ini file to default to sanitizing all input for HTML safety:


ATTACK: Unquoted attributes. The official HTML spec does not require you to quote HTML tag attributes. That means both of these syntaxes are equally legal:

<a href="details.php?id=>
<a href="details.php?id=">

Let’s assume that $userid has been properly sanitized. Problem solved, right? Not so fast.

In the first syntax without quotation marks enclosing the href attribute, we can sneak in a new attribute that passes right through the sanitization filter:

Given this URL, the page with the above code produces HTML with a link now rigged to trigger the hacker’s own Javascript code when clicked.

DEFENSE: Enclose all attributes in quotes! Although simply stated, this defense speaks to the need for discipline and consistency in Web development as its own defense against input sanitization exploits. Where there are holes, rodents will sneak in.

ATTACK: SQL injection. Attacks which try to exploit an underlying SQL database can use faulty input sanitization to their advantage. It is important to remember, though, that input sanitization alone is not a cure-all against SQL injection. More on that in a moment.

One of the first probes an attacker will make is to test whether your code sends SQL queries without any data sanitization. This is often done with a single quote. For example, the hacker will visit your login form and enter a single quote in their email address, like this:

[email protected]

If your backend does not sanitize this input, it will cause a syntax error when the database interprets the SQL. Chances are that the error output will be pushed to the Web page and the hacker will now know that they can construct more sophisticated inputs to add clauses to the SQL query that may dump data from your database.

DEFENSE: You must escape all user input before including it in an SQL query. In PHP this is done using the function mysql_real_escape_string(). Never include any data in an SQL query without passing it through this function first!

The best defense against SQL injection attacks is not related to input sanitization at all. Although this goes beyond the scope of this article, ideally your Web application should not construct SQL queries using user input at all. Rather, it should rely on prepared statements and parameter binding.

Considering Context

Whenever you look at sanitizing user input, the key point to remember is context. In what context will this data be used? Consider the most common possibilities:

  • HTML output
  • HTML attributes
  • Javascript
  • CSS
  • SQL

Each context has its own vulnerabilities. In this article we’ve looked at some examples of HTML output and attribute contexts, as well as SQL. But when it comes to input sanitization, one size does not fit all.

Aaron Weiss is a technology writer and frequent contributor to eSecurity Planet and Wi-Fi Planet.

Aaron Weiss
Aaron Weiss is a technology writer, comedy writer, and web developer.

Top Products

Related articles