What is SQL Injection?
To answer this, let's look at a typical piece of code used in millions of web sites which attempts to validate a user who has tried to log in to a protected area of a web site:
protected void Button1_Click(object sender, EventArgs e) { string connect = "MyConnString"; string query = "Select Count(*) From Users Where Username = '" + UserName.Text + "' And Password = '" + Password.Text + "'"; int result = 0; using (var conn = new SqlConnection(connect)) { using (var cmd = new SqlCommand(query, conn)) { conn.Open(); result = (int)cmd.ExecuteScalar(); } } if (result > 0) { Response.Redirect("LoggedIn.aspx"); } else { Literal1.Text = "Invalid credentials"; }
This is a commonly found piece of code that runs as the result of a ButtonClick event. It connects to the database and executes some SQL against a SQL Server database that returns the number of rows where the username and password combination supplied by the user matches a row in the database. If the result is at least one matching row, the user is logged in. At runtime, the values entered by the user are merged dynamically with the SQL string, to create a valid SQL command which is then executed against the database:
The values supplied above were "Admin" for the user name, and "Let_Me_In" for the password. The image illustrates that the merging of those values with the core SQL has worked rather nicely. You will only get logged in if there is a matching row in the database. Now look at this:
This was achieved simply by entering ' or '1' = '1 into both the username textbox and the password textbox. If you study the SQL that has resulted from concatenating those user values with the core SQL, you will probably be able to see that it will always match at least one row. In fact, it will match all rows, so the variable result will be > 0. Sometimes, coders don't return a count. They return user's details so they can use them for allowing further permissions or similar. This SQL will return the first row that matches, which will be the first row in the table generally. Often, this is the admin account that you set up when developing the site, and has all privileges.
This is SQL Injection. Additional SQL syntax has been injected into the statement to change its behaviour. The single quotes are string delimiters as far as T-SQL is concerned, and if you allow users to enter these without managing them, you are asking for potential trouble. Quite often, I see well meaning people advise beginners to "escape" the quotes, using a string.Replace() method such as this:
var username = UserName.Text.Replace("'", "''"); var password = Password.Text.Replace("'", "''"); string query = "Select * From Users Where Username = '" + username + "' And Password = '" + password + "'";
And indeed that will have the desired effect:
The first OR clause will never be true. Job done. However, it does not protect against all avenues of attack. Consider the very common scenario where you are querying the database for an article, product or similar by ID. Typically, the ID is stored as a number - most of them are autogenerated by the database. The code will usually look like this:
string connect = "MyConnString"; string query = "Select * From Products Where ProductID = " + Request["ID"]; using (var conn = new SqlConnection(connect)) { using (var cmd = new SqlCommand(query, conn)) { conn.Open(); //Process results } }
Now, in this case, the value for Request["ID"] could come from a posted form, or a querystring value - perhaps from a hyperlink on a previous page. It's easy for a malicious user to amend a querystring value. In the example caught by the VS debugger below, I just put ;Drop Table Admin-- on the end of the querystring before requesting the page:
The result, again is a legitimate SQL statement that will be run against the database. And the result will be that my Admin table will be deleted. You might be wondering how a hacker will know the names of your tables. Chances are they don't. But think about how you name your database objects. They are bound to be common sense names that reflect their purpose. It doesn't take long to guess. And of course, if you are using ASP.NET Membership out-of-the-box, the aspnetdb.mdf schema is available to anyone. But before you start changing all your database table names to something really obscure, that is not the answer. Dictionary attacks (where random strings are generated) are common. Finally, it takes just one disgruntled person (a former colleague?) who knows the obscure names you have used to undo all your effort.
So far, the examples have shown attacks that will only really cause an inconvenience. Someone might break into your Content Management System and start defacing your site, or they might make parts of the database disappear. No big deal - a database restore from a backup will put things right (you DO backup your database, don't you?). Or they might help themselves to some free shipments, or discover secret information that your client would prefer their competitors didn't know. Or it could be a lot worse. Have a look at this document, which describes a certain type of penetration test.
Penetration tests are tests designed to identify security loopholes in applications, systems and networks. This particular test makes use of the SQL Server xp_cmdshell system stored procedure. xp_cmdshell is extraordinarily powerful, and assuming that the user has the right privileges, effectively gives them total control over the machine that the SQL Server is on, as well as potentially others in the network. Now, imagine being responsible for creating the loophole through which someone was able to create an FTP site on your web server's network, and use it to store material they would not like the police to know about. Or wiped the entire server. Some people think they don't need to worry about this type of thing because their application is intended only to run on a private Intranet. I have however seen servers hosting this type of application being affected.
The Prevention
OK. I'm sure that by now, you can see that the only sensible thing to do is to prevent any possibility of your application being subject to successful SQL Injection attacks. So now we will turn to preventing them. I have seen plenty of advice like this: http://forums.asp.net/t/1254125.aspx, where the suggestion is to create a Black List if all T-SQL keywords and punctuation, and screen user input for the presence of it. If it exists, throw an error, or similar. It's not a good approach in my view. There are two potential problems with this that I can see. The first is that there may be legitimate reasons why users need to post values included in the blacklist. How many users will become frustrated in their attempt to post a comment that includes "At the end of the day..."? Most T-SQL keywords are also in use in every day language. What if they aren't alowed to submit an @ symbol? Or a semi-colon? The second problem is what happens if Microsoft makes changes to the syntax of T-SQL? Do you go round all the applications you have built over the years and rebuild your black list functions? You might think that the chances of Microsoft making changes are so slim that you shouldn't worry about this. However, until the middle of last year, neither "var" nor "=>" would have done anything in C# except generate a compiler error. You certainly won't get any guarantees from Microsoft against making changes to T-SQL. The real way to prevent SQL Injection attacks is the use of Parameter Queries.
Parameter Queries
Parameters in queries are placeholders for values that are supplied to a SQL query at runtime, in very much the same way as parameters act as placeholders for values supplied to a C# method at runtime. And, just as C# parameters ensure type safety, SQL parameters do a similar thing. If you attempt to pass in a value that cannot be implicitly converted to a numeric where the database field expects one, exceptions are thrown. In a previous example where the ProductID value was tampered with to append a SQL command to DROP a table, this will now cause an error rather than get executed because the semicolon and text cannot be converted to a number.
The SqlCommand class represents a SQL query or stored procedure to be executed against the database. It has a Parameters property which is a collection of SqlParameter objects. For each parameter that appears in the SQL statement, you need to add a Parameter object to the collection. This is probably simpler to explain through code, so taking the ProductID example as a starting point, here's how to rewrite the code:
protected void Page_Load(object sender, EventArgs e) { var connect = ConfigurationManager.ConnectionStrings["NorthWind"].ToString(); var query = "Select * From Products Where ProductID = @ProductID"; using (var conn = new SqlConnection(connect)) { using (var cmd = new SqlCommand(query, conn)) { cmd.Parameters.Add("@ProductID", SqlDbType.Int); cmd.Parameters["@ProductID"].Value = Convert.ToInt32(Request["ProductID"]); conn.Open(); //Process results } } }
The connection string has been defined in the web.config file, and is obtained using the System.Configuration.ConfigurationManager class which provides access to items in the web.config file. In this case, it retrieves the value of the item in the connectionstrings area with the name "NorthWind". The SQL query is declared with a parameter: @ProductID. All parameters are prefixed with the @ sign. The connection object is declared next, with the connection string passed into the constructor. It's in a using block, which ensures that the connection is closed and disposed of without have to explicitly type code to manage that. The same is true of the SqlCommand object.
Adding the parameter to the SqlCommand.Parameters collection is relatively straightforward. there are two methods - the Add() method and the AddWithValue() method. The first of these has a number of overloads. I've used the Add(String, SqlDbType) option and then applied the value separately. It could be written all on one line like this:
cmd.Parameters.Add("@ProductID", SqlDbType.Int).Value = Convert.ToInt32(Request["ProductID"]);
Alternatively, I could use the AddWithValue(string, object) option like this:
protected void Page_Load(object sender, EventArgs e) { var connect = ConfigurationManager.ConnectionStrings["NorthWind"].ToString(); var query = "Select * From Products Where ProductID = @ProductID"; using (var conn = new SqlConnection(connect)) { using (var cmd = new SqlCommand(query, conn)) { cmd.Parameters.AddWithValue("@ProductID", Convert.ToInt32(Request["ProductID"]); conn.Open(); //Process results } } }
The choice is up to you, but most professionals prefer to use one of the Add() methods where the SQL data type (and length, where appropriate) is specified. This reduces the chance of sub-optimal conversion causing performance issues on the server. It also ensures that the value being passed is of the right type in the application, rather than getting SQL Server to have to deal with it and report back errors. Having said all that, most samples on this site use the AddWithValue() option for readability.
When ADO.NET parameterised queries are sent to SQL Server, they are executed via the system stored procedure sp_executesql:
exec sp_executesql N'Select * From Products Where ProductID = @ProductID',N'@ProductID int',@ProductID=13
This passes in the SQL statement, followed (in this case) by the data type, and finally with the value that the parameter in the SQL statement must use. If the value is a string, any SQL syntax it might contain is treated as part of the literal string, and not as part of the SQL statement, and this is how SQL injection is prevented.
If a string is provided where a numeric is expected, the application will throw an error. For this reason, you should be validating all input for type and range before even attempting to pass it to a parameter.
There is one other benefit to be had by using parameters, and that is one of performance. When SQL Server is presented with a SQL statement, it first checks its cache for an identical statement. If it finds one, it retrieves an optimised execution plan which will ensure that the statement is executed as efficiently as possible. If it cannot find an exact match, it goes through the process of creating a plan to cache prior to using that plan to execute the statement. You can see that the first part of the sp_executesql call contains the statement, and that it will always be the same. All subsequent uses of it will use the cached optimised plan. If this statement was dynamically generated using string concatenation, and the ProductID varied each time, an execution plan would need to be created and stored for every value of ProductID. "...WHERE ProductID = 13" is not the same as "...WHERE ProductID = 96".
Stored Procedures
It always interests me that whenever the subject of preventing SQL injection comes up in the www.asp.net forums, at least one person contributes the assertion that you must use stored procedures to make use of parameters. As I have demonstrated above, this is not true. However, if you do use stored procedures the code above can be used with just two amendments: you need to pass the name of the stored procedure instead of the SQL statement, and you must set the CommandType to CommandType.StoredProcedure. It's omitted at the moment because the default is CommandType.Text. Here's the revised code for a stored procedure which I shall call GetProductByID:
var connect = ConfigurationManager.ConnectionStrings["NorthWind"].ToString(); var query = "GetProductByID"; using (var conn = new SqlConnection(connect)) { using (var cmd = new SqlCommand(query, conn)) { cmd.CommandType = CommandType.StoredProcedure; cmd.Parameters.Add("@ProductID", SqlDbType.Int).Value = Convert.ToInt32(Request["ProductID"]); conn.Open(); //Process results } }
LINQ to SQL, Entity Framework, OleDb and ODBC
Both LINQ to SQL and the Entity Framework generate parameterised SQL commands out-of-the-box, providing protection against SQL Injection with no additional effort. This is indeed true of many other Object Relational Mappers (nHibernate etc). If you are using MS Access, SQL injection is not such a problem, as Access is very limited in what it allows. For example, you cannot batch statements so the DROP Table example will not work. Nevertheless, the login hack will work, so you should still use parameters. A much fuller description of how OleDb parameters work with Access is given here: http://www.mikesdotnetting.com/Article/26/Parameter-Queries-in-ASP.NET-with-MS-Access. It should be noted that both OleDb an ODBC parameters work based on position, whereas the examples in this article that use SqlClient all work on matching the name of the parameters, and position is not important.
How other database systems are affected varies, but it is always best to check their documentation. Nevertheless, using parameterised queries where they are supported by the database system is a sure-fire way to make your application SQL Injection proof. You really have no excuse.