Data Filtering with PHP | WebReference

Data Filtering with PHP

By J. Leidago Noabeb


[next]

In this article, we will be exploring some of the many data filters that are offered by PHP. These functions are available since PHP 5, and no extension installation is required since it is part of PHP 5's core. We will also be looking at what data filtering is and why it should be used in web applications.

What is Data Filtering?

PHP is often characterized as a 'weak' programming language. This is mostly because PHP is known to be an easy to learn language that is used as a footstep into web programming. Most of this misunderstanding is down to authors and tutorials that write about PHP and often concentrate only on how easy it is to write programs in PHP that simply collect data and then send it on through email or to a database, all the while forgetting to mention data validation. Beginners then go on to write these 'easy' scripts and find themselves subject to SQL injections and other forms of attacks that are easily preventable. One of the reasons why data validation is not mentioned in these tutorials and books is because validating user input is too 'complicated' for beginners and won't comply with the notion that PHP is supposed to be 'easy' to program with. In reality, it only takes a few simple steps to validate user input. So what exactly do we mean by data validation and why is it so important? Validating data becomes important when your application starts to accept user input. The rule of thumb is not to trust any data that comes from outside your application i.e. from forms or through the browser. While any data that originate from within your application is 'safe'. Any data that comes from outside needs to be 'sanitized' before it is accepted into your application. Example of 'safe' data is:

$myvar = "A safe variable";

The code above contains a variable that is defined within your application and can therefore be trusted. While the following data cannot be trusted:

$user = $_POST['username'];
$ID = $_GET['id'];

The code above shows a variable called $user that comes from a form that is used to collect the user name. This data cannot be trusted since it comes from outside our application. On its own the variable is not harmful, but if it is used in a database it could potentially present a security problem (as we will demonstrate shortly).The second variable called $ID is contained in a query string that can easily be tampered with when shown in a browser. For example if your query string was generated like this:

delete.php?id = echo $row['id'];

Then on the browser it will show the following line when the delete script is run:

delete.php?id=2

Any attacker will then be able to simply change the number to a letter to crash our application, which surprisingly in many cases reveals more security information about the application.

Let's take a practical look at one of the most common and serious attacks that take place when data validation is not implemented, SQL injections. Below is a sample of a login script. Assume that a form takes the user name and password and sends it to a processing script that grants a user access to the rest of the application if their login details are correct:

As it is, there is nothing wrong with the above code, but consider if the variables contain the following information:

$user = "Dantago !Noabes";
$pass = "x' OR 'a'='a";

What does the above information do? The data contained in the $pass variable is trying to fool your MySQL into thinking that the user is authenticated and that they should have access to the rest of the application. How does it do that? Take a look at what the MySQL code looks like with the above variables:

The password ='x' OR 'a'='a' will always evaluate to true since we are checking if 'a' equals 'a'. If the users' password was for example 'generic' then it would be the same as saying password ='x' OR 'generic' = 'generic' which is true. So the user is then authenticated.

Using PHP Filters

So how does PHP help to validate data? You can create your own filters or use PHP's data filters that come with PHP version 5.2 and above. In addition, you can also prevent SQL injection using a function available in PHP called mysql_real_escape_string(). Let's look at how we can avoid SQL injection using the code from our previous example:

Again, there is nothing wrong with the code above. It will now look like this in MySQL:

The difference now is that MySQL will try to match the user name Dantago !Noabes with the literal password 'x' OR 'a'='a' and will fail(unless 'a' is the correct password).

Data validation does not only revolve around SQL attacks. Other data can be validated, such as checking to see that an email address or URL is written in the proper format or ensuring that a particular value is of the right type. This can be particularly useful when checking that a query string value that is passed on to the application is what it is supposed to be. For example, a user ID is usually an integer and not a letter or string. This can be validated by using int() or is_numeric().

PHP 5 comes with the following filter functions (full list of functions available on the PHP website):

PHP Filter Functions:

Function

Description

filter_has_var()

Checks if a variable of a specified input type exists

filter_id()

Returns the ID number of a specified filter

filter_input()

Get input from outside the script and filter it

filter_input_array()

Get multiple inputs from outside the script and filters them

filter_list()

Returns an array of all supported filters

filter_var_array()

Get multiple variables and filter them

filter_var()

Get a variable and filter it

PHP Filters

Field

Description

FILTER_CALLBACK

Call a user-defined function to filter data

FILTER_SANITIZE_STRING

Strip tags, optionally strip or encode special characters

FILTER_SANITIZE_STRIPPED

Alias of "string" filter

FILTER_SANITIZE_ENCODED

URL-encode string, optionally strip or encode special characters

FILTER_SANITIZE_SPECIAL_CHARS

HTML-escape '"<>& and characters with ASCII value less than 32

FILTER_SANITIZE_EMAIL

Remove all characters, except letters, digits and !#$%&'*+-/=?^_`{|}~@.[]

FILTER_SANITIZE_URL

Remove all characters, except letters, digits and $-_.+!*'(),{}|\\^~[]`<>#%";/?:@&=

FILTER_SANITIZE_NUMBER_INT

Remove all characters, except digits and +-

FILTER_SANITIZE_NUMBER_FLOAT

Remove all characters, except digits, +- and optionally .,eE

FILTER_SANITIZE_MAGIC_QUOTES

Apply addslashes()

FILTER_UNSAFE_RAW

Do nothing, optionally strip or encode special characters

FILTER_VALIDATE_INT

Validate value as integer, optionally from the specified range

FILTER_VALIDATE_BOOLEAN

Return TRUE for "1", "true", "on" and "yes", FALSE for "0", "false", "off", "no", and "", NULL otherwise

FILTER_VALIDATE_FLOAT

Validate value as float

FILTER_VALIDATE_REGEXP

Validate value against regexp, a Perl-compatible regular expression

FILTER_VALIDATE_URL

Validate value as URL, optionally with required components

FILTER_VALIDATE_EMAIL

Validate value as e-mail

FILTER_VALIDATE_IP

Validate value as IP address, optionally only IPv4 or IPv6 or not from private or reserved ranges

If you are running PHP5.2 or higher, you should have access to these functions. To verify that they actually exist, run a script with the phpinfo() and scroll down to where it says filter. It looks something like this:

Figure 1

If you are by any chance running your own server, then for Linux or Unix type on the command line:

pecl install filter
.

...and if you are running the Windows platform, download php_filter.dll from http://pecl4win.php.net/ext.php/php_filter.dll, save it in your extensions folder. Just make sure that the file matches the PHP version that is installed on your system and restart the server so that PHP loads it.

So how do the functions work? You may have noticed that the functions are very unwieldy and certainly a pain to type out. Another thing is that there are only seven filter functions and of these, only four actually do any filtering. filter_has_var(), filter_input(), and filter_input_array() are all using super global arrays, such as $_GET and $_POST. You should refer to the superglobal variable by its equivalent filter constant. Below is a list of some of the constants:

Table 3. Showing Constants.

Constant

Superglobal

INPUT_COOKIE

$_COOKIE variables

INPUT_GET

$_GET variables

INPUT_POST

$_POST variables

INPUT_SERVER

$_SERVER variables

This list is not complete, so consult your PHP manual or visit http://www.php.net for a full list of constants.


[next]