Query Filters and Ordering

Review of a Simple Query

While the SELECT statement is the only command used to retrieve data, it is a very flexible and powerful command that has a great many variations.

To review, we often use a simple statement such as

Simple SELECT Statement for People Table

Figure 1: Simple SELECT Query Statement

The asterisk (*) in this statement means "all columns, in all rows". This is a common way to quickly query the data for one or more tables. The query results will include all columns, in the order they exist in the table. As seen in the table structure in the Object Explorer panel, the peopleid column is a primary key (PK). By default, the rows will normally be in order of the primary key. Because of the way the table was created, the PK is also an auto-number column (called an "identity" column), and the rows are displayed in the ordered they were inserted into the table.

Filtering Columns

While the SELECT * query is handy, we normalize do NOT want to use SELECT *, other than cases of wanting to quickly looking at a table's contents.

The preferred approach - and a Best Practice - is to always list the specific columns to request. The statement below is an example.

SELECT Statement with Selected Columns

Figure 2: SELECT Statement with Selected Columns

As illustrated, the query results display only the columns we requested, in the order we requested them. (The numbers in the left-hand column are a counter for each record; these are not the values from the peopleid identity column.) The asterisk has been removed; it is not included when individual columns are requested. This is the preferred query syntax.

Sorting Results: The ORDER By Clause

To change the sequence of the records in the query results, we can use the ORDER BY clause. This clause is always the last part of a SELECT statement, if it is included.

The ORDER BY clause includes a list of columns that should be used to sort the query results. More than one column can be included, and the results will be sorted on the first column listed, then on each other column listed.

This statement will sort on gender, then fullname. The results are sorted first by gender, where all the F values are placed before all the M values. Then, within each sorted group for column 1 (gender), the results are sorted by column 2 (fullname).

SELECT Statement with ORDER BY Clause

Figure 3: SELECT Statement with ORDER BY Clause

Sorting in Reverse or Descending Order

For each column in the ORDER BY clause, it is also possible to specify a direction. The default sort order is ASC, for Ascending. ASC does not need to be included since it is the default. To specify a reverse order, DESC is used to stipulate a Descending (reverse) order.

For example, to request the same records, but specify that men should be listed first, the query would be as shown below.

SELECT Statement with ORDER BY Descending

Figure 4: SELECT Statement with ORDER BY Descending

In this example, gender was sorted in descending or reverse order, with M before F. Notice, however, that the records were still sorted in ascending order for fullname (column 2).

Using TOP Keyword Qualifier with SELECT

If we want to restrict the number of rows returned from a query, we can use the TOP qualifier. The "TOP" keyword is used with the number of rows we want included in the results.

For example:

SELECT top 3 * from dbo.people

would return three rows.

The TOP qualifier can be used with any other version of the SELECT statement, so

SELECT top 3 age, fullname from dbo.people ORDER BY age;

works the same way.

The TOP qualifier can be especially helpful on tables that have an identity column, because the identity column values are always in order. Finding the last record that was inserted is as simple as

select top 1 * from dbo.people order by peopleid desc;

No error is caused if there are fewer rows in the table than specified in the TOP qualifier. For example, a statement like:

SELECT top 99 age, fullname from dbo.people ORDER BY age;

would return all the table rows if there are fewer than 99 rows.

The graphic below illustrates using the TOP qualifier:

SELECT Statement with TOP Qualifier

Figure 5: SELECT Statement with TOP Qualifier

Filtering Rows: The WHERE Clause

When using the SELECT statement, we often want to filter our results. Filtering is different from limiting the number of rows. Filtering means only returning rows based on some criteria or conditions. Filters are defined using the WHERE clause. The WHERE clause uses comparison conditions that evaluate to true or false, and records are only included if the comparison evaluates to true.

The syntax of the WHERE clause is simple:

SELECT columns from schema.table WHERE columnname search condition value

The WHERE clause follows the table name in the SELECT statement.

The WHERE clause poses a comparison that is evaluated for each row. If the evaluation is true, the row is included in the results. If the evaluation is false, the row is skipped and not included in the results.

Simple search conditions are things like

= equals
> greater than
< less than
>= greater than OR equal
<= less than or equal
!= not equal
<>not equal

For example, suppose we want to return results only for the women in the people table. The SELECT statement using a WHERE clause looks like this:

select * from dbo.people WHERE gender='F';

The search condition is "=". The database engine will examine each row in the dbo.people table and compare the value with the character 'F'. Value refers to the content of that column in the table. Since the gender column is a char data type, we expect the value to be a string. A single character is also considered a string. The character 'F' that is being used to compare with each column value is called a literal. A literal is a number or string we type in the statement. If we intend for the literal to be a string, we must enclose it in single quotes. In a search comparison, we can only compare strings to strings and numbers to numbers.

Note that the literal '5' is also a string - because it is surrounded by single quotes. That would be acceptable, if that is what we intended.

This statement will produce the same results as the one above:

select * from dbo.people WHERE gender='f';

SQL Server is not normally case-sensitive when it evaluates data; it is "case-insensitive". (It can be configured to be case-sensitive, but rarely is).

In this example, the operator is the "=" sign. Notice that the comparison value must match the comparison column data type. In this case, gender is a char (string) column, so the comparison value must be a string (surrounded with single quotes).

The same type of filter could be defined for a numeric column, such as the following:

select * from dbo.people WHERE age=44;

Numeric values are NOT surrounded by single quotes. The WHERE clause can use many types of comparison operators, such as >, <, >=, and <=. For example, to see all records for people older than 44 would be:

select * from dbo.people WHERE age>44;

Note that this query would NOT include people that are the age of 44, because the condition is only for people that are older than 44.

SQL Server permits two types of operators for "not equal": != and <>. They work exactly the same way and either one is acceptable:

select * from dbo.people WHERE age!=44;

select * from dbo.people WHERE age<>44;

Using And & OR in the WHERE Clause

Comparison phrases can be combined in the WHERE clause using AND and OR. The AND keyword adds a comparison to the search condition. When AND is used, both comparisons must be true for a record to be included in the results. Consider this example:

select * from dbo.people WHERE age=44 AND gender='F' ;

In this case, each record is evaluated against both comparisons. If age=44, and also gender='F', the record will be included. If either comparison is false, the record is NOT included in the results.

The WHERE clause can also include an OR keyword. In this case, records are included if either comparison is true. For example:

select * from dbo.people WHERE age=44 OR gender='F' ;

Records evaluated in this query will be included if the age = 44. They will also be included if the gender = 'F' – regardless of the age. The results using AND and OR are normally very different.

It is also possible to use both AND and OR. This normally becomes a much more complicated query in terms of logic. It is recommended that anytime a query uses both AND and OR that parenthesis are used to make the intended logic clear.

Consider the following example:

select * from dbo.people WHERE age=44 AND gender='F' OR age< 44;

Should the records be for 44-year old females OR females less than 44 and anyone else that is 44? It is unclear. This query makes it clear:

select * from dbo.people WHERE (age=44 AND gender='F') OR age< 44;

There is no limit to the number of comparisons that can be added to a WHERE clause. If the comparisons are all AND or all OR, parenthesis are not needed. But if the comparisons are mixed, parenthesis should always be used.

Short Circuiting

The order of multiple search conditions in a query normally does not matter. If parenthesis are used so that complex sets of conditions are clear, SQL Server will evaluate all of the conditions. However, coders may want to consider the order of conditions to assist with the speed of the evaluations.

If multiple search conditions are only combined with the OR operator, then SQL Server must always evaluate all of the conditions. However, if one or more of multiple conditions include the AND operator, the conditions are evaluated in order. And if any comparison is false, the rest of the conditions do not need to be evaluated (with ALL, any false condition eliminates the row from being included in the results).

This is called "short-circuiting". It means that if the least likely true conditions are placed first in a set of multiple conditons, the database will evaulate and skip the false conditions more quickly.

Here is an example:

Assume a people table has 1 million records of people from all over the U.S., distributed in roughly the same way as the overall population in terms of state residence. Take the following queries:

select * from dbo.people WHERE age>18 and age<=65 AND statecode='IN';
select * from dbo.people WHERE statecode='IN' AND age>18 and age<=65;

There are three conditions. In the first statement, only a small number of records will be eliminated based on age. A much larger group (97% that are not Indiana) will be eliminated based on the state code.

In the second statement approximately (97%) of the records will be eliminated by the first comparison and the rest of the statement will be ignored. For any single state code, the majority of the records will be eliminated.

For a table of only 1 million records, the difference in performance between these two statements would be minimal. However, if the table contained 300+ million records, and if the list of conditions were much longer, there might be a considerable difference in how long the query takes to complete.

Being aware of how short-circuiting works can be helpful when dealing with very large datasets and very complex queries.