Subqueries

Introduction

A sub-query is a SELECT statement that it is executed within another statement. These are sometimes called an INNER query, and the main statement is called then OUTER query. The purpose of a sub-query is to retrieve data related to each row of the main query.

Simple Example

A simple case where a sub-query is useful, is obtaining an aggregate value of a related or child table. For example, given the following tables, how would one find the most expensive product in each department?

Sample hardware tables example

Figure 1: Sample Hardware Tables

This is a relatively complex question. There are at least two solutions. The first is to JOIN and GROUP the tables:

select d.departmentname, maxprice = max(p.price) from hdw.departments d left join hdw.products p on p.departmentid=d.departmentid group by d.departmentname order by d.departmentname

A second solution is to use a sub-query.

select d.departmentname, maxprice = (select max(price) from hdw.products where departmentid=d.departmentid) from hdw.departments d order by d.departmentname

Using a sub-query in this case is the simpler solution. As you can see from the example, the sub-query is a complete T-SQL SELECT statement. It is "nested" within the outer query. A sub-query can normally not stand on its own because it is related to each row from the outer query. In the case above, the departmentid of the hdw.departments table is used to define what data to query from hdw.products.

Both of the T-SQL statements above return the same results:

Subquery results example

Figure 2: Sub-query Results Example

Sub-Query Rules

There are several rules for using a sub-query:

  • It must be enclosed within parenthesis
  • It must only return a single value; a sub-query cannot return multiple values or a list
  • Cannot include an ORDER BY clause unless TOP is used

More information on the use and specific rules about sub-queries is available at this link: Subqueries (SQL Server)

Inserting Rows Using a Sub-Query

One of the most valuable uses of a sub-query can be using it to link to primary key values when inserting data into a table. Take the case of wanting to normalize data from an import source, where data a single table needs to split into multiple tables. For instance, assume a table like the following:

Table of countries to import

Figure 3: Sample Table of Countries to Import

This table is not normalized and should be split into Continents and Countries.

The basic T-SQL to create the new tables would look like this:

T-SQL statements to create tables

Figure 4: Creating Tables to Normalize Country Data

Populating the new Continents table would be easy. A single INSERT statement, using a DISTINCT clause in the SELECT statement will provide the values:

insert into xyz.continents (continentname) select DISTINCT continent from xyz.tempcountries

But how to populate the new Countries table? Each country needs to be associated with a continent using a foreign key value. Because the new Countries table enforces referential integrity with an FK constraint, each country that is inserted must be associated with a valid ContinentID value form the new Continents table.

Using a sub-query is the common solution to this problem. The sub-query can associate each country record in the import table with the appropriate continent in the new Continents table - by the continent name. The sub-query retrieves the associated ContinentID and uses that as the FK value for the new record in Countries.

Sub-query to insert related data

Figure 5: Sub-query to Insert Related Data

Here is a graphic representation of how this works. For each record in the import table, some data is inserted directly into the new table. If a column requires a foreign key value (such as ContinentID), the sub-query matches the appropriate string columns from the import table to the FK table, and retrieves the ID value needed for the new record.

Flow of a sub-query insert

Figure 6: Flow of a Sub-query Insert Statement

Conclusion

Sub-queries can be a valuable approach to retrieving data that might otherwise require a more complex solution.