Joining Tables

JOINING Tables Using the SELECT Statement

To this point, everything we have done with SELECT involved only a single table. But relational databases are focused on how tables relate to each other, so many queries involve multiple tables.

The JOIN keyword links one table to another through one or more common columns. The purpose of the JOIN command is to effectively make multiple tables appear to be a single table. In fact, we consider the results of a query to also be a “table”. This results table only exists in the way the data is retrieved, but we still consider it a table.

Consider two tables that are designed like this:

Two Tables to JOIN

Figure 1: Two Tables to JOIN

The departments table has a departmentid column (which is the primary key), and a departmentname column. We abbreviate primary key as PK.

The employees table has a fullname column and a departmentid column. The departmentid column is considered a “foreign key” or FK, because it provides the link to the departments table.

In relational database design, FK columns link or “point to” PK columns. This links or “relates” the records in the two tables. How does this actually work?

Using a SELECT statement, the JOIN keyword provides these links. Here is what a SELECT statement would look like for these two tables:

Query Results from JOIN Statement

Figure 2: Query Results from JOIN Statement

Notice the query results look like a single table that includes all the columns from both of the source tables (Employees and Departments). The employee table columns are listed first because employees was the first table listed in the SELECT statement. (And because we used * to list all columns).

Here is another example with individual columns listed:

Sample JOIN Results

Figure 3: Sample JOIN Results

The results are actually the same, but only the requested columns are returned.

The JOIN Statement in More Detail

Examining the JOIN statement in more detail, there are two parts that must be included. The JOIN keyword identifies which tables are being linked or joined. The order of these tables can be important, but we will analyze that issue later.

The ON keyword identifies which columns in each table are used to match records. The columns must match by data type, such as int to int. It is NOT a requirement that the columns are named the same. It is also not a requirement that the columns be a PK and FK. In a good database design, the columns would most often be PK and FK columns, but it is not required.

Another Example of JOIN Query

Figure 4: Another Example of JOIN Query

It is also possible to use multiple column pairs in the ON clause to further control how the records are joined. We will see examples of that later.

Table Aliases

One of your first reactions to the JOIN statement above might be “Wow, that statement is a lot more complicated!”. The example is not really more complicated, but the table names make it look so.

Fortunately, SQL Server has a feature called “table aliasing” that let’s us assign an abbreviation to each table, rather than always use the full schema and table name.

A table alias is one or more letters to make up a short abbreviation that represents that table. Aliases cannot be keywords, but can be almost any other word.

One option is to use a single letter to represent each table. For example:

Using Table Aliases

Figure 5: Using Table Aliases

In this example, the letter “e” has been used as the alias for the employees table and the letter “d” has been used for departments. Once an alias has been assigned, it is used everywhere else in the statement to represent the associated table. So the aliases are used to prefix the column names and as part of the ON clause. Table aliases make the statement much more readable. One thing to point out about JOIN statements in particular, is that it is a best practice to prefix all of your column names. Although SSMS will understand which table each column name belongs to when they are unique names, tables may change and the SELECT or JOIN statement will fail when SSMS cannot interpret things. Consider the following example. The table aliases have been left off each of the column names. The columns departmentname and fullname only belong to one table each, so SSMS can figure out the respective tables. But there is a departmentid column in both tables, so it is unclear to SSMS what is being asked for in this case. This statement will not run until the errors have been resolved and SSMS understands what table each departmentid column belongs to.