Creating a Table Using T-SQL
The most efficient way to create a table in SSMS is using T-SQL. Although tables can also be created using a visual wizard, that process is slower and harder to duplicate to make other tables.
The CREATE TABLE statement is a DDL (Data Definition Language) statement because it creates or modifies the structure of the database and of a table.
The main syntax of the CREATE TABLE statement is:
This is just the basic statement. A list of column definitions are required to design the full table.
Table Naming Rules
The basic table naming rules are simple: the name cannot start with a number or special character (other than an underscore), and the name should not be a keyword in SSMS.
Best practice in naming tables includes the following:
- No spaces in the table name
- Use natural words to describe the table. Use "BowlingScores" instead of "B234S"
- Keep naming consistent across all tables
Table names can be lowercase, camel case or some other combination. It is best NOT to use all uppercase names for tables.
The same basic rules apply to column names as to table names. Table definitions must also include a data type – at a minimum. Some data types require size or other information.
When defining string data types for columns, SSMS actually uses special names to more accurately describe the string type for each column. In all cases, the string data types must include the maximum length of that data that will be stored – or the word "max".
Here are the main SQL Server string data types:
- varchar(x): a column that can contain a variable-length string up the size of x
- char(x): a column that always contains a string of size x
- nvarchar: similar to varchar, but holds a Unicode string, including multi-byte strings
- nchar: similar to char, but holds a Unicode string, including multi-byte strings
Varchar is the most common string data type. A varchar column can contain a string up to the size specified.
So if a column for FullName is defined like this:
then any string up to and including 30 characters could be inserted in a record. A string of 31 or more characters would cause an error. The important thing about a varchar column, is that the actual length of the string value is the number of characters in the string. So if the value entered in the column is 'Sam Smith', the length of the data is 9.
This is different from a char column. A char column is always the number of characters equal to its length. Consider this statement:
If the string 'drm ' are inserted in the Initials column, the length of the data is 10. Any value of less than 10 characters would be padded with spaces. Because databases are more efficient when tables are "narrow" or smaller in size, using a char column is often not efficient. Char columns are excellent for things like Gender, where the data values are always the same size, but usually inefficient and not recommended for values where the size varies.
It is also possible to define varchar columns with a size of "max". This is available for cases where the size of the input may be very large and is not easily known. A good case is something like Comments or Descriptions. Max columns are meant to hold up to 8000 characters. But their use should be limited to cases where the data really may be very large. Names, for example, would not be a good case for using varchar(max). Varchar(max) should only be used when necessary.
The concept of a narrow table refers to a table that has a small number of rows and data that is relatively small. The meaning is not exact. For example, a narrow table might have 20 or fewer columns, where a wide table might have 50 columns. A narrow table would also tend to have data that is not overly large. A table that includes varchar(max) columns, or columns that store images, audio or other binary data, would not be narrow. Even if a table has only 3-4 columns, if the data it stores is large, it would not be considered a narrow table.
Why is narrowness important? Consider two tables:
- a table with a list of countries, that includes 3 columns, with the longest varchar column of 50 characters
- a table with a list of laws for each country, with 10+ varchar(max) columns, and one or more columns for images (binary data)
For a query involving countries, the database can load the entire table in memory and then sort and filter the data as needed. For a query involving country laws, the table might be too large to load entirely into memory. This means the database must manage loading the table in chunks, selecting records based on any filters, and then combining that with records from other sets of records (chunks). It is a much less efficient operation.
Narrow tables are more efficent for the database to process, and to link with other tables. There are many cases where it is not practical to control the number of columns and width of expected data to create a narrow table. But there are also cases, especially for tables that will see high usage, to consider narrowness in the design.
Database Design Details: Identity Columns
Before we finish defining this new table, we need to discuss two new topics: identity columns and primary keys.
Go back and run the original SELECT statement on the dbo.people table.
Look again at the results and you will see something we overlooked the first time. There is actually an additional column named "peopleid". Where did that come from? What is it for?
The peopleid column is a special column called an identity column. An identity column is a column that gets auto-numbered when a record is entered. There can be only one identity column per table.
We do not include the identity in an INSERT statement, because the database controls that column, and it automatically inserts the next number in the sequence.
Why do we use Identity columns? This is not a simple answer. The main reason we use an identity column is that we want a "key" value that identifies each row. We use the identity column to set the values for a PK column.
Why don’t we use something else for the key, like an SSN or a name or a code? The short answer is that EVERY other possible value might change. Even a social security number might be entered incorrectly, and need to be changed. Once we assign a primary key, we don’t ever want it to change.
The nice thing about identity columns is that the same value will only appear once in that table. Even if rows are deleted, the keys will never be used again.
The numbers assigned in the identity column are also NOT tied to the data - just the record. That is good when it comes to identity columns. We would rather have a unique, but random number as an identity value, instead of something that is tied to the data.
One more reason we like identity columns is that we do not have to manage the values. We don’t have to keep track of the last number we used for People and what the next number should be. The database takes care of it for us.
We will revisit this discussion again. But for now, remember that identity column are automatically filled in, and that we do NOT include them in the INSERT statement. If we try that, it results in an error.
More Table Design: Primary Keys
A primary key is the main way that the database organizes a table, fetches specific records, and links tables to each other. The primary key is an existing column that we also define as the primary key. The primary key must be a unique value. That way, when the database asks for the record with primary key 12345, there is only a single record that matches that condition.
If you remember the relational table example for Courses and Assignments, there was a column in both tables named "CourseKey". For the Courses table, CourseKey is the primary key. It provides a way for other tables, such as Assignments to link to one and only one record in the Courses table for the course it is assigned to.
Primary keys are meant to be short, which makes them fast to look up. There can only be one primary key per table.
They must be unique, and we would rather NOT have them tied to the data. This is called a "surrogate" or "synthetic" key. In other words, we usually don’t want to use one of the other data columns as a key.
Consider a people table. We might use an SSN number or a name as a key. Those are unique – right? No. Someone could enter an SSN number incorrectly. Then we need to add an SSN, that ends up already being in the database? Impossible. No.
What about names? Surely there could only be one "Thomas Haver Jonesy" we will ever come across. But what happens in the amazing situation that we come across two?
These types of keys are called "Natural Keys". They seem like a good idea and many people fall into the trap of using them.
But then they starting working on a really big, global project, and realize that Natural Keys are almost always a bad idea. They might work for one small table, but not for other huge tables? So why use them in the first place if we do not need to?
It turns out that we already have a great column to use for a primary key – the identity column!
By using an identity column, we end up with the perfect primary key.
But then someone will point out, but what if I don’t know the identity value for "Thomas Haver Jonesy", how am I going to get that record?
Well, when was the last time you had to know the VIN number of a car to look up a price? OR the product code of something you want to buy on Amazon?
Remember the diagram of the database layers, and the purpose of the UI and the report engine and the database?
As users, we normally do not use the primary key. It is there, in the web link we use to select a product. And the UI sends it to the report engine, which can quickly find the record you want and returns it the UI.
So, knowing the primary key is not import for us. It is important for the database and the report engine to organize and quickly access records. By following best practices and using a strong identity-primary key, we end up with the best database design.
Completing the Table Design
All the discussion about identity columns and primary keys was necessary for one reason: we ALWAYS want to include an identity column in every table, and we always want to make it the primary key. Normally, we want the first column to fit this requirement. That makes it easy to immediately see that there is a primary key and what it is. Fortunately, this is very easy to do.
Going back to the CREATE TABLE definition, let’s change it to look like this:
For now, we will also use the "int" datatype for the identity-primary key. In SQL Server, we can define a column as an identity column and a primary key, all on the same line, like above.
Should it really be named "TableID". No.
There are different standards and practices to use when naming columns. Some developers will always name they keys "ID" (which is generally a bad idea).
Let’s take a second to discuss naming.
We like to use simple, plain names in database design. If we have a table that is tracking pets, we can just name it Pets. Or a table holding courses can be named Courses. There is no need to overcomplicate things. Names should provide a good idea of what data the table contains. What does "XACCT462" contain? No idea. Bad name…
We normally want to avoid spaces, underscores, hyphens, numbers and any other special characters. There are rare cases where underscores are helpful.
Some people prefix tables with "tbl" or something similar. But prefixing every table with the same letters is the same as not prefixing any of them. This is an old standard that should not be used.
This same naming standard is usually applied to columns. Simple names are best.
Back to the identity column name. A good practice is to simply use a singular version of the table name and attach "id" or "ID" as a suffix. So, a Pets table might have a PetID as the identity column.
This points out that the table names are plural names. Pets, not Pet. Many developers name all their tables with singular names, supposedly because they represent a single object. A Pet. That sounds good, but in reality, a table represents a "collection" of things, not a single thing. We don’t say "give me the names of all the PET that need fed today." We say "pets". "Pets" is a more natural way to refer to a collection of pets. Plural table names are a more natural way to refer to collections of records.
That said, each record in a Pets table does represent a single Pet. So it makes sense to say "PetID", instead of "PetsID"
So, if we try to make our example more real-world, let’s switch to something concrete and design a Pets table. Now our CREATE TABLE example might look like this:
In addition to the identity column, I have changed the next column to "PetName" and made it varchar(30) – meaning it can contain any string up to 30 characters.
I also added the phrase "not null". Ok, so what is that?
NULL and NOT NULL
The term "null" means "nothing". If a column is NULL, it has no value at all; not 0 for a number or an empty string ("") for a string or varchar column.
If we define a column with NOT NULL we are saying that a value must always be provided for this column when a record is created. We cannot create a record and not provide some value for PetName.
This is actually a good idea. We don’t generally like columns that have null values. It is easier for everyone concerned (us as database developers, the UI developers) that all columns have some value.
But NOT NULL has another purpose. It means that the column is required. That makes sense in this case. Why would we want a Pet record without a Pet name? Let’s move on and we will see that not null gets more complicated.
How about some more columns:
|PetID||int||not null||identity primary key|
|,Weight||decimal (5,2)||not null||default(0)|
Okay. There are probably a lot of things we might track for pets – if we were building a database for pet store, for instance. So these are just some examples.
For Breed, the datta type is varchar(30). It could be there are larger breed names that would make us consider varchar(35) or varchar(50), etc.
The column definition also has NOT NULL, so we want to make sure there is some value in that column. But there are cases where it is possible we do not know the value that needs to be entered in a column, especially when we first create a record.
That is the purpose of the DEFAULT phrase. If a value is not provided when the record is inserted, the database will use the default value (the literal in the parenthesis). For MSSQL, we normally surround the default in parenthesis.
So what does this combination of NOT NULL and DEFAULT do? We said we wanted to have some value for very column, if possible. NOT NULL means we will not allow a null value, and DEFAULT provides one when we need it. And we don’t force the user to enter "Don’t know" or something else, just to make sure we have a value.
The Age column is defined as int, which makes sense if we only need to track years, for example. Defaults for number columns must be a number. The "not null", default combination accomplishes the same thing as Breed. If the user does not know the Age, the database will insert zero if nothing else is provided. That works.
Same thing for Weight, although this time we define the column as a decimal. Decimals are different from int columns in that they contain decimal places. The definition for Weight specifies (5,2) after the word "decimal". The 5,2 represents "precision" and "scale". Precision means the total number of digits. Scale means the number of decimal places. The decimal point is not included in these numbers. 5,2 means any number with up to 5 total digits, and 2 decimal places will be allowed. 999.99 qualifies. 99.999 does not (too many decimal places). 9999.9 does not qualify (the 2 decimals are a fixed requirement; there must always be room for 2 decimal places).
Again,NOT NULL and DEFAULT were specified. (We want ALL numbers to be not null. The exception will be keys – we’ll get to that later.)
Okay, the table definition above is complete. How do we actually create the table?
The entire CREATE TABLE statement is executed as one statement. So the entire statement should be selected (highlighted) (down to the ");" at the end) and executed as a single statement. That should create the table.
The table is now ready to have data inserted.
Recreating a Table
What if we made an error, need to add a column, or fix something else after the table was created? This is sometimes a difficult problem. In class, and in development environments in general, we can usually recreate the table and then repopulate any data we need.
The problem is that the table needs to be deleted, before it can be recreated. There is not a "create table and save data" command.
To delete a table, we have to DROP it. There is not a delete command. DROP is a DDL (Data Definition Language) statement because it affects the structure of the table.
This is a DROP statement:
The DROP statement cannot be undone. Once the table is gone, it is gone. Be very careful and double-check your statement and the table you are dropping is correct.
Once the table is dropped, you can execute a statement to create it again. Trying the run CREATE TABLE on an existing table results in an error. Nothing happens.
Trying to drop a table that does not exist is also an error.
The DIE Statement
Since SQL Server 2016, we can now use a DIE statement. That means "DROP IF EXISTS". The statement can be very handy. The syntax is:
This statement will check to see if a table exists before attempting to delete it. It avoids the error of dropping a table that does not exist. This statement is only available in version SQL Server 2016 and later.