Data Types

One of the key concepts about how databases are designed revolves around data types.

There are general and specific types of data. For instance, in general, we refer to letters and words as a "string". This is same concept used in all computer languages. Strings are different from numbers, which must not contain anything other than 0123456789+-. (decimal point).

Numeric data is separated into whole numbers (without decimal places) and decimals (with decimal places).

In SQL, there are several specific categories of strings, numbers and other types of data. These are called "data types". Each column in a table is defined with one, and only one, data type.

Strings

There are two main SQL data types for a string:

  • varchar (variable-length character string)
  • char (fixed-length character string)

A varchar is a string whose actual size, or length, is the number of characters it contains. For example, the string 'Canada' contains 6 letters. The length of this string is 6. We define varchar's with a maximum size, such as 40, which means the varchar can hold up to 40 characters (but no more).

A char is a string whose length is fixed. If we define a char with a maximum length of 4 characters, the length is always 4 characters. If we assign a char the string 'CAN', it is converted to 'CAN ' (with a trailing space), so that the value always has 4 characters (including spaces, as needed to fill out the size).

When we define a table column as a varchar or char, we also define the maximum width that the column can hold. So, if we might define a column as CountryName varchar(40) or CountryCode char(3) meaning each record can contain a string in the CountryName column up to 40 characters, and a string in the CountryCode column of 3 characters (it will be filled with spaces if we set it to less than 3 characters).

If we try to put more than 40 characters in the CountryName column OR more than 3 characters in the CountryCode column, there will be an error. The maximum size we can define for a varchar or char column is 8,000 characters.

In T-SQL, when we refer to the value for a string, we use single quotes. Some computer languages also allow the use of double quotes to define a string value, but T-SQL only uses single quotes. If you surround a set of numeric digits with single quotes, such as '911', then the value is a string – NOT a number.

The important concept here is value. The value of a column in a row is the actual data it contains. The column might be a string data type, but that does not describe the value. The value is the actual data. Here are examples of string values:

  • 'Steve'
  • 'bike'
  • 'pizza'
  • ' ' (a space)
  • '' (an empty string)

The data type for a column defines the permitted values.

Numbers

There are several main SQL data types for a number, although for now only two are important:

  • int (a whole number - no decimal places, positive or negative)
  • decimal (a number with decimal places, positive or negative )

Numbers in T-SQL are NOT enclosed in single quotes. Examples are:

int (integers)

  • 23
  • 14056
  • -55
  • 0

decimals

  • 23.45
  • 2045.2837
  • -99.88
  • 0.00

When defining numbers in T-SQL, parenthesis are used for decimals, but not integers:

Age int or Weight decimal(6,4)

For decimals, the parenthesis are required because the definition must also establish the total number of digits and the number of decimal places. In this example, the Weight column has 6 total places, and 4 decimal places. This will be covered in more detail later.

Other Data Types

T-SQL has many other data types, only a few of which will be used in this course. The main ones are

  • date: a valid date
  • datetime, smalldatetime: a valid date that also includes the time
  • bit: a true-false type that may only contain 1 (true) or 0 (false)
  • smallmoney: a number type that is really a decimal

Dates

Dates in T-SQL are defined using the data-type name, such as date and datetime:

BirthDate date or MeetingDateTime smalldatetime or RunStart datetime

Date types are treated like strings when we define their values, such as '2018-01-05'. The "year-month-day" or "yyyy-MM-dd" format is referred to as an ISO Date, and is the preferred format for defining date values. An ISO date always included leading zeroes, so that the value is '2018-01-05' and NOT '2018-1-5'.

The date type can only contain a date. The datetime and smalldatetime types also contain the time, in the format "hours:minutes:seconds:milliseconds" or "hh:mm:ss:nnn". The different between datetime and smalldatetime is that datetime will include milliseconds and smalldatetime only contains seconds.

If the time is not specified for a date value, it defaults to zeros, such as '2018-01-05 00:00:00'.

In this course, the time will never need to be set manually for any date-time columns. Automated functions will be used to set the date-time in some cases, but it will never need to be set or updated.

Later in the course, another date-time column will be introduced to store the date-time with time-zone information. This is called a datetimeoffset.

Bits: a Boolean Data Type

A Boolean is a true-false, yes-no data type that exists in all computer languages. In other languages, a Boolean sometimes contains a value of "true" or "false", or "Y" or "N". In T-SQL, the bit data type is the only Boolean data type, and it may only contain a 1 or 0.