Database Design Basics

Database Roles

There are three basic roles of people that interact with a database.

  • The DBA (database administrator) is primary concerned with maintaining the hardware (servers, network, connectivity) for the database, the security to access the database, and managing the processes of backing up and restoring the database. In some cases, this role may be responsible for managing multiple database servers and multiple databases.
  • A Database Developer is often the person who decides how to organize and store data, and how the data can be accessed through queries or programs. This person may or may not share responsibility with a DBA for security and other tasks to manage the database.
  • A User accesses the data using an API or a UI. The user may be part of the same organization that owns the database, or an external person (such as a shopper) that interacts in some fashion with the database information through other applications.

This course is primarily focused on the Database Developer role. Some topics are related to the work of a DBA, but they are not discussed at a level of detail needed to train a DBA.

Database Design

It has already been discussed that a database consists of one or more files. The internal, technical layout of those files vary by brand, and are beyond the scope of this course.

We are interested, however, in how we can conceptualize a database. What does a database "look" like?

In their simplest form, databases consist of "data sets"", collections of data that are organized and grouped around common topics. For example, we might have a set of index cards for recipes and a file drawer of folders about tax information.

These data sets may take various forms, such as documents or lists or spreadsheets.

Spreadsheets are a common way to maintain lists of data, and provides a simple way to organize the data in worksheets with rows and columns.

The most common way to conceptualize a data set in a database is in terms of rows and columns, very similar to a spreadsheet. This type of dataset is referred to as a database table. Tables are sometimes also referred to as "tuples" or "relations" (however, these are more academic terms, and you will rarely here teams members talk about "tuples"). The term table will be used throughout this content.

A database table consists of rows and columns. We can conceptualize it like a spreadsheet. An example might look like this:

Database Table Example

CountryName CountryCode Continent Population
Canada USA North America37279811
Mexico MEX North America132328035
United StatesCAN North America329093110
France FRA Europe 65480710
Japan JPN Asia 126854745
Brazil BRZ South America212392717

Just like a spreadsheet, each column represents a different piece of data, or data element. Each row represents a group of data, or a collection of data elements. In this example, each row represents a separate country. Rows are also called "records", and the two terms are interchangeable.

Each column represents the same type of information for each row. For example, the second column is a 3-letter country code. Each record in this table uses the second column to store the country code.

Columns are also referred to as "attributes" or "fields". (The term "field" should be avoided in this context, because application forms contain "fields". The two are not the same thing. So, using the term "column", instead of "field", helps avoid this confusion.)

This looks like a spreadsheet, right? And we can think of tables in the same way. But there is a a key difference in how a spreadsheet works and how a database works. In a spreadsheet, there are often very few rules about what can be entered into a column. For example, a spreadsheet example of the grid above might have entries like:

Spreadsheet Example

CountryName CountryCode Continent Population
Canada USA North America37279811
Mexico MEX North Americaunknown
United StatesCAN North America329093110
France FRA Europe (more than) 65480710
Japan JPN Asia 126854745
Brazil BRZ South America212392717

In the spreadsheet example above, the word "unknown" is not a number, but it has been entered in a column that is mostly numbers. The same case exists for the population of France – which looks like numbers, but includes parentheses and spaces.

These entries are not possible in a database table. Each column in a table must contain only one data type, such as a word or number.

(A quick note about the population numbers in the database table example. You might have expected the numbers to look like "37,279,811", but commas are not numbers; they are a way to display or present numbers. It would be up to the UI to display these numbers in a certain format. The database only stores the actual numeric data. )

The important concepts to remember about tables are

  1. a row contains a set of data for a single "instance" of the data, in other words, all the data describing a single country
  2. a column contains the same data for every row or record
  3. each column may only contain a single type of data, such as a word or number

Communicating With the Database

As noted above, we do not interact directly with a database. We work through a UI or an API (which works with the database engine).

This is similar to the way we dine out. We don't go to a local restaurant, make our way to the kitchen, and cook our own food.

We view the menu and tell the server what we want to order based on the menu. Someone in the kitchen prepares our meal and the server delivers it when it is ready. Perhaps we decide to order "pizza", and the server tells the cooks to prepare Margherita or Marinara or Pugliese, because the cooks need to know specifically what to make.

For a database, we use a UI or an API - which is our waiter or waitress. The UI or API takes our order, translates it into database code, submits our order, and delivers the results back to us.

For example, our request might be "What countries start with the letter C?". This must be translated into something the database engine can understand. The translation might be something like: select CountryName from countries table where firstLetter='C'

This is the type of request or query that many databases would understand. A query is a database request written in the syntax for the target database. For many databases, the query is written in a language called Structured Query Language, referred to as "SQL" – and also pronounced "Sequel". SQL is an open-source coding language to communicate with relational databases. It is also referred to as "ANSI SQL" to distinguish the standard, open-source version from customized versions from Microsoft and other companies.

See here for more information on SQL:

This course is mainly concerned with using SQL as part of the Microsoft SQL Server (MSSQL) database. While SQL is a generic database language, Microsoft uses its own version of SQL named T-SQL. T-SQL is Microsoft's database coding language, based on standard ANSI SQL, but also including customized functionality targeted at MSSQL. That is the coding language that will be discussed here.

An actual SQL query to retrieve a list of countries might look like: select CountryName from countries where LEFT(CountryName, 1)='C'

In this case, it is understood that the query is searching a table, and the correct syntax to check the first letter is LEFT(CountryName, 1)

The word syntax refers to the correct way to phrase a statement. Every computer language has its own syntax. T-SQL shares a syntax with ANSI SQL, and is used to writes queries and other statements for MSSQL. T-SQL syntax includes:

  1. the rules of how to write a query
  2. the keywords to use and how to use them
  3. the order in which to structure the keywords, operators and other parts of the query

Understanding the correct syntax to query the database is a large part of learning how to interact with the database, and learning how to code programs that can accomplish the same thing. Future articles will cover T-SQL syntax in detail.

Some important syntax rules for T-SQL:

  • keywords are not case sensitive; there is no difference between using "SELECT" and "select"
  • the order of keywords IS important
  • string literals must be enclosed in single quotes; double quotes are not allowed to define strings
  • does NOT require a semi-colon to end each statement - but it is recommended
  • coding statements MAY be placed on multiple lines: spacing is NOT important