Intro to Entity Analysis


Entity Analsysis is the next step in defining the database needs for a project. An entity is anything we think we need to keep track of. For example, in the case of Maria's Dog-Sitting Service, the initial information about the project suggested that there was a need to track customers, dogs and appointments. Each of these would be considered an entity.

There was also implied requirements that the application might need to track payments, and possibly locations. Each of these would be considered an entity as we explore and decide what the database needs to track.

In some ways, Entity Analysis requires taking the scenario analysis to another level, concentrating specifically on what types of data is needed for the application, and grouping that data in a way that makes sense for the application.

An Entity Analysis attempts to determine what types of entities need to be tracked in the database. The purpose is to answer the question:

What types of information do we need to track?

An Entity Analysis starts at a very high level, attempting just to identity very general entities. For example, if the client runs a pet store, it might be easy to quickly identity pets and customers as obvious entities.

Exploring Possible Entities

From this analysis, can we pick out possible entities? What does a database for Pawsit need to keep track of? The obvious entities are probably

dogs, appointments and customers

Some other possible entities might be

schedules, foods, medicines, special services, and payments

Finally, two special considerations might be

locations and pickup people

Let’s start with the obvious ones, which also seem to be the main ones: customers, dogs and appointments.

One technique that is helpful in identifying and exploring entities is to map them out visually. This can be done on paper or in an application like Word. There are also special applications, like Microsoft Visio, that provide special Entity diagrams. For now, we'll start with the simple diagram below. This could also be done on paper.

Draw Some Boxes for Simple Entities

Figure 1: Simple Entities for the Dog-Sitting Service

Understanding How Entities are Connected

Once we have started identifying the entities we might need, we also want to understand how they are linked.

For example, we are pretty sure that each dog belongs to a Customer. So, dogs and customers are linked.

We might need to know more details before we understand how certain entities are connected, but this one seems obvious. In fact, we can even describe or define the relationship. We can say Customers "own" Dogs.

Let's go ahead and define all the relationships for the entities we have:

  • Customers own Dogs
  • Customers make Appointments
  • Dogs go to Appointments

These are called relationships. In other words, how the entities relate to each other. You will notice that each outline item above uses a verb or action verb to define the relationship. This is a standard way to define relationships.

Now that we have suggested some relationships, let’s redraw our boxes and connect them:

Entities with simple relationships defined

Figure 2: Simple Entities with Relationships Defined

This is a great start. Not only do we have some strong entities, we have connected them with relationships. This is a basic Entity Relationship Diagram (ERD). An ERD is a diagram that displays entities, attributes and relationships, and illustrates how the entities are related. This can be used to establish the requirements for a database design, and is often used to define the database tables.

Taking a Second Look

Now, let’s reconsider our entities and their relationships. Our relationships are very strong. But they have one thing in common, which is that object A "does something" to object B. That is usually the first and easiest way to describe a relationship.

But when we are working with entities and relationships for a database, we often want to turn those relationships around. It works better to use a passive voice. The passive voice is a way to express a relationship where one object "is acted upon" by another object, such as Object B "is acted upon" by Object A. (instead of Object A "acts on" Object B).

A good example is

A customer makes an appointment

We can turn this around by saying

An appointment is made by a customer

The two statements mean basically the same thing, right?

But in an Entity Analysis (and in a database), it is helpful to look at it the second way.

If we turn around each of our relationships, we end up with:

  • Dogs are owned by Customers
  • Appointments are made by Customers
  • Appointments are made for Dogs
(Using passive voice is not an easy task. Don't get frustrated if it does not make sense at first. But practicing now will pay big dividends later when we actually start creating the tables. )

Here’s how the boxes look after we have renamed the relationships. Notice that besides just restating the relationship with a passive voice, we changed the direction of the arrows; that's the important part.

ERD with Passive Relationships

Figure 3: Entities with Passive Relationships Defined

Again, the relationships really mean the same thing. But we have changed the perspective of how the relationships work, and this changes the direction of the arrows. We’ll see next why this it helpful.

Adding Detail to the Entity Analysis

Although we identified several other possible entities for this project, let’s move forward with just our first three, to see how we work with them, and then eventually use entities to help design our database.

Once we feel we have identified strong entities for our project, and tried to understand the relationships between them, we want to add some details to these entities.

The first thing we can do is give each entity an "identifier". As we first conceived of each entity, we normally think of them in terms of "collections". When we say "Customers", we mean a collection of people that are customers.

As we start to think about the details of each Entity, we want to change our perspective and start thinking in terms of a single instance of that entity. So instead of Customers, we want to think about a single customer. What does a single customer look like?

Well, a single customer would probably have a first name and last name, or maybe just a full name; an address, a telephone number and email address, etc. We could use all of these to identify a single customer.

But when we are working with database tables, we normally want a simple key to identify each record. A key is a simple value - a number - that we can use to uniquely identify each record.

Why? Well, there might be two Bob Jones’s; some people don’t have a telephone and don’t have email. Overall, it is easier in the database to have a simple way to identity every record without having to worry about all these other issues. So, we normally create a numeric key for each record. Perhaps the ID value for Bob Jones from Muncie, IN will be 1001. So, when we want the record for Bob Jones from Muncie, IN will simply ask for the record with the ID of 1001.

We want an identifier as the main or primary way we will find each record. To do that, we will create an "ID" key for each entity, using the entity name as a prefix. So: dogid for Dogs, customerid for Customers and appointmentid for Appointments.

These ID’s are the first "attributes" we have defined for each entity. An attribute is a "characteristic" of an Entity, such as a name, color, weight, size, address, etc.; something that describes the entity in some way. The entities will eventually have other attributes like "First Name" and "Address". But for now, the ID attributes are a good place to start.

Let's add our ID's and flag them as a primary key (or PK) for each entity. A primary key (PK) is a single column in each table that we will use as the main value to access each record. Although it is possible to use something other than a number for a primary key, using a number is the most efficient data type for a primary key. The PK value for each record is unique for the entire table.

While we are at it, let's rename the entities to use a single name, since are focusing on how an individual instance will look. That will help us moving forward.

If we add these changes to our boxes, what do we end up with?

Figure 4: Entities with Primary Keys Defined

We can visually see how our entities relate to each other. But how will we connect each instance of one entity to an instance of another entity? If a dog named "Scoop" is owned by customer Bob Jones, how will be know?

The answer is that we will use another type of key - a Foreign Key (FK). A foreign key is a column in a table whose values will "point" at and link to a PK in another table. We already know that Dogs are owned by Customers. If we add another attribute to the Dog entity, we could link it to the dog owner, the Customer. To do this, we can add a field named "customerid" to the Dog entity.

How we decide on foreign keys is easy. If we have a relationship between entities (an arrow between them), then we need a foreign key. The FK is always on the side of the relationship without the arrow, and it always points to the entity with the PK. For now, the fields in the two entities will always be named the same. The FK represents a link (in terms of data) between two records.

We can now see the value of using the passive voice to define our relationships. It makes it easy to assign foreign keys to our relationships.

Figure 5: Entities with Primaryand Foreign Keys Defined

We are now ready to look at creating the actual database tables for our application. We will use the ERD to guide our design and to make sure we create and assign the right keys to our tables.