CRUD refers to
- Create (insert)
- Read (select)
- Update (update)
- Delete (delete)
These are the basic operations required of any application to maintain data in various application tables. Database developers must often create a large number of stored procedures to support the CRUD operations to support an application.
There are many best practices for coding, both general and specific. General practices cover conceptual concepts that apply to most languages. One key practice in the Single Responsibility Principle (SRP). This guideline emphasizes that methods (modular components of code) should focus on one operation. With respect to CRUD, it means that there should be separate stored procedures for insert, update, delete and select. In other words, creating a stored procedure that does multiple operations is discouraged.
While creating larger, more complicated procedures that could insert, update and delete - for example, might seem like it would be more efficient, it rarely is. It results in fewer procedures to manage, but increases the complexity of maintaining and testing those operations. For example, if a single procedure is used to insert-update-delete, then a change to the code requires at least three tests to make sure it is working correctly.
Create - Inserting Records
In T-SQL, the INSERT statement adds a new row or record to a table. Adding a new record is a common operation required in many applications. One key difference between inserting and updating is that inserts to not require a primary key column value (for tables with an IDENTITY column. The other list of parameters is often the same - but not always.
For example, consider an application that keeps a list of tree species. Assume the table has a standard IDENTITY column to set the primary key. The table columns include
- treeid - int (PK)
- commonname - varchar
- scientificname - varchar
- imageurl - varchar
The Create operation requires an insert procedure. It would need three parameters for common name, scientific name and image url. A parameter for treeid is NOT needed, because inserting the record will create the PK value (and attempting to set the PK is not allowed for an identity column).
In this case, an update procedure would need to include all four columns, because the PK would be used in the WHERE clause to specify the record to change.
Take another example of adding a record to a Customers table. Again, assume the table uses a standard primary key as an IDENTITY column. In this case, the application is used by multiple stores - each of which have their own set of customers. The table design includes the following rows:
- customerid - int (PK)
- storeid - int (FK)
- customername - varchar
- emailaddress - varchar
The Create operation in this case requires an insert procedure with three parameters (storeid, customername, emailaddress). A parameter for customerid is NOT needed. Adding a record defines which store the customer belongs to.
The Update operation will need a parameter for the customerid. However, it will probably NOT need a parameter for storeid, since such an application would not have a way for a store to assign a customer to another store. This is an example of a case where the set of parameters between inserts and updates may use different parameters (other than the PK).
Not all applications need (or should) delete records. Consider an application that deals with health or financial records. Without a way to document when and how a record was deleted (and possibly why), deleting records would probably not be a good idea. It may be illegal in some cases - especially as it relates to medical data.
Consider a health clinic. A patient comes in, receives treatment, and dies on the way out. Do we really want someone deleting the records about that visit, prescriptions, etc?
If a Delete operation is required, it often requires only a single parameter - the PK value for the specified record. On the other hand, it is sometimes preferred to include more than one parameter, for safety reasons.
Take the example of the Customers table above. While a single value could be used to delete the record for a customer, what if Store A has an error an submits the wrong value? It might delete a record for Store B (and how do we know what happened?). So,, in some cases, it might be preferable to include more than one value - such as storeid AND customerid, just to insure that the correct record is being deleted.
Alternatives to Deleting Records
One simple approach to avoiding the deletion of records is to use a column to indicate whether a record is "active" or not. If the record is not active, it could be excluded from all operations and would be essentially invisible to the application.
Many times, such an approach is required. For example, take the Customers table example above. Perhaps the person moves or dies, and it is no longer necessary to maintain that record. But what if there are orders connected to that customer? Deleting the customer record might make it impossible to tell who those orders belonged to. If there are Referential Integrity rules in place (between Orders and Customers), deleting a customer record may not be permitted by the database. In these cases, a column to identify active and not active records might be required anyway.
Another option for handling deletions is to offload the records to another table, such as Customers_Removed. This can be accomplished in a Delete procedure, by adding a record to such a table at the same time it is deleted in the main table. Such a strategy may work for selected tables, but attempting to do it for all tables becomes complex and cumbersome.
In MSSQL versions since 2016, it is possible to track ALL changes - including deletions - in a companion table called a "temporal table". These tables track all change made to every row - in real time. This provides an automated tracking system where all changes are visible by date.
Lastly, the best practice for all databases is to maintain daily, weekly and monthly backups of the database, in case data needs to be recovered in an emergency. This also allows data to be recovered if it is overwritten or deleted by mistake. However, applications should NOT rely on backups for tracking things such as deleted records. Again, take the case of the Customers table. Mr. Smith places an order approximately every six months. If his customer record was accidently deleted five months ago, he no longer has an account (and perhaps all his orders were deleted, too). None of the backups in the last five months will have his information, either. A big deal? Maybe. Perhaps Mr. Smith was your largest customer and always placed very large orders. It would have been prudent to prevent customer records from being deleted at all.
Read Operations - Select Queries
The Read part of CRUD operations often includes many different procedures to query tables. Unlike insert, update and delete operations, which may only involve a single table, select operations most often include multiple, joined tables.
When supporting an application, it is common that some queries are very complicated. Although some queries may be very simple (list all customers for a given Store), other queries may require a long list of possible parameters.
For instance, consider a case where the application allows the user to query for a list of products based on name, price, category and color. A procedure would require four or five parameters - many of which may not contain a value. How can this be handled?
Assume the application is using the Customers table from the examples above. So, the query will need to include the storeid, in addition to the other parameters. Let's start with the following parameters for a "search" procedure:
- storeid - int
- productname - varchar
- price - smallmoney
- categoryid - int
- color - varchar
We are assuming that the category is defined as an FK to some type of Categories table, and the value we receive from the application is actually an int (from a dropdown list, perhaps).
If we consider the data that we actually receive, it might look like this:
- storeid: 1234
- productname: 'shirt'
- price: 0
- categoryid: null
- color: 'gre'
How do we handle partial and missing values? storeid would be a required value to include only products for the specified store. In this case, productname and color only have partial values. Those can both be handled using LIKE clauses, such as
But what if the user did not specify a product name or color? The query has to consider that possibility as well. This can be accomplished by using more complicated OR clauses:
Notice that there are now a series of AND clauses, each of which includes the option that the parameter value is empty/blank/missing OR has a value.
Handling the additional parameters can be handled the same way:
Notice that in one case (price), the default (missing) value is 0 (zero) and in another case (categoryid) it is NULL. This is where standards are important. What types of values will the application submit to the database? It would be just as common for an application to return NULL for a missing price or a zero for a missing category. The application and database developers must be on the same page with respect to how data will be exchanged, and how these types of situations will be handled. Note that the last query statement above would NOT work if these procedure received NULL for the price parameter instead of zero. Worst case, the application sometimes sends zero and sometimes sends NULL for the same case (the user did not specify a value for price).