Data Security: Protecting Data

Introduction

Data security involves strategies to prevent data from being seen or retrieved by hackers or others who should not have access. Sometimes, this comes down to decisions about what data to maintain and do not store

In general, best practice is that applications should strive to maintain only data that is actually needed and actually used. In other words, applications should not collect or store unnecessary data. For example, unless there is a purpose to collect and store a user’s employer name, such a column should be removed from the database.

This can be especially relevant for two categories of information: Personally Identifiable Information (PII) and Protected Health Information (PHI):

PII is data that can identify someone, such as a social security number.

PII also includes data items such as an employee ID number, but may also include a set of data that can be combined to identify someone (address, phone number, job title, badge number, etc.).

While "John Smith" does not identify an individual, "John Smith" working at PTX Electronics may identify a single person. Knowing that John Smith lives in Oakland, IN and was born on 1982-10-03 might also help identify a single individual. Knowing that John Smith is the Vice-President of PTX Electronics also would identify a single person.

Applications must take special care when collecting PII and PHI. PHI data may be protected by HIPPA and other laws. While revealing some data is often avoidable (an employee's position, for example), limiting data where possible is a good strategy. Knowing that John Smith is a Vice-President probably won’t help anyone hack the company's HR database. Knowing employee badge numbers might help.

A third area where applications should be cautious is in collecting and maintaining account information such as credit card numbers. Retention of credit card data should be left to processors. Credit card data and transactions can be stored and referenced from the processor (PayPal, Stripe, Square) using transaction ID’s, and the actual credit card data does not need to be maintained.

When data from any of these sensitive areas are collected, consideration should be given to encrypt the data and limit who has access to that data (see below: Hiding Data in Individual Columns)

Learn more here:PHI and PIII Definitions.

Tracking User Activity

Tracking user logins and activity is a common database task. Such tracking, however, must adhere to local laws – which may differ from country to country. This type of activity should also adhere to company policies, and any stated policies such as web site privacy policies. Data maintained in activity logs should also account for PII, PHI and account data sensitivity.

International Considerations

Web applications related to users in the European Union (EU) are required to adhere to the General Data Protection Regulation (GDPR). This is a very specific set of requirements controlling how user data can be collected, stored and maintained, including how users must be informed about issues related to Privacy. The GDPR, for example, requires the ability for users to recover most types of data that are collected about them.

Web Applications

Web applications connect to a database through a “connection string”. This is a formatted set of credentials to allow the web site to connect to the database. It usually looks something like:

connectionString="data source=SQL5044.smarterasp.net;initial catalog=xf_7cgr2y5_mynewdb; persist security info=True; user id=xx_yyyyyy_zzzzzzzzzzzz; password=abcdefghijk12345*; MultipleActiveResultSets=True;App=EntityFramework"

Web applications connect to a database through a “connection string”. This is a formatted set of credentials to allow the web site to connect to the database. It usually looks something like: connectionString="data source=SQL5030.smarterasp.net;initial catalog=db_9fe8d9_mmecloudnew; persist security info=True; user id=xx_yyyyyy_zzzzzzzzzzzz; password=abcdefghijk12345*; MultipleActiveResultSets=True;App=EntityFramework"

In many cases, web sites use a single connection string to access the database. The user/login in this case is often a database owner – but not necessary a security admin role.

The web application often assigns users to roles within the application – not at the database level. Users and roles are allowed to execute certain functions, or denied the ability to execute functions. But this security is controlled in the application – not in the database.

For advanced hosting services, such as Microsoft Azure, the database connection strings are encrypted and can be hidden – even from the developers. This allows the DBA to control security without compromising the ability of developers to do their jobs.

Other Issues

Hiding Data in Individual Columns

SQL Server 2016 (newest version) includes a feature named “Always Encrypted”. This feature allows individual columns to be encrypted. For example, data such as SSN’s, account numbers, phone numbers, employee ID’s, patient numbers, etc., can be encrypted on a column-by-column basis. The security of this data is strong because the master keys to decrypt the data is not stored in the database – but in an encrypted key store. More on this feature can be found here.

The ability to view encrypted columns can then be granted to roles or individuals as other permissions are.

Data can also be “masked”, so that only part of the actual data is visible, such as revealing only the last four digits of a credit card number or social security number. Permissions can be set on masked columns so that some users can views the entire column values and other users only see the masked version – even when querying the data directly.

Why is masking useful?

Assume a company maintains sensitive data in the database such as birthdates and illnesses. (Note this is PHI information, and considered confidential). Many company staff members may have access to those records as they speak with patients. However, the company does not want other people to see the sensitive data. Consider the case where the company asks a third-party consulting company to help maintain and improve their web site. The contractor employees should not have access to sensitive data, even though they need to review the web site views and reports that contain sensitive data.

Masking allows the database administrator to hide or display only parts of a field, such as a Patient Number. The contractor might only see *******981 for patient number 6724981, keeping the actual data safe.

Masking is applied at the column level, meaning that even if a developer is using the Query Manager to query a table, the developer would only see the masked data unless he or she has elevated access.

Encrypting Data at Rest

What has been discussed so far involves digital security, controlling access to the data files, tables, procedures, and other features of the database. All of this is often under the control of the DBA.

Another aspect of security is physical security: access to the facility, server rooms, servers, network, etc. This is normally the responsibility of corporate or facility security.

Sometimes, an overlooked aspect of security is the physical security of the database media. In other words, the hard drives, SSD’s, or backup media used for the database. All the other security measures are useless if someone can steal and access a hard drive or DVD with an unencrypted copy of the database.

Recent SQL Server versions include a feature named Transparent Data Encryption (TDE). This feature encrypts the database files, log files and backups, so they are protected from unauthorized access in any form. This is referred to as Data at Rest. This means that even if these items are stolen or otherwise accessed outside of the database engine, they cannot be used without proper access.

More on this feature can be found here.