DataSphereChronicle

The Key to Order in your Database

Sep 09, 2023

All databases, structured or unstructured, require a standardized way to define a unique record. At the core of any data organization is defining an identifier that is tied to a combination of fields that stamps a record as unique. Databases refer to this identifier as a Primary Key. Primary keys are essential to ensuring a database is modernized, high performing, and easily maintainable.

What is it?

A primary key is a unique identifier that’s value is distinct for one record in a database – think of a row in a relational table-based database. What makes a primary key efficient is that uniqueness is documented in one value – not a combination of values on a row. A combination of keys used to identify a one record is called a compound key.

Take an example of a list of addresses – what would make an address unique? You could not just rely on the street address as it may not be unique across all cities. Therefore, you would have to look across all address fields – street address, City, State, and Zip Code – to identify a unique address – resulting in a compound key.

What’s wrong with compound keys?

When you have a long compound key it makes your data operations much more costly from a resource and speed perspective – any select statement must look across 4 columns for any basic look up or count. Additionally, linking this address information with other datasets becomes increasingly taxing as you would have to join on all 4+ columns. That is why in these cases, it is best to create a primary surrogate key that represents a unique address.

A good key should be alphanumeric and have a long enough length to ensure uniqueness for your use case. While this is obviously essential for relational databases, it is also important for semi-structured and unstructured data as well. No matter the format, knowing the one key that uniquely identifies an object is essential for designing processes around that object.

Value of a primary key

Ensuring all data objects have primary keys is not new guidance in the data industry, however it is an essential part of modernization. Making sure data objects are properly identified is important to support distributed data systems – such as a data fabric – and ensure easy portability for your data. By following this pattern, you will see the following benefits:

- Better performance
- Reduced Data Gravity
- Less maintenance
- Increased ease of modernization

Overall, by ensuring you always have a primary key for each object, your database can achieve immediate performance enhancements and ease of maintenance. As a bonus, your database will be more portable making it easier to adopt modernized platforms and distribute data faster.

By Ellie Najewicz