DataSphereChronicle

Rightsizing your Database

Jul 29, 2023

We are generating more data than we ever have before, and the industry expects this trend to continue. To keep ahead of uncontrolled data growth, we need to put data right-sizing practices in place to curb volume increases.

Cleaning your Data

The first place to start when decreasing your data volumes is to look for data that can be permanently removed.

Reference Data - Any data that your database is not the source of truth for, but you have replicated from another source for reference. Any older version of this data does not need to be maintained. If needed, you can pull from the trusted source. For example, if you reference currency rates for calculations, you only need to retain the current rates while older rates can be deleted.

Tables with Large Objects - Tables with CLOBs and BLOBs are good candidates for deletion because they usually hold large volumes that will save you more storage space. Additionally, these large objects can hold data that should not be maintained in relational databases such as - compressed files, logs, messages, json files, etc.

Tables with the most volume - Start where you have the most opportunity to reduce your volumes. You can find your largest schemas or tables in your database through investigative queries.

Data Archival

Archiving your database is an essential practice and important for modernized applications. It’s also essential to archive to then easily delete historic data that no longer has value. An archive is a subset of your database that is historic or less valuable and therefore can be stored in a compressed, more cost-efficient environment. This differs from backups in that it is not a complete replica of the database - but instead a portion of the database. Traditionally an archive is hosted on an object or file storage offering where storage costs are much cheaper.

If you have not started archiving, first begin by defining when objects should be archived. There should be a point in time where data stops providing value and could be removed. An archive policy defines when each object should be archived and ultimately deleted for a given database. An archive leverages a cost-effective storage solution - like S3 - to hold archived data. The most efficient systems automate the archive data movement so it is not a manual task.

See the Benefits

Decreased infrastructure costs - If a team has never archived their database - they can save 20% or more off of their infrastructure and operations costs by building an archive for 20% of their oldest data. This archival pattern applies to all databases - including SQL and NoSQL. Not only is this good for our budgets but also for our sustainability effort as this will reduce your application’s carbon footprint.

Performance Improvement - By removing fewer valuable data, the high value data is more accessible. Queries will run faster if searching through less information. This will also improve the effectiveness of other performance enhancements such as indexes and views. By lowering the amount of data retained, your database will also become more portable making it easier to migrate or replicate the data.

Reduces impact of data compromises - By storing data in an alternate location you are improving the data's resiliency. Therefore, fortifying the integrity and confidentiality of the data. For example, if your database was compromised by ransomware, without an archive you would lose a longer history of data as opposed to a much smaller portion if you were archiving.

Ease of automatic data deletion - By moving data to an archival database like Cloud Object Storage, automatic data deletion can be easily implemented. Archival set up usually inherits good data retention policies as well.

In conclusion, right-sizing your database is not a one-time task, but an ongoing commitment to efficiency, sustainability, and performance. By regularly cleaning out irrelevant data and archiving historic information, databases will only hold data that is highly valuable and can be better managed.

By Ellie Najewicz