DataSphereChronicle

To View or Not to View

Aug 12, 2023

Physical data models are not the most intuitive and retain the information in the exact format for easy user consumption. Over time relational databases have developed a concept called views.

Another Way to Cut the Pie

Views are essentially created by writing a query to combine physical tables together in a virtual table. Then this virtual table can be easily queried by an application or ETL process. All views are to support read-only operations, any updates cannot be done on a view and will have to update the source tables themselves.

Views are commonly used to abstract complex data structures as well as hide personal information. This way tables can be presented in a more logical format that developers will easily understand. An additional, common use case is for presenting aggregated or summarized data. Since it is a common principle to avoid storing calculated values, if you often need to display or access summations, averages, percent changes, etc. this can be tedious to build into the read query from the application. Therefore, we see a common use case for views to present these aggregate views to isolate the query complexity.

There are many benefits to using views - however everything has a cost and in the case of views the cost is performance. It is important to understand the considerations when implementing views to ensure they are applied correctly to your database.

Why do we need Views?

Query Simplicity - By having views display filtered data by default - the queries to read from the views should be very simple. The query complexity is removed from the API or service integration and instead stored in the database. This is beneficial for development velocity while the view structure is not prone to changes. If the view structure is expected to change, then ensure you have a plan to manage your dependency on your DBA or whoever is managing your database views.

Access Control - Access control can partially be managed through views. For example, you can hide PI, SPI, or other sensitive information from the view, so services can freely read the views and not the origin tables.

Note: Although this is a common use case for views, access control should be managed through a stronger Access Control method. Relying on views is a weak access control policy since the access control configuration is not easily documented, monitored, or managed in a view.

Hides Table Complexity - Physical tables can be complex and include auto-generated keys, ids, timestamps, and other data that might not make sense to someone reading the data. You can abstract a lot of this complexity in views by filtering out data that is not needed.

Database Technology Agnostic - Views can be applied on any database. Across SQL and even NoSQL databases we can apply views.

What Could Go Wrong

With views helping so much, why don’t we make views for all reads? Views of course have drawbacks. Consider the following before building one:

Performance - The biggest challenge views have is around performance. Since views are generated when they are queried - imagine if a view is read 100’s of times in an hour, it is the same performance impact of direct reads. If the source table for the view is part of many other views or being read directly, this could lead to conflicts and table locks. View creation may be worth the impact if the view structure is a complex query with large amounts of data transformation. However, for basic queries a view may not be worth the performance impact and you would be better off directly querying tables. A key method to counteracting performance issues is creating indexes on the source tables to tune them for the view.

It is also important to consider the impact views have on the infrastructure. Because views are virtual tables and never persisted to disk, they will consume lots of RAM. Be aware as you implement views that the storage cost that is removed from disk is moved to your RAM utilization.

Update Restrictions - While views are used for read operations, no writes or updates can be made to the views. Any writes must go directly to the source tables. If you have a set of tables that are taking on a heavy amount of reads and writes, then it may be too complex to use views for reads and direct queries for writes.

Disjointed documentation and change control - Views need to be managed with care as one of the large challenges is that the SQL query used for the view is not visible to the development team or managed with the rest of the code base. Views are essentially stored procedures in a database and are hidden code. While views can reduce complexity of a direct read, that complexity does not go away, but instead resides in the database and maintained by a DBA/Data Engineer. Your team will need to think about maintenance and change control as these database changes are not managed through a traditional dev-ops process.

If a view structure and query is expected to constantly change it may be better to leave that complexity in direct queries where the interactions can be managed through traditional change control with the reset of the codebase.

Final Thoughts?

All in all, views are a common tool in relational databases that can help teams boost their performance and organize their data. Although, views are frequently used as a band aid for poorly designed tables and can further complicate understanding the data. Use views with care and clear documentation on how they work and why they are needed.

By Ellie Najewicz