In database terminology, a view is a named query that typically aggregates data from multiple tables. When using views, it is important to remember that querying a view will evaluate the query that defines the view. Repeated evaluation of the view – say from within a nested query – may seriously impact or even kill the performance of your application.
One solution to this performance problem is to use a “precomputed view”. Unlike an ordinary view, a precomputed view is stored in a table rather than computed on demand. When data in one of the aggregated tables changes, the update operation also updates the precomputed view table.
A great thing about precomputed views is that they can be implemented fully in SQL. Any code that accesses the database sees a precomputed view as a regular table. Also, if you have an existing regular view, you can change it into a precomputed view without having to modify any code that queries the view.
To explain the precomputed view pattern, let’s look at an example loosely inspired by the reddit social news site. Users submit articles to reddit and the articles receive up and down votes from other users. User’s “karma” is computed as a sum of votes on positively-voted articles submitted by the user.
To store the reddit data, you can store users in one table, articles in another table, and votes (+1 or -1) from users on articles in a third table:
Now, say that we want to find the top 10 users with highest karma. To compute a user’s karma, we need to sum up the scores of all articles submitted by the user. And to compute each article score, we need to aggregate the votes for that article. It should be clear that this query is going to be very expensive, no matter how much you tune and optimize it.
Views could be used to factor the complex query into simpler pieces, but not to decrease its overall cost:
It is easy to find the top 10 users, simply by querying User_View. Unfortunately, if the database contains millions and millions of votes, the query will take a long time to run. The query will have to group all votes by article, group all articles by user, and then pick the top users. Imagine the impact on performance if you wanted to show the top users on every page of your web app!
However, by changing the views into precomputed views, we can make the query for top users cheap:
Let’s walk through the conversion of Article_View to a precomputed view. First, we’ll create a table for the precomputed view:
CREATE TABLE Article_PView( ArticleId int NOT NULL, Title varchar(40) NOT NULL, Url varchar(250) NOT NULL, VoteSum int NOT NULL, )
Second, we’ll need two triggers. This trigger inserts a row into Article_PView whenever a new row is inserted into the Article table:
CREATE TRIGGER TRG_ArticleInsert ON Article AFTER INSERT AS BEGIN SET NOCOUNT ON; INSERT INTO Article_PView (ArticleId, Title, Url, VoteSum) SELECT Id As ArticleId, Title, Url, 0 FROM Inserted END
And this trigger recomputes a row in Article_PView whenever a vote is inserted, updated or deleted:
CREATE TRIGGER TRG_Vote ON Vote AFTER INSERT, UPDATE, DELETE AS BEGIN SET NOCOUNT ON; WITH articleIds(ArticleId) As ( SELECT ArticleId From Inserted UNION SELECT ArticleId From Deleted ) UPDATE Article_PView SET VoteSum = ( SELECT SUM(Vote) FROM Vote WHERE Article_PView.ArticleId = articleIds.ArticleId) FROM articleIds WHERE Article_PView.ArticleId = articleIds.ArticleId END
And finally, we’ll populate the precomputed view with data that is already in the database:
INSERT INTO Article_PView (ArticleId, Title, Url, VoteSum) SELECT Article.Id, Title, Url, ISNULL((SELECT SUM(vote) FROM Vote WHERE ArticleId = Article.Id), 0) FROM Article
Article_PView precomputed view is now ready, and User_PView can be created in a similar fashion.
Remarks
Note that my example assumes articles never get removed or updated. Adding support for that functionality is straightforward: you’ll need to extend the TRG_ArticleInsert trigger to also handle updates and deletes. This will be very similar to what TRG_Vote does, but I left it out from the sample for simplicity.
There are several interesting variations of how Article_PView could be implemented. In the implementation above, the trigger aggregates the votes for an article each time someone votes on it. If you want to avoid this cost, you can change the trigger so that it only adjusts the article score instead of recomputing it. For example, if a vote was updated from -1 to +1, the trigger would add 2 to the score of the article. Avoiding concurrency issues may be tricky with that approach, though.
Another interesting variation is adding the VoteSum column to the Article table instead of creating a separate table. Choose the approach that better fits your table design.
Hope you find this pattern useful!
Tags: SQL
FYI Google Reader murdered your HTML and presented a big blog of black text with no formatting.
I’ve used this technique before with good success. Microsofts SQL Server also has the ability to create “indexed views” which are roughly equivalent and a good option if your view logic is a simple select query.
Hi Igor,
nice approach. Have You considered indexing the Article_View resp. User_View?
As the only thing needed is an aggregate over the votes (ArticleVote resp. Karma) I would suggest creating an index on both views. That way one places the burden of updating the aggregate column totally to the indexing engine.
Additional benefit comes from the fact, that SQL server is able to infer the index on the view, even when the view itself is not refrenced in the FROM clause. So even when querying like
SELECT
Sum(v.Vote) as Sum_Votes, v.ArticleID
FROM Vote v
GROUP BY v.ArticleID
the query analyzer is able to infer the index on the Article_View! So even existing queries benefit from the index.
Martin
Barry & Martin: I didn’t know about the indexed view feature. Very cool! I’ll definitely experiment with it the next time I am solving this kind of a problem, which is bound to be soon.
Barry: Thanks for letting me know about the formatting issue. For a couple minutes, the article was posted badly formatted, and Google Reader must have crawled and cached it at that time.
Also consider SQL server clustered views
Thanks for all the feedback. Love your site.
Would you mind juxtaposing precomputed views with materialized views?
[…] Precomputed view: A cool and useful SQL pattern. LD_AddCustomAttr("AdOpt", "1"); LD_AddCustomAttr("Origin", "other"); […]
Hi, I do think this is an excellent web site. I stumbledupon it I may return yet
again since I book-marked it. Money and freedom is the greatest way to change,
may you be rich and continue to guide others.