Tuning Date Range Queries -

SQL Server has been around for many years and does a pretty good job of optimising queries. While there are ways we can help enhance performance, there are also plenty to inadvertently degrade it. A common issue I often encounter occurs in queries across date ranges.

For example, using the StackOverflow 2010 database, imagine we’re tasked with finding the number of posts created in a specific year. To better understand the query’s efficiency, I’ll enable query statistics so we can examine details on data reads and CPU time.

				
					SET STATISTICS IO, TIME ON;

SELECT COUNT(*)
FROM dbo.Posts
WHERE YEAR(CreationDate) = 2009

On the Messages tab, the statistics output can be tricky to read. To make it more manageable, I recommend using StatisticsParser by Richie Rump. This free online tool allows you to paste in your stats text and instantly formats it into a more readable and organised layout.

The query returned 1,288,974 rows and executed in parallel across all 8 CPUs, with a total duration of approximately 16 seconds. It performed 802,760 logical page reads, which translates to over 6 GB of data (at 8 KB per page).

Examining the execution plan, we can see that the query performed a Clustered Index Scan, reading all 3,729,195 rows in the Posts table. This is far from optimal, and there’s definitely room for improvement.

Make life easier with an Index

I’m a big fan of indexes. When used correctly, they can significantly boost performance without requiring any code changes. Let’s add an index on the CreationDate column to speed things up a bit.

				
					CREATE INDEX IX_CreationDate ON dbo.Posts (CreationDate)

Then we’ll run the same SELECT query again.

				
					SELECT COUNT(*)
FROM dbo.Posts
WHERE YEAR(CreationDate) = 2009

In the updated execution plan, we can see that the query utilised our new index for an Index Scan. It still returned 1,288,974 rows as expected, but this time it didn’t require parallelism and only performed 8,336 logical page reads (about 65 MB of data). The query duration dropped significantly to just 445ms – a huge improvement. Yay for indexes!

So we’re done right? Well, not quite.

Despite the improvements, the execution plan shows that the query still read all 3,729,195 rows, even though it used the new index and reduced page reads. So, what’s happening here?

SQL Server always aims to minimise the number of pages it needs to read to retrieve the data. Since our SELECT query only involves the CreationDate column, and our index is based on that column, SQL Server opts for an Index Scan instead of a Clustered Index Scan, as it involves fewer pages. (Remember, a Clustered Index is essentially the table.)

So that’s great, we read less pages and got a quicker query, but why did it still read all the rows in the table? The key giveaway is the fact it did an Index Scan instead of an Index Seek, and the reason we didn’t get a seek, is due to the pesky little YEAR function in our WHERE clause.

Be very careful when using functions in a WHERE clause

When you wrap a function around a WHERE clause predicate, like we did with CreationDate, SQL Server can’t predict the function’s outcome before executing it. As a result, it has to scan every row in the table to evaluate whether the YEAR function returns 2009, which leads to unnecessary scans.

I see this approach often, and when I ask why, the typical response is that the code seems more efficient. I completely understand – I was a developer for a long time and used to do this myself. Developers often run queries in non-production environments (as they should), and when a SELECT query completes in less than a second with an index, it’s easy to be satisfied with the result.

However, in a production environment, where numerous other queries are running simultaneously, full table scans or even index scans should be avoided whenever possible to minimize resource usage.

We can optimise our query by removing the YEAR function and directly specifying the date range for 2009.

				
					SELECT COUNT(*)
FROM dbo.Posts
WHERE CreationDate >= '20090101 00:00:00' AND CreationDate < '20100101 00:00:00'

By utilising this method, we achieve an Index Seek that processes 1,288,974 rows across 2,885 data pages (22MB) in just 157ms. Unlike a scan, the seek operation precisely targets the exact rows we need, making the query highly efficient. As our Posts table grows, this approach will continue to scale effectively.

Summary

We’ve explored how indexes can significantly boost query performance, often providing a quick and effective optimisation. However, to truly maximise SQL Server’s efficiency, it’s crucial to understand its internal workings and write queries that align with its strengths. Being mindful of the impact of functions in WHERE clauses can further enhance performance, helping us achieve even greater optimisation.

Tuning Date Range Queries

Make life easier with an Index

Be very careful when using functions in a WHERE clause

Summary

Like this:

1 thought on “Tuning Date Range Queries”

Leave a ReplyCancel reply

Make life easier with an Index

Be very careful when using functions in a WHERE clause

Summary

Share this:

Like this:

1 thought on “Tuning Date Range Queries”

Leave a ReplyCancel reply

Discover more from