Improving performance of queries in reports with date filling2013-02-03 13:16:13 0 comments
I want's much surprised that stackoveflow once again turned out to be very helpful. Someone had a very similar problem with grouping a set of record by an hour and showing hours that don't have corresponding records in event table. Beside hours I needed also the rest of date so I couldn't just copy this query, but the most important was an idea of right join with set of records that would represent a time axis. So now needed a table with dates with period of one hour. A useful procedure that can generate records with dates I found in a article Using MySQL to generate daily sales reports with filled gaps. I only had to change the period from one day to one hour. Here is my code:
BEGIN DECLARE crt_date DATETIME; SET crt_date=start_date; WHILE crt_date < end_date DO INSERT INTO calendar VALUES(crt_date); SET crt_date = ADDTIME(crt_date, '01:00:00'); END WHILE; END
Finally I had all ingredients. I wrote a first query and I waited and waited and waited... It was so slow that there was now chance to use it.
SELECT date, saledate, COUNT( * ) as cnt FROM `sales` RIGHT JOIN ( SELECT * FROM calendar LIMIT 300 ) AS AllHours ON HOUR(saldate) = HOUR(date) AND DAY(saldate) = DAY(date) AND MONTH(saldate) = MONTH(date) AND YEAR(date) = YEAR(saldate) GROUP BY date ORDER BY 1 ASC
Obviously the problem was in join conditions. 4 comparation at a time each with functions parsing dates to get hours, days, months and years. Query was running for 2,870 sec. At my local machine with test data it was not a problem but if we remove the limit 300 it could run for hours. I decided to split date column in calendar table into 4 columns to separate year, month, day and hour. I updated table to fill new columns:
UPDATE calendar SET year = YEAR(date), month = MONTH(date), day = DAY(date), hour = HOUR(date)
and rewrite my report query:
SELECT date, saledate, COUNT( * ) as cnt FROM `sales` RIGHT JOIN ( SELECT * FROM calendar LIMIT 300 ) AS AllHours ON HOUR(saledate) = hour AND DAY(saledate) = day AND MONTH(saledate) = month AND YEAR(saledate) = year GROUP BY date ORDER BY 1 ASC
I was sure this query will be faster. I had ready to use data from separated columns instead of one date column that need to be parsed every time. Surprisingly parsing data wasn't problem because its execution time was almost identical with a previous one. The difference was very small (2,855 sec. instead of 2,870 sec.) so that definitely wasn't the improvement a was looking for.
I decided to add another column just with data I'm using in join condition and group clause. In my calendar table I added a char column of length 13 so that it fit the full date and hour (without minutes and seconds), I filled it with data:
UPDATE calendar SET subdate = SUBSTR(date, 1, 13)
and added a unique index on that column. Once again I changed my query...
SELECT date, saledate, COUNT( * ) as cnt FROM `sales` RIGHT JOIN ( SELECT * FROM calendar LIMIT 300 ) AS AllHours ON SUBSTR(saledate, 1, 13) = subdate GROUP BY date ORDER BY 1 ASC
... and once again I got a small improvement. Query time was 2,730 sec. so still far from acceptable. What if we add an index to a saledate column? It slowed down query. Really. So the substring was not efficient at all. At this point the only possible solutions was to make an additional column for sales table with substring of saledate column.
UPDATE sales SET salesubdate = SUBSTR(saledate, 1, 13)
So now I had mirror columns in both tables varing one another with a type of index: a key index instead of unique one (there was many sales on every hour so dates in subdate field was repeated multiple times). Final query looked like this:
SELECT date, saledate, COUNT( * ) as cnt FROM `sales` RIGHT JOIN ( SELECT * FROM calendar LIMIT 300 ) AS AllHours ON salesubdate = subdate GROUP BY date ORDER BY 1 ASC
Query was executed in 0,031 sec. So it gave nearly 10 000% performance's improvement. Maybe they are even better solutions, more cleaver, without additional tables or columns but for now this one satisfies me completely.