I have a simple table, it has a date column and a column with values that reflect the total disk space occupied by a database. Given that I have values for all preceding dates, how can I predict future values using SQL or PL/SQL , i.e. trend prediction.
Assume a simple table:
Col1 Col2 JAN 1000 FEB 1300 MAR 1800 APR 2100...and so on. How would I predict the value for any month in the future?
This is probably one of those problems more easily solved with a spreadsheet application such as Microsoft Excel. However, the SQL challenge has been put forth, and it is my obligation to accept. I came up with two different solutions: one for linear growth, and one for exponential growth. However, their implementations are very similar. Which solution you use depends on the type of thing you are measuring. For example, linear growth would be evident in your car's odometer, where population growth tends to be exponential.
The solutions involves calculating an addend for linear growth and a multiplier for exponential growth. They also use a helper table to assist in extending out the forecast. Let's start by creating our helper table. Its utility will become evident shortly.
create table Cardinals ( digit numeric(1) primary key check (digit >= 0) ); insert into Cardinals values (0); insert into Cardinals values (1); insert into Cardinals values (2); insert into Cardinals values (3); insert into Cardinals values (4); insert into Cardinals values (5); insert into Cardinals values (6); insert into Cardinals values (7); insert into Cardinals values (8); insert into Cardinals values (9);This table simply holds the numbers 0 through 9. By cross joining the Cardinals table to itself and using each instance of the table for a different order of magnitude, we can produce a view that returns numbers 0 through 99. We will use this view as the basis for our forecast consisting of one hundred time periods.
create view TwoDigitCardinals as select ( Tens.Digit * 10 ) + Ones.Digit Cardinal from Cardinals Ones cross join Cardinals Tens;(Please note, if your database doesn't support the SQL-92 CROSS JOIN syntax, you should be able to accomplish the same by simply separating the two instances of the table by a comma with no WHERE clause.) Incidentally, by cross joining even more instances of the Cardinals table, we can create queries that return even larger series of numbers. Now, let's create a table to store our history. It's pretty generic, with time periods and magnitude represented by simple integers. Without too much difficulty, we could modify this example to use dates to solve the original request. For now, I'll try to make my example as simple as possible for the sakes of both understanding and brevity. Here's the History table:
create table History ( TimePeriod smallint primary key, Magnitude integer ); insert into History values (1, 1000); insert into History values (2, 1300); insert into History values (3, 1800); insert into History values (4, 2100);For linear growth, I decided to take the average of the increases from one period to the next as the basis for future growth. The query could be easily modified to consider only the last five or ten time periods. Here it is:
select avg(H.Magnitude - PrevH.Magnitude) Addend from History H inner join History PrevH on H.TimePeriod-1 = PrevH.TimePeriod; ADDEND ------ 366Mathematically, my formula for linear growth looks like f(x) = a*x + b, where x is the time period extending into the future, and f(x) is the projected magnitude.
In order to make the final SQL a bit easier to understand, I'll illustrate it first, with pseudo-SQL using some simplified tokens encased in greater-than and less-than symbols:
select Cardinal + 1 + <Last Recorded TimePeriod>, <Magnitude of the Last Recorded TimePeriod> + ( <Addend> * ( Cardinal + 1 ) ) from TwoDigitCardinals order by Cardinal;
This answer is continued.
This was first published in March 2002