Interpolation is a highly useful technique for converting data between different time periods.
For example, one data source might be compiled daily, another weekly, another biweekly, another monthly, etc. You might have limited control over how often this data is reported, too, in which case you’ll run into difficulties plotting data points over different periods.
A relatively simple way to tackle this practical issue is interpolation. With interpolation, unknown data is predicted between two or more known values.
There are many examples of interpolation in day-to-day technology, such as image interpolation, which tries to predict the best estimation of a pixel based on neighboring pixel values.
Why interpolate data?
Generally speaking, having daily data is ideal and advantageous compared to weekly or monthly data. Models built with daily data have more definition because they have 7 times more data points than models built with weekly data and approximately 30 times more than models built with monthly data.
While interpolating daily values isn’t as good as possessing daily values in the first place, it still adds detail to a model.
For example, here is an example of daily brand searches. This is the format you’d most likely receive such data in.
The graph here features data across 882 days, or 127 weeks. You can see how this data follows clearly weekly and monthly trends with daily definition, including some big spikes on some days and a general increase throughout much of the year.
This second graph shows that daily data grouped weekly without being interpolated. We can compare the daily data to this graph, but suppose we only ever had weekly data in the first place. What would that look like?
If we scale back up to 882 days from 127 weeks, we get the following graph:
Now, let’s see how much definition we lose by grouping data monthly, which provides just 30 rows from 2 and a half years of data. Building a model with this would be inaccurate unless the data extends back several years, and we’re looking at large-scale macro variations.
Scaling back up to daily from that monthly data produces the following graph. There was an issue at the end of the last month where the average was incorrectly calculated due to lack of definition in the data.
Here’s where it really comes together. The following graph shows actual daily data vs the interpolated versions of both the weekly and monthly data as shown above.
As you can see, using interpolation efficiently replicates daily data. Therefore, interpolation is usually quite effective if you need to resample data from weekly or monthly into daily.
The benefits of interpolating data
The benefits of interpolate data are that it is more accurate than extrapolate data. It is also easier to understand and more reliable.
Extrapolation involves estimating an unknown value based on extending a known sequence of values. Extrapolation is inferring data not explicitly stated by existing information. Conversely, interpolation involves estimating a value within two known values that exist within a sequence.
Interpolated data can be easier to understand and more reliable, as it can be challenging to determine how much weight should be given to each point when using extrapolated data.
The disadvantages of interpolating data
Interpolation is not as effective as possessing daily variables - but it is an effective way of resampling less useful weekly or daily variables down to daily formats.
The disadvantage of interpolating data is that it can lead to overfitting, as the model may over-learn the averages created during the interpolation processes. Interpolation could also lead to a loss of generality, which means that the model will not work well with other data sets.
The importance of interpolating data
Fundamentally, interpolation involves using a model to estimate values between two known data points.
Extrapolation is using a model to estimate values that are outside of the known data points. Both techniques provide a means to predict missing values from existing values.
Interpolation will smooth out the data so that there are no gaps between points. This can be done by using linear, cubic, or quadratic interpolations.
Linear interpolation is the simplest form of interpolation and connects data points with a straight line. This is what we’re doing here.
Quadratic interpolation is a strategy for improving the interpolation estimate by introducing curvature into the line connecting the points. Unlike linear interpolation, which requires two points, quadratic interpolation requires the existence of three points. There are other types of complex interpolation for spatial coordinates.
How you can easily interpolate data
A quick and easy way to interpolate data is by using linear regression. Linear regression is a statistical technique that allows you to find the line of best fit for a data set.
This can be done by plotting the data on a scatter plot, finding the equation for the line of best fit, and then finding the slope and intercepts from those equations. See this guide to linear regression.
The slope is found by dividing one y-value by another, while the intercepts are found as either y-intercept or x-intercept. Extrapolation is also performed using regression equations.
Find out more about interpolating real data with Vexpower’s courses.
Interpolation seeks to predict an unknown value between two or more known values. It’s generally more accurate than extrapolation, which seeks to predict an unknown value outside of two or more known values.
In marketing and marketing mix modeling (MMM), interpolation can be used to fill in missing daily data when weekly or monthly data is present.
Interpolation is simple but effective, even despite being somewhat long-winded to calculate. Therefore, it’s worth adding definition to a model by creating daily data from data across other periods.