Interpolate Data

Interpolation is useful for transforming data between time periods

Interpolation is a highly useful technique for converting data between different time periods. 

For example, one data source might be compiled daily, another weekly, another biweekly, another monthly, etc. You might have limited control over how often this data is reported, too, in which case you’ll run into difficulties plotting data points over different periods. 

A relatively simple way to tackle this practical issue is interpolation. With interpolation, unknown data is predicted between two or more known values. 

There are many examples of interpolation in day-to-day technology, such as image interpolation, which tries to predict the best estimation of a pixel based on neighboring pixel values. 

Why interpolate data?

Generally speaking, having daily data is ideal and advantageous compared to weekly or monthly data. Models built with daily data have more definition because they have 7 times more data points than models built with weekly data and approximately 30 times more than models built with monthly data. 

While interpolating daily values isn’t as good as possessing daily values in the first place, it still adds detail to a model. 

For example, here is an example of daily brand searches. This is the format you’d most likely receive such data in. 

The graph here features data across 882 days, or 127 weeks. You can see how this data follows clearly weekly and monthly trends with daily definition, including some big spikes on some days and a general increase throughout much of the year. 

Daily data

This second graph shows that daily data grouped weekly without being interpolated. We can compare the daily data to this graph, but suppose we only ever had weekly data in the first place. What would that look like?

Weekly data

If we scale back up to 882 days from 127 weeks, we get the following graph:

Daily data interpolated from weekly data

Now, let’s see how much definition we lose by grouping data monthly, which provides just 30 rows from 2 and a half years of data. Building a model with this would be inaccurate unless the data extends back several years, and we’re looking at large-scale macro variations. 

Monthly data

Scaling back up to daily from that monthly data produces the following graph. There was an issue at the end of the last month where the average was incorrectly calculated due to lack of definition in the data. 

Daily data interpolated from monthly data

Here’s where it really comes together. The following graph shows actual daily data vs the interpolated versions of both the weekly and monthly data as shown above. 

As you can see, using interpolation efficiently replicates daily data. Therefore, interpolation is usually quite effective if you need to resample data from weekly or monthly into daily.

Daily, weekly and monthly data

The benefits of interpolating data

The benefits of interpolate data are that it is more accurate than extrapolate data. It is also easier to understand and more reliable.

Extrapolation involves estimating an unknown value based on extending a known sequence of values. Extrapolation is inferring data not explicitly stated by existing information. Conversely, interpolation involves estimating a value within two known values that exist within a sequence.

Interpolated data can be easier to understand and more reliable, as it can be challenging to determine how much weight should be given to each point when using extrapolated data.

The disadvantages of interpolating data

Interpolation is not as effective as possessing daily variables - but it is an effective way of resampling less useful weekly or daily variables down to daily formats. 

The disadvantage of interpolating data is that it can lead to overfitting, as the model may over-learn the averages created during the interpolation processes. Interpolation could also lead to a loss of generality, which means that the model will not work well with other data sets. 

The importance of interpolating data

Fundamentally, interpolation involves using a model to estimate values between two known data points. 

Extrapolation is using a model to estimate values that are outside of the known data points. Both techniques provide a means to predict missing values from existing values. 

Interpolation will smooth out the data so that there are no gaps between points. This can be done by using linear, cubic, or quadratic interpolations. 

Linear interpolation is the simplest form of interpolation and connects data points with a straight line. This is what we’re doing here. 

Quadratic interpolation is a strategy for improving the interpolation estimate by introducing curvature into the line connecting the points. Unlike linear interpolation, which requires two points, quadratic interpolation requires the existence of three points. There are other types of complex interpolation for spatial coordinates. 

How you can easily interpolate data

A quick and easy way to interpolate data is by using linear regression. Linear regression is a statistical technique that allows you to find the line of best fit for a data set. 

This can be done by plotting the data on a scatter plot, finding the equation for the line of best fit, and then finding the slope and intercepts from those equations. See this guide to linear regression

The slope is found by dividing one y-value by another, while the intercepts are found as either y-intercept or x-intercept. Extrapolation is also performed using regression equations. 

Find out more about interpolating real data with Vexpower’s courses

Summary: Interpolation

Interpolation seeks to predict an unknown value between two or more known values. It’s generally more accurate than extrapolation, which seeks to predict an unknown value outside of two or more known values.

In marketing and marketing mix modeling (MMM), interpolation can be used to fill in missing daily data when weekly or monthly data is present. 

Interpolation is simple but effective, even despite being somewhat long-winded to calculate. Therefore, it’s worth adding definition to a model by creating daily data from data across other periods.

Relevant Courses

No items found.

Frequently Asked Questions

What is the equation for interpolation?

The formula is y = y1 + ((x - x1) / (x2 - x1)) * (y2 - y1), where x is the known value, y is the unknown value, x1 and y1 are the coordinates that are below the known x value, and x2 and y2 are the coordinates that are above the x value.

How do you interpolate missing data?

Linear interpolation simply means to estimate a missing value by connecting dots in a straight line. In short, It estimates the unknown value in the same order of previous values. Interpolation involves predicting non-present or missing values between known values. In the case of linear interpolation, this involves predicting the variable intercepting the straight line of two variables, thus creating a third variable between the two points. There are more complex forms of interpolation for predicting variables across 2D or 3D plains.

What is linear interpolation?

Linear Interpolation is a technique used to estimate an unknown value by using two known values on either side of the unknown data. It's the simplest form of interpolation that requires only two known values.

What is the difference between interpolation and extrapolation?

Both interpolation and extrapolation predict non-present or missing values from existing values. Extrapolation involves estimating an unknown value based on a known sequence of values or facts. To extrapolate is to infer something outside of existing information. Interpolation is the act of estimating a value within two known values from a sequence. Both may involve linear regression.
Become a Member
$30/m for unlimited access to 70+ courses (plus more every week!).