Ground Truth

"Ground truth" is a term borrowed from meteorology. In marketing and data science, the ground truth encapsulates the objective reality behind models, data and predictions.


The term “ground truth” tends to crop up in discussions about machine learning and data science, but it’s a useful concept to apply to the world of marketing. 

Ground truth is originally an aeronautics, military, and aviation term. In its simplest form, the ground truth is information collected through direct observation and measurement, rather than inference. 

ground truth
"Ground truth" is a term borrowed from meteorology

The origins of the term "ground truth" stem from when scientists and engineers had to validate the images taken by satellites, aircraft, and other airborne vehicles. By taking measurements on the ground, this enabled them to validate remote images on a pixel-by-pixel basis to ensure that data captured by sensors above the ground correlated objectively with real features on the ground. 

For example, you can order a satellite to scan a mountain range to reveal its contours and elevations, but without ground truthing that data, it’d be impossible to vary whether or not that remote data is accurate. By collecting empirical data from the ground, sensors and imaging technologies can be calibrated for future accuracy, also lowering the error rate. Ground truth in machine learning follows the same principle; it’s the reality behind the predictions inferred by the model.


There are two errors that ground truthing seeks to reduce to eliminate: 

Errors of commission

An error of commission is when a feature is reported that is actually absent. So, a remote sensor or model might predict the presence of a tree, when in reality, there is no tree. You could also call this a false positive. 

Ground truthing helps reduce false positives. For example, in machine learning, data scientists could evaluate a training set for why it produces false positives in the model (e.g., grassy soil is being wrongly classified as a tree). 

Errors of omission

Errors of omission occur when a model, remote sensor, or other technology (or person) fails to do something it should have done. For example, the sensor may fail to classify trees altogether. 

In machine learning, an error of omission would instead be described as a reducible error. 

Ground truth in marketing 

Ground truthing bridges direct observation with information concluded through the process of inference. 

Inference occurs when data is fed into a model, like a supervised machine learning model, which infers predictions based on what it has learned already. This is broadly analogous to human learning - humans learn to generalize data based on what we know already. The problem is, machines, like humans, may believe they know the answer, when really, they only have a partial understanding (or no understanding at all). 

If we, as humans, are confident we know the answer because we believe our data that the answer is unequivocal, and yet we’re wrong, we might have a problem. 

To put a finer point on it, if you believe you know the answer because you have so much data, but actually, you’re wrong, and you go and invest heavily into your conclusions, you might have a problem. 

This is clearly a problem in marketing, where data is used to extrapolate complex concepts all of the time. 

From segmentation to personalization, marketing relies on inference. It relies on the data being right, or right enough to make things work. While clean, accurate data is king, paying attention to the ground truth can augment your marketing campaigns and ensure that you’re not selling yourself a lie. 

Finding the ground truth in marketing 

Performance-based marketing can masquerade as objective science, but often, it contains a good deal of abstraction and inference. Sure, by analyzing your web traffic and customer data together with your user’s affinity interests data, you can get a pretty solid idea of: 

  1. Who your customers/users are, demographically, geographically, and socially 
  2. What your customers/users do, behaviourally, ethically, and morally 
  3. What your customers are interested in, culturally, socially, activity-wise, etc, etc 

By performing segmentation analysis, you might find that, actually, your eCommerce store users are totally different from what you assumed. Your findings might shape your entire strategy, from your products and branding to your marketing and advertising. 

However, before making hasty (and possibly expensive) decisions, it’s crucial to collect more data to discover whether your inferences square up with the ground truth. 

But how?

Ground truth in marketing mix modeling (MMM)

Ground truth applies to marketing mix modeling (MMM). In MMM, we don't know what the ground truth is - and will never know! The model is just an estimation of what the truth could be, and there's no way to truly verify it in an objective way. 

So, we use different tools, like NRMSE, R2, and MoE, to pick up on how robust the model looks, cross-checking results against predictions.  

One source of frustration for practitioners is that when you select your model on any one or more of these metrics, particularly if you automate that selection (for example, by testing thousands of models), you're often left with garbage!

All models are wrong in some way, and to varying extents - we must choose one that's the most useful. 

This is why marketing mix modeling is still a relatively manual process, in an era where many other analytical tasks are automated. Many tools can automate parts of the job, but there's no substitute for domain expertise. Enriching the model with other data, particularly qualitative data, can add understanding and help marketers optimize strategies in light of the ground truth while negotiating the limitations of their models. 

Qualitative data matters 

First and foremost, collecting qualitative data is extremely important for building a customer view. It’s tough to beat the granular, high-quality data you can discover from customer feedback, interviews, and traditional market research. 

You may have noticed that some brands really badger you for customer feedback, offering prizes and rewards for those who participate in some often-lengthy surveys.

Why? Because they obviously value that data, especially when it comes from long-term customers. 

While qualitative data is far from empirically robust (aside from 1 to 10 scores and other scales you find on feedback forms), it still qualifies as direct observation. Therefore, collecting qualitative data is an excellent way to ground truth your other marketing data. 

  • Create surveys and offer incentives 
  • Target your long-term customers with detailed feedback forms and surveys
  • Conduct traditional market research, focus groups, and interviews 

With the power of data, it’s tempting to forgo qualitative data and believe that your conclusions are unequivocal because you have so much other data. However, collecting data from real people enables you to enrich your conclusions and get under the skin of what your customers feel and want. 

Social listening

Social listening, or social media listening, essentially allows you to ‘eavesdrop’ on public conversations people are having about your brand, or even your competitor’s brands and products. Social media listening provides an insight into real conversations taking place between individuals, and it’s been wielded to impressive effect by some brands. 

ground truth
Social listening

For example, Ocean Spray ‘listened’ to a conversation from TikTok user 420doggface, who was longboarding down a road at sunset listening to Fleetwood Mac’s “Dreams”, while swigging from a bottle of Ocean Spray. In true viral fashion, thousands of others joined in the conversation, and some other major influencers filmed their own videos that featured Ocean Spray, like the original. 

Fast forward, and Ocean Spray delivers 420doggface a red truck full of Ocean Spray products

Obviously, he filmed and posted it online, and millions saw it. The rest is history - Ocean Spray pulled off one of the most famous impromptu marketing stunts of recent memory, and what did it cost them? A truck full of their products. 

Of course, Ocean Spray must’ve picked up on the conversation, and the original didn’t contain any direct handles or hashtags to notify them. So, did they use social media listening? It’s certainly possible. Either way, the point is that they found out and reacted swiftly.

Social listening and ground truthing 

Social media listening provides another avenue to explore the ground truth of people’s thoughts, feelings, actions, and opinions. While this is far from objective science, it’s more detailed and granular than going by affinity categories alone. In other words, social media listening gives marketers another angle to explore their users, and the market as a whole.

Summary: Ground truth

Ground truth has become somewhat of a generic term to define the objective reality behind data. In marketing, the ground truth lies behind the veneer of a model, and it's a similar story in machine learning.

It's worth remembering that models are just that - models. They are not a total representation of objectivity, and 'bake in' subjective interpretations. Subjective interpretations are liable to bias and misjudgement, which practitioners must navigate using both qualitative and quantitative methods.

In terms of a marketing mix model, no statistical test can fully account for the near-limitless factors that lie beneath the data. It's crucial to enrich understandings usually observations taken from the ground, which is what qualitative data does very well.

Frequently Asked Questions

What is ground truth in AI?

Ground truth refers to the actual nature of the problem that is the target of a machine learning model, reflected by the relevant data sets associated with the use case in question. It refers to the objectivity underpinning the model and its function. For example, a model might misclassify a tree as a shrub, an error of concession. The tree is the ground truth, and the model will need to be trained/optimized to lower its error rate.

What is ground truth in classification?

Ground truth is a term used in statistics and machine learning that means checking the results of machine learning for accuracy against the real world. The term is borrowed from meteorology, where "ground truth" refers to information obtained on site. Ground truthing helps reduce errors of omission or concession.

What are some examples of ground truth?

Ground truth applies to meteorology, remote sensing, machine learning, and data science. It can also be applied to marketing and psychology. In all cases, it refers to the objective, empirical truth underpinning information retrieved via inference. The ground truth is collected from direct observation and measurement, rather than prediction, estimate or inference.