GA4 Modeled Data: Are You Looking at Real Data?

28 May 2024

This article dives into Google Analytics 4's data modeling techniques. We'll explore behavioral, conversion, attribution, and predictive models—how they fill data gaps, impact your reports, and why Google's pushing them, with a focus on the growing importance of first-party data.

What is GA4 Data Modeling #

In Google Analytics 4, modeling refers to different types of machine learning techniques used to:

Estimate missing user data (behavioral modeling)
Reattribute conversions (conversion modeling)
Assign conversions credit (DDA attribution modeling)
Predict future user behavior (predictive modeling)

where observed data from user interactions and conversions may be missing or incomplete.

Observed Data vs. Modeled Data #

Traditionally, data in analytics has been categorised into two distinct types: observed data and modeled data.

Observed data is the lifeblood of analytics, representing the actual information captured directly from user interactions.

Modeled data, on the other hand, is not real data, but rather an estimate of what the data would be if it were available.

Google Analytics 4 (GA4) exemplifies this hybrid approach. It collects observed data through website and app activity, but it can also use that data to generate modeled data.

GA4 use a mix of observed data – where possible – and modeled data – where necessary.

This ensures a more complete picture of user behaviour even when direct measurements are not possible.

Why is Google Introducing These Models? #

The Shift Towards Modeled Data

Previously, we lived in a world of fully observable data, facilitated by tools like cookies and device identifiers. However, the measurement landscape has changed dramatically in recent years with the rise of privacy-focused browsers and the decline of cookies.

Regulatory changes: Regulations are impacting how user data can be captured and used.
Technology changes: Increased restrictions are impacting traditional data collection (e.g., third-party cookies and mobile ad identifiers).

These changes pose challenges for accurate data collection. According to an IAS 2021 Industry Pulse Report, 85% of digital media professionals cite cookie loss and accurate measurement as two of their top three challenges, with cross-device attribution being the third most significant concern.

In this new ecosystem, where change is the norm and the availability of identifiers is diminishing over time, the reliance on conversion modeling to provide comprehensive measurement reporting will continue to increase.

GA4 Shift Modeled Data Illustrative Example

Why Google Matters? #

My personal opinion (coming from my cynical perspective):

I believe, the primary reason behind these changes is to ensure that advertisers can see a higher or at least stable ROI from their advertising spend on Google platforms.

Do you remember ATT (App Tracking Transparency) from Apple? A couple of years ago, if you got an iPhone, you would have received a prompt asking to allow tracking.

Meta reported that they lost $10 billion due to this feature. It’s not because it impacted their advertising ability itself, but because it impacted Meta’s ability to measure that effectiveness.

Advertisers need to see a clear link between their spend and the resulting conversions. If they can’t measure this, they’re likely to cut back on their advertising budgets — a scenario that Google, a company that generated $230 billion in advertising revenue last year, wants to avoid at all costs.

Google has learned from Meta’s experience that providing accurate measurement of advertising return on investment (ROI) is crucial. For this reason, Google has released this suite of modeling techniques – conversion, behavioural, data-driven attribution, and predictive modeling. These models prevent advertisers from perceiving a lower ROI or higher cost per acquisition (CPA), despite obstacles to tracking user data.

Note: Google Analytics is shifting from a data tool to a marketing tool, focusing more on driving marketing strategies rather than just analysing data.

The GA4 Models Explained #

Here’s a breakdown of these models and their pros and cons.

Behavioural modeling #

Filling the Gaps

Behavioural modeling predicts the behavior of users who decline analytics cookies based on the behavior of similar users who accept analytics cookies.

According to Jaime Rodera “Behavioral modeling drove a 23% increase in the observable traffic in analytics reporting on European and UK websites.”

Purpose

To fill gaps in user behaviour data across various touchpoints, especially when users have not consented to analytic storage cookies.

Challenges Addressed

This is where we have to introduce the concept of something called consent mode.

What is Consent Mode?

Consent Mode is another Google feature that restricts data collection for users who opt out of analytic storage cookies.

While it limits data collection, it doesn’t completely block it; some anonymized data is still collected from users who opt out. This anonymized data, combined with data from consenting users, is used in behavioural modeling.

By analysing these combined datasets, behavioural modeling can identify patterns and trends, allowing it to predict the missing user behaviour for those who opted out.

Pros:

Maintains reporting continuity: comparison against pre-cookie banner historical data.
Can be opt out: is enabled by default but can be disabled with no effect on historical or future data.

Cons:

High Data Thresholds: Requires substantial data collection (1K daily consented and unconsented users).

Conversion modeling #

Reattributing Conversions

Purpose

Conversion modeling aims to attribute conversions to the “correct” marketing channels, particularly those that appear as “direct” traffic in Google Analytics.

Challenges Addressed

In Google analytics, Direct traffic often signifies traffic without a referrer, which could mean users navigating via bookmarks or typing URLs directly but even clicks from untracked sources (e.g. inbound links without UTM parameters).

This is problematic for advertisers because direct traffic is uninvestable — it lacks actionable data about the user’s journey.

Conversion modeling reassigns these conversions to more specific channels (often where Google ads might play a role 😉).

Note: Conversion modeling will not change the total number of conversions collected by Google Analytics 4, but it will change the channels that those conversions are attributed to.

Example of Impact:

CHANNEL	BEFORE CONVERSION MODELING	AFTER CONVERSION MODELING
Direct	40	20
Paid Search	40	60
Total	80	80

Imagine you spent £1000 on a paid search campaign. You see 40 conversions from the campaign and another 40 from “direct” traffic, giving you a cost per conversion of £25 (£1000 / 40 conversions).

Conversion modeling steps in and reallocates 20 of those “direct” conversions to your campaign. Now, you have 60 conversions from the campaign, reducing your cost per conversion to £16.67 (£1000 / 60 conversions).

Pros:

Helps with short-lived cookies: like Safari and Firefox, where first-party cookies have limited lifespans.
No fingerprinting: this approach avoids techniques known as fingerprinting, which can track and identify specific user devices. Instead, it leverages anonymised and grouped data for modeling, respecting user privacy.

Cons:

Can’t opt out: This feature is enabled by default, and users have no option to disable it.
Data variability: Models update every seven days, leading to potential discrepancies in reported data over time.
Google-centric: This modeling favours Google’s own channels, including YouTube engagements and ad impressions, potentially skewing data in Google’s favour.

Attribution modelling (DDA) #

Assigning Conversion Credit

For more details about this model, refer to my dedicated post.

Attribution modeling, particularly data-driven attribution (DDA), assigns conversion credit across various touchpoints in the customer journey.

Purpose:

Give marketers a more accurate view of how different channels and touchpoints influence conversions, rather than just looking at the last click.

Challenges Addressed:

Traditional last-click attribution models often overlook the contribution of earlier touchpoints. DDA uses machine learning to distribute credit more accurately across all interactions, providing a holistic view of marketing effectiveness.

Pros:

Multi-Channel Credit model: assigns credit for conversions across various touchpoints.
Can be opt out: Users can choose other models.

Cons:

Scope-Dependent application: Only for event-scoped dimensions.
Lack of transparency: DDA is a black box.

Predictive modeling #

Forecasting the Future

Purpose

Predictive modeling forecasts future outcomes based on historical data and trends.

Challenges Addressed:

Beyond understanding past and present conversions, predictive modeling allows advertisers to anticipate future performance, optimising their strategies proactively.

Pros:

Can build audiences from predictions: Useful for targeting ads and personalising user experiences.
Anomalous data alerts: Can set up email alerts for anomalies, even on an hourly basis.

Cons:

Conversion and churn predictions are ecommerce-focused: Limited to ecommerce events, posing challenges for non-ecommerce businesses.
High data thresholds: Requires substantial activity on the website or app to function effectively.

Conclusions #

The Importance of First-Party Data #

In today’s marketing landscape, artificial intelligence (AI) and machine learning (ML) models are playing an increasingly prominent role. Platforms like Google Analytics 4, Meta, and Microsoft are all utilising these advanced technologies to analyse user data, model behaviour, attribute conversions, and optimise ad performance.

However, this heavy reliance on AI/ML raises some important questions about transparency and objectivity.

With so much data processing and modeling happening behind the scenes, it can be difficult to understand what you’re truly looking at.

What is it I’m looking at?

Are these models presenting an accurate reflection of reality, or are they designed to show the platforms’ services in the best light possible?

This lack of clarity is concerning, especially when you consider that major ad platforms aren’t exactly known for their transparency. There’s a risk that the data could be manipulated or skewed to encourage more ad spend on their networks.

Do you trust Google to mark their own homework?

Without first-party data, you’re left blindly trusting that Google, Meta, Microsoft, and others are accurately representing performance metrics and optimising campaigns in an unbiased way. But with your own data warehouse or repository, you can apply AI/ML models yourself or scrutinise the platforms’ methodologies.

Of course, establishing a first-party data strategy is easier said than done. Many brands and businesses still don’t have a centralised data hub or the technical resources to effectively utilise advanced modeling. But you can’t even think about that until you at least have the data.

In this new AI/ML-centric marketing reality, having ownership and control of your customer data is paramount. While the leading ad platforms are already wielding these technologies, the real power comes from developing your own robust, first-party data strategy. Only then can you truly understand what the models are telling you and ensure your marketing efforts are optimised objectively.

Vidhya Srinivasan (Google’s VP of Ads Buying, Analytics and Measurement) put it this way:

“The future is based on first-party data. The future is consented. And the future is modeled.”

Reference and Useful Links #

The Measure Pod #11: Is this the end of real data in analytics?: Link
[GA4] Behavioral modeling for consent mode: Link

GA4 Modeled Data: Are You Looking at Real Data?

What is GA4 Data Modeling #

Observed Data vs. Modeled Data #

Why is Google Introducing These Models? #

Why Google Matters? #

The GA4 Models Explained #

Behavioural modeling #

Conversion modeling #

Attribution modelling (DDA) #

Predictive modeling #

Conclusions #

The Importance of First-Party Data #

Reference and Useful Links #

Leave a Reply Cancel reply