GA4 Active Users: Querying With BigQuery

by Jhon Lennon 41 views

Alright guys, let's dive into the awesome world of Google Analytics 4 (GA4) and how we can leverage the power of BigQuery to extract some seriously insightful data about our active users. If you're just scratching the surface with GA4's standard reports, you're missing out on a goldmine of potential. BigQuery opens the door to custom analysis and a much deeper understanding of user behavior.

Why Use BigQuery for GA4 Active User Analysis?

GA4 is a fantastic evolution from Universal Analytics, offering a more event-driven model and enhanced cross-platform tracking. However, the standard GA4 interface has its limitations. When you need to perform complex queries, combine data from different sources, or retain data for longer periods, BigQuery becomes your best friend.

  • Scalability and Power: BigQuery is designed to handle massive datasets with ease. Forget about sampling issues that can plague GA4's standard reports when dealing with large volumes of data. BigQuery crunches the numbers without breaking a sweat, ensuring you get accurate results.
  • Customization and Flexibility: With BigQuery, you're not restricted to pre-defined reports. You can write custom SQL queries to slice and dice your data exactly the way you want. Want to segment active users based on specific behaviors, demographics, or acquisition channels? No problem! BigQuery gives you the freedom to explore your data in infinite ways.
  • Data Retention and Integration: GA4 has data retention limits, which can be a pain if you need to analyze historical trends over extended periods. BigQuery allows you to store your GA4 data for as long as you need, giving you a complete historical view. Plus, you can easily integrate your GA4 data with other data sources in BigQuery, such as CRM data, advertising data, or even data from other marketing platforms. This opens up exciting possibilities for cross-channel analysis and a holistic understanding of your customer journey.
  • Advanced Analysis and Machine Learning: BigQuery isn't just for querying data; it's also a powerful platform for advanced analysis and machine learning. You can use BigQuery ML to build predictive models, identify user segments with high churn risk, or personalize user experiences based on their behavior. The possibilities are truly endless.

Understanding Active Users in GA4

Before we jump into the BigQuery queries, let's clarify what we mean by "active users" in the context of GA4. GA4 defines active users based on engagement. An active user is someone who has engaged with your website or app in a meaningful way. This typically includes users who:

  • Launch your app or visit your website.
  • View a page or screen.
  • Trigger an event. GA4 tracks several types of active users, including:
  • Total Users: The total number of unique users who have engaged with your property.
  • Active Users (28-day, 7-day, 1-day): The number of users who have been active within the last 28 days, 7 days, or 1 day, respectively. These metrics provide a snapshot of your user base over different time horizons.

Setting Up the BigQuery Connection

Okay, so you're sold on the power of BigQuery. Now, how do you actually connect your GA4 data to BigQuery? Here’s a breakdown:

  1. Link GA4 to BigQuery:
    • In GA4, go to Admin (the gear icon at the bottom left).
    • Under Property, click on BigQuery Links.
    • Click Choose a BigQuery project and select the Google Cloud project you want to link to. If you don't have a project yet, you'll need to create one in the Google Cloud Console.
    • Configure the data stream. You can choose to export data for all web streams or specific ones. It's generally a good idea to export data for all streams to have a comprehensive view.
    • Enable daily exports and, optionally, streaming exports. Daily exports provide a batch of data once per day, while streaming exports provide data in near real-time. Streaming exports are great for up-to-the-minute analysis, but they can incur additional costs.
    • Review your settings and click Submit.
  2. Verify the Connection:
    • Once the link is established, GA4 will start exporting data to your BigQuery project. It may take up to 24 hours for the first export to complete.
    • In the BigQuery console, navigate to your project and you should see a dataset named analytics_xxxxxxxx (where xxxxxxxx is your GA4 property ID).
    • Inside the dataset, you'll find tables named events_YYYYMMDD (where YYYYMMDD is the date of the export). Each table contains the raw event data for that day.

Querying Active Users in BigQuery: Practical Examples

Alright, let's get our hands dirty with some SQL queries. Here are a few examples to get you started:

1. Daily Active Users (DAU)

This query calculates the number of daily active users (DAU) for a specific date range.

SELECT
  FORMAT_DATE('%Y-%m-%d', event_date) AS date,
  COUNT(DISTINCT user_pseudo_id) AS daily_active_users
FROM
  `your_project.analytics_your_ga4_property_id.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20230101' AND '20230131' -- Replace with your desired date range
GROUP BY
  date
ORDER BY
  date

Explanation:

  • FORMAT_DATE('%Y-%m-%d', event_date): This formats the event_date column into a human-readable date format.
  • COUNT(DISTINCT user_pseudo_id): This counts the number of unique user IDs for each day.
  • your_project.analytics_your_ga4_property_id.events_*: Replace your_project with your Google Cloud project ID and your_ga4_property_id with your GA4 property ID. The events_* wildcard matches all tables with the events_ prefix.
  • _TABLE_SUFFIX BETWEEN '20230101' AND '20230131': This filters the data to include only tables within the specified date range. Remember to replace the dates with your desired range.
  • GROUP BY date: This groups the results by date, so you get a DAU count for each day.
  • ORDER BY date: This orders the results by date.

2. Monthly Active Users (MAU)

This query calculates the number of monthly active users (MAU) for a specific date range.

SELECT
  FORMAT_DATE('%Y-%m', event_date) AS month,
  COUNT(DISTINCT user_pseudo_id) AS monthly_active_users
FROM
  `your_project.analytics_your_ga4_property_id.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20230101' AND '20230331' -- Replace with your desired date range
GROUP BY
  month
ORDER BY
  month

Explanation:

  • The query is very similar to the DAU query, but it groups the data by month instead of day.
  • FORMAT_DATE('%Y-%m', event_date): This formats the event_date column to show only the year and month.

3. Active Users by Country

This query shows the number of active users by country for a specific date range.

SELECT
  geo.country AS country,
  COUNT(DISTINCT user_pseudo_id) AS active_users
FROM
  `your_project.analytics_your_ga4_property_id.events_*`,
  UNNEST(event_params) AS params
WHERE
  _TABLE_SUFFIX BETWEEN '20230101' AND '20230131' -- Replace with your desired date range
  AND params.key = 'geo'
  AND geo.country IS NOT NULL
GROUP BY
  country
ORDER BY
  active_users DESC

Explanation:

  • geo.country: This accesses the country information from the geo event parameter.
  • UNNEST(event_params) AS params: This flattens the event_params array into individual rows, making it easier to access the event parameters.
  • params.key = 'geo': This filters the event parameters to include only the geo parameter.
  • geo.country IS NOT NULL: This excludes rows where the country is not available.
  • ORDER BY active_users DESC: This orders the results by the number of active users in descending order, so you see the countries with the most active users first.

4. Active Users by Device Category

This query shows the number of active users by device category (e.g., mobile, desktop, tablet) for a specific date range.

SELECT
  device.category AS device_category,
  COUNT(DISTINCT user_pseudo_id) AS active_users
FROM
  `your_project.analytics_your_ga4_property_id.events_*`,
  UNNEST(event_params) AS params
WHERE
  _TABLE_SUFFIX BETWEEN '20230101' AND '20230131' -- Replace with your desired date range
  AND params.key = 'device'
  AND device.category IS NOT NULL
GROUP BY
  device_category
ORDER BY
  active_users DESC

Explanation:

  • This query is similar to the