Why Most BigQuery Projects Fail (and It's Almost Never BigQuery)
BigQuery projects rarely fail on the tool. They fail on raw GA4 export with no model, SELECT * eating the budget, and missing tests. Four patterns, one counter-design.
The pattern: BigQuery becomes a data graveyard
It almost always plays out the same way. Someone enables the GA4 BigQuery export, the raw data flows, and for two weeks it feels like progress. Finally your own data, finally SQL, finally no more sampling limits.
Then nothing happens.
The events_ tables grow daily. Nobody builds a clean layer on top. Whoever needs a number writes a fresh query by hand, copies it from an old dashboard, or asks the person who last had a similar question. Six months later there are forty half-redundant queries, three definitions of "conversion", and a BigQuery bill nobody can explain.
This isn't a tooling problem. It's a missing plan disguised as a tooling problem.
Raw sources fire straight into the dashboard. No modeling, no tests. The logic sits scattered, the dashboard shakes.
Mistake 1: The export is not a model
The GA4 export is raw. Deliberately raw. One row per event, nested event_params as arrays, user_properties as structs, all denormalized. Google ships it that way to be maximally flexible, not because you're meant to report on it directly.
Many teams skip exactly this step. They report straight off the raw table. Every dashboard, every ad-hoc query then has to unpack event_params itself, reconstruct the session itself, decide for itself what counts as a purchase. That logic ends up scattered across thirty queries instead of living in one place.
-- A single order out of the raw events, everything has to be unpacked
SELECT
event_date,
(SELECT value.string_value FROM UNNEST(event_params)
WHERE key = 'transaction_id') AS transaction_id,
(SELECT value.int_value FROM UNNEST(event_params)
WHERE key = 'ga_session_id') AS session_id,
ecommerce.purchase_revenue AS revenue
FROM `project.analytics_123456789.events_*`
WHERE event_name = 'purchase';
That unpacking logic belongs in the model exactly once, not in every dashboard.
The same goes for metrics. COUNT(DISTINCT user_pseudo_id) is not the GA4 user count, and you count sessions via ga_session_id per user, not via session_start. Skip writing that into the model and you get numbers that contradict the GA4 interface, and you lose the argument in the meeting.
The modeling layer isn't optional. It's the point where events become business logic:
stg_eventscleans and types the raw dataint_sessionsreconstructs sessions consistentlyfct_ordersis the one truth about ordersdim_*keeps channels, campaigns and products cleanly separated
That's exactly what dbt does, which is why it's our default layer between BigQuery and BI. Not for fashion. Because the alternative is that every definition exists forty times and forty times slightly differently.
Mistake 2: SELECT * eats the budget
In on-demand mode BigQuery bills by bytes scanned, around $6 per terabyte. That sounds harmless until someone writes SELECT * over a year of GA4 events and the query has no idea that only three columns and one month are needed.
An example from a setup I took over in 2024: the monthly BigQuery bill was higher than the GA4 360 licence the team had switched to the free tier to avoid. They'd traded the expensive tool for a supposedly cheap one and then spent just as much. Full stop.
The causes are almost always the same:
- No partitioning on
event_date, so every query scans the entire history - No clustering, so no pruning benefit on filters
- Dashboards hitting the raw table live instead of a slim, materialized model
SELECT *as a reflex, even though nobody needs 300 columns
-- ❌ Full scan: reads the entire event history, every time SELECT * FROM `project.analytics_123456789.events_*`; -- ✅ Smart scan: only the range you need, only the columns you need SELECT event_date, event_name, user_pseudo_id FROM `project.analytics_123456789.events_*` WHERE _TABLE_SUFFIX BETWEEN '20260501' AND '20260531';
Same result, a fraction of the bytes scanned. On GA4's daily-sharded tables, the _TABLE_SUFFIX filter does what partitioning does elsewhere.
Partitioning plus a materialized model often cuts bytes scanned by more than 90%. This isn't an optimization detail for specialists. It's the difference between a bill you sign off and one you have to defend.
The SELECT * budget burner
BigQuery bills on-demand by bytes scanned, about $6.25 per terabyte (as of June 2026). Drag the table size and toggle the query type. Watch in real time what a bad query costs.
Scanned per query
800 GB
Cost per query
$4.88
Storage / month (active)
$16.00
Compute extrapolated / month
$29,296.88
A non-partitioned raw table, queried by LLM agents or generic BI tools, can bankrupt a startup overnight.
Experience in real time what bad queries cost.
Mistake 3: No tests, no trust
Marketing data is less often wrong than unnoticed-wrong. A tag breaks, a parameter gets renamed, a join doubles rows. Without tests nobody catches it until a quarterly report looks odd and the discussion lands on "are these numbers even right?"
In classic software a test is mandatory. In data pipelines it gets treated as a nice-to-have. Yet the tests you need are trivial:
- Is
order_idunique? - Is
revenuenever negative? - Do the daily totals reconcile between raw export and model?
- Does every expected event type still show up, or did something quietly break?
dbt tests catch exactly this, automatically, on every run. The real payoff isn't technical. It's social: once the pipeline is green, the argument about whether the numbers are credible stops and the argument about the decision starts. That's the whole point of the exercise.
Every merge runs against the tests. Clean models pass, broken data (NULL, duplicate IDs) is stopped at the barrier before it reaches production.
Mistake 4: Dashboard-first, model-never
The most expensive mistake is cultural. The expectation is: the dashboard ships in two weeks. So someone quickly builds a dashboard straight on the raw data, it looks good, everyone's happy, and the modeling layer gets done "later".
Later never comes.
The dashboard becomes the source of truth, even though the logic sits inside it, invisible, untested, unversioned. Want to change the definition of "active customer"? You have to hunt it down in every report one by one. This is exactly where the three teams with three numbers for the same revenue come from, not out of bad faith, but because the logic was never pulled into one shared place.
A dashboard is the last layer, not the first. Build it first and you build on sand.
What a working setup does differently
The good setups look boring, and that's a compliment. They have a clear order, and they hold it:
- Raw export lands in its own dataset. Untouched, the source of truth for everything downstream.
- dbt models in layers. Staging cleans, intermediate reconstructs, marts deliver business logic. Every model has a name, a test and an owner.
- Definitions live once. "Conversion", "active customer", "attributed revenue", each defined exactly once, in the model, not in the dashboard.
- Partitioned and clustered. Reporting runs on slim marts, not on the raw table.
- BI is the display, not the logic. Power BI or Data (Looker) Studio shows what the model computed. Nothing more.
The same architecture pays off twice the moment AI enters the picture. A language model that's supposed to reason over your marketing data is only as good as the models beneath it. Clean, tested, named tables are exactly what an LLM needs to avoid hallucinating. The messy raw table isn't. Data quality isn't preparation for AI. It's the precondition.
BigQuery is an excellent warehouse. Fast, cheap in the EU region europe-west3, tightly wired to GA4. In almost every failed project we've taken over, BigQuery was the part that worked. What failed was the layer nobody built.
Warehouse in place but the numbers don't hold? An Audit Sprint reviews modeling, cost and tests, and delivers a prioritized plan within two weeks on what to fix first.
Need help with your setup?
Audit Sprint in two weeks, prioritised report, concrete action steps.
Request an audit →-
Is it really never BigQuery itself?
Rarely. BigQuery scales reliably and is cheap in on-demand mode, as long as queries are properly partitioned. The problems sit one layer up, in modeling, cost control and tests. You only hit real tool limits with very specific requirements, such as hard real-time SLAs.
-
Do I need dbt, or are views enough?
For a small setup a few maintained views are fine. As soon as more than one person changes definitions, or you need tests and versioning, dbt is the standard. It forces the logic into one place and makes it testable and traceable.
-
How do I keep BigQuery costs under control?
Partition on `event_date`, cluster on the most common filter columns, report off materialized marts rather than the raw table, and avoid `SELECT *`. These four steps usually cut bytes scanned drastically. Custom cost controls per project add a safety net on top.
-
Is the GA4 BigQuery export worth it for mid-size companies?
Yes, the free-tier export covers most mid-market setups. But the value only shows up with the modeling layer on top. Without it, you just move the sampling problem into a more expensive, messier format.
-
What does data quality have to do with AI readiness?
Everything. An LLM or an attribution model is only as reliable as the data beneath it. Tested, cleanly named models reduce hallucinations and misreads. Most AI projects fail not on the model, but on the data it reaches into.
-
Why do my BigQuery numbers differ from the GA4 interface?
Because the GA4 interface shows modeled and partly estimated values, while the export holds raw events. Counting users via `COUNT(DISTINCT user_pseudo_id)` ignores Consent Mode modeling and cross-device identity, and sessions need `ga_session_id` per user, not `session_start`. Small gaps are normal. Large ones point to a modeling error, not a bug in BigQuery.