Rawbbit managed game analytics pipeline

How Rawbbit Turns Game Events into Managed Analytics Infrastructure

Game analytics usually shows up as dashboards — funnels, retention charts, revenue reports, cohort tables. Those matter, but they are not the foundation. Before any dashboard can answer a useful question, a studio needs a reliable pipeline underneath it: something that receives events, survives traffic spikes, stores the raw history, makes it fast to query, models it into useful tables, and only then shows it to the team.

That pipeline is what Rawbbit manages.

Rawbbit hosts, deploys, and maintains your game data pipeline, ClickHouse database, data models, and dashboards — so your studio can understand player behavior without hiring data engineers.

This article follows one game event from the moment a player does something inside the game to the moment your team sees that behavior in a dashboard — and explains why many studios are better off not building this stack from scratch.

Collect
get events in safely
  • Game event
  • Collector API
  • NATS JetStream
Store
keep the raw history you own
  • Raw Writer
  • Parquet files
  • Object storage
Source of truth
Serve
make the data fast and useful
  • ClickHouse
  • Data models
  • Dashboards

The whole pipeline in three stages

Most game analytics stacks are easier to understand if you stop thinking about charts first and follow the movement of data instead. Everything Rawbbit does fits into three stages.

Collect — events arrive from your game and are received and buffered safely. Store — they land as raw Parquet files you own, the durable record of what happened. Serve — that raw history is made fast to query, modeled into useful tables, and shown in dashboards.

The order matters, and so does the boundary in the middle:

Raw events stay as the source of truth. Everything downstream can be rebuilt from them.

Keep that idea in mind as we walk through each stage. The raw layer is the thing you never want to lose; query engines, models, and dashboards can all change around it.


One event, many future questions

Imagine a player opens your game and starts level 5. The event looks small:

{
  "event_name": "level_started",
  "user_id": "player_123",
  "level": 5,
  "platform": "ios",
  "app_version": "1.4.2",
  "timestamp": "2026-06-23T10:15:00Z"
}

On its own it is one row. But stored properly, that single event becomes the raw material for questions you have not even thought of yet: How many players reached level 5? How many failed before completing it? Did the new tutorial improve level 5 completion for organic users? Do Android and iOS players behave differently here? Did players from one campaign drop at this point more often? Did level 5 difficulty affect day-one retention?

If the raw event was stored properly, all of those stay answerable later. If it disappeared into a closed dashboard that only kept a daily count, the team is stuck with whatever the vendor decided to aggregate.

Store the raw facts first. Build metrics, models, and dashboards on top of them afterward.

That principle is the whole reason the pipeline is shaped the way it is. Now let's follow the event through it.


Stage 1 — Collect: get events in safely

The event enters through the Collector API

The first component is the Collector API. Your game client, backend, or browser build sends events to Rawbbit over a simple HTTP endpoint, authenticated with a per-project API key.

Game / Backend  --HTTP-->  Collector API

The collector's job is deliberately narrow. It does not calculate retention, build funnels, or run heavy queries. It accepts event batches, checks that required fields exist, validates the basic structure, adds technical metadata, rejects clearly invalid requests, and passes the valid events forward. That is all.

That simplicity is the point. In a game, events arrive unevenly — one hour is quiet, the next includes a content update, a push notification, a streamer spike, or a paid user-acquisition test. The ingestion layer has to stay boring, fast, and reliable exactly when everything else is loud.

NATS JetStream absorbs the spike

After the collector accepts an event, Rawbbit pushes it into NATS JetStream, the queue layer. Without a queue, the system would be fragile — if storage slows down, the API slows down with it, and events can be lost at the worst possible moment. With a queue in between, ingestion and storage are decoupled:

Game  ->  Collector API  ->  NATS JetStream  ->  Raw Writer  ->  storage

The queue gives the pipeline room to breathe, with at-least-once delivery so events are not silently dropped under pressure. That matters for games specifically, because the messiest moments are usually the ones you most want clean data for: launch day, soft launch, live events, economy tests, ad campaigns, weekend peaks, major updates, regional rollouts, streamer-driven spikes. A queue is what keeps those moments from becoming gaps in your history.


Stage 2 — Store: keep the raw history you own

The Raw Writer lands events as Parquet

The Raw Writer reads from the queue and writes events into Parquet files, partitioned by project and time:

/events/project_id=my_game/date=2026-06-23/hour=10/part-0001.parquet
/events/project_id=my_game/date=2026-06-23/hour=11/part-0002.parquet

This is where Rawbbit's philosophy becomes concrete. Raw events are not treated as a temporary export — they are the durable record of what happened. Parquet suits analytical data well because it is compact, column-oriented, and readable by almost every query engine, but the format is only half the point. The bigger idea:

Your raw event history should not be trapped inside a dashboard vendor.

Object storage keeps the raw layer portable

Those Parquet files live in object storage — cloud object storage, or an S3-compatible backend like SeaweedFS, depending on the deployment. The raw layer stays exportable and reusable, which is what gives the studio freedom later. The same raw history can feed ClickHouse for fast analytics, BigQuery for a cloud-native warehouse workflow, DuckDB for local inspection, or new data models if your metric definitions change.

The raw layer becomes the contract. Everything downstream is allowed to evolve.

Built on open-source infrastructure to reduce vendor lock-in. Your raw events stay exportable, and your analytics stack stays transparent.

That does not mean every studio wants to operate this themselves. It means the architecture keeps your history in the open instead of hiding it inside one closed product — and Rawbbit can manage the infrastructure while keeping that foundation clear.

If you want the architectural argument in more detail, read Why the Raw Layer Should Be Your Analytics Contract.


Stage 3 — Serve: make the data fast and useful

ClickHouse makes events fast to query

Raw Parquet is the source of truth, but teams also need fast analytics. That is where ClickHouse fits, as the main query layer for day-to-day game analytics. A scheduled CRON job loads the raw Parquet into ClickHouse, where it lands in a single wide events table — analytics.events, a One Big Table (OBT) model that keeps querying simple:

Parquet in object storage  --CRON-->  ClickHouse (analytics.events)  ->  SQL / dashboards

ClickHouse is well suited to event data, large tables, and the repeated aggregations dashboards need to stay responsive. It is what makes questions like these quick to answer: D1, D7, and D30 retention by cohort; which levels create the most retries; which countries monetize better after onboarding; which app versions changed ad engagement; which players saw an offer, skipped it, and bought later; what behavior tends to happen right before churn.

The raw files stay the durable record; ClickHouse is the fast serving layer on top of them. If the ClickHouse layer ever needs to be rebuilt, resized, or moved, the studio reloads it from the raw Parquet history. ClickHouse is the database optimized for the current workload — not the permanent owner of your data.

If you want the deeper ClickHouse rationale, read ClickHouse as a Query Layer for Raw Game Events.

Studios already on Google Cloud can run the same pattern through BigQuery instead: the raw Parquet is exposed as a BigQuery external table, modeled into a base_dataquery__events table, with SQLMesh handling the transformations on a schedule triggered by Cloud Scheduler. ClickHouse is Rawbbit's primary, open-source-aligned default, but the point is not "ClickHouse versus BigQuery forever" —

Both can sit downstream from the same raw event record.

Data models turn raw events into a shared language

Raw events are facts, but teams do not want to work with raw facts all day. A producer should not have to hand-join event streams every time they need level performance; a product manager should not rebuild retention logic from scratch; a founder should not have three dashboards showing three different definitions of revenue.

So Rawbbit shapes raw events like session_started, level_started, level_completed, ad_watched, and purchase_completed into reusable analytical tables — players, sessions, payments, daily revenue, retention cohorts, tutorial funnels, level progression, ad engagement. Instead of everyone writing complex SQL against raw events, the studio works with tables that are easier to trust, audit, and visualize.

Raw events are the memory. Data models are the language your studio uses to understand that memory.

Dashboards show behavior, but do not own it

Once the data is modeled, dashboards make it accessible to people who do not want to write SQL every day. This is where Metabase, an open-source BI tool, fits — DAU and ARPU, retention by cohort, tutorial completion, level difficulty, ad engagement, purchase conversion, live-event performance, country and platform breakdowns.

But the dashboard is only the top layer. It should never be the system of record; it reads from a reliable foundation underneath it. That is the difference between a dashboard-first setup and an infrastructure-first one — and Rawbbit is built around the second.


Ask your data in plain language: the MCP and agents layer

There is one more layer that makes the open foundation pay off in a way closed analytics tools can't easily match.

Because your modeled data lives in a real database you control, Rawbbit can expose it through an MCP server — a standard interface that lets AI agents query your analytics directly. Tools in the agents layer, like Opencode and OpenClaw, can connect to that MCP server and answer questions in plain language by writing and running SQL against ClickHouse for you.

In practice, that means someone on the team can ask "what's D7 retention for players who finished the tutorial last week, split by platform?" and an agent can translate it into a query against the analytics.events table, run it, and return the answer — without that person hand-writing the SQL. The agent is working against your own modeled events, in your own database, so the answers come from the same source of truth as your dashboards.

This only works because of everything upstream: the data is in an open format, in a real query engine you control, modeled into stable tables. A closed dashboard product can't hand an external agent that kind of access. An owned, open stack can.


Deployment: how the stack gets there and stays current

All of this runs as a set of Docker services. The images are built from a docker-compose definition, pushed to a container registry (Artifact Registry), and deployed onto a VM — the collector, NATS, and raw writer run as containers, with the storage, OLAP, and BI layers alongside.

For a self-hosting team, that is a stack to stand up and maintain. Under Rawbbit's managed model, it is deployed and kept up to date for you: the same architecture, in infrastructure you own, without your engineers babysitting it.


Build it yourself, or use Rawbbit managed

A technical studio can absolutely build this internally. The components are not mysterious — an HTTP collector, a message queue, a writer service, Parquet partitioning, object storage, ClickHouse, data models, dashboards, plus the monitoring, backups, deployment, and ongoing maintenance that hold it together.

The real question is not whether it can be built. It is whether your studio should spend its own engineering time building and operating it.

Build internally

What your team handles: the collector, queue, storage, ClickHouse, models, dashboards, monitoring, and backups — all of it.

Best for: maximum internal control.

Tradeoff: requires data engineering and ongoing operations.

Dashboard-first analytics tool

What your team handles: mostly configuring events and using the vendor's UI.

Best for: a fast start with a familiar product-analytics workflow.

Tradeoff: limited raw-event ownership and little control over the underlying infrastructure.

Rawbbit is for teams that want a serious data foundation but do not want to turn their game engineers into infrastructure maintainers.


Why this matters as a game grows

The pattern is common. Early on, a studio needs only basic analytics — installs, DAU, revenue, retention, maybe a simple funnel — and a hosted dashboard tool is enough.

Then the game grows, and the questions get specific. Did the new tutorial improve retention for organic users only? Which levels cause churn for players who came from campaign A? Did the economy change raise revenue but hurt long-term retention? Which players saw the starter pack twice before buying? What happened right before players stopped joining the live event? Did version 1.4.2 behave differently in Brazil and Turkey?

At that point analytics stops being dashboard work and becomes data-infrastructure work. You need raw events, SQL access, reliable models, and dashboards built around your game rather than a generic product template. That is the moment Rawbbit is designed for.


Who it's for, and who it isn't

Rawbbit fits studios that have outgrown basic analytics but don't want to hire a data team yet — teams that want control over raw game events, ClickHouse-powered analytics, custom SQL questions, game-specific models, dashboards built around their own event logic, a transparent open-source-based stack, and managed setup instead of internal data engineering.

It is not the right fit for everyone. If your game only needs a simple plug-and-play analytics UI, a hosted product-analytics tool may be easier. If your project is still validating core gameplay and has little event volume, it may be too early. And if no one on the team needs raw events, SQL access, custom models, or long-term data ownership, Rawbbit is more infrastructure than you need right now. That's fine — it isn't trying to be every analytics tool. It's built for studios that need a managed data foundation.


The main idea

A serious game analytics stack is not just a dashboard. It is a pipeline: collect events, buffer them safely, store the raw history, query it efficiently, model it into useful tables, show it in dashboards, and keep the whole thing maintainable. Rawbbit turns that pipeline into managed infrastructure — the game data pipeline, ClickHouse database, data models, and dashboards your studio needs to understand player behavior, without hiring data engineers first.

Game analytics infrastructure, managed for you.

Want Rawbbit to manage your game analytics stack?

If your studio needs raw event ownership, ClickHouse analytics, data models, and dashboards without building the pipeline internally, Rawbbit can set it up and maintain it for you.

Book a call or read more about how Rawbbit works.