How Rawbbit Works

A self-hosted event tracking and raw-storage pipeline for product, application, and game analytics. Open source under Apache 2.0, designed for teams that want to keep control of their event data and avoid vendor lock-in.

RawbbitEvent is the ingestion and raw-storage layer of a portable analytics system. In its current public shape, producers send event batches over HTTP, the collector validates and enriches them, NATS JetStream buffers the write path, and a raw writer lands partitioned Parquet files in object storage.

The supported query path builds on that raw layer: Parquet files can be exposed through a BigQuery external table, and the repository also includes a small SQLMesh starter project for downstream modeling. The raw Parquet layer stays the system-of-record boundary.

Producer → Collector API → NATS JetStream → Raw Writer → Parquet in object storage
Rawbbit pipeline architecture: producer sends events via HTTP to a collector-api which buffers through NATS JetStream; a raw-writer service lands partitioned Parquet files in object storage; downstream BigQuery external tables and SQLMesh enable querying.
End-to-end Rawbbit pipeline: HTTP ingestion to raw Parquet to a queryable warehouse layer.

Components

Collector API

HTTP ingestion service. Accepts and validates event batches at POST /v1/events:batch, then enriches the accepted events before publishing them into the stream.

NATS JetStream

Message broker between the collector and the writer. It separates request handling from storage writes and provides buffering and durability between ingestion and raw-file landing.

Raw Writer

JetStream consumer that writes partitioned Parquet files to object storage. The raw Parquet layer is the durable system-of-record boundary for downstream analytics work.

Object Storage

Your choice of Google Cloud Storage or an S3-compatible backend such as SeaweedFS. The documented BigQuery external-table path currently uses GCS.

BigQuery External Tables

Query raw Parquet directly through BigQuery without loading the data into managed tables first. The raw layer stays portable while the warehouse becomes a query surface.

SQLMesh Starter Project

Included downstream modeling layer. It reads from the BigQuery external table over raw Parquet and provides a small starting point for downstream shaping, not a full modeling system.

Why this shape

  • The collector accepts and validates event batches.
  • NATS JetStream separates request handling from storage writes.
  • The raw writer lands durable Parquet files in object storage.
  • Raw Parquet is the system-of-record boundary for downstream analytics work.
  • Downstream modeling can evolve without changing the ingestion contract.

What’s working today

  • Ingestion path (collector + NATS + raw-writer)
  • Raw Parquet landing
  • Storage backend selection (GCS or S3-compatible)
  • BigQuery external-table querying
  • SQLMesh starter project

The current release is intentionally narrow: it focuses on reliable ingestion, durable raw storage, and a simple first query path.

Open source

Rawbbit is released under the Apache 2.0 License.

All source code, documentation, deploy scaffolding, and the starter SQLMesh project are public on GitHub.

Self-host it free

Want to run this on your infrastructure?

Use the open-source repository for the self-serve path, or book a setup call if you want the fastest route to a working deployment.

Self-host it free