Berserk Docs

Ingestion

How data flows into Berserk and how to configure your OpenTelemetry Collector

Berserk ingests telemetry — logs, traces, and metrics — via the OpenTelemetry Protocol (OTLP). You configure a standard OpenTelemetry Collector to send data to Berserk. Everything after that is handled automatically.

How Data Flows

  1. Your OpenTelemetry Collector sends data to Berserk's ingest component, named Tjalfe, over OTLP (gRPC or HTTP). Alternatively, Promtail or any Loki-compatible client can send logs via the Loki push API.
  2. Your collector includes an ingest token in each request. Tjalfe validates it with the Meta service, which authenticates the token. Tjalfe then batches incoming data and uploads it to S3.
  3. The query component (Nursery) follows each stream, downloads batches from S3, routes data to the correct datasets, and makes them searchable. Nursery also merges small batches into larger optimized segments in the background.

Tjalfe holds each request open until S3 confirms the upload, then returns that result to the collector. A success response means the data is durably stored. If S3 is temporarily unavailable, Tjalfe does a few quick retries but will return a failure to the collector rather than buffer locally. Your OpenTelemetry Collector is the durability layer — it is responsible for retrying failed requests and persistently queuing data until Tjalfe accepts it.

Ingest Tokens

Every request to Tjalfe must carry an ingest token for authentication and routing:

Authorization: Bearer ing_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Each token is bound to a dataset. All data sent with that token routes to that dataset. You can override routing per-record by setting the bzrk.dataset resource attribute.

Create a token with the CLI:

bzrk ingest-token create --dataset default my-token

The token value is only shown once at creation time. Store it securely.

Default Ingest Token (Kubernetes)

When deploying with the Helm chart, the ingest service can be configured with a default ingest token via a Kubernetes Secret. This token is used to authenticate incoming data when no other token is provided.

Managed mode (recommended): Set global.ingestToken.managed: true in your Helm values. An init container will automatically create the token by calling Meta's API and store it in a Kubernetes Secret before Tjalfe starts. This is idempotent — if the secret already exists, the init container is skipped entirely.

global:
  ingestToken:
    managed: true

Manual mode: Create the secret yourself and reference it in the chart:

kubectl create secret generic ingest-token \
  --from-literal=default_ingest_token="ing_<your-token-value>"

The Helm chart references this secret by default (ingest-token with key default_ingest_token).

Streams

A stream is a sequential write path in S3. Tjalfe registers a stream with Meta on startup and writes all incoming data — from any number of collectors and ingest tokens — to that single stream. Data from different tokens targeting different datasets is batched together in the same upload; Nursery handles the routing.

In some cases Meta may assign more than one stream to a Tjalfe instance (e.g., after a restart or during scaling), but typically there is just one. Streams are created and managed automatically — you do not need to configure or interact with them directly.

Latency Error Recovery and Durability

PropertyBehavior
Ingest latencyData is batched for up to 2 seconds (or 10 MB) in Tjalfe before S3 upload. This is configurable. End-to-end latency from collector send to searchable is typically 1-10 seconds.
DurabilityData is durable once the collector receives a success response. This confirms data has been written to S3.
BackpressureIf Tjalfe cannot keep up, it returns errors (503/UNAVAILABLE or 429/RESOURCE_EXHAUSTED). The collector's retry and queue handle this automatically.
Error recoveryWhen S3 or Meta is having problems, Tjalfe returns retryable error codes to the collector. The collector queues failed requests and retries automatically.

Protocols

Tjalfe accepts OTLP over both gRPC and HTTP, and the Loki push API for log ingestion:

ProtocolDefault PortUse
OTLP gRPC4317Standard transport. Preferred.
OTLP HTTP4318Useful when gRPC is not available (e.g., browser, Lambda).
Loki push3100Promtail-compatible HTTP endpoint for log ingestion.

The Loki receiver accepts JSON and Protobuf push requests at /loki/api/v1/push, including Loki 3.0+ structured metadata. Stream labels are mapped to resource attributes and log lines become the OTLP log body. This makes it easy to ingest logs from existing Promtail or Grafana Agent deployments without switching to an OpenTelemetry Collector.

OpenTelemetry Collector Configuration

Below is the recommended default configuration for sending data to Berserk. Your setup may vary depending on your environment and use case, but these settings are a good starting point.

# Disk-backed queue so buffered data survives collector restarts.
extensions:
  file_storage/queue:
    directory: /var/lib/otel/queue

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

exporters:
  otlp/berserk:
    endpoint: "<your-endpoint>:4317"
    tls:
      insecure: true #if not using tls/https
    headers:
      authorization: "Bearer <your-ingest-token>"

    # Tjalfe batches for up to 2s before uploading to S3.
    # Increase default 5s timeout.
    timeout: 30s

    sending_queue:
      # Persist the queue to disk so data survives collector restarts.
      # Without this, an in-memory queue loses all buffered data on restart.
      storage: file_storage/queue

      # Parallel connections to Tjalfe.
      num_consumers: 10

      # 1 GiB disk queue — buffers data during longer outages.
      queue_size: 1073741824
      sizer: bytes

      # Combine queued items into larger OTLP requests before sending.
      # Tjalfe holds each request for up to 2s or 10 MB,
      # so larger requests mean fewer round-trips waiting on S3 uploads.
      batch:
        sizer: bytes
        flush_timeout: 1s
        max_size: 10485760 # 10 MiB

    retry_on_failure:
      enabled: true
      initial_interval: 1s
      max_interval: 60s
      # 0 = retry forever. The default 5min limit drops data after timeout.
      max_elapsed_time: 0
      multiplier: 2

processors:
  # Backpressure on receivers when approaching memory limit.
  # Prevents OOM before the disk queue absorbs everything.
  memory_limiter:
    check_interval: 1s
    limit_mib: 256
    spike_limit_mib: 64

service:
  extensions: [file_storage/queue]
  # No batch processor — batching is handled inside the exporter's sending_queue.
  # A separate batch processor would fragment data across streams,
  # working against Tjalfe's per-stream batching.
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter]
      exporters: [otlp/berserk]
    logs:
      receivers: [otlp]
      processors: [memory_limiter]
      exporters: [otlp/berserk]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter]
      exporters: [otlp/berserk]

Why These Settings Matter

timeout: 30s — Tjalfe batches data for up to 2 seconds before uploading to S3. The collector's default 5-second timeout will cause spurious failures during normal operation. 30 seconds gives plenty of headroom for S3 uploads under load.

file_storage/queue — Tjalfe has no local durability — your collector is the durability layer. If the collector restarts with an in-memory queue, all buffered data is lost. The file_storage extension persists the queue to disk.

max_elapsed_time: 0 — Disables the default 5-minute retry limit. With a disk-backed queue, the collector should retry indefinitely until Tjalfe recovers. Setting a limit means data is silently dropped after the timeout.

sending_queue.batch — Tjalfe holds each request for up to 2 seconds or 10 MB before uploading to S3. Sending many small requests means each one waits independently for the batch to fill. Combining items in the queue into larger requests reduces round-trips.

memory_limiter — Applies backpressure to receivers when the collector approaches its memory limit. Without this, if Tjalfe is slow and the queue is filling, the collector can OOM before the disk queue absorbs everything.

No batch processor — Do not add a batch processor to pipelines sending to Berserk. Tjalfe batches data per-stream internally. A collector-side batch processor splits and recombines requests on its own timer, fragmenting data across streams and working against Tjalfe's batching.

Verifying Ingestion

After configuring your collector, verify data is flowing:

bzrk search "<your dataset> | take 10" --since "5m ago"

If no data appears, check:

  • The ingest token is correct and not revoked
  • The collector can reach Tjalfe (tjalfe:4317)
  • The collector logs for export errors or retries

On this page