Service Metrics
OpenTelemetry metrics emitted by Berserk services
All Berserk services emit metrics via OpenTelemetry. Metrics are exported over OTLP to the configured collector endpoint and can be queried in Berserk itself.
Each metric name is prefixed with bzrk. followed by the service scope (e.g. bzrk.ui.query_duration).
A pre-built Grafana dashboard is available for download: bzrk-service-metrics.json. Import it into Grafana and select your Berserk datasource to visualize all metrics below.
Janitor
Background service responsible for segment lifecycle management: merging small segments into larger ones, deleting tombstoned segments from cloud storage, and running probe queries to monitor query service health.
| Metric | Type | Unit | Description |
|---|---|---|---|
bzrk.janitor.segment_count | gauge | — | Current number of segments in the cluster |
bzrk.janitor.total_data_size | gauge | bytes | Total size of all segment data in cloud storage |
bzrk.janitor.segments_deleted | counter | — | Total segments deleted from cloud storage |
bzrk.janitor.merge_cycle_duration | histogram | ms | Duration of segment merge cycles |
bzrk.janitor.probe_duration | histogram | ms | Duration of probe query executions |
Nursery
Ingestion service that receives OpenTelemetry data from the collector, converts it into segments, and manages segment merging for optimal query performance.
| Metric | Type | Unit | Description |
|---|---|---|---|
bzrk.nursery.streams_active | up_down_counter | — | Number of currently active stream followers |
bzrk.nursery.ingest_lag_seconds | histogram | s | Ingest lag across all streams (seconds since last ingest time) |
bzrk.nursery.download_duration_ms | histogram | ms | S3 segment download duration |
bzrk.nursery.conversion_duration_ms | histogram | ms | Protobuf to segment conversion duration |
bzrk.nursery.total_duration_ms | histogram | ms | Total segment processing duration (download + conversion) |
bzrk.nursery.bytes_ingested | counter | By | Total compressed bytes downloaded from S3 (use rate() for throughput) |
bzrk.nursery.bytes_ingested_uncompressed | counter | By | Total uncompressed proto bytes ingested (use rate() for throughput) |
bzrk.nursery.segment_output_bytes | counter | By | Total bytes of segment files produced (use rate() for throughput) |
bzrk.nursery.data_errors | counter | — | Data errors (malformed protobuf, conversion failures) |
bzrk.nursery.infra_errors | counter | — | Infrastructure errors (S3 failures, I/O errors) |
bzrk.nursery.active_streams | gauge | — | Number of active streams reported by Meta |
bzrk.nursery.closed_streams | gauge | — | Number of closed streams reported by Meta |
bzrk.nursery.merge_count | counter | — | Total number of completed merges |
bzrk.nursery.merge_output_size_mb | histogram | MB | Compressed output size of merged segments |
bzrk.nursery.merge_duration | histogram | ms | Duration of segment merge operations |
bzrk.nursery.merge_speed_mbps | histogram | MB/s | Merge throughput in megabytes per second |
bzrk.nursery.time_to_merge_seconds | histogram | s | Time from baby segment ingest to merge completion |
bzrk.nursery.rows_ingested | counter | — | Total rows ingested across all streams |
bzrk.nursery.ingest_delay | histogram | ms | Delay between event timestamp and ingest time |
Query
Query execution service that receives KQL queries over HTTP and gRPC, plans and executes them against segments, and streams results back to clients.
| Metric | Type | Unit | Description |
|---|---|---|---|
bzrk.query.execution_duration | histogram | ms | End-to-end query execution duration |
bzrk.query.requests | counter | — | Total query requests received |
bzrk.query.result_rows | histogram | — | Number of rows returned per query |
bzrk.query.errors | counter | — | Total query errors by error type |
Tjalfe
OpenTelemetry collector that receives logs, traces, and metrics over OTLP, batches them, and exports to Berserk's ingest pipeline via a persistent WAL queue.
| Metric | Type | Unit | Description |
|---|---|---|---|
bzrk.tjalfe.queue_rejections | counter | — | Total batches rejected due to full queue |
bzrk.tjalfe.batch_flush_duration | histogram | ms | Duration of batch flush operations to downstream exporters |
bzrk.tjalfe.data_dropped | counter | — | Total requests dropped due to missing ingest token or channel full |
Ui
Web UI for querying Berserk.
| Metric | Type | Unit | Description |
|---|---|---|---|
bzrk.ui.query_duration | histogram | ms | Duration of proxied queries from start to stream completion |
bzrk.ui.site_visits | counter | — | Number of page visits to the UI |