Berserk Docs

Service Metrics

OpenTelemetry metrics emitted by Berserk services

All Berserk services emit metrics via OpenTelemetry. Metrics are exported over OTLP to the configured collector endpoint and can be queried in Berserk itself.

Each metric name is prefixed with bzrk. followed by the service scope (e.g. bzrk.ui.query_duration).

A pre-built Grafana dashboard is available for download: bzrk-service-metrics.json. Import it into Grafana and select your Berserk datasource to visualize all metrics below.

Janitor

Background service responsible for segment lifecycle management: merging small segments into larger ones, deleting tombstoned segments from cloud storage, and running probe queries to monitor query service health.

MetricTypeUnitDescription
bzrk.janitor.segment_countgaugeCurrent number of segments in the cluster
bzrk.janitor.total_data_sizegaugebytesTotal size of all segment data in cloud storage
bzrk.janitor.segments_deletedcounterTotal segments deleted from cloud storage
bzrk.janitor.merge_cycle_durationhistogrammsDuration of segment merge cycles
bzrk.janitor.probe_durationhistogrammsDuration of probe query executions

Nursery

Ingestion service that receives OpenTelemetry data from the collector, converts it into segments, and manages segment merging for optimal query performance.

MetricTypeUnitDescription
bzrk.nursery.streams_activeup_down_counterNumber of currently active stream followers
bzrk.nursery.ingest_lag_secondshistogramsIngest lag across all streams (seconds since last ingest time)
bzrk.nursery.download_duration_mshistogrammsS3 segment download duration
bzrk.nursery.conversion_duration_mshistogrammsProtobuf to segment conversion duration
bzrk.nursery.total_duration_mshistogrammsTotal segment processing duration (download + conversion)
bzrk.nursery.bytes_ingestedcounterByTotal compressed bytes downloaded from S3 (use rate() for throughput)
bzrk.nursery.bytes_ingested_uncompressedcounterByTotal uncompressed proto bytes ingested (use rate() for throughput)
bzrk.nursery.segment_output_bytescounterByTotal bytes of segment files produced (use rate() for throughput)
bzrk.nursery.data_errorscounterData errors (malformed protobuf, conversion failures)
bzrk.nursery.infra_errorscounterInfrastructure errors (S3 failures, I/O errors)
bzrk.nursery.active_streamsgaugeNumber of active streams reported by Meta
bzrk.nursery.closed_streamsgaugeNumber of closed streams reported by Meta
bzrk.nursery.merge_countcounterTotal number of completed merges
bzrk.nursery.merge_output_size_mbhistogramMBCompressed output size of merged segments
bzrk.nursery.merge_durationhistogrammsDuration of segment merge operations
bzrk.nursery.merge_speed_mbpshistogramMB/sMerge throughput in megabytes per second
bzrk.nursery.time_to_merge_secondshistogramsTime from baby segment ingest to merge completion
bzrk.nursery.rows_ingestedcounterTotal rows ingested across all streams
bzrk.nursery.ingest_delayhistogrammsDelay between event timestamp and ingest time

Query

Query execution service that receives KQL queries over HTTP and gRPC, plans and executes them against segments, and streams results back to clients.

MetricTypeUnitDescription
bzrk.query.execution_durationhistogrammsEnd-to-end query execution duration
bzrk.query.requestscounterTotal query requests received
bzrk.query.result_rowshistogramNumber of rows returned per query
bzrk.query.errorscounterTotal query errors by error type

Tjalfe

OpenTelemetry collector that receives logs, traces, and metrics over OTLP, batches them, and exports to Berserk's ingest pipeline via a persistent WAL queue.

MetricTypeUnitDescription
bzrk.tjalfe.queue_rejectionscounterTotal batches rejected due to full queue
bzrk.tjalfe.batch_flush_durationhistogrammsDuration of batch flush operations to downstream exporters
bzrk.tjalfe.data_droppedcounterTotal requests dropped due to missing ingest token or channel full

Ui

Web UI for querying Berserk.

MetricTypeUnitDescription
bzrk.ui.query_durationhistogrammsDuration of proxied queries from start to stream completion
bzrk.ui.site_visitscounterNumber of page visits to the UI

On this page