Storage

Local disk configuration for Berserk services — caching, temporary storage, and persistence options

Berserk services use local disk for caching and temporary storage. The source of truth for all segment data is S3 (configured in Dependencies), so local storage is ephemeral — losing it means a cold cache, not data loss.

Query Service

The query service uses a local segment cache to avoid re-fetching segments from S3 on every query. For best performance, use a local SSD or NVMe-backed disk. The cache mount point is /segment_cache.

Size the cache to hold roughly a week of ingested data after compression (~10x reduction):

Ingest rate	Raw / week	Cache size (compressed)
1 MB/s	605 GB	~60 GB
5 MB/s	3.0 TB	~300 GB
10 MB/s	6.0 TB	~600 GB
25 MB/s	15.1 TB	~1.5 TB
50 MB/s	30.2 TB	~3 TB

Uses ephemeral node storage. Simple to set up but the cache is lost when the pod is rescheduled.

values.yaml

query:
  cache:
    enabled: true
    persistent: false
    size: "128Gi"

Mounts a directory from the host node — ideal when you have a local NVMe disk mounted at a known path. The cache survives pod restarts on the same node.

values.yaml

query:
  cache:
    enabled: true
    persistent: false
    hostPath: "/mnt/nvme"

Uses a PersistentVolumeClaim. The Helm chart deploys query as a StatefulSet in this mode so the PVC is retained across restarts. Use a storageClass backed by local NVMe or fast SSD for best results.

values.yaml

query:
  cache:
    enabled: true
    persistent: true
    size: "128Gi"
    storageClass: "local-nvme" # your storage class

For production workloads, a local NVMe-backed disk (via hostPath or emptyDir) gives the best query latency. The cache is purely a performance optimization — S3 remains the source of truth.

Other Services

The janitor, nursery, and ingest services use local disk for temporary working storage (e.g., segment merging scratch space, baby segment buffering). Disk speed is not critical for these services — standard node storage is sufficient.

All three use emptyDir volumes with configurable size limits:

values.yaml

janitor:
  cache:
    size: "20Gi" # scratch space for merge operations

nursery:
  workingDir:
    sizeLimit: "50Gi" # baby segment buffer


# ingest uses a small emptyDir with no size limit by default

Node Scheduling

Control which Kubernetes nodes Berserk services run on using nodeSelector, tolerations, and affinity. Each can be set globally or per-service — per-service values take precedence.

Global defaults

Apply scheduling constraints to all services:

values.yaml

global:
  nodeSelector:
    disktype: ssd
  tolerations:
    - key: "dedicated"
      operator: "Equal"
      value: "berserk"
      effect: "NoSchedule"
  affinity: {}

Per-service overrides

Override global defaults for individual services:

values.yaml

query:
  nodeSelector:
    kubernetes.io/hostname: my-dedicated-node

Pinning to a specific node

To pin a service to a specific node by hostname:

values.yaml

query:
  nodeSelector:
    kubernetes.io/hostname: my-node-name
nursery:
  nodeSelector:
    kubernetes.io/hostname: my-node-name

Query Service

Other Services

Node Scheduling

Global defaults

Per-service overrides

Pinning to a specific node

On this page