Berserk Docs

Storage

Local disk configuration for Berserk services — caching, temporary storage, and persistence options

Berserk services use local disk for caching and temporary storage. The source of truth for all segment data is S3 (configured in Dependencies), so local storage is ephemeral — losing it means a cold cache, not data loss.

Query Service

The query service uses a local segment cache to avoid re-fetching segments from S3 on every query. For best performance, use a local SSD or NVMe-backed disk. The cache mount point is /segment_cache.

Size the cache to hold roughly a week of ingested data after compression (~10x reduction):

Ingest rateRaw / weekCache size (compressed)
1 MB/s605 GB~60 GB
5 MB/s3.0 TB~300 GB
10 MB/s6.0 TB~600 GB
25 MB/s15.1 TB~1.5 TB
50 MB/s30.2 TB~3 TB

Uses ephemeral node storage. Simple to set up but the cache is lost when the pod is rescheduled.

values.yaml
query:
  cache:
    enabled: true
    persistent: false
    size: "128Gi"

Mounts a directory from the host node — ideal when you have a local NVMe disk mounted at a known path. The cache survives pod restarts on the same node.

values.yaml
query:
  cache:
    enabled: true
    persistent: false
    hostPath: "/mnt/nvme"

Uses a PersistentVolumeClaim. The Helm chart deploys query as a StatefulSet in this mode so the PVC is retained across restarts. Use a storageClass backed by local NVMe or fast SSD for best results.

values.yaml
query:
  cache:
    enabled: true
    persistent: true
    size: "128Gi"
    storageClass: "local-nvme" # your storage class

For production workloads, a local NVMe-backed disk (via hostPath or emptyDir) gives the best query latency. The cache is purely a performance optimization — S3 remains the source of truth.

Other Services

The janitor, nursery, and ingest services use local disk for temporary working storage (e.g., segment merging scratch space, baby segment buffering). Disk speed is not critical for these services — standard node storage is sufficient.

All three use emptyDir volumes with configurable size limits:

values.yaml
janitor:
  cache:
    size: "20Gi" # scratch space for merge operations

nursery:
  workingDir:
    sizeLimit: "50Gi" # baby segment buffer


# ingest uses a small emptyDir with no size limit by default

Node Scheduling

Control which Kubernetes nodes Berserk services run on using nodeSelector, tolerations, and affinity. Each can be set globally or per-service — per-service values take precedence.

Global defaults

Apply scheduling constraints to all services:

values.yaml
global:
  nodeSelector:
    disktype: ssd
  tolerations:
    - key: "dedicated"
      operator: "Equal"
      value: "berserk"
      effect: "NoSchedule"
  affinity: {}

Per-service overrides

Override global defaults for individual services:

values.yaml
query:
  nodeSelector:
    kubernetes.io/hostname: my-dedicated-node

Pinning to a specific node

To pin a service to a specific node by hostname:

values.yaml
query:
  nodeSelector:
    kubernetes.io/hostname: my-node-name
nursery:
  nodeSelector:
    kubernetes.io/hostname: my-node-name

On this page