Storage
Local disk configuration for Berserk services — caching, temporary storage, and persistence options
Berserk services use local disk for caching and temporary storage. The source of truth for all segment data is S3 (configured in Dependencies), so local storage is ephemeral — losing it means a cold cache, not data loss.
Query Service
The query service uses a local segment cache to avoid re-fetching segments from S3 on every query. For best performance, use a local SSD or NVMe-backed disk. The cache mount point is /segment_cache.
Size the cache to hold roughly a week of ingested data after compression (~10x reduction):
| Ingest rate | Raw / week | Cache size (compressed) |
|---|---|---|
| 1 MB/s | 605 GB | ~60 GB |
| 5 MB/s | 3.0 TB | ~300 GB |
| 10 MB/s | 6.0 TB | ~600 GB |
| 25 MB/s | 15.1 TB | ~1.5 TB |
| 50 MB/s | 30.2 TB | ~3 TB |
Uses ephemeral node storage. Simple to set up but the cache is lost when the pod is rescheduled.
query:
cache:
enabled: true
persistent: false
size: "128Gi"Mounts a directory from the host node — ideal when you have a local NVMe disk mounted at a known path. The cache survives pod restarts on the same node.
query:
cache:
enabled: true
persistent: false
hostPath: "/mnt/nvme"Uses a PersistentVolumeClaim. The Helm chart deploys query as a StatefulSet in this mode so the PVC is retained across restarts. Use a storageClass backed by local NVMe or fast SSD for best results.
query:
cache:
enabled: true
persistent: true
size: "128Gi"
storageClass: "local-nvme" # your storage classFor production workloads, a local NVMe-backed disk (via hostPath or emptyDir) gives the best query latency. The cache is purely a performance optimization — S3 remains the source of truth.
Other Services
The janitor, nursery, and ingest services use local disk for temporary working storage (e.g., segment merging scratch space, baby segment buffering). Disk speed is not critical for these services — standard node storage is sufficient.
All three use emptyDir volumes with configurable size limits:
janitor:
cache:
size: "20Gi" # scratch space for merge operations
nursery:
workingDir:
sizeLimit: "50Gi" # baby segment buffer
# ingest uses a small emptyDir with no size limit by defaultNode Scheduling
Control which Kubernetes nodes Berserk services run on using nodeSelector, tolerations, and affinity. Each can be set globally or per-service — per-service values take precedence.
Global defaults
Apply scheduling constraints to all services:
global:
nodeSelector:
disktype: ssd
tolerations:
- key: "dedicated"
operator: "Equal"
value: "berserk"
effect: "NoSchedule"
affinity: {}Per-service overrides
Override global defaults for individual services:
query:
nodeSelector:
kubernetes.io/hostname: my-dedicated-nodePinning to a specific node
To pin a service to a specific node by hostname:
query:
nodeSelector:
kubernetes.io/hostname: my-node-name
nursery:
nodeSelector:
kubernetes.io/hostname: my-node-name