Extracting Metrics from Logs
Use parse and make-series to extract ad-hoc metrics from unstructured log text and analyze them as time series.
In a real-world production environment, logs are a mess of structured JSON, text logs with embedded numbers, syslog lines, nginx access logs, and more. Not everything conforms to OpenTelemetry conventions. Berserk is built to handle unstructured logs efficiently — you can extract fields on the fly and use them in meaningful ways without pre-processing.
This guide shows you how to pull numeric values out of free-text log lines with
parse, aggregate them with summarize, and build gap-filled time series with
make-series.
Sample data
The examples below use these application logs with embedded candidate counts and size ranges:
| $time:datetime | body:string |
|---|---|
| 2024-03-01T10:00:00Z | fetched 120 candidates for size range 0-1MB |
| 2024-03-01T10:01:00Z | fetched 85 candidates for size range 0-1MB |
| 2024-03-01T10:02:00Z | fetched 200 candidates for size range 1-10MB |
| 2024-03-01T10:03:00Z | fetched 150 candidates for size range 0-1MB |
| 2024-03-01T10:04:00Z | fetched 310 candidates for size range 1-10MB |
| 2024-03-01T10:06:00Z | fetched 175 candidates for size range 1-10MB |
| 2024-03-01T10:07:00Z | fetched 60 candidates for size range 10-100MB |
| 2024-03-01T10:08:00Z | fetched 130 candidates for size range 0-1MB |
| 2024-03-01T10:09:00Z | fetched 220 candidates for size range 1-10MB |
| no logs between 10:10 and 10:14 — the process was idle | |
| 2024-03-01T10:15:00Z | fetched 45 candidates for size range 10-100MB |
| 2024-03-01T10:16:00Z | fetched 110 candidates for size range 0-1MB |
| 2024-03-01T10:18:00Z | fetched 280 candidates for size range 1-10MB |
Notice the gap: no log lines exist between 10:10 and 10:14. This is common in real data — processes go idle, services restart, or log volume simply drops to zero.
Step 1: Extract values with parse
Use parse to pull the candidate count and size range out of each log line:
...
| parse body with "fetched " candidate_count:long " candidates for size range " size_rangeThis adds two new columns to each row:
| $time:datetime | candidate_count:long | size_range:string |
|---|---|---|
| 2024-03-01T10:00:00Z | 120 | 0-1MB |
| 2024-03-01T10:01:00Z | 85 | 0-1MB |
| 2024-03-01T10:02:00Z | 200 | 1-10MB |
| ... | ... | ... |
The :long type suffix tells parse to convert the extracted text to an integer.
Rows that don't match the pattern get null for both fields.
Step 2: Quick summary with summarize
The simplest way to aggregate is summarize with bin():
...
| parse body with "fetched " candidate_count:long " candidates for size range " size_range
| summarize avg(candidate_count) by bin($time, 5m)| $time:datetime | avg_candidate_count:real |
|---|---|
| 2024-03-01T10:00:00Z | 173.0 |
| 2024-03-01T10:05:00Z | 146.25 |
| 2024-03-01T10:15:00Z | 145.0 |
The 10:10 bucket is missing entirely — summarize only produces rows for buckets
that contain data. If you chart this directly, the line jumps from 10:05 straight
to 10:15 with no indication that a gap exists. For dashboards and alerting, that
silent gap is a problem.
Step 3: Build time series with make-series
make-series solves this by producing arrays with every bucket present, filling
gaps with a default value:
...
| parse body with "fetched " candidate_count:long " candidates for size range " size_range
| make-series
avg_candidates = avg(candidate_count),
sample_count = count()
default = 0
on $time step 5m| $time:dynamic | avg_candidates:dynamic | sample_count:dynamic |
|---|---|---|
| [10:00, 10:05, 10:10, 10:15] | [173.0, 146.25, 0, 145.0] | [5, 4, 0, 3] |
Notice the columns are all dynamic — in KQL, arrays are represented as the
dynamic type regardless of element type.
The 10:10 bucket now appears with 0 — the default value. When from/to are
omitted, the time range is inferred from the data's min/max timestamps, so every
5-minute bucket within it is guaranteed to be present. Charts render a continuous
axis and downstream series functions (smoothing, anomaly detection) get a regular
time grid to work with.
Step 4: Group by extracted dimension
Add by size_range to get one series per size range:
...
| parse body with "fetched " candidate_count:long " candidates for size range " size_range
| make-series
avg_candidates = avg(candidate_count),
sample_count = count()
default = 0
on $time step 5m
by size_rangeNow each size range gets its own row, each with gap-filled arrays. The gap at
10:10 is filled with 0 independently per group:
| size_range:string | $time:dynamic | avg_candidates:dynamic | sample_count:dynamic |
|---|---|---|---|
| 0-1MB | [10:00, 10:05, 10:10, 10:15] | [118.33, 130.0, 0, 110.0] | [3, 1, 0, 1] |
| 1-10MB | [10:00, 10:05, 10:10, 10:15] | [255.0, 197.5, 0, 280.0] | [2, 2, 0, 1] |
| 10-100MB | [10:00, 10:05, 10:10, 10:15] | [0, 60.0, 0, 45.0] | [0, 1, 0, 1] |
From here you can pipe into series functions
(series_fir, series_decompose_anomalies) or visualize with | render timechart.
When to use which approach
summarize ... by bin() | make-series | |
|---|---|---|
| Output shape | One row per bucket | One row per group, array columns |
| Gap handling | Missing buckets omitted | Missing buckets filled with default |
| Best for | Quick counts, tabular reports | Charting, series functions, anomaly detection |
| Downstream | Filter, sort, join | series_fir, series_decompose_anomalies, render timechart |
Using Grafana?
Stick with summarize. Grafana handles gap-filling, interpolation,
and smoothing on the client side — sending pre-filled arrays just gets in the
way. Use make-series when the analysis happens inside KQL itself.
See the Grafana integration guide for setup and configuration details.