Berserk Docs

Extracting Metrics from Logs

Use parse and make-series to extract ad-hoc metrics from unstructured log text and analyze them as time series.

In a real-world production environment, logs are a mess of structured JSON, text logs with embedded numbers, syslog lines, nginx access logs, and more. Not everything conforms to OpenTelemetry conventions. Berserk is built to handle unstructured logs efficiently — you can extract fields on the fly and use them in meaningful ways without pre-processing.

This guide shows you how to pull numeric values out of free-text log lines with parse, aggregate them with summarize, and build gap-filled time series with make-series.

Sample data

The examples below use these application logs with embedded candidate counts and size ranges:

$time:datetimebody:string
2024-03-01T10:00:00Zfetched 120 candidates for size range 0-1MB
2024-03-01T10:01:00Zfetched 85 candidates for size range 0-1MB
2024-03-01T10:02:00Zfetched 200 candidates for size range 1-10MB
2024-03-01T10:03:00Zfetched 150 candidates for size range 0-1MB
2024-03-01T10:04:00Zfetched 310 candidates for size range 1-10MB
2024-03-01T10:06:00Zfetched 175 candidates for size range 1-10MB
2024-03-01T10:07:00Zfetched 60 candidates for size range 10-100MB
2024-03-01T10:08:00Zfetched 130 candidates for size range 0-1MB
2024-03-01T10:09:00Zfetched 220 candidates for size range 1-10MB
no logs between 10:10 and 10:14 — the process was idle
2024-03-01T10:15:00Zfetched 45 candidates for size range 10-100MB
2024-03-01T10:16:00Zfetched 110 candidates for size range 0-1MB
2024-03-01T10:18:00Zfetched 280 candidates for size range 1-10MB

Notice the gap: no log lines exist between 10:10 and 10:14. This is common in real data — processes go idle, services restart, or log volume simply drops to zero.

Step 1: Extract values with parse

Use parse to pull the candidate count and size range out of each log line:

...
| parse body with "fetched " candidate_count:long " candidates for size range " size_range

This adds two new columns to each row:

$time:datetimecandidate_count:longsize_range:string
2024-03-01T10:00:00Z1200-1MB
2024-03-01T10:01:00Z850-1MB
2024-03-01T10:02:00Z2001-10MB
.........

The :long type suffix tells parse to convert the extracted text to an integer. Rows that don't match the pattern get null for both fields.

Step 2: Quick summary with summarize

The simplest way to aggregate is summarize with bin():

...
| parse body with "fetched " candidate_count:long " candidates for size range " size_range
| summarize avg(candidate_count) by bin($time, 5m)
$time:datetimeavg_candidate_count:real
2024-03-01T10:00:00Z173.0
2024-03-01T10:05:00Z146.25
2024-03-01T10:15:00Z145.0

The 10:10 bucket is missing entirely — summarize only produces rows for buckets that contain data. If you chart this directly, the line jumps from 10:05 straight to 10:15 with no indication that a gap exists. For dashboards and alerting, that silent gap is a problem.

Step 3: Build time series with make-series

make-series solves this by producing arrays with every bucket present, filling gaps with a default value:

...
| parse body with "fetched " candidate_count:long " candidates for size range " size_range
| make-series
    avg_candidates = avg(candidate_count),
    sample_count = count()
    default = 0
  on $time step 5m
$time:dynamicavg_candidates:dynamicsample_count:dynamic
[10:00, 10:05, 10:10, 10:15][173.0, 146.25, 0, 145.0][5, 4, 0, 3]

Notice the columns are all dynamic — in KQL, arrays are represented as the dynamic type regardless of element type.

The 10:10 bucket now appears with 0 — the default value. When from/to are omitted, the time range is inferred from the data's min/max timestamps, so every 5-minute bucket within it is guaranteed to be present. Charts render a continuous axis and downstream series functions (smoothing, anomaly detection) get a regular time grid to work with.

Step 4: Group by extracted dimension

Add by size_range to get one series per size range:

...
| parse body with "fetched " candidate_count:long " candidates for size range " size_range
| make-series
    avg_candidates = avg(candidate_count),
    sample_count = count()
    default = 0
  on $time step 5m
  by size_range

Now each size range gets its own row, each with gap-filled arrays. The gap at 10:10 is filled with 0 independently per group:

size_range:string$time:dynamicavg_candidates:dynamicsample_count:dynamic
0-1MB[10:00, 10:05, 10:10, 10:15][118.33, 130.0, 0, 110.0][3, 1, 0, 1]
1-10MB[10:00, 10:05, 10:10, 10:15][255.0, 197.5, 0, 280.0][2, 2, 0, 1]
10-100MB[10:00, 10:05, 10:10, 10:15][0, 60.0, 0, 45.0][0, 1, 0, 1]

From here you can pipe into series functions (series_fir, series_decompose_anomalies) or visualize with | render timechart.

When to use which approach

summarize ... by bin()make-series
Output shapeOne row per bucketOne row per group, array columns
Gap handlingMissing buckets omittedMissing buckets filled with default
Best forQuick counts, tabular reportsCharting, series functions, anomaly detection
DownstreamFilter, sort, joinseries_fir, series_decompose_anomalies, render timechart

Using Grafana?

Stick with summarize. Grafana handles gap-filling, interpolation, and smoothing on the client side — sending pre-filled arrays just gets in the way. Use make-series when the analysis happens inside KQL itself.

See the Grafana integration guide for setup and configuration details.

On this page