Extracting Metrics from Logs

Use parse and make-series to extract ad-hoc metrics from unstructured log text and analyze them as time series.

In a real-world production environment, logs are a mess of structured JSON, text logs with embedded numbers, syslog lines, nginx access logs, and more. Not everything conforms to OpenTelemetry conventions. Berserk is built to handle unstructured logs efficiently — you can extract fields on the fly and use them in meaningful ways without pre-processing.

This guide shows you how to pull numeric values out of free-text log lines with parse, aggregate them with summarize, and build gap-filled time series with make-series.

Sample data

The examples below use these application logs with embedded candidate counts and size ranges:

$time:datetime	body:string
2024-03-01T10:00:00Z	fetched 120 candidates for size range 0-1MB
2024-03-01T10:01:00Z	fetched 85 candidates for size range 0-1MB
2024-03-01T10:02:00Z	fetched 200 candidates for size range 1-10MB
2024-03-01T10:03:00Z	fetched 150 candidates for size range 0-1MB
2024-03-01T10:04:00Z	fetched 310 candidates for size range 1-10MB
2024-03-01T10:06:00Z	fetched 175 candidates for size range 1-10MB
2024-03-01T10:07:00Z	fetched 60 candidates for size range 10-100MB
2024-03-01T10:08:00Z	fetched 130 candidates for size range 0-1MB
2024-03-01T10:09:00Z	fetched 220 candidates for size range 1-10MB
no logs between 10:10 and 10:14 — the process was idle
2024-03-01T10:15:00Z	fetched 45 candidates for size range 10-100MB
2024-03-01T10:16:00Z	fetched 110 candidates for size range 0-1MB
2024-03-01T10:18:00Z	fetched 280 candidates for size range 1-10MB

Notice the gap: no log lines exist between 10:10 and 10:14. This is common in real data — processes go idle, services restart, or log volume simply drops to zero.

Step 1: Extract values with `parse`

Use parse to pull the candidate count and size range out of each log line:

...
| parse body with "fetched " candidate_count:long " candidates for size range " size_range

This adds two new columns to each row:

$time:datetime	candidate_count:long	size_range:string
2024-03-01T10:00:00Z	120	0-1MB
2024-03-01T10:01:00Z	85	0-1MB
2024-03-01T10:02:00Z	200	1-10MB
...	...	...

The :long type suffix tells parse to convert the extracted text to an integer. Rows that don't match the pattern get null for both fields.

Step 2: Quick summary with `summarize`

The simplest way to aggregate is summarize with bin():

...
| parse body with "fetched " candidate_count:long " candidates for size range " size_range
| summarize avg(candidate_count) by bin($time, 5m)

$time:datetime	avg_candidate_count:real
2024-03-01T10:00:00Z	173.0
2024-03-01T10:05:00Z	146.25
2024-03-01T10:15:00Z	145.0

The 10:10 bucket is missing entirely — summarize only produces rows for buckets that contain data. If you chart this directly, the line jumps from 10:05 straight to 10:15 with no indication that a gap exists. For dashboards and alerting, that silent gap is a problem.

Step 3: Build time series with `make-series`

make-series solves this by producing arrays with every bucket present, filling gaps with a default value:

...
| parse body with "fetched " candidate_count:long " candidates for size range " size_range
| make-series
    avg_candidates = avg(candidate_count),
    sample_count = count()
    default = 0
  on $time step 5m

$time:dynamic	avg_candidates:dynamic	sample_count:dynamic
[10:00, 10:05, 10:10, 10:15]	[173.0, 146.25, 0, 145.0]	[5, 4, 0, 3]

Notice the columns are all dynamic — in KQL, arrays are represented as the dynamic type regardless of element type.

The 10:10 bucket now appears with 0 — the default value. When from/to are omitted, the time range is inferred from the data's min/max timestamps, so every 5-minute bucket within it is guaranteed to be present. Charts render a continuous axis and downstream series functions (smoothing, anomaly detection) get a regular time grid to work with.

Step 4: Group by extracted dimension

Add by size_range to get one series per size range:

...
| parse body with "fetched " candidate_count:long " candidates for size range " size_range
| make-series
    avg_candidates = avg(candidate_count),
    sample_count = count()
    default = 0
  on $time step 5m
  by size_range

Now each size range gets its own row, each with gap-filled arrays. The gap at 10:10 is filled with 0 independently per group:

size_range:string	$time:dynamic	avg_candidates:dynamic	sample_count:dynamic
0-1MB	[10:00, 10:05, 10:10, 10:15]	[118.33, 130.0, 0, 110.0]	[3, 1, 0, 1]
1-10MB	[10:00, 10:05, 10:10, 10:15]	[255.0, 197.5, 0, 280.0]	[2, 2, 0, 1]
10-100MB	[10:00, 10:05, 10:10, 10:15]	[0, 60.0, 0, 45.0]	[0, 1, 0, 1]

From here you can pipe into series functions (series_fir, series_decompose_anomalies) or visualize with | render timechart.

When to use which approach

	`summarize ... by bin()`	`make-series`
Output shape	One row per bucket	One row per group, array columns
Gap handling	Missing buckets omitted	Missing buckets filled with default
Best for	Quick counts, tabular reports	Charting, series functions, anomaly detection
Downstream	Filter, sort, join	`series_fir`, `series_decompose_anomalies`, `render timechart`

Using Grafana?

Stick with summarize. Grafana handles gap-filling, interpolation, and smoothing on the client side — sending pre-filled arrays just gets in the way. Use make-series when the analysis happens inside KQL itself.

See the Grafana integration guide for setup and configuration details.

Extracting Metrics from Logs

Sample data

Step 1: Extract values with parse

Step 2: Quick summary with summarize

Step 3: Build time series with make-series

Step 4: Group by extracted dimension

When to use which approach

On this page

Step 1: Extract values with `parse`

Step 2: Quick summary with `summarize`

Step 3: Build time series with `make-series`