Berserk Docs

Compared to TraceQL

How Berserk's trace-find operator relates to Grafana's TraceQL language for querying distributed traces

Berserk's trace-find operator and Grafana's TraceQL both solve the same problem: finding traces that match structural patterns across spans. They share core concepts — span-set selectors, structural operators, and composition — but differ in syntax, output model, and capabilities.

This page compares the two approaches to help users familiar with TraceQL understand trace-find, and vice versa.

Core Concepts

Both languages use structural operators to express parent-child relationships between spans, and predicates inside selectors to filter spans by their attributes.

ConceptTraceQLtrace-find
Span selector{ .http.status_code = 200 }{ attributes.http.status_code == 200 }
Structural operator{ A } >> { B }{ A } >> { B }
Logical composition{ A } && { B }{ A } and { B }
Match all spans{ }{ }

Structural Operators

Both support the same set of structural operators:

RelationshipTraceQLtrace-findNotes
Descendant (any depth)>>>>Same syntax
Child (direct)>>Same syntax
Ancestor<<<<Same syntax
Parent<<Same syntax
Sibling~~Same syntax
Negated descendant!>>Not supported in trace-find
Negated child!>Not supported in trace-find
Correlated log::Shorthand for span → log child relationship

Logs as span children: In trace-find, OTel log records are treated as children of the span they're attached to (via shared span_id). This means >> and > work naturally with logs — you don't need special syntax. The :: operator is shorthand for > when the RHS targets log fields, making it convenient for queries like { status.code == "ERROR" } :: { body has "OutOfMemory" }. Logs are leaf nodes in the tree (they cannot have children). TraceQL has no log correlation — it operates on spans only.

TraceQL adds negated structural operators (!>>, !>, etc.) which match spans that do NOT have a matching descendant/child. These are experimental in Tempo and not yet supported in trace-find.

Predicate Syntax

TraceQL uses its own expression language. trace-find uses KQL where-clause syntax.

FeatureTraceQLtrace-find
Span status{ status = ok }{ status.code == "OK" }
Span name{ name = "GET /api" }{ name == "GET /api" }
Span kind{ span:kind = server }{ kind == "SERVER" }
Duration{ span:duration > 100ms }{ duration > 100ms }
Span attribute{ .http.method = "GET" }{ attributes.http.method == "GET" }
Service name{ resource.service.name = "api" }{ resource.attributes.service.name == "api" }
String match{ .name =~ "GET.*" }{ name matches regex "GET.*" }

Key differences in field naming:

  • TraceQL uses intrinsic shorthand (status, name, span:kind, span:duration)
  • Berserk maps OTel fields to a structured layout: status is a propertybag with status.code ("OK", "ERROR", "UNSET"), kind is a string ("SERVER", "CLIENT", etc.), and span attributes live under attributes

trace-find predicates use full KQL syntax, which means you get all KQL functions (has, contains, startswith, isempty, isnull, etc.) and operators inside { } blocks. You can also use { search "term" } for full-text search across all columns — the same syntax as the KQL search operator. TraceQL has a more limited expression grammar but includes regex matching (=~, !~) which KQL handles via matches regex.

Composition

FeatureTraceQLtrace-find
AND&&and
OR||or
Chaining{ A } >> { B } >> { C }{ A } >> { B } >> { C }
Precedencestructural > &&/||:: > structural > and/or

Chaining works the same way: { A } >> { B } >> { C } finds traces where A has a descendant B which has a descendant C. In trace-find, this is desugared to { A } >> { B } and { B } >> { C }.

Time Window and Execution Model

This is a fundamental architectural difference that affects both performance and semantics.

TraceQL does not have a trace duration bound — structural patterns are evaluated against traces of any duration.

trace-find uses a within <duration> clause (default: 5 minutes) that bounds the maximum time window for a trace. This is the key to trace-find's performance: the engine divides the query time range into bins of this size and processes them incrementally, combining adjacent bins to ensure traces that span a bin boundary are still found. Shorter windows are faster and use less memory.

-- Default: 5-minute trace window
spans | trace-find { A } >> { B }

-- Long-running traces: expand the window
spans | trace-find within 1h { A } >> { B }

-- Low-latency microservices: narrow for speed
spans | trace-find within 30s { A } >> { B }

Set within to at least the expected duration of the traces you want to find:

  • Microservice requests (milliseconds to seconds): within 30s or the default within 5m
  • Batch jobs or workflows (minutes): within 30m or within 1h
  • Very long traces (hours): within 4h — but expect higher memory usage and slower queries

Memory budget and sampling

trace-find operates under a fixed memory budget per query. When the number of matching traces or their aggregation state exceeds this budget, the engine uses deterministic sampling to evict entire traces — keeping a consistent, reproducible subset. This means:

  • Simple queries (summarize count()) can track many traces cheaply
  • Complex aggregations (summarize make_set(service_name), dcount(span_id)) use more memory per trace, so fewer traces are retained
  • The sampling is deterministic: the same query over the same data always returns the same traces
  • Wider within windows also increase per-trace memory, which can reduce the number of retained traces

If you see fewer results than expected, the memory budget may be the cause. Narrowing the within window, simplifying the summarize expressions, or narrowing the time range can help.

Output Model

This is the biggest philosophical difference.

TraceQL returns spansets by default — the individual spans within each trace that matched the query.

trace-find returns trace summaries by default — one row per matching trace with aggregated metadata (trace_id, root_name, services, spans, start_time, end_time, duration). However, with project you get individual spans (like TraceQL's default), and with summarize you get custom aggregations. The output mode is controlled by the clause you choose:

Output modeTraceQLtrace-find
Default outputMatched spansTrace summary (7 columns)
Select specific attributes| select(.http.status)project name, status.code
Aggregate per trace| count() > 2summarize count()
Group within trace| by(resource.service.name)summarize count() by service_name
Filter by aggregate| count() > 10where count() > 10

Query Examples

Find traces with errors downstream of an API gateway

TraceQL:

{ resource.service.name = "api-gateway" } >> { status = error }

trace-find:

spans
| trace-find
    { resource.attributes.service.name == "api-gateway" }
    >> { status.code == "ERROR" }

Find traces where a server calls a client (direct hop)

TraceQL:

{ span:kind = server } > { span:kind = client }

trace-find:

spans
| trace-find { kind == "SERVER" } > { kind == "CLIENT" }

Find traces with more than 10 spans

TraceQL:

{ } | count() > 10

trace-find:

spans
| trace-find {} >> {}
  where count() > 10

Count spans per service in matching traces

TraceQL:

{ resource.service.name = "api-gateway" } >> { status = error }
  | by(resource.service.name) | count()

trace-find:

spans
| trace-find
    { resource.attributes.service.name == "api-gateway" }
    >> { status.code == "ERROR" }
  summarize spans=count()
    by resource.attributes.service.name

Find traces touching both service A and service B

TraceQL:

{ resource.service.name = "service-a" } && { resource.service.name = "service-b" }

trace-find:

spans
| trace-find
    { resource.attributes.service.name == "service-a" }
    and { resource.attributes.service.name == "service-b" }

Emit individual matching spans with selected fields

TraceQL:

{ status = error } | select(span:name, resource.service.name, span:duration)

trace-find:

spans
| trace-find { status.code == "ERROR" } >> {}
  project name, resource.attributes.service.name, duration

Find error spans with correlated error logs

TraceQL: No direct equivalent — TraceQL operates on spans only, not logs.

trace-find:

union otel_logs, spans
| trace-find
    { status.code == "ERROR" }
    :: { severity_text == "ERROR" and body has "OutOfMemory" }

What trace-find Does That TraceQL Doesn't

  • Logs in the span tree: OTel log records participate as children of their span. All structural operators (>>, >, ~) work with logs naturally. The :: shorthand makes log queries concise. TraceQL operates on spans only.
  • KQL aggregation functions: Full access to make_set, dcount, countif, take_anyif, arg_min, avg, percentile, etc. TraceQL has only count, avg, max, min, sum.
  • Composable output clauses: where, summarize, project can be combined. For example, where count() > 5 summarize make_set(service_name) first filters then aggregates.
  • KQL ecosystem integration: Results pipe into any KQL operator (| sort by, | top 10 by, | join, etc.).
  • Incremental execution via within: The time window bounds trace duration, enabling streaming evaluation across time bins. This makes trace-find fast on large datasets — it never needs to buffer all data at once. Default is 5 minutes; tunable per query.

What TraceQL Does That trace-find Doesn't

  • Negated structural operators (!>>, !>, etc.): Find traces where a relationship does NOT exist.
  • Spanset output by default: TraceQL returns spans without needing to opt in. trace-find defaults to trace summaries; use project to get individual spans.
  • Regex predicates (=~, !~): Native regex matching in selectors (trace-find uses KQL's matches regex).
  • Intrinsic field optimization: TraceQL's span:duration, trace:rootService etc. use pre-indexed metadata for faster queries.
  • Metrics generation: rate(), count_over_time(), quantile_over_time() etc. produce time-series from trace queries.
  • Exemplars: Connect metric data points back to the specific traces that produced them.

Key Differences Summary

AspectTraceQLtrace-find
Query languageTraceQL-specificKQL (Kusto Query Language)
Default outputSpans (spansets)Trace summaries (use project for spans)
Predicate powerLimited expressionsFull KQL where-clause
Aggregation power5 functionsAll KQL aggregate functions
Log supportSpans onlySpans + correlated logs
Time windowNo duration boundwithin clause (default 5m, tunable)
ExecutionStreaming incremental (time-binned)
NegationSupportedNot yet
MetricsBuilt-inVia downstream KQL operators
BackendGrafana TempoBerserk

On this page