Skip to main content

Performance tuning

A quick-reference page listing the tuning knobs and their defaults. For the rationale, measurement recipes, and trade-offs, see the Performance section.

User-facing knobs (in shaktiman.toml)

KnobDefaultEffect summary
search.max_results10Max results per search. Higher = more scoring work.
search.default_modelocatelocate (compact) or full (inline source).
search.min_score0.15Relevance floor. Higher = less noise.
context.budget_tokens4096Default assembly budget. Higher = more work.
embedding.batch_size128Ollama batch size. Higher = better throughput if hardware allows.
embedding.timeout"120s"HTTP timeout per batch. Affects circuit-breaker sensitivity.
embedding.query_prefix / document_prefix""Model-specific task prefixes. Not performance per se, but affects recall.
vector.backendbrute_forceSee Backend selection.
database.backendsqliteSee Backend selection.

Knobs not exposed via TOML (but in DefaultConfig)

These ship with defaults that are almost always correct. If you need to change them, edit internal/types/config.go:DefaultConfig and rebuild:

KnobDefaultEffect summary
EnrichmentWorkers4Parallel parse / extract workers.
WatcherDebounceMs200File-event coalescing window.
WriterChannelSize500SQLite writer backpressure.
MaxBudgetTokens4096Cap for assembly (usually matches context.budget_tokens).
Tokenizercl100k_baseTokenizer for budget accounting.

Per-call knobs (MCP / CLI)

Many MCP tools accept per-call overrides. Full list under each tool's reference:

  • searchmode, max_results, min_score, path, scope.
  • contextbudget_tokens, scope.
  • dependenciesdirection, depth, scope.
  • diffsince, limit, scope.

Per-call wins over TOML defaults.

Tuning playbook

Don't tune speculatively. Reach for these in order when something's actually slow:

  1. Measure first. time shaktiman search "query" --root . and shaktiman enrichment-status. Baseline is the first commit.
  2. Lower max_results for interactive use. 10 is plenty for most agent workflows.
  3. Raise min_score if marginal hits dominate result sets.
  4. Switch to hnsw (single-dev) or qdrant (shared) once the repo grows past ~75k chunks.
  5. Raise batch_size if your Ollama is a GPU and the queue is lagging.
  6. Raise timeout if your Ollama is occasionally slow and the circuit breaker is tripping unnecessarily.

If you reach step 6 and queries are still slow, consult Troubleshooting → Performance problems.

See also