Skip to main content

Embeddings

Shaktiman uses Ollama to generate vector embeddings for semantic search. This page covers the knobs that actually ship, grouped by what they affect.

The authoritative schema is [embedding] in Config File.

Provider

Today the only embedding client is Ollama (internal/vector/ollama.go). Point it at any compatible Ollama-served model via:

[embedding]
ollama_url = "http://localhost:11434"
model = "nomic-embed-text"
dims = 768

dims must match what the model actually produces. Common combinations:

Modeldims
nomic-embed-text768
mxbai-embed-large1024
snowflake-arctic-embed-m768

If dims and the model disagree you'll get insertion errors or silently wrong results — there's no cross-check. See Known Limitations.

Task prefixes

Some embedding models — nomic-embed-text notably — are trained with task-specific prefixes. Using them improves retrieval quality noticeably:

[embedding]
query_prefix = "search_query: "
document_prefix = "search_document: "

query_prefix is prepended to every search query before it's sent to Ollama. document_prefix is prepended to every chunk before embedding. Leave both empty for models that don't use prefixes.

Batching and timeout

[embedding]
batch_size = 128
timeout = "120s"
  • batch_size — how many chunks go into a single /api/embed request. Higher throughput on a fast GPU, more memory used on the Ollama side. Default 128 is safe for nomic-embed-text on most machines.
  • timeout — HTTP timeout for a single batch. 120s is generous; lower it if you'd rather fail fast. On failure the circuit breaker opens — see below.

Circuit breaker

The embedding worker wraps the Ollama client in a circuit breaker:

StateBehaviour
closedNormal operation; batches flow through.
openEmbedding paused after repeated failures. Retries after a cooldown.
half_openProbe batch in flight to see if Ollama recovered.
disabledPermanently off — either by config (embedding.enabled = false) or after extended unavailability. Restart shaktimand to retry.

Check the state with enrichment_status. When it's not closed, semantic search automatically falls back to keyword ranking.

Daemon vs CLI indexing

Shaktiman has two code paths that populate the index, and they handle embeddings differently — worth knowing so the defaults don't surprise you.

PathWhat it isEmbedding behaviour
Daemon (shaktimand)Launched automatically by your MCP client (Claude Code, Cursor, Zed, …). Owns the watcher and runs incremental re-indexes on file saves.Honors embedding.enableddefault true. Starts the embedding worker on launch; new/changed chunks are embedded continuously in the background.
CLI (shaktiman index)You run it by hand, e.g. in CI, scripting, or one-shot cold-indexing.Opt-in — embedding is off unless you pass --embed (and have Ollama reachable). The enabled flag in the TOML is ignored here; the CLI flag wins.

Both paths write to the same metadata store and vector store, so a CLI cold-index without --embed followed by a daemon start is a valid sequence — the daemon will embed the pending chunks on startup. Check progress with enrichment_status.

Disabling embedding

If you don't have Ollama installed or want the keyword-only path (faster cold index, no GPU required), disable embedding entirely:

[embedding]
enabled = false

This only affects the daemon path. For the CLI, simply don't pass --embed.

Keyword search, symbol lookup, dependencies, and diff all work without embeddings. Only semantic ranking and the vector-based facets of context need them.

See also