Vault Retrieval

Retrieval over your own vault — related notes, semantic search, and grounded chat — running locally and offline.

Vault Retrieval turns your notes into a searchable knowledge base without sending anything to the cloud. It reads a small embedding index that ships with your vault and answers three questions: What else have I written about this? Where did I say something like that? What does my vault know about X? Generation (chat) runs against a local LLM endpoint you control.

Features

Related notes — a side panel ranks the notes most similar to the one you're reading. Cosine similarity over a compact note-level index, computed on-device — works fully offline, including on mobile.
Semantic search — find notes by meaning, not just keywords.
Grounded RAG chat — ask your vault a question and get an answer grounded in retrieved notes, streamed token-by-token from your local LLM. An editable live-context panel shows exactly which notes feed the answer, with source chips that link back.
Visible thinking, with an off switch — for reasoning models, the live "💭 thinking" stream appears in a collapsible block above the answer and folds away once it arrives (and is never sent back into the conversation history). A toggle suppresses thinking when you want faster answers — via cross-server-portable hints — and a settings test tells you whether your model actually honours it.
Model capability hints — settings show, best-effort, whether the selected chat model supports vision and/or thinking, so you can pick the right one. Each endpoint has an inline connection test, and the model pickers populate from the server.
Live indexing — notes are re-embedded on save; edits made offline queue up and catch up automatically on reconnect.

Requirements

Obsidian 1.4+ (desktop or mobile).
An embedding index in <vault>/_vaultrag/ — produced by your indexing backend and synced with the vault. The related-notes panel and semantic search need only this index; no running server.
For chat (and live re-indexing): an OpenAI-compatible local LLM endpoint — e.g. LM Studio, Ollama, or an MLX server. Configurable in settings; nothing leaves your machine.

Install

Community Plugins

In Obsidian, open Settings → Community plugins → Browse, search for Vault Retrieval, then install and enable it.

Manual

Download main.js, manifest.json and styles.css from the latest release, drop them into <vault>/.obsidian/plugins/vault-retrieval/, then enable Settings → Community plugins → Vault Retrieval.

BRAT (beta)

Add the GitHub mirror johannes-kaindl/vault-rag in the BRAT plugin to track pre-release builds.

From source

git clone https://codeberg.org/jkaindl/vault-rag
cd vault-rag
npm install
npm run build      # → main.js
# copy main.js, manifest.json, styles.css into <vault>/.obsidian/plugins/vault-retrieval/

Usage

Enable the plugin and open a note. The Related notes panel (ribbon: 🔍) populates automatically.
Open Semantic search (ribbon: 🔭) to query the vault by meaning.
Open Vault Chat (ribbon: 💬), point the chat endpoint at your local LLM in settings, and ask away. Edit the live-context list to control which notes ground the answer.

Configuration

Setting	What it does	Default
Embedding endpoint / model	Re-embeds notes on save	`http://localhost:11434` · `qwen3-embedding:8b`
Chat endpoint / model	LLM for RAG chat	`http://localhost:8080` · `qwen3`
Index directory	Where the synced index lives	`_vaultrag`
Similarity / top-k	Retrieval thresholds	tunable
Excluded folders	Paths skipped by indexing	`Templates/`, `Archive/`
Context budget	Max characters fed as context (ceiling follows the model window)	`12000`
Suppress thinking	Default for new chats; also a per-chat toggle in the panel	off
Enter sends	On: Enter sends, Shift+Enter newlines · Off: reversed	on

Endpoint tip: enter the base URL without a trailing /v1 — the plugin appends it. Both forms are accepted.

How it works

Your indexing backend exports a portable note-level Matryoshka-256 int8 mini-index (~1.4 MB) into <vault>/_vaultrag/. The plugin loads it and runs brute-force cosine locally — no daemon, no VPN, no on-device LLM needed for retrieval. Only chat talks to an LLM, over an endpoint you configure. The portable index is what makes retrieval work identically across all your synced devices.

Architecture, module layout and contributor conventions live in AGENTS.md.

Image transcription (handwriting/screenshots → Markdown) lives in the sibling plugin image-to-markdown.

Contributing

Issues and pull requests are welcome on Codeberg. The project is test-driven — every change ships with tests (npm test), and larger features go through a brainstorm → spec → plan → TDD flow (docs/superpowers/). See AGENTS.md for conventions.

License

Code: GNU Affero General Public License v3.0 or later (LICENSE). A commercial dual-license is available on request if the AGPL's copyleft doesn't fit your use case.
Documentation & text: Creative Commons Attribution-ShareAlike 4.0 (LICENSE-DOCS).

Vault Retrieval

Description

Reviews

Stats

Latest Version

Changelog

README file from

Vault Retrieval

Features

Requirements

Install

Community Plugins

Manual

BRAT (beta)

From source

Usage

Configuration

How it works

Contributing

License

Vault Retrieval

Description

Reviews

Stats

Latest Version

Changelog

Vault Retrieval

Features

Requirements

Install

Community Plugins

Manual

BRAT (beta)

From source

Usage

Configuration

How it works

Related

Contributing

License