README file from
GithubVault Retrieval
Retrieval over your own vault — related notes, semantic search, and grounded chat — running locally and offline.
Vault Retrieval turns your notes into a searchable knowledge base without sending anything to the cloud. It reads a small embedding index that ships with your vault and answers three questions: What else have I written about this? Where did I say something like that? What does my vault know about X? Generation (chat) runs against a local LLM endpoint you control.
Features
- Related notes — a side panel ranks the notes most similar to the one you're reading. Cosine similarity over a compact note-level index, computed on-device — works fully offline, including on mobile.
- Semantic search — find notes by meaning, not just keywords.
- Grounded RAG chat — ask your vault a question and get an answer grounded in retrieved notes, streamed token-by-token from your local LLM. An editable live-context panel shows exactly which notes feed the answer, with source chips that link back.
- Visible thinking, with an off switch — for reasoning models, the live "💭 thinking" stream appears in a collapsible block above the answer and folds away once it arrives (and is never sent back into the conversation history). A toggle suppresses thinking when you want faster answers — via cross-server-portable hints — and a settings test tells you whether your model actually honours it.
- Model capability hints — settings show, best-effort, whether the selected chat model supports vision and/or thinking, so you can pick the right one. Each endpoint has an inline connection test, and the model pickers populate from the server.
- Live indexing — notes are re-embedded on save; edits made offline queue up and catch up automatically on reconnect.
Requirements
- Obsidian 1.4+ (desktop or mobile).
- An embedding index in
<vault>/_vaultrag/— produced by your indexing backend and synced with the vault. The related-notes panel and semantic search need only this index; no running server. - For chat (and live re-indexing): an OpenAI-compatible local LLM endpoint — e.g. LM Studio, Ollama, or an MLX server. Configurable in settings; nothing leaves your machine.
Install
Community Plugins
In Obsidian, open Settings → Community plugins → Browse, search for Vault Retrieval, then install and enable it.
Manual
Download main.js, manifest.json and styles.css from the latest release, drop them into <vault>/.obsidian/plugins/vault-retrieval/, then enable Settings → Community plugins → Vault Retrieval.
BRAT (beta)
Add the GitHub mirror johannes-kaindl/vault-rag in the BRAT plugin to track pre-release builds.
From source
git clone https://codeberg.org/jkaindl/vault-rag
cd vault-rag
npm install
npm run build # → main.js
# copy main.js, manifest.json, styles.css into <vault>/.obsidian/plugins/vault-retrieval/
Usage
- Enable the plugin and open a note. The Related notes panel (ribbon: 🔍) populates automatically.
- Open Semantic search (ribbon: 🔭) to query the vault by meaning.
- Open Vault Chat (ribbon: 💬), point the chat endpoint at your local LLM in settings, and ask away. Edit the live-context list to control which notes ground the answer.
Configuration
| Setting | What it does | Default |
|---|---|---|
| Embedding endpoint / model | Re-embeds notes on save | http://localhost:11434 · qwen3-embedding:8b |
| Chat endpoint / model | LLM for RAG chat | http://localhost:8080 · qwen3 |
| Index directory | Where the synced index lives | _vaultrag |
| Similarity / top-k | Retrieval thresholds | tunable |
| Excluded folders | Paths skipped by indexing | Templates/, Archive/ |
| Context budget | Max characters fed as context (ceiling follows the model window) | 12000 |
| Suppress thinking | Default for new chats; also a per-chat toggle in the panel | off |
| Enter sends | On: Enter sends, Shift+Enter newlines · Off: reversed | on |
Endpoint tip: enter the base URL without a trailing
/v1— the plugin appends it. Both forms are accepted.
How it works
Your indexing backend exports a portable note-level Matryoshka-256 int8 mini-index (~1.4 MB) into <vault>/_vaultrag/. The plugin loads it and runs brute-force cosine locally — no daemon, no VPN, no on-device LLM needed for retrieval. Only chat talks to an LLM, over an endpoint you configure. The portable index is what makes retrieval work identically across all your synced devices.
Architecture, module layout and contributor conventions live in AGENTS.md.
Related
Image transcription (handwriting/screenshots → Markdown) lives in the sibling plugin image-to-markdown.
Contributing
Issues and pull requests are welcome on Codeberg. The project is test-driven — every change ships with tests (npm test), and larger features go through a brainstorm → spec → plan → TDD flow (docs/superpowers/). See AGENTS.md for conventions.
License
- Code: GNU Affero General Public License v3.0 or later (
LICENSE). A commercial dual-license is available on request if the AGPL's copyleft doesn't fit your use case. - Documentation & text: Creative Commons Attribution-ShareAlike 4.0 (
LICENSE-DOCS).
Copyright © 2026 Johannes Kaindl.