Vault Retrieval

by Johannes Kaindl
5
4
3
2
1
New Plugin

Description

Local, offline retrieval over your vault: related notes, semantic search, and grounded chat with a local LLM — nothing leaves your machine. - This plugin has not been manually reviewed by Obsidian staff.

Reviews

No reviews yet.

Stats

stars
downloads
0
forks
0
days
NaN
days
NaN
days
0
total PRs
0
open PRs
0
closed PRs
0
merged PRs
0
total issues
0
open issues
0
closed issues
0
commits

Latest Version

Invalid date

Changelog

README file from

Github

Vault Retrieval

License: AGPL-3.0 Docs: CC BY-SA 4.0 Release Platform

Retrieval over your own vault — related notes, semantic search, and grounded chat — running locally and offline.

Vault Retrieval turns your notes into a searchable knowledge base without sending anything to the cloud. It reads a small embedding index that ships with your vault and answers three questions: What else have I written about this? Where did I say something like that? What does my vault know about X? Generation (chat) runs against a local LLM endpoint you control.

Features

  • Related notes — a side panel ranks the notes most similar to the one you're reading. Cosine similarity over a compact note-level index, computed on-device — works fully offline, including on mobile.
  • Semantic search — find notes by meaning, not just keywords.
  • Grounded RAG chat — ask your vault a question and get an answer grounded in retrieved notes, streamed token-by-token from your local LLM. An editable live-context panel shows exactly which notes feed the answer, with source chips that link back.
  • Visible thinking, with an off switch — for reasoning models, the live "💭 thinking" stream appears in a collapsible block above the answer and folds away once it arrives (and is never sent back into the conversation history). A toggle suppresses thinking when you want faster answers — via cross-server-portable hints — and a settings test tells you whether your model actually honours it.
  • Model capability hints — settings show, best-effort, whether the selected chat model supports vision and/or thinking, so you can pick the right one. Each endpoint has an inline connection test, and the model pickers populate from the server.
  • Live indexing — notes are re-embedded on save; edits made offline queue up and catch up automatically on reconnect.

Requirements

  • Obsidian 1.4+ (desktop or mobile).
  • An embedding index in <vault>/_vaultrag/ — produced by your indexing backend and synced with the vault. The related-notes panel and semantic search need only this index; no running server.
  • For chat (and live re-indexing): an OpenAI-compatible local LLM endpoint — e.g. LM Studio, Ollama, or an MLX server. Configurable in settings; nothing leaves your machine.

Install

Community Plugins

In Obsidian, open Settings → Community plugins → Browse, search for Vault Retrieval, then install and enable it.

Manual

Download main.js, manifest.json and styles.css from the latest release, drop them into <vault>/.obsidian/plugins/vault-retrieval/, then enable Settings → Community plugins → Vault Retrieval.

BRAT (beta)

Add the GitHub mirror johannes-kaindl/vault-rag in the BRAT plugin to track pre-release builds.

From source

git clone https://codeberg.org/jkaindl/vault-rag
cd vault-rag
npm install
npm run build      # → main.js
# copy main.js, manifest.json, styles.css into <vault>/.obsidian/plugins/vault-retrieval/

Usage

  1. Enable the plugin and open a note. The Related notes panel (ribbon: 🔍) populates automatically.
  2. Open Semantic search (ribbon: 🔭) to query the vault by meaning.
  3. Open Vault Chat (ribbon: 💬), point the chat endpoint at your local LLM in settings, and ask away. Edit the live-context list to control which notes ground the answer.

Configuration

Setting What it does Default
Embedding endpoint / model Re-embeds notes on save http://localhost:11434 · qwen3-embedding:8b
Chat endpoint / model LLM for RAG chat http://localhost:8080 · qwen3
Index directory Where the synced index lives _vaultrag
Similarity / top-k Retrieval thresholds tunable
Excluded folders Paths skipped by indexing Templates/, Archive/
Context budget Max characters fed as context (ceiling follows the model window) 12000
Suppress thinking Default for new chats; also a per-chat toggle in the panel off
Enter sends On: Enter sends, Shift+Enter newlines · Off: reversed on

Endpoint tip: enter the base URL without a trailing /v1 — the plugin appends it. Both forms are accepted.

How it works

Your indexing backend exports a portable note-level Matryoshka-256 int8 mini-index (~1.4 MB) into <vault>/_vaultrag/. The plugin loads it and runs brute-force cosine locally — no daemon, no VPN, no on-device LLM needed for retrieval. Only chat talks to an LLM, over an endpoint you configure. The portable index is what makes retrieval work identically across all your synced devices.

Architecture, module layout and contributor conventions live in AGENTS.md.

Image transcription (handwriting/screenshots → Markdown) lives in the sibling plugin image-to-markdown.

Contributing

Issues and pull requests are welcome on Codeberg. The project is test-driven — every change ships with tests (npm test), and larger features go through a brainstorm → spec → plan → TDD flow (docs/superpowers/). See AGENTS.md for conventions.

License

  • Code: GNU Affero General Public License v3.0 or later (LICENSE). A commercial dual-license is available on request if the AGPL's copyleft doesn't fit your use case.
  • Documentation & text: Creative Commons Attribution-ShareAlike 4.0 (LICENSE-DOCS).

Copyright © 2026 Johannes Kaindl.