README file from
GithubVault Brain
Fully-local, multimodal AI for Obsidian. Talk to your vault, snap images into notes, search across everything semantically, and write with AI — all powered by Gemma 4 on Ollama. No cloud. No API keys. Zero data leaves your machine.
Every AI feature runs against a local Ollama endpoint on
127.0.0.1. There is exactly one network call site in the entire codebase (src/core/ollama-provider.ts) and it only ever talks to your configured localhost endpoint. See Privacy & network audit.
Contents: Features · Requirements · Setup · Commands · Settings · Privacy · Development · License
Features
🎙️ Voice
- Voice memo → structured note — right-click an audio file (or process your most recent) → verbatim transcript + ≤5-bullet summary + extracted
- [ ]tasks, written to your daily note (or a new note). Serbian & English. Your original audio is never modified. - In-app recording — a mic ribbon button opens a floating control bar with Pause / Resume / Stop / Cancel and a live timer. Stop → it transcribes and opens the note at the new memo. (Transcribe-and-discard — no audio clutter.)
- Microphone selection — choose your input device in settings.
- Meeting mode — right-click audio → "Process as meeting (who said what)" → diarized transcript + decisions + action items.
- Auto-watch folder — drop audio into a configured folder and it's transcribed automatically.
🖼️ Vision
- Image → Markdown — cursor on an embedded image → "Extract text from image below cursor" → clean Markdown (tables included) inserted below it.
💬 Chat, search & edit
A right-sidebar panel with three modes:
- This note + links — chat over the active note and its 1-hop links. Add more with the + Context picker (any note or folder).
- Whole vault (RAG) — semantic search across your whole vault with
[[note]]citations (localnomic-embed-text, file-based index, auto-built & incremental). - Edit this note — give an instruction ("replace X with Y", "make headings consistent"); preview a red/green diff, then Apply (Cmd-Z undoable).
Answers render as Markdown with clickable [[links]], a copy button, and modern chat bubbles.
🔗 Discovery
- Related notes sidebar — a live panel showing the most semantically-related notes to whatever you're viewing (reuses the vault index).
✍️ Writing
- Selection actions (right-click → Vault Brain): Summarize · Improve · Format · Translate SR↔EN · Fix grammar.
- Custom prompts — define your own
Name :: instructionactions in settings; they appear in the same submenu. - Continue writing — "Continue writing at cursor" streams an AI continuation in place.
- Suggest tags — AI suggests tags and writes them to the note's frontmatter.
- Link mentions — wraps plain-text mentions of your other note titles in
[[ ]](deterministic, no AI).
🔌 Status
- Connection indicator (🟢/🟡/🔴) with one-click fixes, and a 🧠 activity indicator showing live status of every running operation + a recent-activity log.
Requirements
- Obsidian 1.5.0+ on desktop (macOS / Windows / Linux). Desktop-only — local inference isn't available on mobile.
- Ollama 0.30.5+ installed and running.
gemma4:12b(multimodal — audio + vision + text) and, for vault search,nomic-embed-text.- ~16 GB RAM recommended (the 12B model is ~7.6 GB on disk). Apple Silicon or comparable.
Setup
- Install Ollama (ollama.com) and make sure it's running (
ollama serve, or open the app). - Pull the models:
ollama pull gemma4:12b ollama pull nomic-embed-text # for whole-vault search - Install the plugin (until it's in the community directory): copy
main.js,manifest.json, andstyles.cssinto<your-vault>/.obsidian/plugins/vault-brain/, then enable Vault Brain under Settings → Community plugins. - Open Settings → Vault Brain — defaults work out of the box; the status bar should show 🟢 Vault Brain.
About the model:
gemma4:12bis the recommended build — it's fully multimodal (audio + vision + text), so voice, image OCR, and chat all run on one model. It needs Ollama 0.30.5+ (older versions return a412when pulling it). Want something lighter? The 8Bgemma4:latestis also fully multimodal and faster (just less capable) — pick it in Settings → Vault Brain → Model. Avoidgemma4:12b-mlx: that MLX build is text-only (no voice or vision).
Commands
Voice memo → note · Start/Stop voice recording · Process as meeting · Extract text from image · Open Q&A panel · Open related notes · Rebuild vault index · Selection: Summarize/Improve/Format/Translate/Fix grammar (+ your custom prompts) · Continue writing at cursor · Suggest tags for this note · Link mentions of existing notes · Test connection.
Settings
| Setting | Default | Notes |
|---|---|---|
| Ollama host / port | http://127.0.0.1 / 11434 |
Localhost only; non-local triggers a privacy warning |
| Model | gemma4:12b |
multimodal (audio + vision + text); pick any installed model from the dropdown |
| Embedding model | nomic-embed-text:latest |
for whole-vault search |
| Vault search results (top-K) | 6 | chunks retrieved per question |
| Daily-note mode | Append to today's note | or "new note per memo" |
| Output language | Auto | or force EN / SR |
| Context token cap | 8000 | hard cap for chat context |
| Microphone | System default | input device for recording |
| Auto-watch folder | (off) | auto-transcribe audio dropped here |
| Keep model warm | off | periodic ping to avoid cold starts |
| Output template | (editable) | {{date}} {{title}} {{summary}} {{tasks}} {{transcript}} |
| Custom prompts | (empty) | one per line: Name :: instruction |
Privacy & network audit
Local-only by design. Verify it yourself:
- Read the code. One network call site —
src/core/ollama-provider.ts:grep -rln fetch src/ # → only src/core/ollama-provider.ts - Watch the traffic. With Little Snitch /
lsof -i/ a proxy, exercise every feature — the only connections are to127.0.0.1:11434. - Pull the plug. Turn Wi-Fi off and record/transcribe a memo — it still works end-to-end.
Development
npm install
npm run dev # esbuild watch → main.js
npm test # node:test unit + live Ollama contract tests (skip if offline)
npm run build # type-check + production bundle
Architecture: a thin, obsidian-free src/core/ (24 unit-tested modules — transport, context building, prompts, parsing, similarity, diff) behind an LlmProvider interface, with 9 feature orchestrators in src/features/. 87 tests. Design specs and build plans live in docs/.
License
MIT — see LICENSE.