Vault Curate

High-quality Chinese-friendly semantic search for Obsidian, with optional AI curation.

Vault Curate

ⓘ Previously published as vault-search (plugin id and repository renamed). A different plugin authored by a separate developer now occupies the vault-search id — see the Upgrading from vault-search section below before installing if you used earlier versions.

Why Vault Curate?

Obsidian's built-in search is literal: think "prayer" but your note says "devotional" and you'll miss it. Most semantic-search plugins use generic multilingual models, which tend to under-perform on Chinese content.

Andrej Karpathy shared his vision of LLM-maintained knowledge bases — letting AI "compile" your notes into structured wikis. Compelling, but it asks you to hand over full editorial control. Vault Curate takes a different stance: AI should help you see, not think for you.

Three differentiators

Feature	How it works
Chinese semantic quality beats generic multilingual models	Ships with `bge-small-zh-v1.5` (Chinese-only training). In head-to-head testing on Chinese names, religious terms, and colloquial phrases, generic MiniLM-style multilingual models miss most of the matches; Vault Curate consistently recalls the right notes.
Zero-config to run, WebGPU accelerated	~110 MB model downloads once. WebGPU indexing: 342 notes / 5,004 chunks in about 1m23s (WASM fallback still works, around 27 minutes).
AI curation is opt-in, never silent	Description generation, MOC clustering, and frontmatter rewrites all require explicit opt-in. Nothing runs LLMs in the background and nothing rewrites your notes without you asking.

Quick Start

In Obsidian, go to Settings → Community plugins and search for Vault Curate
After enabling, the Welcome to Vault Curate modal opens. Under Embedding provider, pick Built-in (on-device, WebGPU) and click Index my vault now
After the ~110 MB model download and WebGPU indexing finish, click the sidebar compass icon and start searching

⚠️ Vault Curate is currently going through Obsidian's community review. If you can't find it in Community plugins yet, use the Manual install path below.

Installation

Requirements

Obsidian desktop (v1.0.0+)
Advanced paths only: a local Ollama instance or any OpenAI-compatible server

From Community plugins (recommended)

Open Settings → Community plugins in Obsidian
Make sure Restricted mode is off, click Community plugins → Browse
Search Vault Curate → Install → Enable
The Welcome to Vault Curate modal opens automatically on first launch

Manual install

Download main.js, manifest.json, styles.css, worker.js, ort-wasm-simd-threaded.wasm from Releases
Copy them into .obsidian/plugins/vault-curate/ in your vault
Enable in Settings → Community plugins

Tip: If your vault is Git-tracked, add .obsidian/plugins/*/data.json and .obsidian/plugins/*/index.sqlite to .gitignore.

Upgrading from vault-search

If you used the earlier vault-search plugin, follow this path:

Open your vault folder and locate .obsidian/plugins/vault-search/
Delete that folder directly from the filesystem. ⚠️ Do not use Community plugins → Uninstall — a different plugin now occupies the vault-search id and may insert itself when you uninstall.
Install Vault Curate via the Installation steps above
Enable. The Welcome to Vault Curate modal will guide you through rebuilding the index.

Embeddings are not reused across versions — a from-scratch rebuild takes ~1–2 minutes on WebGPU for a few hundred notes. Frontmatter descriptions and tags already in your notes are preserved (they live in the .md files, not in the index).

If you had keybindings set on vault-search:* commands, redo them under vault-curate:* in Settings → Hotkeys (9 commands total — see Commands).

Features

Search (Hybrid Fusion)

Three signals combined via Reciprocal Rank Fusion (k=60):

Path	Catches
BM25 (pure TS, CJK trigram)	Exact phrases, keyword combinations
Semantic embedding	Different wording, same meaning
Fuzzy title (Jaro–Winkler)	Typos, spelling variants

Two entry points:

Cmd/Ctrl+P → Vault Curate: Semantic search (modal) for quick jump
Sidebar → Search tab for persistent results

Discover

Discover works on notes, not query strings — it surfaces semantically related Cold notes you haven't touched recently:

Current note: when you open a file, related notes appear automatically, with Cold notes visually highlighted ("you haven't read this one")
Global: Cold notes most related to your entire Hot pool — intentional blind-spot mining
Results can be exported to a topic-grouped Map of Content via Generate MOC (falls back to a flat MOC if results are too few or too similar)

Discover sidebar — current note

Hot / Cold auto-tiering

Notes are auto-classified by internal links + recency:

Hot: linked to / recently touched
Cold: orphan / untouched for a while

The "recent" cutoff is tunable in Settings → Advanced → Hot window (days). Cold notes don't get buried in Discover — they're exactly the content you should be re-seeing.

Find similar notes

Right-click any .md → VC: Find similar notes → results show up in the sidebar; you can drag them straight to Canvas.

AI curation (off by default)

Turn it on under Settings → AI Curation → Enable AI curation to unlock three actions:

Generate a description + tags into a single note's frontmatter
Run description generation across the sidebar's search / discover results
Generate a topic-grouped MOC via HDBSCAN clustering + LLM naming

The LLM provider is configured separately under Settings → AI Curation (local Ollama or any OpenAI-compatible endpoint).

Commands

From Command Palette (Cmd/Ctrl+P), type Vault Curate: to see them all.

Command	What it does	Requires
`Semantic search (modal)`	Modal-style semantic search with quick jump	always available
`Open search panel`	Open the sidebar panel	always available
`Find similar notes`	Find semantically related notes to the active `.md`	always available
`Rebuild index`	Wipe the existing index and re-index everything	always available
`Update index`	Incremental update (re-index files with newer mtime)	always available
`Discover related Cold notes`	Global discover: Cold notes most related to your Hot pool	always available
`Generate description for active note`	LLM-write description + tags to the active file's frontmatter	AI curation on
`Generate descriptions for current results`	Batch description for the current sidebar results	AI curation on
`Generate MOC (topic-grouped)`	HDBSCAN cluster + LLM-name each group	AI curation on

Right-click menus expose two of these directly on a .md:

VC: Find similar notes
VC: Generate description (AI curation on)

Settings

The settings panel is split into three sections:

Quick setup

Setting	Default	Note
Embedding provider	Built-in (on-device, WebGPU)	One of three: Built-in / Ollama / OpenAI-compatible
Excluded folders	(empty)	Folder globs that won't be indexed

Changing the embedding provider or model triggers a confirmation modal — the index is wiped and rebuilt.

AI Curation

Setting	Default	Note
Enable AI curation	off	When off, description / MOC commands stay hidden
LLM provider	Ollama	The endpoint used for description + MOC naming
LLM model	qwen3:1.7b	Recommended default; any Ollama model works

Advanced

Collapsible <details> block: top results / min score / Hot window (days) / default search scope (Hot / Cold / All) / chunk size + overlap / synonym list / auto-index toggle / rebuild + update buttons / index stats.

Privacy

Three embedding modes, picked from Quick setup → Embedding provider:

Mode	Where embeddings run	Where note text goes
Built-in	On-device WebGPU / WASM	Stays on your device
Ollama (local daemon)	Local Ollama daemon on 127.0.0.1	Stays on your device
OpenAI-compatible API	Any endpoint you point it at — could be local (LM Studio, llama.cpp, …) or remote (OpenAI etc.)	Depends on the endpoint you choose; may leave your device

The same applies to AI curation (description / MOC naming), which uses an independently-configured LLM endpoint.

No telemetry. No usage tracking. Nothing is sent to any server unless you configure a remote endpoint.

🔒 About API key storage

Vault Curate, like every Obsidian plugin, stores its settings (including any OpenAI API key) as plain text in <vault>/.obsidian/plugins/vault-curate/data.json. This is Obsidian's plugin storage mechanism, not a vault-curate-specific design choice.

If your vault syncs to a cloud service (iCloud / Dropbox / Google Drive) or pushes to a public Git repository, you should:

Add .obsidian/plugins/vault-curate/data.json to your sync exclusion list or .gitignore
Or use the Built-in model / Ollama path — neither requires an API key

Tech Stack

TypeScript + esbuild (two-stage bundle for worker + main)
sql.js (SQLite via WASM) for the storage layer — replaces v0.x's data.json / index.json
Pure-TS BM25+ (src/storage/bm25.ts) for CJK-aware full-text search (no native FTS5 dependency)
@huggingface/transformers + bge-small-zh-v1.5 q8 (~110 MB, WebGPU/WASM) for on-device embeddings
hdbscan-ts for topic clustering (MOC)
Reciprocal Rank Fusion (k=60) combining BM25 + semantic + fuzzy
Optional: Ollama / any OpenAI-compatible endpoint for higher-end embedding or LLM models

Development

git clone https://github.com/notoriouslab/vault-curate.git
cd vault-curate
npm install
npm run dev    # watch mode
npm run build  # production build
npm test       # vitest unit tests (59 tests)

License

MIT

Vault Curate

Description

Reviews

Stats

Latest Version

Changelog

README file from

Vault Curate

Why Vault Curate?

Three differentiators

Quick Start

Installation

From Community plugins (recommended)

Manual install

Upgrading from vault-search

Features

Search (Hybrid Fusion)

Discover

Hot / Cold auto-tiering

Find similar notes

AI curation (off by default)

Commands

Settings

Quick setup

AI Curation

Advanced

Privacy

🔒 About API key storage

Tech Stack

Development

License