README file from
Githubhermes-TTS
Convert any Obsidian Markdown note into lightweight speech audio, then prepend a timestamped metadata callout with an embedded audio link.
What changed
This plugin now uses an Aloud-style API link-up pattern:
- One Model Provider selector in settings.
- Provider-specific fields shown only for the selected provider.
- Voice selection is done via dropdowns for all major providers.
- New Voice prompt section for optional speaking-style instructions.
- Output is always normalized to MP3.
- Character limit is no longer user-configurable (notes are processed without a fixed UI cap).
- File name prefix and speech speed settings were removed to simplify configuration.
Supported providers
- OpenAI
- Google Gemini
- Google Cloud Text-to-Speech
- Azure Speech
- ElevenLabs
- AWS Polly
- OpenAI-compatible endpoints (custom base URL)
Policy disclosures
- Network access is required. The plugin sends note text to the selected external TTS provider.
- External accounts and API keys are required for provider usage (OpenAI, Google, Azure, ElevenLabs, AWS, or compatible API).
- The plugin does not include telemetry or ads.
Mobile compatibility
- Hermes TTS is configured to load on mobile (
isDesktopOnly: false). - The bundle is built for browser-compatible runtimes to support Obsidian mobile.
- The plugin avoids regex lookbehind and Node-only
Bufferusage in runtime paths for broader mobile compatibility. - Provider behavior may still vary by service/API/network conditions on mobile devices.
Voice dropdown behavior
- OpenAI/Gemini: curated built-in voice dropdowns.
- Google Cloud/Azure/ElevenLabs/AWS Polly: dropdowns with refresh buttons to fetch latest provider voices.
- OpenAI-compatible: OpenAI-style voice dropdown.
- Audio from all providers is normalized and saved as MP3.
Voice prompt behavior
- The Voice prompt setting is global and optional.
- OpenAI: sent as
instructionsonly when usinggpt-4o-mini-ttsmodels (per API behavior). - Gemini: prepended as style notes before the transcript in the prompt.
- Other providers currently ignore this field.
Gemini reliability fallback
- Gemini uses the official
@google/genaiSDK flow (matching Aloud plugin setup). - On Gemini
400"tried to generate text" errors, the plugin retries in segmented transcript mode with rolling previous-context continuity. - If Gemini fails with transient errors and Google Cloud TTS is configured, generation automatically falls back to Google Cloud.
- Metadata uses the provider that actually generated the audio.
Commands
Generate Hermes-TTS audio (current note)
Provider documentation
The same docs are also available from buttons in the plugin settings tab.
Metadata block format
The plugin prepends a callout block near the top of the note (after frontmatter if present). Metadata lines can be toggled in settings. The title is a clean timestamp. For example:
> [!tts]+ 2026-02-17 15:42:10.321
> generated_at: 2026-02-17T14:42:10.321Z
> source_note: [[02 Projects/My Note]]
> provider: openai
> provider_name: OpenAI
> model: gpt-4o-mini-tts
> voice: shimmer
> format: mp3
> mime_type: audio/mpeg
> source_characters_sent: 2412
> provider_docs: https://platform.openai.com/docs/guides/text-to-speech
> voice_docs: https://platform.openai.com/docs/guides/text-to-speech#voice-options
> audio_file: ![[Attachments/TTS Audio/my-note-20260217-154210.mp3]]
Build
npm ci
npm run build
Release assets expected by Obsidian:
manifest.jsonmain.jsstyles.css