README file from
GithubPDF to Markdown Local
Desktop-only Obsidian plugin that converts text-based PDFs to traceable Markdown locally from the file context menu.
Features
- Adds
Convert to Markdownto the file menu for.pdffiles. - Adds
Convert PDFs in folder tree to Markdownto folders with PDF descendants. - Adds optional merged batch conversion for folders with multiple PDFs.
- Shows a mandatory preview popup before any folder batch conversion creates files.
- Adds an optional per-conversion checkbox to extract and embed detected PDF images as local PNG files.
- Adds
Clean generated PDF Markdown in folder treeto remove legacy page markers and source-index boilerplate from generated notes. - Ships with a built-in clean template when no user template is configured.
- Uses Obsidian locale for Spanish or English menus, modals, notices, and built-in output.
- Reads PDFs from the local vault using Obsidian APIs.
- Uses bundled
pdf.jswith an inline local worker fallback insidemain.js; no CDN or remote worker is used. - Creates a clean
.mdfile beside the source PDF. - Never overwrites existing Markdown files.
- Preserves the original PDF.
- Keeps compact notes clean by default, with optional page markers and full technical metadata when needed.
- Applies conservative Markdown cleanup for paragraph joins, hyphenated line breaks, simple two-column pages, and repeated headers/footers.
- Applies local Markdown templates and infers related notes from resolved Obsidian links.
- Cleans empty template scaffolding, replaces placeholders in body and frontmatter, and keeps a single document H1.
- Creates local previous, next, and conservative mention links between generated Markdown notes without adding visible plugin marker comments.
Templates and Relationships
If no user template is selected or resolved, the plugin uses a built-in template. In Spanish it creates tipo: Nota convertida de PDF a MD, nombre, tags from H1/H2 headings, one H1, and the converted content. In English it creates type: PDF to Markdown note, name, tags, one H1, and the converted content.
Configure any vault template folder and optional source-folder rules in the plugin settings. The paths remain editable, offer vault-aware suggestions, and show a warning when the template folder does not exist. Selecting an autocomplete suggestion saves the complete vault path. Every Markdown note below the configured folder is available as a template in the conversion modal. Templates are regular Markdown notes and support:
{{content}}
{{pdf_link}}
{{related_notes}}
{{heading_tags}}
{{title}}
{{pdf_title}}
{{document_title}}
{{date}}
{{time}}
{{source_path}}
{{title}} is the final note title. It normally comes from the cleaned PDF filename, and falls back to the first document H1 only when the filename is clearly generic, such as a hash, date, or random alphanumeric code. {{pdf_title}} always uses the cleaned PDF filename. {{document_title}} always uses the first H1 detected in the converted content.
{{heading_tags}} is a YAML list generated from H1/H2 headings. Tags are normalized for Obsidian by lowercasing, removing accents, replacing punctuation with hyphens, deduplicating, and limiting the list to 12 values.
Template selection is local and deterministic:
- A unique
pdf_templateproperty from a note that already links to the PDF. - The most specific configured source-folder rule.
- Manual selection in the conversion modal.
- The built-in template fallback.
If a template does not contain {{content}}, the plugin inserts converted content under Apuntes, Contenido, Notas, Content, or Notes when available and removes empty template scaffolding. The modal reports template compatibility before conversion.
When a template uses {{pdf_link}} or {{related_notes}} in its own properties, those properties are filled without adding duplicate canonical properties. When enabled in the modal, the plugin also adds the generated Markdown link to the related property of notes that already link to the PDF. Generated-note relationships are stored in frontmatter; the plugin does not add visible HTML relation markers to compact notes.
Folder Batch Conversion
Right-click a folder containing PDFs anywhere below it and choose Convert PDFs in folder tree to Markdown. The plugin first opens an options modal, then a mandatory preview popup. Conversion only starts from the preview popup.
- PDFs inside subfolders are processed recursively at any depth.
- Each Markdown file is created beside its source PDF.
- Existing base Markdown files such as
Manual.mdare skipped. - Scanned PDFs and failed PDFs do not stop the remaining batch.
- A final Notice reports converted, skipped, no-text, and failed totals.
- The preview shows a short summary, collapsible PDF name lists, skipped existing Markdown names, template sources, editable merged-note names, and active options.
- The preview does not read PDF contents, so PDFs without extractable text are still detected only during conversion.
- The optional generated-note linking pass creates previous, next, and conservative mention wikilinks.
Refresh PDF Markdown links in folder treerecalculates generated-note links without reconverting PDFs.Clean generated PDF Markdown in folder treecleans existing generated notes associated with PDFs without reconverting them.- Batch conversion does not open generated notes.
When two or more PDFs are detected, the modal can merge conversions instead of creating one note per PDF:
No mergekeeps the existing behavior.Merge all PDFs into one notecreates one merged note in the selected folder.Merge by PDF foldercreates one merged note beside each direct PDF group.- Merge mode does not create individual Markdown notes.
- The built-in merge template stores the clean PDF titles as tags, removing leading numbering such as
1 -,01., or(3).
Optional Images
Each individual and folder conversion modal includes Extract and embed images. It is off by default. When enabled, the plugin detects pages with raster image operations through local pdf.js, renders those pages as PNG files, stores them beside the generated note under assets/{note-name}/, and embeds them with Obsidian wikilinks such as ![[assets/Manual/Manual-p2-1.png]].
This is local and non-destructive. It does not perform OCR, extract text from images, or reconstruct vector-only diagrams as separate image assets.
Output
For Folder/Manual.pdf, the plugin creates Folder/Manual.md. If that file already exists, it creates Folder/Manual 1.md, Folder/Manual 2.md, and so on.
The default compact metadata profile keeps template properties and relationships without exposing converter metadata. Page markers are hidden by default:
Spanish built-in compact output:
---
tipo: Nota convertida de PDF a MD
nombre: Manual
tags:
- introduccion
- conceptos-basicos
---
# Manual
Compact output also removes source-index boilerplate lines such as Tema, Subtema, Titulo, Archivo, Fila CSV, URL original, and URL final when they appear as a generated preamble. Select the full metadata profile when source path, hash, page count, converter version, warnings, page markers, and source-index lines must be stored.
Privacy
Conversion runs inside Obsidian on the local machine. The plugin code does not call remote APIs, upload PDFs, configure remote workers, or send telemetry. It passes the local PDF bytes to bundled pdf.js with local worker execution, disabled auto-fetch/range/stream loading, and no CDN configuration. The source PDF stays in the vault and is not modified.
The bundled pdf.js distribution contains generic network loader and viewer telemetry code paths, so static scans of main.js may find strings such as fetch, XMLHttpRequest, or reporttelemetry. They are vendored library code paths, not plugin upload or telemetry integrations.
Limits
Version 0.11.1 focuses on fast, local extraction of embedded PDF text with clean Obsidian defaults, built-in ES/EN output, conservative Markdown cleanup, flexible local templates, recursive folder batch conversion with mandatory preview, optional merged batch notes, optional local image embedding, deterministic Obsidian relationships, and a local cleanup command for existing generated notes. It does not infer semantic relationships, fill editorial sections such as objectives or key ideas, include OCR, reconstruct tables, or reproduce PDF layout losslessly.
conversion_warnings may include simple layout notes such as detected two-column pages, removed repeated headers/footers, removed page numbers, or pages with very low extractable text.
Select the full metadata profile when those technical properties must be stored in frontmatter. In compact mode they remain available only during conversion notices and batch summaries.
Manual Install
Build the plugin and copy these files to .obsidian/plugins/pdf-to-markdown-local/:
main.js
manifest.json
styles.css
Enable PDF to Markdown Local in Obsidian community plugins.
Development
npm install
npm run test
npm run build
node --check main.js
Full local check:
npm run check
Install the current release build into the active local vault plugin folder:
npm run prepare-release
npm run install-local
Then reload Obsidian or disable and re-enable the plugin. install-local copies release artifacts to .obsidian/plugins/pdf-to-markdown-local, removes any stale pdf.worker.mjs, and verifies the copied hashes.
If you do not have git on this machine, use GITHUB-UPLOAD-0.11.1.md for the exact manual upload order and file list for both the repository source and the GitHub release assets.
Release
manifest.json and versions.json must include the release version expected by Obsidian. The GitHub release tag should match manifest.version.
Prepare release artifacts:
npm run prepare-release
The command builds the plugin and copies the required release files to release/:
release/main.js
release/manifest.json
release/styles.css
## Release Notes
The Obsidian review still may disclose scanner findings coming from bundled `pdfjs-dist`, especially around `new Function("")`, DOM internals, or base64 helpers. Those are third-party library paths bundled into `main.js`, not plugin network or telemetry code. v0.11.1 removes the plugin-owned issues around unsupported release assets, deprecated settings API usage, stale worker packaging, and global vault enumeration helpers.