2025 12-December 15
Date: 2025 12-December 15
exkbq70zkppbmo7fT5d7
Good: https://us1.eam.hxgnsmartcloud.com/web/base/logindisp?tenant=MEMPHISTN_PRD
| Function Name | Purpose | Suitability for Library (init) | Rationale |
|---|---|---|---|
get_link_rect |
Extracts a rect tuple from a PyMuPDF link dict. | No | Pure internal helper function for PyMuPDF data transformation. Should remain private (e.g., _get_link_rect). |
get_anchor_text |
Extracts text given a fitz.Page and a rect tuple. |
No | Pure internal helper function reliant on PyMuPDF objects (fitz.Page). Should remain private (e.g., _get_anchor_text). |
analyze_toc_fitz |
Extracts TOC/Bookmarks from a fitz.Document. |
No | Requires a raw fitz.Document object, making it low-level and tightly coupled to PyMuPDF. Should remain internal or private. |
inspect_pdf_hyperlinks_fitz |
Core extractor. Gets all links and TOC from a path. | Yes, but renamed | This is the primary data extraction function. It should be exposed, likely renamed to a generic, public-facing name like extract_links_and_toc(pdf_path). |
print_structural_toc |
Prints TOC in a specific, formatted way. | No | A utility function for console output. Library functions should focus on returning data. If exposed, it should be in a separate pdflinkcheck.cli.output module. |
run_analysis |
High-Level CLI. Extracts, processes, prints summary, and handles remnants. | Yes | This is the public main entry point for the library's high-level functionality. It combines all tasks (extraction, remnants, reporting). It is the most logical function to expose in __init__. |
call_stable |
Placeholder for __main__ entry. |
No | Only used for module execution setup (e.g., to load arguments). Should be removed or kept as if __name__ == "__main__": block. |
📦 Library Access (Advanced)
For developers importing pdflinkcheck into other Python projects, the core analysis functions are exposed directly in the root namespace:
| Function | Description |
|---|---|
run_analysis() |
(Primary function) Performs the full analysis, prints to console, and handles file export. |
extract_links() |
Low-level function to retrieve all explicit links (URIs, GoTo, etc.) from a PDF path. |
extract_toc() |
Low-level function to extract the PDF's internal Table of Contents (bookmarks/outline). |
Python
from pdflinkcheck.analyze import run_analysis, extract_links, extract_toc
🚀 CLI Usage
The core functionality is accessed via the analyze command. All commands include the built-in --help flag for quick reference.
Available Commands
| Command | Description |
|---|---|
pdflinkcheck analyze |
Analyzes a PDF file for links and remnants. |
pdflinkcheck gui |
Explicitly launch the Graphical User Interface. |
pdflinkcheck license |
Displays the full AGPLv3+ license text in the terminal. |
analyze Command Options
| Option | Description | Default |
|---|---|---|
<PDF_PATH> |
Required. The path to the PDF file to analyze. | N/A |
--check-remnants / --no-check-remnants |
Toggle scanning the text layer for unlinked URLs/Emails. | --check-remnants |
--max-links INTEGER |
Maximum number of links/remnants to display in the detailed report sections. Use 0 to show all. |
0 (Show All) |
--export-format FORMAT |
Format for the exported report. If specified, the report is saved to a file named after the PDF. Currently supported: JSON. |
JSON |
--help |
Show command help and exit. | N/A |
gui Command Options
| Option | Description | Default |
|---|---|---|
--auto-close INTEGER |
(For testing/automation only). Delay in milliseconds after which the GUI window will automatically close. | 0 (Disabled) |
Example Runs
Bash
# Analyze a document, show all links/remnants, and save the report as JSON
pdflinkcheck analyze "TE Maxson WWTF O&M Manual.pdf" --export-format JSON
# Analyze a document but skip the time-consuming remnant check
pdflinkcheck analyze "another_doc.pdf" --no-check-remnants
# Analyze a document but keep the print block short, showing only the first 10 links for each type
pdflinkcheck analyze "TE Maxson WWTF O&M Manual.pdf" --max-links 10
# Show the GUI for only a moment, like in a build check
pdflinkcheck gui --auto-close 3000