2025 12-December 15

Date: 2025 12-December 15

exkbq70zkppbmo7fT5d7

Good: https://us1.eam.hxgnsmartcloud.com/web/base/logindisp?tenant=MEMPHISTN_PRD

Function Name	Purpose	Suitability for Library (init)	Rationale
`get_link_rect`	Extracts a rect tuple from a PyMuPDF link dict.	No	Pure internal helper function for PyMuPDF data transformation. Should remain private (e.g., `_get_link_rect`).
`get_anchor_text`	Extracts text given a `fitz.Page` and a rect tuple.	No	Pure internal helper function reliant on PyMuPDF objects (`fitz.Page`). Should remain private (e.g., `_get_anchor_text`).
`analyze_toc_fitz`	Extracts TOC/Bookmarks from a `fitz.Document`.	No	Requires a raw `fitz.Document` object, making it low-level and tightly coupled to PyMuPDF. Should remain internal or private.
`inspect_pdf_hyperlinks_fitz`	Core extractor. Gets all links and TOC from a path.	Yes, but renamed	This is the primary data extraction function. It should be exposed, likely renamed to a generic, public-facing name like `extract_links_and_toc(pdf_path)`.
`print_structural_toc`	Prints TOC in a specific, formatted way.	No	A utility function for console output. Library functions should focus on returning data. If exposed, it should be in a separate `pdflinkcheck.cli.output` module.
`run_analysis`	High-Level CLI. Extracts, processes, prints summary, and handles remnants.	Yes	This is the public main entry point for the library's high-level functionality. It combines all tasks (extraction, remnants, reporting). It is the most logical function to expose in `__init__`.
`call_stable`	Placeholder for `__main__` entry.	No	Only used for module execution setup (e.g., to load arguments). Should be removed or kept as `if __name__ == "__main__":` block.

📦 Library Access (Advanced)

For developers importing pdflinkcheck into other Python projects, the core analysis functions are exposed directly in the root namespace:

Function	Description
`run_analysis()`	(Primary function) Performs the full analysis, prints to console, and handles file export.
`extract_links()`	Low-level function to retrieve all explicit links (URIs, GoTo, etc.) from a PDF path.
`extract_toc()`	Low-level function to extract the PDF's internal Table of Contents (bookmarks/outline).

Python

from pdflinkcheck.analyze import run_analysis, extract_links, extract_toc

🚀 CLI Usage

The core functionality is accessed via the analyze command. All commands include the built-in --help flag for quick reference.

Available Commands

Command	Description
`pdflinkcheck analyze`	Analyzes a PDF file for links and remnants.
`pdflinkcheck gui`	Explicitly launch the Graphical User Interface.
`pdflinkcheck license`	Displays the full AGPLv3+ license text in the terminal.

`analyze` Command Options

Option	Description	Default
`<PDF_PATH>`	Required. The path to the PDF file to analyze.	N/A
`--check-remnants / --no-check-remnants`	Toggle scanning the text layer for unlinked URLs/Emails.	`--check-remnants`
`--max-links INTEGER`	Maximum number of links/remnants to display in the detailed report sections. Use `0` to show all.	`0` (Show All)
`--export-format FORMAT`	Format for the exported report. If specified, the report is saved to a file named after the PDF. Currently supported: `JSON`.	`JSON`
`--help`	Show command help and exit.	N/A

`gui` Command Options

Option	Description	Default
`--auto-close INTEGER`	(For testing/automation only). Delay in milliseconds after which the GUI window will automatically close.	`0` (Disabled)

Example Runs

Bash

# Analyze a document, show all links/remnants, and save the report as JSON
pdflinkcheck analyze "TE Maxson WWTF O&M Manual.pdf" --export-format JSON

# Analyze a document but skip the time-consuming remnant check
pdflinkcheck analyze "another_doc.pdf" --no-check-remnants 

# Analyze a document but keep the print block short, showing only the first 10 links for each type
pdflinkcheck analyze "TE Maxson WWTF O&M Manual.pdf" --max-links 10

# Show the GUI for only a moment, like in a build check
pdflinkcheck gui --auto-close 3000