2025 12-December 15

Date: 2025 12-December 15

exkbq70zkppbmo7fT5d7

Good: https://us1.eam.hxgnsmartcloud.com/web/base/logindisp?tenant=MEMPHISTN_PRD

Function Name Purpose Suitability for Library (init) Rationale
get_link_rect Extracts a rect tuple from a PyMuPDF link dict. No Pure internal helper function for PyMuPDF data transformation. Should remain private (e.g., _get_link_rect).
get_anchor_text Extracts text given a fitz.Page and a rect tuple. No Pure internal helper function reliant on PyMuPDF objects (fitz.Page). Should remain private (e.g., _get_anchor_text).
analyze_toc_fitz Extracts TOC/Bookmarks from a fitz.Document. No Requires a raw fitz.Document object, making it low-level and tightly coupled to PyMuPDF. Should remain internal or private.
inspect_pdf_hyperlinks_fitz Core extractor. Gets all links and TOC from a path. Yes, but renamed This is the primary data extraction function. It should be exposed, likely renamed to a generic, public-facing name like extract_links_and_toc(pdf_path).
print_structural_toc Prints TOC in a specific, formatted way. No A utility function for console output. Library functions should focus on returning data. If exposed, it should be in a separate pdflinkcheck.cli.output module.
run_analysis High-Level CLI. Extracts, processes, prints summary, and handles remnants. Yes This is the public main entry point for the library's high-level functionality. It combines all tasks (extraction, remnants, reporting). It is the most logical function to expose in __init__.
call_stable Placeholder for __main__ entry. No Only used for module execution setup (e.g., to load arguments). Should be removed or kept as if __name__ == "__main__": block.

📦 Library Access (Advanced)

For developers importing pdflinkcheck into other Python projects, the core analysis functions are exposed directly in the root namespace:

Function Description
run_analysis() (Primary function) Performs the full analysis, prints to console, and handles file export.
extract_links() Low-level function to retrieve all explicit links (URIs, GoTo, etc.) from a PDF path.
extract_toc() Low-level function to extract the PDF's internal Table of Contents (bookmarks/outline).

Python

from pdflinkcheck.analyze import run_analysis, extract_links, extract_toc

🚀 CLI Usage

The core functionality is accessed via the analyze command. All commands include the built-in --help flag for quick reference.

Available Commands

Command Description
pdflinkcheck analyze Analyzes a PDF file for links and remnants.
pdflinkcheck gui Explicitly launch the Graphical User Interface.
pdflinkcheck license Displays the full AGPLv3+ license text in the terminal.

analyze Command Options

Option Description Default
<PDF_PATH> Required. The path to the PDF file to analyze. N/A
--check-remnants / --no-check-remnants Toggle scanning the text layer for unlinked URLs/Emails. --check-remnants
--max-links INTEGER Maximum number of links/remnants to display in the detailed report sections. Use 0 to show all. 0 (Show All)
--export-format FORMAT Format for the exported report. If specified, the report is saved to a file named after the PDF. Currently supported: JSON. JSON
--help Show command help and exit. N/A

gui Command Options

Option Description Default
--auto-close INTEGER (For testing/automation only). Delay in milliseconds after which the GUI window will automatically close. 0 (Disabled)

Example Runs

Bash

# Analyze a document, show all links/remnants, and save the report as JSON
pdflinkcheck analyze "TE Maxson WWTF O&M Manual.pdf" --export-format JSON

# Analyze a document but skip the time-consuming remnant check
pdflinkcheck analyze "another_doc.pdf" --no-check-remnants 

# Analyze a document but keep the print block short, showing only the first 10 links for each type
pdflinkcheck analyze "TE Maxson WWTF O&M Manual.pdf" --max-links 10

# Show the GUI for only a moment, like in a build check
pdflinkcheck gui --auto-close 3000