2 min read
pyssg.plugins.content_meta
Content-meta plugin: TOC/outline, word count, reading time, excerpt.
Runs in the parse phase at stage 300, i.e. after the markdown plugin (stage 200)
has populated node.ast with the heading tree (Python-Markdown toc_tokens),
node.meta["content_html"] and node.meta["title"]. This plugin only reads
those derived facts plus the raw body and writes four new node.meta keys; it
owns no graph algorithm or cache state (plugins declare facts, the engine owns
invalidation).
Every computation here is pure: it depends solely on the declared inputs, never on a clock or randomness, so two builds of the same input are byte-identical.
This module is pure standard library; the markdown engine lives in the markdown
plugin, never in pyssg.core.
slugify(text: str) -> str
Return a GitHub-style slug for a heading.
Rules, applied in order:
- NFC-normalise so visually identical strings slug identically (determinism).
- Lowercase and strip surrounding whitespace.
- Replace each run of whitespace with a single hyphen.
- Remove every character that is not a Unicode word char or a hyphen.
Unicode letters are kept (no ASCII folding), so a Vietnamese heading such
as
"Giới thiệu"yields a readable"giới-thiệu". - Collapse runs of hyphens and trim leading/trailing hyphens.
Empty or whitespace-only input returns "".
outline(toc_tokens: object) -> list[dict[str, object]]
Flatten Python-Markdown toc_tokens into a flat TOC in document order.
toc_tokens is the nested heading tree the toc extension records on the
parser -- each node a {"level", "id", "name", "children"} dict -- which the
markdown plugin copies onto node.ast. Each returned entry is
{"level": int, "text": str, "slug": str} where slug is the heading's
id (so in-page anchors always resolve and text carries no markup). The
list stays flat (one entry per heading), walked depth-first; nesting is a
presentation concern left to the consumer. Input that is not the expected shape
yields an empty list.
reading_time(word_count: int) -> int
Estimated reading time in whole minutes, at least 1.
max(1, round(word_count / WORDS_PER_MINUTE)) so even a near-empty document
reports one minute rather than zero.
first_paragraph_excerpt(plain_text: str, limit: int = _EXCERPT_LIMIT) -> str
Derive a plain-text excerpt from the first paragraph.
Takes everything up to the first blank line (the first paragraph), collapses
all internal whitespace to single spaces, and truncates to limit
characters on a word boundary, appending an ellipsis when content was cut.
Returns "" for empty input.
class ContentMetaPlugin
Derives TOC, word count, reading time and excerpt from parsed Markdown.
ContentMetaPlugin.apply(self, builder: Builder) -> None
content_meta() -> ContentMetaPlugin
Factory used in pyssg.config.py.