2 min read

`pyssg.plugins.content_meta`

Content-meta plugin: TOC/outline, word count, reading time, excerpt.

Runs in the parse phase at stage 300, i.e. after the markdown plugin (stage 200) has populated node.ast with the heading tree (Python-Markdown toc_tokens), node.meta["content_html"] and node.meta["title"]. This plugin only reads those derived facts plus the raw body and writes four new node.meta keys; it owns no graph algorithm or cache state (plugins declare facts, the engine owns invalidation).

Every computation here is pure: it depends solely on the declared inputs, never on a clock or randomness, so two builds of the same input are byte-identical.

This module is pure standard library; the markdown engine lives in the markdown plugin, never in pyssg.core.

`slugify(text: str) -> str`

Return a GitHub-style slug for a heading.

Rules, applied in order:

NFC-normalise so visually identical strings slug identically (determinism).
Lowercase and strip surrounding whitespace.
Replace each run of whitespace with a single hyphen.
Remove every character that is not a Unicode word char or a hyphen. Unicode letters are kept (no ASCII folding), so a Vietnamese heading such as "Giới thiệu" yields a readable "giới-thiệu".
Collapse runs of hyphens and trim leading/trailing hyphens.

Empty or whitespace-only input returns "".

`outline(toc_tokens: object) -> list[dict[str, object]]`

Flatten Python-Markdown toc_tokens into a flat TOC in document order.

toc_tokens is the nested heading tree the toc extension records on the parser -- each node a {"level", "id", "name", "children"} dict -- which the markdown plugin copies onto node.ast. Each returned entry is {"level": int, "text": str, "slug": str} where slug is the heading's id (so in-page anchors always resolve and text carries no markup). The list stays flat (one entry per heading), walked depth-first; nesting is a presentation concern left to the consumer. Input that is not the expected shape yields an empty list.

`reading_time(word_count: int) -> int`

Estimated reading time in whole minutes, at least 1.

max(1, round(word_count / WORDS_PER_MINUTE)) so even a near-empty document reports one minute rather than zero.

`first_paragraph_excerpt(plain_text: str, limit: int = _EXCERPT_LIMIT) -> str`

Derive a plain-text excerpt from the first paragraph.

Takes everything up to the first blank line (the first paragraph), collapses all internal whitespace to single spaces, and truncates to limit characters on a word boundary, appending an ellipsis when content was cut. Returns "" for empty input.

`class ContentMetaPlugin`

Derives TOC, word count, reading time and excerpt from parsed Markdown.

`ContentMetaPlugin.apply(self, builder: Builder) -> None`

`content_meta() -> ContentMetaPlugin`

Factory used in pyssg.config.py.