pyssg is pre-1.0 and under active development - APIs, config, and themes may change.
pyssg.plugins.content_meta

2 min read

pyssg.plugins.content_meta

Content-meta plugin: TOC/outline, word count, reading time, excerpt.

Runs in the parse phase at stage 300, i.e. after the markdown plugin (stage 200) has populated node.ast with the heading tree (Python-Markdown toc_tokens), node.meta["content_html"] and node.meta["title"]. This plugin only reads those derived facts plus the raw body and writes four new node.meta keys; it owns no graph algorithm or cache state (plugins declare facts, the engine owns invalidation).

Every computation here is pure: it depends solely on the declared inputs, never on a clock or randomness, so two builds of the same input are byte-identical.

This module is pure standard library; the markdown engine lives in the markdown plugin, never in pyssg.core.

slugify(text: str) -> str

Return a GitHub-style slug for a heading.

Rules, applied in order:

  1. NFC-normalise so visually identical strings slug identically (determinism).
  2. Lowercase and strip surrounding whitespace.
  3. Replace each run of whitespace with a single hyphen.
  4. Remove every character that is not a Unicode word char or a hyphen. Unicode letters are kept (no ASCII folding), so a Vietnamese heading such as "Giới thiệu" yields a readable "giới-thiệu".
  5. Collapse runs of hyphens and trim leading/trailing hyphens.

Empty or whitespace-only input returns "".

outline(toc_tokens: object) -> list[dict[str, object]]

Flatten Python-Markdown toc_tokens into a flat TOC in document order.

toc_tokens is the nested heading tree the toc extension records on the parser -- each node a {"level", "id", "name", "children"} dict -- which the markdown plugin copies onto node.ast. Each returned entry is {"level": int, "text": str, "slug": str} where slug is the heading's id (so in-page anchors always resolve and text carries no markup). The list stays flat (one entry per heading), walked depth-first; nesting is a presentation concern left to the consumer. Input that is not the expected shape yields an empty list.

reading_time(word_count: int) -> int

Estimated reading time in whole minutes, at least 1.

max(1, round(word_count / WORDS_PER_MINUTE)) so even a near-empty document reports one minute rather than zero.

first_paragraph_excerpt(plain_text: str, limit: int = _EXCERPT_LIMIT) -> str

Derive a plain-text excerpt from the first paragraph.

Takes everything up to the first blank line (the first paragraph), collapses all internal whitespace to single spaces, and truncates to limit characters on a word boundary, appending an ellipsis when content was cut. Returns "" for empty input.

class ContentMetaPlugin

Derives TOC, word count, reading time and excerpt from parsed Markdown.

ContentMetaPlugin.apply(self, builder: Builder) -> None

content_meta() -> ContentMetaPlugin

Factory used in pyssg.config.py.