Topic modeling is the unsupervised discovery of themes in a corpus of documents, classically via algorithms like LDA and increasingly via embedding-based clustering.
Why it matters
Knowing what a corpus is *about* without reading it is the foundational task of corpus analytics. Classical LDA (Latent Dirichlet Allocation) was the gold standard until ~2020; modern approaches use embeddings + clustering (BERTopic, top2vec) for cleaner topics with less hyperparameter tuning.
The downstream use is typically navigation: instead of scrolling through 500 documents, you see 12 topics, each with representative documents. Search-by-topic, recommendation-by-topic, and trend detection all build on this.
How Pith relates
Pith's topic map is embedding-based clustering on bookmark embeddings. Hover over a cluster to see its theme; click to filter bookmarks. Topic-cluster labels are LLM-generated from the cluster's content.
See also
Last reviewed: 10 May 2026 · Licensed CC BY 4.0 · cite freely with attribution to Pith.