Back to glossary

Entity extraction is the task of identifying named entities (people, companies, places, products, concepts) within unstructured text, typically performed by a language model or specialised NLP pipeline.

Why it matters

Entity extraction turns prose into structured data. A news article mentioning 'Sundar Pichai announced…' becomes a (Person, Pichai) entity linked to (Company, Google) — usable for search, recommendation, alerting, and graph construction.

The state of the art is now LLM-based: a single GPT-4-class call can extract entities + types + relationships with reasonable accuracy. Specialised tools (spaCy, Stanford NLP) remain faster and cheaper for high-volume pipelines.

How Pith relates

Pith extracts entities from every summarised bookmark — people, companies, frameworks. These feed the auto-tag service (matching against client names + aliases), the entities surface, and the topic map's clustering.

See also

Last reviewed: 10 May 2026 · Licensed CC BY 4.0 · cite freely with attribution to Pith.