Entity Salience — How Named Things Are Weighted

What It Is

Entity salience is a 0.0-1.0 score that measures how prominent and important a specific named entity is within a piece of content. A salience of 0.8 means the entity is central to the page's topic; a salience of 0.1 means it's a passing mention. Google uses entity salience in its Natural Language API and ranking systems to understand what a page is truly about — not just what keywords it contains.

Why It Matters for Your SEO

Search engines have evolved from keyword matching to entity understanding. Google builds knowledge graphs connecting entities (people, places, products, organisations) and evaluates whether your content demonstrates genuine topical coverage. Pages that mention key entities with high salience — in titles, headings, and throughout the body — signal deep relevance. Pages where important entities only appear once in a footnote signal shallow coverage.

How korvex Measures It

Each entity receives a salience score from 5 weighted components:

Component	Weight	What It Measures
Frequency	30%	How often the entity appears (10+ mentions approaches maximum)
Position	25%	Where the entity first appears (earlier = higher salience)
Keyword Proximity	20%	Distance between the entity and target keywords
Title/Heading Presence	10%	Whether the entity appears in the page title (+7%) or headings (+3%)
Contextual Importance	15%	Semantic similarity between the entity and the overall page topic

Salience Thresholds

Range	Meaning
0.5-1.0	Primary entity — the page is fundamentally about this entity
0.3-0.5	Supporting entity — important to the topic, mentioned substantially
0.15-0.3	Referenced entity — relevant but not central
0.0-0.15	Passing mention — barely relevant to the page's core topic

How to Improve Your Score

Mention key entities early — the position signal rewards entities that appear in the first 20% of content
Use entities in headings — placing an entity in an H2 or H3 adds up to 3% to its salience
Include entities in the title — worth up to 7% of the salience score
Reference entities consistently — 5-10 natural mentions across different sections scores well
Co-locate entities with keywords — keep target entities near your tracked keywords for the proximity signal

<details> <summary>Technical Deep Dive</summary>

Salience Formula

salience = (frequency × 0.30) + (position × 0.25) + (proximity × 0.20) + (title_heading × 0.10) + (contextual × 0.15)

Where:

Frequency: min(occurrence_count / 10.0, 0.30) — 10+ occurrences reaches the cap
Position: (1.0 - (first_position / text_length)) × 0.25 — appearing at position 0 (start) = maximum
Keyword Proximity: (1.0 / (1.0 + min_distance / 1000.0)) × 0.20 — closer to keywords = higher
Title/Heading: In title = +0.07, in heading = +0.03, capped at 0.10
Contextual Importance: cosine_similarity(entity_embedding, mean_context_embedding) × 0.15

Entity Extraction Pipeline

NER: spaCy en_core_web_sm identifies named entities and their types
Type mapping: spaCy types → standardised types (e.g., ORG → ORGANIZATION, GPE/LOC/FAC → LOCATION)
Filtering: entities must be > 2 characters with at least one alphabetic character
Quality filter: 3-layer filter removes technical artifacts, template text, and low-quality entities
Salience scoring: 5-component weighted formula applied to each surviving entity
Corpus storage: entities stored in corpus_entity_occurrences with page reference and salience score

Entity Types

Standardised Type	Source Types	Example
PERSON	PERSON	"Koray Tugberk"
ORGANIZATION	ORG	"Google", "korvex"
LOCATION	GPE, LOC, FAC	"London", "Canary Wharf"
CONSUMER_GOOD	PRODUCT	"iPhone 15"
EVENT	EVENT	"Google I/O"
NUMBER	PERCENT, QUANTITY, CARDINAL	"45%", "3.5 million"

Data Sources

Extraction: Phase 5 page scoring pipeline
Embeddings: SentenceTransformer all-MiniLM-L6-v2 (384 dimensions)
Storage: corpus_entity_occurrences table (entity × page × salience)
Knowledge graph: Linkable types (PERSON, ORGANIZATION, LOCATION, EVENT, WORK_OF_ART) stored in Neo4j

The Koray Score — entity coverage feeds the Central Entity fundamental
Information Gain — unique entities drive the information gain score
Semantic Networks — entities form nodes in the internal linking graph

</details>