Skip to main content

What It Is

Entity salience is a 0.0-1.0 score that measures how prominent and important a specific named entity is within a piece of content. A salience of 0.8 means the entity is central to the page's topic; a salience of 0.1 means it's a passing mention. Google uses entity salience in its Natural Language API and ranking systems to understand what a page is truly about — not just what keywords it contains.

Why It Matters for Your SEO

Search engines have evolved from keyword matching to entity understanding. Google builds knowledge graphs connecting entities (people, places, products, organisations) and evaluates whether your content demonstrates genuine topical coverage. Pages that mention key entities with high salience — in titles, headings, and throughout the body — signal deep relevance. Pages where important entities only appear once in a footnote signal shallow coverage.

How Korvex Measures It

Each entity receives a salience score from 5 weighted components:

ComponentWeightWhat It Measures
Frequency30%How often the entity appears (10+ mentions approaches maximum)
Position25%Where the entity first appears (earlier = higher salience)
Keyword Proximity20%Distance between the entity and target keywords
Title/Heading Presence10%Whether the entity appears in the page title (+7%) or headings (+3%)
Contextual Importance15%Semantic similarity between the entity and the overall page topic

Salience Thresholds

RangeMeaning
0.5-1.0Primary entity — the page is fundamentally about this entity
0.3-0.5Supporting entity — important to the topic, mentioned substantially
0.15-0.3Referenced entity — relevant but not central
0.0-0.15Passing mention — barely relevant to the page's core topic

How to Improve Your Score

  1. Mention key entities early — the position signal rewards entities that appear in the first 20% of content
  2. Use entities in headings — placing an entity in an H2 or H3 adds up to 3% to its salience
  3. Include entities in the title — worth up to 7% of the salience score
  4. Reference entities consistently — 5-10 natural mentions across different sections scores well
  5. Co-locate entities with keywords — keep target entities near your tracked keywords for the proximity signal
<details> <summary>Technical Deep Dive</summary>

Salience Formula

salience = (frequency × 0.30) + (position × 0.25) + (proximity × 0.20) + (title_heading × 0.10) + (contextual × 0.15)

Where:

  • Frequency: min(occurrence_count / 10.0, 0.30) — 10+ occurrences reaches the cap
  • Position: (1.0 - (first_position / text_length)) × 0.25 — appearing at position 0 (start) = maximum
  • Keyword Proximity: (1.0 / (1.0 + min_distance / 1000.0)) × 0.20 — closer to keywords = higher
  • Title/Heading: In title = +0.07, in heading = +0.03, capped at 0.10
  • Contextual Importance: cosine_similarity(entity_embedding, mean_context_embedding) × 0.15

Entity Extraction Pipeline

  1. NER: spaCy en_core_web_sm identifies named entities and their types
  2. Type mapping: spaCy types → standardised types (e.g., ORGORGANIZATION, GPE/LOC/FACLOCATION)
  3. Filtering: entities must be > 2 characters with at least one alphabetic character
  4. Quality filter: 3-layer filter removes technical artifacts, template text, and low-quality entities
  5. Salience scoring: 5-component weighted formula applied to each surviving entity
  6. Corpus storage: entities stored in corpus_entity_occurrences with page reference and salience score

Entity Types

Standardised TypeSource TypesExample
PERSONPERSON"Koray Tugberk"
ORGANIZATIONORG"Google", "Korvex"
LOCATIONGPE, LOC, FAC"London", "Canary Wharf"
CONSUMER_GOODPRODUCT"iPhone 15"
EVENTEVENT"Google I/O"
NUMBERPERCENT, QUANTITY, CARDINAL"45%", "3.5 million"

Data Sources

  • Extraction: Phase 5 page scoring pipeline
  • Embeddings: SentenceTransformer all-MiniLM-L6-v2 (384 dimensions)
  • Storage: corpus_entity_occurrences table (entity × page × salience)
  • Knowledge graph: Linkable types (PERSON, ORGANIZATION, LOCATION, EVENT, WORK_OF_ART) stored in Neo4j
</details>
Last updated: 2026-03-20