What It Is
Information Gain is a 0-100 score that measures how much unique value your content provides compared to competing pages ranking for the same keywords. It answers the question: "Does this page tell the reader something they can't find elsewhere?" Google's Information Gain patent (US20200349181A1) describes exactly this concept — pages that add new information to a topic deserve higher rankings.
Why It Matters for Your SEO
In competitive SERPs, pages that merely rewrite what every other result says offer zero information gain. Google actively looks for pages that contribute unique entities, original concepts, proprietary data, or novel perspectives. A high Information Gain score means your content is genuinely additive — not just another commodity article. Pages with scores above 60 consistently outperform similar-quality pages that lack unique content.
How Korvex Measures It
The score combines three components:
| Component | Points | What It Measures |
|---|---|---|
| Unique Entities | 0-40 | Named things (people, products, organisations) that appear in your content but not in competitors |
| Unique Concepts | 0-40 | Ideas, phrases, and topic clusters unique to your page |
| Content Depth | 0-20 | Structural quality — word count, heading structure, paragraph depth |
Score Ranges
| Range | Rating | What It Means |
|---|---|---|
| 70-100 | High Gain | Substantial original content — strong differentiation |
| 50-69 | Moderate Gain | Some unique angles but overlaps significantly with competitors |
| 30-49 | Low Gain | Mostly commodity content with limited original contribution |
| 0-29 | Minimal Gain | Nearly all content duplicates what competitors already cover |
How to Improve Your Score
- Add proprietary data — original research, case studies with real numbers, surveys, or benchmarks
- Cover entities competitors miss — identify entity gaps using the Entity Intelligence page and fill them
- Go deeper on subtopics — expand thin sections with original analysis rather than surface-level summaries
- Include expert perspectives — quotes, interviews, or commentary from practitioners
- Structure for depth — use 10+ headings and 15+ paragraphs to demonstrate comprehensive coverage
Scoring Components
Unique Entities (0-40 points):
base_score = uniqueness_ratio × 30(ratio of entities NOT found in competitors)bonus = min(unique_count / 20, 1.0) × 10(absolute count bonus)- Semantic deduplication: entities with cosine similarity > 0.85 to a competitor entity count as duplicates
- Uses spaCy entity extraction + SentenceTransformer embeddings (all-MiniLM-L6-v2, 384-dim)
Unique Concepts (0-40 points):
- Same formula as entities but for concept phrases (unigrams ≥4 chars, bigrams, trigrams)
- Top 200 concepts per side sampled for performance
- Stop-word filtered, BERT-normalised (NFD + strip combining marks)
bonus = min(unique_count / 50, 1.0) × 10
Content Depth (0-20 points):
- Word count (0-8 pts): 2000+ words = 8, 1500+ = 6, 1000+ = 4, 500+ = 2
- Heading structure (0-6 pts):
min(heading_count / 10, 1.0) × 6 - Paragraph structure (0-4 pts):
min(paragraph_count / 15, 1.0) × 4 - Technical detail (0-2 pts): average words per paragraph ≥ 100 = 2, ≥ 50 = 1.5
Data Sources
- Competitor content: Fetched during , stored in
page_scoreswithis_competitor = true - Entity extraction:
services/analyzers/entity_extractor.pyusing spaCyen_core_web_sm - Embedding model:
sentence-transformers/all-MiniLM-L6-v2(384 dimensions, cosine similarity) - Update frequency: Recalculated when page is re-scored in Phase 5
Related Concepts
- Entity Salience — how individual entities are weighted
- The Koray Score — information gain contributes to overall content quality
- Content Opportunities — finding topics with high information gain potential