Skip to main content

What It Is

Information Gain is a 0-100 score that measures how much unique value your content provides compared to competing pages ranking for the same keywords. It answers the question: "Does this page tell the reader something they can't find elsewhere?" Google's Information Gain patent (US20200349181A1) describes exactly this concept — pages that add new information to a topic deserve higher rankings.

Why It Matters for Your SEO

In competitive SERPs, pages that merely rewrite what every other result says offer zero information gain. Google actively looks for pages that contribute unique entities, original concepts, proprietary data, or novel perspectives. A high Information Gain score means your content is genuinely additive — not just another commodity article. Pages with scores above 60 consistently outperform similar-quality pages that lack unique content.

How Korvex Measures It

The score combines three components:

ComponentPointsWhat It Measures
Unique Entities0-40Named things (people, products, organisations) that appear in your content but not in competitors
Unique Concepts0-40Ideas, phrases, and topic clusters unique to your page
Content Depth0-20Structural quality — word count, heading structure, paragraph depth

Score Ranges

RangeRatingWhat It Means
70-100High GainSubstantial original content — strong differentiation
50-69Moderate GainSome unique angles but overlaps significantly with competitors
30-49Low GainMostly commodity content with limited original contribution
0-29Minimal GainNearly all content duplicates what competitors already cover

How to Improve Your Score

  1. Add proprietary data — original research, case studies with real numbers, surveys, or benchmarks
  2. Cover entities competitors miss — identify entity gaps using the Entity Intelligence page and fill them
  3. Go deeper on subtopics — expand thin sections with original analysis rather than surface-level summaries
  4. Include expert perspectives — quotes, interviews, or commentary from practitioners
  5. Structure for depth — use 10+ headings and 15+ paragraphs to demonstrate comprehensive coverage
<details> <summary>Technical Deep Dive</summary>

Scoring Components

Unique Entities (0-40 points):

  • base_score = uniqueness_ratio × 30 (ratio of entities NOT found in competitors)
  • bonus = min(unique_count / 20, 1.0) × 10 (absolute count bonus)
  • Semantic deduplication: entities with cosine similarity > 0.85 to a competitor entity count as duplicates
  • Uses spaCy entity extraction + SentenceTransformer embeddings (all-MiniLM-L6-v2, 384-dim)

Unique Concepts (0-40 points):

  • Same formula as entities but for concept phrases (unigrams ≥4 chars, bigrams, trigrams)
  • Top 200 concepts per side sampled for performance
  • Stop-word filtered, BERT-normalised (NFD + strip combining marks)
  • bonus = min(unique_count / 50, 1.0) × 10

Content Depth (0-20 points):

  • Word count (0-8 pts): 2000+ words = 8, 1500+ = 6, 1000+ = 4, 500+ = 2
  • Heading structure (0-6 pts): min(heading_count / 10, 1.0) × 6
  • Paragraph structure (0-4 pts): min(paragraph_count / 15, 1.0) × 4
  • Technical detail (0-2 pts): average words per paragraph ≥ 100 = 2, ≥ 50 = 1.5

Data Sources

  • Competitor content: Fetched during , stored in page_scores with is_competitor = true
  • Entity extraction: services/analyzers/entity_extractor.py using spaCy en_core_web_sm
  • Embedding model: sentence-transformers/all-MiniLM-L6-v2 (384 dimensions, cosine similarity)
  • Update frequency: Recalculated when page is re-scored in Phase 5
</details>
Last updated: 2026-03-20