Question Type Detection and Entity Type Weighting

Overview

The Question Type Detection system automatically analyzes queries to determine what kind of answer the user is looking for, then adjusts entity type weights accordingly. This dramatically improves search accuracy by understanding user intent.

How It Works

1. Question Type Detection

The system recognizes 8 different question types:

Question Type	Patterns	Expected Answer	Example
WHO	who, whom, whose, which person	Person (PER)	"Who was the California Attorney General?"
WHERE	where, which place/location	Location (LOC)	"Where did Kamala Harris work?"
WHEN	when, what time/date/year	Date/Time	"When was Gavin Newsom governor?"
WHAT_ORG	what organization/company	Organization (ORG)	"What organization did she lead?"
WHAT_POSITION	what position/role/title	Position/Role	"What position did Harris hold?"
HOW_MANY	how many/much, what number	Numbers/Quantities	"How many years was she AG?"
WHY	why, what reason/cause	Reasons/Events	"Why did she leave office?"
WHAT_EVENT	what event/happened	Events	"What happened in 2020?"

2. Entity Type Weighting

Based on the detected question type, the system applies different weights to entity types:

WHO Questions (Person-focused)

entity_weights = {
    'PER': 2.0,    # ⭐⭐ Strong preference for persons
    'ORG': 0.5,    # Organizations less relevant
    'LOC': 0.3,    # Locations even less relevant
    'DATE': 0.8,   # Dates somewhat relevant (context)
}

Example:

Query: "Who was the California Attorney General in 2020?"

Node A: "Xavier Becerra" (PER entity)
- Entity type score: 2.0 (high boost) ✅

Node B: "California" (LOC entity)
- Entity type score: 0.3 (penalty) ❌

Result: Person entities rank higher

WHERE Questions (Location-focused)

entity_weights = {
    'LOC': 2.0,    # ⭐⭐ Strong preference for locations
    'GPE': 2.0,    # Geopolitical entities
    'FAC': 1.5,    # Facilities
    'PER': 0.5,    # Persons less relevant
    'ORG': 0.7,    # Organizations somewhat relevant
}

Example:

Query: "Where did Kamala Harris work as district attorney?"

Node A: "San Francisco" (LOC entity)
- Entity type score: 2.0 (high boost) ✅

Node B: "Kamala Harris" (PER entity)
- Entity type score: 0.5 (penalty) ❌

Result: Location entities rank higher

WHEN Questions (Time-focused)

entity_weights = {
    'DATE': 2.0,   # ⭐⭐ Strong preference for dates
    'TIME': 2.0,   # Times
    'EVENT': 1.5,  # Events (often have dates)
    'PER': 0.8,    # Persons somewhat relevant
}

3. Scoring Integration

The entity type score is integrated as the 5th factor in the ranking system:

final_score = 0.30 × semantic_score +        # RRF hybrid search
              0.15 × connection_score +       # Graph connectivity
              0.20 × temporal_score +         # Time period match
              0.20 × query_match_score +      # Term matching
              0.15 × entity_type_score        # Question-type alignment ⭐

Complete Question Type Patterns

WHO - Person Queries

Patterns:

who, whom, whose
which person, what person
name the person

Entity Weights:

PER: 2.0x (strong boost)
ORG: 0.5x (penalty)
LOC: 0.3x (penalty)
DATE: 0.8x (slight penalty)

Use Case: Finding specific people, identifying individuals

WHERE - Location Queries

Patterns:

where
which place/location
in which city/state/country
what location

Entity Weights:

LOC: 2.0x (strong boost)
GPE: 2.0x (geopolitical entities)
FAC: 1.5x (facilities)
PER: 0.5x (penalty)
ORG: 0.7x (slight penalty)

Use Case: Finding places, locations, geographic information

WHEN - Time Queries

Patterns:

when
what time/date/year
in which year
how long
what period

Entity Weights:

DATE: 2.0x (strong boost)
TIME: 2.0x (strong boost)
EVENT: 1.5x (boost)
PER/ORG/LOC: 0.8x (slight penalty)

Use Case: Finding dates, time periods, durations

WHAT_ORG - Organization Queries

Patterns:

what organization/company/agency
which organization/institution
name the organization

Entity Weights:

ORG: 2.0x (strong boost)
PER: 0.7x (slight penalty)
LOC: 0.8x (slight penalty)

Use Case: Finding organizations, companies, agencies

WHAT_POSITION - Role/Title Queries

Patterns:

what position/role/title/job
which position/office
attorney general, governor, senator, etc.

Entity Weights:

PER: 1.5x (boost)
ORG: 1.5x (boost)
LOC: 1.2x (slight boost)
DATE: 1.5x (boost)

Use Case: Finding roles, positions, job titles

HOW_MANY - Quantity Queries

Patterns:

how many/much
what number/percentage/amount

Entity Weights:

CARDINAL: 2.0x (numbers)
MONEY: 2.0x (money amounts)
PERCENT: 2.0x (percentages)
QUANTITY: 2.0x (quantities)

Use Case: Finding numbers, counts, amounts

WHY - Reason Queries

Patterns:

why
what reason/cause
why did

Entity Weights:

EVENT: 1.5x (boost)
PER: 1.2x (slight boost)
ORG: 1.2x (slight boost)
DATE: 1.2x (slight boost)

Use Case: Finding reasons, causes, explanations

WHAT_EVENT - Event Queries

Patterns:

what event/incident/occurrence
what happened

Entity Weights:

EVENT: 2.0x (strong boost)
DATE: 1.5x (boost)
LOC: 1.3x (boost)
PER: 1.2x (slight boost)

Use Case: Finding events, incidents, happenings

Example Comparisons

Example 1: WHO Question

Query: "Who was the California Attorney General in 2020?"

Without Question-Type Detection:

Results:
1. California (LOC) - high semantic match
2. Attorney General (ORG) - role match
3. Xavier Becerra (PER) - person match

With Question-Type Detection:

Question Type: WHO (Person-focused)
Entity Type Weights: PER=2.0x, LOC=0.3x, ORG=0.5x

Results:
1. Xavier Becerra (PER) - person match + 2.0x boost ✅
2. Kamala Harris (PER) - person match + 2.0x boost
3. California (LOC) - high semantic but 0.3x penalty

Improvement: Correctly prioritizes person entities!

Example 2: WHERE Question

Query: "Where did Kamala Harris work as district attorney?"

Without Question-Type Detection:

Results:
1. Kamala Harris (PER) - name match
2. District Attorney (ORG) - role match
3. San Francisco (LOC) - location match

With Question-Type Detection:

Question Type: WHERE (Location-focused)
Entity Type Weights: LOC=2.0x, PER=0.5x, ORG=0.7x

Results:
1. San Francisco (LOC) - location + 2.0x boost ✅
2. California (LOC) - location + 2.0x boost
3. District Attorney (ORG) - role but 0.7x penalty

Improvement: Correctly prioritizes location entities!

Example 3: WHEN Question

Query: "When did Gavin Newsom become governor?"

Without Question-Type Detection:

Results:
1. Gavin Newsom (PER) - name match
2. Governor (ORG) - role match
3. January 7, 2019 (DATE) - date match

With Question-Type Detection:

Question Type: WHEN (Time-focused)
Entity Type Weights: DATE=2.0x, PER=0.8x, ORG=0.8x

Results:
1. January 7, 2019 (DATE) - date + 2.0x boost ✅
2. 2019 (DATE) - year + 2.0x boost
3. Gavin Newsom (PER) - name but 0.8x penalty

Improvement: Correctly prioritizes date entities!

Usage

Basic Usage

from maintest import QuestionTypeDetector

# Initialize detector
detector = QuestionTypeDetector()

# Detect question type
query = "Who was the California Attorney General in 2020?"
question_info = detector.detect_question_type(query)

print(f"Type: {question_info['type']}")
print(f"Description: {question_info['description']}")
print(f"Confidence: {question_info['confidence']}")
print(f"Entity Weights: {question_info['entity_weights']}")

Output:

Type: WHO
Description: Person-focused query
Confidence: 1.0
Entity Weights: {'PER': 2.0, 'ORG': 0.5, 'LOC': 0.3, 'DATE': 0.8}

Integrated Search

# Use question-aware search
await search_episodes_with_question_aware_ranking(
    graphiti,
    "Who was the California Attorney General in 2020?",
    enricher=enricher,
    connection_weight=0.15,
    temporal_weight=0.20,
    query_match_weight=0.20,
    entity_type_weight=0.15  # 15% weight for entity type matching
)

Benefits

1. Intent Understanding

Automatically detects what kind of answer user wants
No manual configuration needed
Works across different query phrasings

2. Improved Accuracy

WHO questions return people, not places
WHERE questions return locations, not people
WHEN questions return dates, not entities

3. Confidence Scoring

Provides confidence score for question type detection
Multiple pattern matches increase confidence
Fallback to general search if uncertain

4. Flexible Weighting

Entity type weight can be adjusted (0.0 to 1.0)
Higher weight = stronger entity type preference
Lower weight = more balanced with other factors

Configuration

Adjusting Entity Type Weight

# Strong entity type preference (20%)
entity_type_weight=0.20

# Moderate entity type preference (15%) - RECOMMENDED
entity_type_weight=0.15

# Light entity type preference (10%)
entity_type_weight=0.10

# No entity type preference (0%)
entity_type_weight=0.0

Custom Question Patterns

You can extend the QUESTION_PATTERNS dictionary to add custom patterns:

QUESTION_PATTERNS['CUSTOM_TYPE'] = {
    'patterns': [r'\bcustom pattern\b'],
    'entity_weights': {'PER': 1.5, 'LOC': 1.2},
    'description': 'Custom query type'
}

Performance Impact

Detection Speed: ~1-5ms per query (regex matching)
Entity Extraction: ~50-500ms per node (depends on spaCy)
Overall Impact: +10-20% search time for significantly better accuracy

Limitations

English-Only: Patterns are English-specific
Pattern-Based: May miss complex or unusual phrasings
Entity Extraction Dependency: Requires NER for best results
Single Question Type: Detects primary question type only

Future Enhancements

Multi-Language Support: Patterns for other languages
ML-Based Detection: Use ML model instead of regex patterns
Multi-Question Handling: Handle compound questions
Context Awareness: Consider conversation history
Custom Entity Types: Support domain-specific entity types
Learning System: Learn from user feedback to improve weights

Best Practices

Use with NER: Entity type weighting works best with NER enrichment
Tune Weights: Adjust entity_type_weight based on your use case
Monitor Confidence: Low confidence may indicate unclear question
Combine with Temporal: Use temporal ranking for date-specific queries
Test Patterns: Test with various query phrasings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question Type Detection and Entity Type Weighting

Overview

How It Works

1. Question Type Detection

2. Entity Type Weighting

WHO Questions (Person-focused)

WHERE Questions (Location-focused)

WHEN Questions (Time-focused)

3. Scoring Integration

Complete Question Type Patterns

WHO - Person Queries

WHERE - Location Queries

WHEN - Time Queries

WHAT_ORG - Organization Queries

WHAT_POSITION - Role/Title Queries

HOW_MANY - Quantity Queries

WHY - Reason Queries

WHAT_EVENT - Event Queries

Example Comparisons

Example 1: WHO Question

Example 2: WHERE Question

Example 3: WHEN Question

Usage

Basic Usage

Integrated Search

Benefits

1. Intent Understanding

2. Improved Accuracy

3. Confidence Scoring

4. Flexible Weighting

Configuration

Adjusting Entity Type Weight

Custom Question Patterns

Performance Impact

Limitations

Future Enhancements

Best Practices

FilesExpand file tree

QUESTION_TYPE_DETECTION.md

Latest commit

History

QUESTION_TYPE_DETECTION.md

File metadata and controls

Question Type Detection and Entity Type Weighting

Overview

How It Works

1. Question Type Detection

2. Entity Type Weighting

WHO Questions (Person-focused)

WHERE Questions (Location-focused)

WHEN Questions (Time-focused)

3. Scoring Integration

Complete Question Type Patterns

WHO - Person Queries

WHERE - Location Queries

WHEN - Time Queries

WHAT_ORG - Organization Queries

WHAT_POSITION - Role/Title Queries

HOW_MANY - Quantity Queries

WHY - Reason Queries

WHAT_EVENT - Event Queries

Example Comparisons

Example 1: WHO Question

Example 2: WHERE Question

Example 3: WHEN Question

Usage

Basic Usage

Integrated Search

Benefits

1. Intent Understanding

2. Improved Accuracy

3. Confidence Scoring

4. Flexible Weighting

Configuration

Adjusting Entity Type Weight

Custom Question Patterns

Performance Impact

Limitations

Future Enhancements

Best Practices