The Question Type Detection system automatically analyzes queries to determine what kind of answer the user is looking for, then adjusts entity type weights accordingly. This dramatically improves search accuracy by understanding user intent.
The system recognizes 8 different question types:
| Question Type | Patterns | Expected Answer | Example |
|---|---|---|---|
| WHO | who, whom, whose, which person | Person (PER) | "Who was the California Attorney General?" |
| WHERE | where, which place/location | Location (LOC) | "Where did Kamala Harris work?" |
| WHEN | when, what time/date/year | Date/Time | "When was Gavin Newsom governor?" |
| WHAT_ORG | what organization/company | Organization (ORG) | "What organization did she lead?" |
| WHAT_POSITION | what position/role/title | Position/Role | "What position did Harris hold?" |
| HOW_MANY | how many/much, what number | Numbers/Quantities | "How many years was she AG?" |
| WHY | why, what reason/cause | Reasons/Events | "Why did she leave office?" |
| WHAT_EVENT | what event/happened | Events | "What happened in 2020?" |
Based on the detected question type, the system applies different weights to entity types:
entity_weights = {
'PER': 2.0, # ⭐⭐ Strong preference for persons
'ORG': 0.5, # Organizations less relevant
'LOC': 0.3, # Locations even less relevant
'DATE': 0.8, # Dates somewhat relevant (context)
}Example:
Query: "Who was the California Attorney General in 2020?"
Node A: "Xavier Becerra" (PER entity)
- Entity type score: 2.0 (high boost) ✅
Node B: "California" (LOC entity)
- Entity type score: 0.3 (penalty) ❌
Result: Person entities rank higher
entity_weights = {
'LOC': 2.0, # ⭐⭐ Strong preference for locations
'GPE': 2.0, # Geopolitical entities
'FAC': 1.5, # Facilities
'PER': 0.5, # Persons less relevant
'ORG': 0.7, # Organizations somewhat relevant
}Example:
Query: "Where did Kamala Harris work as district attorney?"
Node A: "San Francisco" (LOC entity)
- Entity type score: 2.0 (high boost) ✅
Node B: "Kamala Harris" (PER entity)
- Entity type score: 0.5 (penalty) ❌
Result: Location entities rank higher
entity_weights = {
'DATE': 2.0, # ⭐⭐ Strong preference for dates
'TIME': 2.0, # Times
'EVENT': 1.5, # Events (often have dates)
'PER': 0.8, # Persons somewhat relevant
}The entity type score is integrated as the 5th factor in the ranking system:
final_score = 0.30 × semantic_score + # RRF hybrid search
0.15 × connection_score + # Graph connectivity
0.20 × temporal_score + # Time period match
0.20 × query_match_score + # Term matching
0.15 × entity_type_score # Question-type alignment ⭐
Patterns:
who,whom,whosewhich person,what personname the person
Entity Weights:
- PER: 2.0x (strong boost)
- ORG: 0.5x (penalty)
- LOC: 0.3x (penalty)
- DATE: 0.8x (slight penalty)
Use Case: Finding specific people, identifying individuals
Patterns:
wherewhich place/locationin which city/state/countrywhat location
Entity Weights:
- LOC: 2.0x (strong boost)
- GPE: 2.0x (geopolitical entities)
- FAC: 1.5x (facilities)
- PER: 0.5x (penalty)
- ORG: 0.7x (slight penalty)
Use Case: Finding places, locations, geographic information
Patterns:
whenwhat time/date/yearin which yearhow longwhat period
Entity Weights:
- DATE: 2.0x (strong boost)
- TIME: 2.0x (strong boost)
- EVENT: 1.5x (boost)
- PER/ORG/LOC: 0.8x (slight penalty)
Use Case: Finding dates, time periods, durations
Patterns:
what organization/company/agencywhich organization/institutionname the organization
Entity Weights:
- ORG: 2.0x (strong boost)
- PER: 0.7x (slight penalty)
- LOC: 0.8x (slight penalty)
Use Case: Finding organizations, companies, agencies
Patterns:
what position/role/title/jobwhich position/officeattorney general,governor,senator, etc.
Entity Weights:
- PER: 1.5x (boost)
- ORG: 1.5x (boost)
- LOC: 1.2x (slight boost)
- DATE: 1.5x (boost)
Use Case: Finding roles, positions, job titles
Patterns:
how many/muchwhat number/percentage/amount
Entity Weights:
- CARDINAL: 2.0x (numbers)
- MONEY: 2.0x (money amounts)
- PERCENT: 2.0x (percentages)
- QUANTITY: 2.0x (quantities)
Use Case: Finding numbers, counts, amounts
Patterns:
whywhat reason/causewhy did
Entity Weights:
- EVENT: 1.5x (boost)
- PER: 1.2x (slight boost)
- ORG: 1.2x (slight boost)
- DATE: 1.2x (slight boost)
Use Case: Finding reasons, causes, explanations
Patterns:
what event/incident/occurrencewhat happened
Entity Weights:
- EVENT: 2.0x (strong boost)
- DATE: 1.5x (boost)
- LOC: 1.3x (boost)
- PER: 1.2x (slight boost)
Use Case: Finding events, incidents, happenings
Query: "Who was the California Attorney General in 2020?"
Without Question-Type Detection:
Results:
1. California (LOC) - high semantic match
2. Attorney General (ORG) - role match
3. Xavier Becerra (PER) - person match
With Question-Type Detection:
Question Type: WHO (Person-focused)
Entity Type Weights: PER=2.0x, LOC=0.3x, ORG=0.5x
Results:
1. Xavier Becerra (PER) - person match + 2.0x boost ✅
2. Kamala Harris (PER) - person match + 2.0x boost
3. California (LOC) - high semantic but 0.3x penalty
Improvement: Correctly prioritizes person entities!
Query: "Where did Kamala Harris work as district attorney?"
Without Question-Type Detection:
Results:
1. Kamala Harris (PER) - name match
2. District Attorney (ORG) - role match
3. San Francisco (LOC) - location match
With Question-Type Detection:
Question Type: WHERE (Location-focused)
Entity Type Weights: LOC=2.0x, PER=0.5x, ORG=0.7x
Results:
1. San Francisco (LOC) - location + 2.0x boost ✅
2. California (LOC) - location + 2.0x boost
3. District Attorney (ORG) - role but 0.7x penalty
Improvement: Correctly prioritizes location entities!
Query: "When did Gavin Newsom become governor?"
Without Question-Type Detection:
Results:
1. Gavin Newsom (PER) - name match
2. Governor (ORG) - role match
3. January 7, 2019 (DATE) - date match
With Question-Type Detection:
Question Type: WHEN (Time-focused)
Entity Type Weights: DATE=2.0x, PER=0.8x, ORG=0.8x
Results:
1. January 7, 2019 (DATE) - date + 2.0x boost ✅
2. 2019 (DATE) - year + 2.0x boost
3. Gavin Newsom (PER) - name but 0.8x penalty
Improvement: Correctly prioritizes date entities!
from maintest import QuestionTypeDetector
# Initialize detector
detector = QuestionTypeDetector()
# Detect question type
query = "Who was the California Attorney General in 2020?"
question_info = detector.detect_question_type(query)
print(f"Type: {question_info['type']}")
print(f"Description: {question_info['description']}")
print(f"Confidence: {question_info['confidence']}")
print(f"Entity Weights: {question_info['entity_weights']}")Output:
Type: WHO
Description: Person-focused query
Confidence: 1.0
Entity Weights: {'PER': 2.0, 'ORG': 0.5, 'LOC': 0.3, 'DATE': 0.8}
# Use question-aware search
await search_episodes_with_question_aware_ranking(
graphiti,
"Who was the California Attorney General in 2020?",
enricher=enricher,
connection_weight=0.15,
temporal_weight=0.20,
query_match_weight=0.20,
entity_type_weight=0.15 # 15% weight for entity type matching
)- Automatically detects what kind of answer user wants
- No manual configuration needed
- Works across different query phrasings
- WHO questions return people, not places
- WHERE questions return locations, not people
- WHEN questions return dates, not entities
- Provides confidence score for question type detection
- Multiple pattern matches increase confidence
- Fallback to general search if uncertain
- Entity type weight can be adjusted (0.0 to 1.0)
- Higher weight = stronger entity type preference
- Lower weight = more balanced with other factors
# Strong entity type preference (20%)
entity_type_weight=0.20
# Moderate entity type preference (15%) - RECOMMENDED
entity_type_weight=0.15
# Light entity type preference (10%)
entity_type_weight=0.10
# No entity type preference (0%)
entity_type_weight=0.0You can extend the QUESTION_PATTERNS dictionary to add custom patterns:
QUESTION_PATTERNS['CUSTOM_TYPE'] = {
'patterns': [r'\bcustom pattern\b'],
'entity_weights': {'PER': 1.5, 'LOC': 1.2},
'description': 'Custom query type'
}- Detection Speed: ~1-5ms per query (regex matching)
- Entity Extraction: ~50-500ms per node (depends on spaCy)
- Overall Impact: +10-20% search time for significantly better accuracy
- English-Only: Patterns are English-specific
- Pattern-Based: May miss complex or unusual phrasings
- Entity Extraction Dependency: Requires NER for best results
- Single Question Type: Detects primary question type only
- Multi-Language Support: Patterns for other languages
- ML-Based Detection: Use ML model instead of regex patterns
- Multi-Question Handling: Handle compound questions
- Context Awareness: Consider conversation history
- Custom Entity Types: Support domain-specific entity types
- Learning System: Learn from user feedback to improve weights
- Use with NER: Entity type weighting works best with NER enrichment
- Tune Weights: Adjust
entity_type_weightbased on your use case - Monitor Confidence: Low confidence may indicate unclear question
- Combine with Temporal: Use temporal ranking for date-specific queries
- Test Patterns: Test with various query phrasings