Easiest entry point: minimal - Maximum token reduction (95.7%) with minimum system prompt overhead (4 lines).
Observed results (gpt-4o-mini, 5 scenarios):
| Variant | Compliance | Token Reduction | System Prompt Size |
|---|---|---|---|
| minimal | 40% | 95.7% | Smallest (~4 lines) |
| balanced | 40% | 95.4% | Medium |
| strict | 80% | 88.8% | Largest |
Open questions:
- Does higher compliance justify larger prompt overhead?
- How does compliance vary across models and tasks?
- What's the optimal balance between prompt size and compliance?
Reference: README.md for test results.
Observed symptom: Model outputs English instead of Vector-Native symbols.
Questions to explore:
- Does strict variant consistently achieve higher compliance across models?
- How does temperature affect compliance for each variant?
- Can fallback parsing handle low-compliance scenarios gracefully?
Potential approaches:
- Try different variants (strict shows 80% compliance in limited tests)
- Experiment with temperature (lower may increase compliance, but needs validation)
- Add fallback parsing (detect English output, parse as natural language)
- Strengthen attention symbols (ensure operations start with
●,○, or━) - Include examples in system prompt
What have you observed? Share your findings to help build collective knowledge.
Reference: prompts/README.md for troubleshooting ideas.
Observed patterns (limited testing):
| Variant | Temperature Range Tested | Compliance Observed |
|---|---|---|
| strict | 0.1 - 0.2 | 80% |
| balanced | 0.3 - 0.5 | 40% |
| minimal | 0.5 - 0.7 | 40% |
Open questions:
- Does lower temperature always increase compliance?
- How does temperature interact with different models?
- What's the optimal temperature for your specific use case?
Research direction: Test temperature ranges systematically and share results.
Reference: prompts/README.md for initial observations.
Unanswered questions:
- Does hybrid usage reduce efficiency? (Needs measurement)
- How does mixing affect compliance rates?
- Can attention symbols (
●,○,━) at operation start help isolate Vector-Native from English?
Potential research:
- Test hybrid prompts with delimited Vector-Native blocks
- Measure token reduction vs pure Vector-Native
- Compare compliance rates
What's your use case? Understanding real-world needs helps guide research priorities.
Reference: README.md for use case context.
Research questions:
- What validation strategies work best?
- How reliable is fallback parsing?
- Can we detect and recover from format errors automatically?
Example fallback approach:
def parse_with_fallback(output: str):
if "●" in output and "|" in output:
return parse_vector_native(output) # Vector-Native
else:
return parse_natural_language(output) # FallbackValidation ideas:
- Check for
●symbol, pipe separators, colon pairs - Retry with different variant if parsing fails
- Log failures to track compliance patterns
What error patterns have you seen? Share observations to improve robustness.
Reference: how-it-works.md for parsing examples.
Yes. Use vector_native.parser for parsing outputs.
Basic usage:
from vector_native.parser import parse_vector_native
output = "●analyze|dataset:Q4|metrics:revenue"
operations = parse_vector_native(output)For custom parsing: See how-it-works.md for implementation examples.
Reference: LANGUAGE_SPEC.md for parser guidelines.
python tests/test_token_reduction.py --variant minimal --scenarios 10from vector_native.tokenizer import count_tokens
english_tokens = count_tokens("Give attention and add values")
vn_tokens = count_tokens("●⊕")
reduction = (1 - vn_tokens / english_tokens) * 100- Generate same task in English vs Vector-Native
- Compare completion token counts from API response
- Calculate percentage reduction
What reduction rates are you seeing? Share results to expand the dataset.
Reference: token-savings.md for observed results (88-95% reduction in limited tests).
Research questions:
- How does compliance change across multiple turns?
- Does context accumulation degrade performance?
- What state management patterns work best?
Example pattern:
Turn 1: ●task|description:analyze_sales|context:Q4
Turn 2: ●result|status:complete|data:revenue:50000
Turn 3: ●next|action:generate_report|format:pdf
Potential challenges:
- Context accumulation (each turn adds tokens)
- Compliance drift (later turns may degrade)
- State management (track conversation state in parameters)
What patterns have you tested? Multi-turn usage needs more research.
Reference: how-it-works.md for A2A examples.
Research approach:
- Identify verbose sections in your prompts
- Convert to Vector-Native equivalents
- Test compliance with your model/temperature
- Measure token reduction
- Compare results
Example conversion:
- English: "You are helpful. Provide details. Focus on needs."
- Vector-Native:
●assistant|mode:helpful|detail:high|attention:needs
Open questions:
- How much can you convert before compliance drops?
- Does hybrid (English + Vector-Native) work better than pure conversion?
- What conversion patterns work best?
What have you tried? Migration strategies need real-world testing.
Reference: how-it-works.md for integration steps.
This indicates low compliance. See "What happens if compliance is low?" for research directions.
Questions to investigate:
- Why does compliance vary?
- Can we predict when English output will occur?
- What recovery strategies work best?
Potential approaches:
- Try different variants (strict shows higher compliance in limited tests)
- Experiment with temperature
- Add fallback parsing
What patterns have you observed? Understanding failure modes helps improve the system.
Research contribution steps:
- Create
prompts/your_variant.txtwith system prompt - Test with
python tests/test_token_reduction.py - Document compliance rate, temperature, model, scenarios tested
- Submit PR with test results and observations
What to include:
- Minimum 10 test scenarios
- Compliance rate observed
- Token reduction measured
- Model and temperature used
- Any patterns or anomalies noticed
Reference: CONTRIBUTING.md and prompts/README.md for guidelines.