You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Shaurya** is a high-frequency trading (HFT) market data feed handler engineered for sub-microsecond latency. By leveraging **Zero-Copy parsing**, **Lock-Free concurrency**, and **Stack-based memory management**, it bypasses the performance bottlenecks of standard software architectures to process financial data with deterministic speed.
9
+
10
+
---
11
+
12
+
## ⚡ Performance Impact & Comparison
13
+
14
+
Shaurya was benchmarked using high-resolution hardware timers (`QueryPerformanceCounter`).
15
+
16
+
| Implementation Approach | Average Latency | Min Latency | Why it's Slow/Fast? |
> **The Result:** Shaurya achieves a minimum internal reaction time of **300 nanoseconds**, approximately **50x faster** than standard Python implementations.
23
+
>
24
+
> **Measured in Pure Mock Environment*
25
+
26
+

27
+
28
+
29
+
### 🌍 Real-World Validation: The "Fragmented Liquidity" Test
30
+
Shaurya was subjected to a **30-minute stress test** aggregating live ticks from **Binance, Coinbase, and Bitstamp** simultaneously.
***Outcome:** The engine successfully normalized fragmented liquidity streams in real-time. While average latency increased under OS scheduler load (due to non-isolated cores), the **minimum latency remained at 0.3 µs**, proving the core engine's efficiency remains stable even during crypto market volatility.
35
+
36
+
---
37
+
38
+
## 🏗 Key Technical Innovations
39
+
40
+
### 1. Zero-Copy Architecture
41
+
Instead of copying network packets into new `std::string` objects (which forces the OS to allocate memory), Shaurya uses a custom `StringViewLite` class. This creates a lightweight "view" over the raw socket buffer, allowing the engine to parse prices without moving a single byte of memory.
42
+
43
+
### 2. Lock-Free Concurrency (SPSC)
44
+
Traditional systems use Mutex locks (`std::mutex`) to share data between threads, which forces the CPU to stop and switch contexts (expensive). Shaurya implements a **Single-Producer Single-Consumer Ring Buffer** using `std::atomic` instructions. This allows the Network Thread to push data and the Strategy Thread to read data simultaneously without ever blocking.
45
+
46
+
### 3. CPU Cache Optimization
47
+
Critical data structures are aligned to 64-byte cache lines (`alignas(64)`). This prevents **False Sharing**, a phenomenon where two threads fight over the same CPU cache line, drastically reducing performance on multi-core systems.
48
+
49
+
---
50
+
51
+
## 🚀 Quick Start
52
+
53
+
### Prerequisites
54
+
***OS:** Windows (Required for `winsock2` and `QueryPerformanceCounter`)
55
+
***Compiler:** G++ (MinGW) supporting C++11 or higher.
56
+
57
+
### Execution Guide
58
+
1.**Build the System:**
59
+
```cmd
60
+
build.bat
61
+
```
62
+
2. **Start Data Source:**
63
+
```python bridge.py```
64
+
3. **Start Shaurya Engine:**
65
+
```cmd
66
+
bin\Shaurya.exe
67
+
```
68
+
69
+
*Upon completion, the engine generates a `Shaurya_Metrics.txt` report detailing the nanosecond-level performance of the run.*
70
+
71
+
---
72
+
73
+
## Resources
74
+
75
+
If you are new to High-Frequency Trading systems, these concepts explain the "Why" behind Shaurya's architecture:
76
+
77
+
* **Latency vs. Jitter:** [Understand why "Average Speed" is useless in HFT](https://www.youtube.com/watch?v=NH1Tta7purM).
0 commit comments