Skip to content

Commit 34277f2

Browse files
release
1 parent 88a790a commit 34277f2

16 files changed

Lines changed: 599 additions & 49 deletions

.github/workflows/publish.yml

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
name: Publish to PyPI
2+
3+
on:
4+
release:
5+
types: [published]
6+
7+
permissions:
8+
contents: read
9+
10+
jobs:
11+
build_wheels:
12+
name: Build wheels on ${{ matrix.os }}
13+
runs-on: ${{ matrix.os }}
14+
strategy:
15+
matrix:
16+
# Build on Windows (YOU), Linux (Servers), and Mac (Devs)
17+
os: [ubuntu-latest, windows-latest, macos-latest]
18+
19+
steps:
20+
- uses: actions/checkout@v4
21+
22+
# Used to build the C++ Extension automatically
23+
- name: Build wheels
24+
uses: pypa/cibuildwheel@v2.16.5
25+
# env:
26+
# CIBW_SOME_OPTION: value
27+
28+
- uses: actions/upload-artifact@v4
29+
with:
30+
name: cibw-wheels-${{ matrix.os }}-${{ strategy.job-index }}
31+
path: ./wheelhouse/*.whl
32+
33+
build_sdist:
34+
name: Build source distribution
35+
runs-on: ubuntu-latest
36+
steps:
37+
- uses: actions/checkout@v4
38+
- name: Build sdist
39+
run: pipx run build --sdist
40+
- uses: actions/upload-artifact@v4
41+
with:
42+
name: cibw-sdist
43+
path: dist/*.tar.gz
44+
45+
publish_to_pypi:
46+
needs: [build_wheels, build_sdist]
47+
runs-on: ubuntu-latest
48+
environment: pypi
49+
permissions:
50+
id-token: write # IMPORTANT: Mandatory for trusted publishing
51+
52+
steps:
53+
- name: Download all the dists
54+
uses: actions/download-artifact@v4
55+
with:
56+
pattern: cibw-*
57+
path: dist
58+
merge-multiple: true
59+
60+
- name: Publish distribution 📦 to PyPI
61+
uses: pypa/gh-action-pypi-publish@release/v1

README.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# SHAURYA: Scalable High-frequency Architecture for Ultra-low Response Yield Access
2+
3+
![Language](https://img.shields.io/badge/language-C%2B%2B17-blue.svg)
4+
![Latency](https://img.shields.io/badge/min%20latency-300%20ns-brightgreen.svg)
5+
![Architecture](https://img.shields.io/badge/architecture-Lock--Free-orange.svg)
6+
![Parsing](https://img.shields.io/badge/parsing-Zero--Copy-red.svg)
7+
8+
**Shaurya** is a high-frequency trading (HFT) market data feed handler engineered for sub-microsecond latency. By leveraging **Zero-Copy parsing**, **Lock-Free concurrency**, and **Stack-based memory management**, it bypasses the performance bottlenecks of standard software architectures to process financial data with deterministic speed.
9+
10+
---
11+
12+
## ⚡ Performance Impact & Comparison
13+
14+
Shaurya was benchmarked using high-resolution hardware timers (`QueryPerformanceCounter`).
15+
16+
| Implementation Approach | Average Latency | Min Latency | Why it's Slow/Fast? |
17+
| :--- | :--- | :--- | :--- |
18+
| **Python Script** | ~45.0 µs | ~30.0 µs | Interpreter overhead & Garbage Collection pauses. |
19+
| **Standard C++ (`std::string`)** | ~5.0 µs | ~3.5 µs | Frequent Heap Allocations (`malloc`) & deep memory copying. |
20+
| **SHAURYA (Zero-Copy)** | **1.88 µs*** | **0.3 µs** | **Zero-Copy** pointer arithmetic & **Lock-Free** queues. |
21+
22+
> **The Result:** Shaurya achieves a minimum internal reaction time of **300 nanoseconds**, approximately **50x faster** than standard Python implementations.
23+
>
24+
> **Measured in Pure Mock Environment*
25+
26+
![WhatsApp Image 2025-12-12 at 23 55 35_9e0c137d](https://github.com/user-attachments/assets/c095eb1a-0a6b-40d3-8a43-8a725090134b)
27+
28+
29+
### 🌍 Real-World Validation: The "Fragmented Liquidity" Test
30+
Shaurya was subjected to a **30-minute stress test** aggregating live ticks from **Binance, Coinbase, and Bitstamp** simultaneously.
31+
32+
* **Test Duration:** 30 Minutes
33+
* **Total Messages:** 21,862 (Live Volatility Bursts)
34+
* **Outcome:** The engine successfully normalized fragmented liquidity streams in real-time. While average latency increased under OS scheduler load (due to non-isolated cores), the **minimum latency remained at 0.3 µs**, proving the core engine's efficiency remains stable even during crypto market volatility.
35+
36+
---
37+
38+
## 🏗 Key Technical Innovations
39+
40+
### 1. Zero-Copy Architecture
41+
Instead of copying network packets into new `std::string` objects (which forces the OS to allocate memory), Shaurya uses a custom `StringViewLite` class. This creates a lightweight "view" over the raw socket buffer, allowing the engine to parse prices without moving a single byte of memory.
42+
43+
### 2. Lock-Free Concurrency (SPSC)
44+
Traditional systems use Mutex locks (`std::mutex`) to share data between threads, which forces the CPU to stop and switch contexts (expensive). Shaurya implements a **Single-Producer Single-Consumer Ring Buffer** using `std::atomic` instructions. This allows the Network Thread to push data and the Strategy Thread to read data simultaneously without ever blocking.
45+
46+
### 3. CPU Cache Optimization
47+
Critical data structures are aligned to 64-byte cache lines (`alignas(64)`). This prevents **False Sharing**, a phenomenon where two threads fight over the same CPU cache line, drastically reducing performance on multi-core systems.
48+
49+
---
50+
51+
## 🚀 Quick Start
52+
53+
### Prerequisites
54+
* **OS:** Windows (Required for `winsock2` and `QueryPerformanceCounter`)
55+
* **Compiler:** G++ (MinGW) supporting C++11 or higher.
56+
57+
### Execution Guide
58+
1. **Build the System:**
59+
```cmd
60+
build.bat
61+
```
62+
2. **Start Data Source:**
63+
```python bridge.py```
64+
3. **Start Shaurya Engine:**
65+
```cmd
66+
bin\Shaurya.exe
67+
```
68+
69+
*Upon completion, the engine generates a `Shaurya_Metrics.txt` report detailing the nanosecond-level performance of the run.*
70+
71+
---
72+
73+
## Resources
74+
75+
If you are new to High-Frequency Trading systems, these concepts explain the "Why" behind Shaurya's architecture:
76+
77+
* **Latency vs. Jitter:** [Understand why "Average Speed" is useless in HFT](https://www.youtube.com/watch?v=NH1Tta7purM).
78+
* **Zero-Copy Networking:** [How avoiding memory copies saves microseconds](https://en.wikipedia.org/wiki/Zero-copy).
79+
* **Lock-Free Programming:** [An introduction to Atomics and Ring Buffers](https://www.1024cores.net/home/lock-free-algorithms/queues).
80+
* **False Sharing:** [The hidden killer of multi-threaded performance](https://mechanical-sympathy.blogspot.com/2011/07/false-sharing.html).
81+
82+
`Developed by your's truly 🛩️!`

bridge.py

Lines changed: 33 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -2,26 +2,32 @@
22
import websockets
33
import json
44
import socket
5-
import time
65

7-
TCP_HOST = "127.0.0.1"
8-
TCP_PORT = 5000
9-
RUN_DURATION = 1800 # 30 Minutes
6+
# --- CONFIGURATION ---
7+
MULTICAST_GROUP = "239.0.0.1"
8+
MULTICAST_PORT = 30001
9+
RUN_DURATION = 120 # 30 Minutes
1010

11+
# --- DATA SOURCES ---
1112
SOURCES = [
1213
{"name": "BINANCE", "url": "wss://stream.binance.com:9443/ws/btcusdt@trade"},
1314
{"name": "COINBASE", "url": "wss://ws-feed.exchange.coinbase.com"},
1415
{"name": "BITSTAMP", "url": "wss://ws.bitstamp.net"}
1516
]
1617

17-
async def forward_to_engine(writer, fix_msg):
18+
# Setup UDP Multicast Socket
19+
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
20+
sock.setsockopt(socket.IPPROTO_IP, socket.IP_MULTICAST_TTL, 2)
21+
22+
def send_multicast(fix_msg):
1823
try:
19-
writer.write(fix_msg.encode())
20-
await writer.drain()
24+
sock.sendto(fix_msg.encode(), (MULTICAST_GROUP, MULTICAST_PORT))
25+
# Optional: Print a dot for every message to verify liveness without spam
26+
# print(".", end="", flush=True)
2127
except Exception as e:
22-
print(f"[TCP ERROR] {e}")
28+
print(f"[ERROR] UDP Send Failed: {e}")
2329

24-
async def stream_binance(writer):
30+
async def stream_binance():
2531
url = SOURCES[0]["url"]
2632
async for websocket in websockets.connect(url):
2733
try:
@@ -30,12 +36,14 @@ async def stream_binance(writer):
3036
msg = await websocket.recv()
3137
data = json.loads(msg)
3238
price = data['p']
39+
# Tag 49 identifies the Source Exchange
3340
fix = f"8=FIX.4.2\x0135=X\x0149=BINANCE\x0155=BTCUSDT\x01269=0\x01270={price}\x01"
34-
await forward_to_engine(writer, fix)
41+
send_multicast(fix)
3542
except Exception:
43+
await asyncio.sleep(1) # Reconnect delay
3644
continue
3745

38-
async def stream_coinbase(writer):
46+
async def stream_coinbase():
3947
url = SOURCES[1]["url"]
4048
async for websocket in websockets.connect(url):
4149
try:
@@ -51,11 +59,12 @@ async def stream_coinbase(writer):
5159
if 'price' in data:
5260
price = data['price']
5361
fix = f"8=FIX.4.2\x0135=X\x0149=COINBASE\x0155=BTCUSD\x01269=0\x01270={price}\x01"
54-
await forward_to_engine(writer, fix)
62+
send_multicast(fix)
5563
except Exception:
64+
await asyncio.sleep(1)
5665
continue
5766

58-
async def stream_bitstamp(writer):
67+
async def stream_bitstamp():
5968
url = SOURCES[2]["url"]
6069
async for websocket in websockets.connect(url):
6170
try:
@@ -71,39 +80,31 @@ async def stream_bitstamp(writer):
7180
if 'data' in data and 'price' in data['data']:
7281
price = data['data']['price']
7382
fix = f"8=FIX.4.2\x0135=X\x0149=BITSTAMP\x0155=BTCUSD\x01269=0\x01270={price}\x01"
74-
await forward_to_engine(writer, fix)
83+
send_multicast(fix)
7584
except Exception:
85+
await asyncio.sleep(1)
7686
continue
7787

7888
async def main():
79-
server = await asyncio.start_server(handle_client, TCP_HOST, TCP_PORT)
80-
addr = server.sockets[0].getsockname()
81-
print(f"[GATEWAY] Multi-Source Bridge listening on {addr}")
82-
print(f"[GATEWAY] Test Duration: {RUN_DURATION / 60} Minutes")
83-
84-
async with server:
85-
await server.serve_forever()
89+
print(f"[GATEWAY] Starting Multi-Exchange UDP Broadcast to {MULTICAST_GROUP}:{MULTICAST_PORT}")
90+
print(f"[GATEWAY] Aggregating Liquidity from Binance, Coinbase, Bitstamp...")
8691

87-
async def handle_client(reader, writer):
88-
print("[GATEWAY] Shaurya Engine Connected! Starting Streams...")
89-
92+
# Run all streams concurrently
9093
tasks = [
91-
asyncio.create_task(stream_binance(writer)),
92-
asyncio.create_task(stream_coinbase(writer)),
93-
asyncio.create_task(stream_bitstamp(writer))
94+
asyncio.create_task(stream_binance()),
95+
asyncio.create_task(stream_coinbase()),
96+
asyncio.create_task(stream_bitstamp())
9497
]
9598

99+
# Run for the specified duration
96100
await asyncio.sleep(RUN_DURATION)
97101

98-
print("\n[GATEWAY] 30 Minutes Complete. Stopping Test...")
99-
writer.close()
100-
await writer.wait_closed()
101-
102+
print("\n[GATEWAY] Test Duration Complete. Stopping...")
102103
for task in tasks:
103104
task.cancel()
104105

105106
if __name__ == "__main__":
106107
try:
107108
asyncio.run(main())
108109
except KeyboardInterrupt:
109-
pass
110+
print("Gateway Stopped.")

cmakelists.txt

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
cmake_minimum_required(VERSION 3.15)
2+
project(shaurya_hft)
3+
4+
# 1. Find Python and Pybind11
5+
find_package(Python COMPONENTS Interpreter Development REQUIRED)
6+
find_package(pybind11 CONFIG REQUIRED)
7+
8+
# 2. Create the Module
9+
pybind11_add_module(shaurya_hft src/bindings.cpp src/FixParser.cpp src/NetworkClient.cpp)
10+
11+
# 3. Optimizations
12+
if(MSVC)
13+
target_compile_options(shaurya_hft PRIVATE /O2)
14+
else()
15+
target_compile_options(shaurya_hft PRIVATE -O3 -march=native)
16+
endif()
17+
18+
# 4. Includes & Links
19+
target_include_directories(shaurya_hft PRIVATE include)
20+
if(WIN32)
21+
target_link_libraries(shaurya_hft PRIVATE ws2_32)
22+
endif()
23+
24+
# 5. CRITICAL FIX: Tell CMake to install the file into the package!
25+
install(TARGETS shaurya_hft DESTINATION .)
3.43 KB
Binary file not shown.

dist/hft_shaurya-0.1.0.tar.gz

3.5 KB
Binary file not shown.

0 commit comments

Comments
 (0)