This repository contains two main components working together to enable automated copy trading on Solana:
The Rust code (src/) provides the live execution engine for copy trading. It:
- Monitors Wallet Activity: Listens to swap transactions from high-performing wallets via the
solana_ingestionsystem - Automatic Position Management: Opens positions when tracked wallets buy tokens, monitors them in real-time, and closes them based on configured strategies
- Risk Management: Implements stop-loss (with optional trailing stop-loss) and hold-time strategies to manage positions
- Fast Execution: Uses Jito bundles for sub-second transaction submission and settlement
- Position Lifecycle: Manages complete position lifecycle from creation through opening, monitoring, closing, and finalization
- Migration Support: Handles Pumpfun bonding curve migrations to Pumpswap pools automatically
The core components are:
PositionManager: Central coordinator that monitors swap transactions, market updates, and manages all active positionsPosition: Represents a single copy trading position with full lifecycle state management- Async event loops for transaction monitoring, market update processing, and new pool detection
Rust functions, structs, and modules are extensively documented in the provided rustdoc: https://yuno-research.github.io/docs/target/doc/copy_trading/index.html
The Python code provides historical data analysis and strategy optimization tools:
- Backtesting Framework: Tests copy trading strategies on historical swap transaction data
- Wallet Scoring: Evaluates wallets based on multiple performance metrics (PNL, win rate, hold times, etc.)
- Genetic Algorithm Optimization: Finds optimal weight combinations for multi-metric wallet ranking
- Position Analysis: Creates and analyzes first-buy-first-sell (FBFS) positions from historical data
- Performance Evaluation: Calculates what PNL would have been by copying specific wallets with various hold times and strategies
Key scripts include wallet scoring, position creation pipelines, genetic algorithm weighting optimization, and various analysis notebooks for understanding strategy performance.
- Group buy and sell swap pairs on one wallet into first buy and first sell positions. This is explained and implemented in make_fbfs_positions.py file. This step also backtests what the PNL would be copying that buy for a specific amount of time, including the wallet's median and Q1 hold times. The hypothesis being that maybe selling before the wallet does (Q1) could lead to increased profits.
- Score wallets based on various performance metrics. All of the various performance metrics are gathered and standardized to a score between 0 and 1, with 1 being the best and 0 being the worst. This is done in the score_wallets.py file
- The hypothesis is that the ideal way to rank wallets is not just by 1 or a few metrics, but by a weighted combination of all of them. In order to find this ideal combination. We take and rank wallets based on some time worth of positions (like 1 day lets say)
- To set this up from the starting token metadata and swap txs, the script make_fbfs_and_sols_all.py is used. This is not a function and can just run directly. Make sure to change the directories to the correct ones
- How genetic is done: The genetic algorithm and Bayesian optimization, to my limited understanding, are both functions which take n dimensional inputs and try to find the combinations of n function inputs for the smallest possible output. In our case, the n inputs are the n weights taht we use to score wallets, and the output is the NEGATIVE PNL of a specific backtesting window that involves copying top 25 wallets chosen using a specific set of weights for a period of time. The goal was to find the combination of weights that would yield the lowest negative PNL so therefore the highest PNL. So essentially:
- Using a big list of wallets we'd get from using score_wallets.py
- Using a list of backtesting positions, of what it would be like to copy those wallets for their optimal hold time made by create_backtesting_positions_for_wallets.py
- We'd put these 2 big tables into an objecive function in objective.py
- And this function would try weigths, select top 25 wallets, and then see what the resutls of copying those 25 wallets would be
- the weighting.py file would then run the genetic algorithm which would repeatedly call the objective function to find the best PNL
- The reason why we chose genetic over bayesian is because a property of Bayesian is that the longer it runs the slower it gets and this isn't somethign that we say in genetic, along with better overall results
- Lamports are units used on Solana,
$10^9$ lamports$= 1$ . We use lamports to store things in some places. To get the actual value, divide anything that ends in_lpby$10^9$ . - The concept of time on a blockchain and therefore in Defi is that the timestamp is an estimate in seconds. Timestamps of anything come from the block time, of the block where the Defi event of any kind occured on. Everything on that block will have the same timestamp. Because solana is a high performance block time, multiple blocks are able to have the same block time in seconds. Solana blocks have a slot number which is a unique identifier and is incremented by 1 for each block. Higher slot numbers mean chronologically later blocks. A block is an ordered list of transactions. Ordering within a single block is done by the transaction's index in the ordered array/list of txs in that block. One blockchain transaction can contain multiple defi events. See this example: https://solscan.io/tx/5u42oXBS6E62KAkr5RxURHwAjT8wc4NVL5YqAyHTXLRvpHNkZzgSXAiDsfqoBPJu2yvBU6Q77qpW9URjcp6cAPJZ
- In this single blockchain transaction, you can see a routing program do 2 Defi events, first swap wrapped Solana to DogeAI, and then swap DogeAI for USDC. That is 2 defi events that atomically occur in one single transaction. We order these events by an atomic instruction index, with chronologically later events having higher atomic instruction indices. Therefore, the way to order events chronologically is by the following way:
.sort(["block_time", "slot", "index", "atomic_instruction_index"])Here are the schemas of the data that we use:
- Python imports become a bit easier when you use module syntax. Run everything from the root and import everything like
from Preprocessing.make_sol_swaps import make_sol_swapsfor example - Use
python -mto run files. Has been easier in my experience - Also integrates well with Jupyter Notebooks