This document outlines the plan to integrate the Attack TimeArcs visualization with the IP Bar Diagram visualization, allowing users to select attack arcs and view detailed TCP flows for those IPs within the selected time frame.
| System | Data Source | Size | Features |
|---|---|---|---|
| Attack TimeArcs | set1_first90_minutes.csv |
25 MB | Attack labels, aggregated arcs |
| IP Bar Diagram | decoded_set1_full.csv |
60 GB/day | TCP flows, microsecond timestamps |
- Different data sources: Attack data is small/aggregated; flow data is massive/detailed
- No attack labels in streaming data: The 60GB file lacks attack type information
- Scale mismatch: Cannot load 60GB into browser memory
- Timestamp formats differ: Attack data uses minutes, streaming data uses microseconds
- Multi-day support gap: Attack TimeArcs supports multiple files, streaming loader does not
| System | Multi-File Support | Implementation |
|---|---|---|
| attack_timearcs.js | ✅ Yes | Iterates files, combines into combinedData array |
| tcp_data_loader_streaming.py | ❌ No | Single --data argument only |
This gap must be addressed for multi-day analysis scenarios.
Attack TimeArcs: 20954244 minutes
Streaming Data: 1257254615805569 microseconds
↓ convert to minutes
20954243.60 minutes
Difference: 0.40 minutes ✓ (same time period)
┌─────────────────────────────────────────────────────────────────────────────┐
│ WORKFLOW │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ 1. ATTACK TIMEARCS (Browser) │ │
│ │ • Load small attack CSV (25 MB) │ │
│ │ • User selects attack arcs via click/brush │ │
│ │ • Selection: {ips, timeRange, attackType} │ │
│ └────────────────────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ 2. COMMAND GENERATOR (Browser) │ │
│ │ • Convert minutes → microseconds │ │
│ │ • Format filter parameters │ │
│ │ • Display Python command to user │ │
│ └────────────────────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ 3. STREAMING LOADER (Python - User runs locally) │ │
│ │ • tcp_data_loader_streaming.py with new filter options │ │
│ │ • Processes 60GB file in 500K row chunks (~200MB RAM) │ │
│ │ • Filters by IP + time range during streaming │ │
│ │ • Outputs small subset folder (1-50 MB) │ │
│ └────────────────────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ 4. IP BAR DIAGRAM (Browser) │ │
│ │ • Load generated subset folder │ │
│ │ • Display TCP flows for selected IPs/time │ │
│ │ • Attack context from manifest.json │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────────┐
│ UNIFIED VISUALIZATION │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ unified_timearcs.html │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ ┌───────────────────────────────────────────────────────────────────┐ │ │
│ │ │ ATTACK TIMEARCS PANEL (Top) │ │ │
│ │ │ ════════════════════════════════════════════════════════════════ │ │ │
│ │ │ IP1 ──────●────●●●●●────●────────────────── │ │ │
│ │ │ IP2 ──────●────●●●●●────●────────────────── │ │ │
│ │ │ IP3 ────────────────────●●●●────●────────── │ │ │
│ │ │ ════════════════════════════════════════════════════════════════ │ │ │
│ │ │ [====BRUSH SELECTION====] │ │ │
│ │ └───────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌───────────────────────────────────────────────────────────────────┐ │ │
│ │ │ SELECTION PANEL (Middle) │ │ │
│ │ │ IPs: 172.28.185.51, 60.203.52.184 (numeric: 1, 2) │ │ │
│ │ │ Time: 20954244 - 20954250 (6 minutes) │ │ │
│ │ │ Attack: client compromise │ │ │
│ │ │ │ │ │
│ │ │ [Copy Command] [Show Instructions] │ │ │
│ │ └───────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌───────────────────────────────────────────────────────────────────┐ │ │
│ │ │ IP BAR DIAGRAM PANEL (Bottom) │ │ │
│ │ │ [Load Generated Folder] │ │ │
│ │ │ ─────────────────────────────────────────────────────────────── │ │ │
│ │ │ Flow 1: 172.28.185.51:49382 ↔ 60.203.52.184:80 │ │ │
│ │ │ ┌─────┬─────────────────────────┬─────┐ │ │ │
│ │ │ │ EST │ DATA TRANSFER │ CLS │ │ │ │
│ │ │ └─────┴─────────────────────────┴─────┘ │ │ │
│ │ └───────────────────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
Estimated changes: ~100 lines (increased due to multi-file support)
# Change --data to accept multiple files (like attack_extract.py)
parser.add_argument('--data',
nargs='+', # Accept one or more files
required=True,
help='Input TCP data file(s) (CSV or CSV.GZ) - can specify multiple files for multi-day analysis')
parser.add_argument('--filter-ips', type=str,
help='Comma-separated list of IP IDs to filter (e.g., "1,2,7204")')
parser.add_argument('--filter-time-start', type=int,
help='Filter packets >= this timestamp (microseconds)')
parser.add_argument('--filter-time-end', type=int,
help='Filter packets <= this timestamp (microseconds)')
parser.add_argument('--attack-context', type=str,
help='Attack type label for this subset (stored in manifest)')def process_tcp_data_chunked(data_files, ip_map_file, output_dir, ...):
"""
Process multiple TCP data files sequentially.
Args:
data_files: List of input CSV file paths (can be single file or multiple)
...
"""
# data_files is now a list (even if single file)
if isinstance(data_files, str):
data_files = [data_files]
print(f"Processing {len(data_files)} input file(s)...")
# Process each file sequentially
for file_index, data_file in enumerate(data_files, start=1):
print(f"\n[{file_index}/{len(data_files)}] Processing: {data_file}")
if not Path(data_file).exists():
print(f" WARNING: File not found, skipping: {data_file}")
continue
compression = 'gzip' if data_file.endswith('.gz') else None
csv_iterator = pd.read_csv(data_file, chunksize=chunk_read_size,
compression=compression)
for df_chunk in csv_iterator:
# ... existing chunk processing with filtering ...
# Finalize flows from all files combined
# (connection_map persists across files for cross-file flows)def process_tcp_data_chunked(..., filter_ips=None, filter_time_start=None,
filter_time_end=None, attack_context=None):
# Parse filter IPs into set for O(1) lookup
ip_filter_set = None
if filter_ips:
ip_filter_set = set(filter_ips.split(','))
for df_chunk in csv_iterator:
# ... existing IP conversion code ...
# === NEW: Apply filters BEFORE processing ===
# Time range filter
if filter_time_start is not None:
df_chunk = df_chunk[df_chunk['timestamp'] >= filter_time_start]
if filter_time_end is not None:
df_chunk = df_chunk[df_chunk['timestamp'] <= filter_time_end]
# IP filter (either src or dst must match)
if ip_filter_set:
# Convert numeric IPs to string for comparison
src_match = df_chunk['src_ip'].astype(str).isin(ip_filter_set)
dst_match = df_chunk['dst_ip'].astype(str).isin(ip_filter_set)
df_chunk = df_chunk[src_match | dst_match]
# Skip chunk if empty after filtering
if len(df_chunk) == 0:
print(f"Chunk {chunk_number}: skipped (no matching packets)")
continue
# ... rest of existing processing ...When processing multiple files, TCP flows may span across file boundaries:
# connection_map persists across files
# This allows flows that start in day1.csv to continue in day2.csv
connection_map = {} # Initialized once, shared across all files
for data_file in data_files:
for df_chunk in pd.read_csv(data_file, chunksize=...):
# Incremental flow detection uses shared connection_map
completed_flows, flow_counter, timed_out = detect_tcp_flows_incremental(
tcp_chunk,
connection_map, # Shared across files!
...
)# At the end, add attack context to manifest
manifest = {
'version': '2.0',
'format': 'chunked',
# ... existing fields ...
# NEW: Source files (supports multiple)
'source_files': data_files, # List of all input files processed
# NEW: Attack context from selection
'attack_context': {
'type': attack_context,
'source': 'attack_timearcs_selection'
},
'filter_applied': {
'ips': filter_ips.split(',') if filter_ips else None,
'time_start': filter_time_start,
'time_end': filter_time_end,
'time_start_minutes': filter_time_start // 60_000_000 if filter_time_start else None,
'time_end_minutes': filter_time_end // 60_000_000 if filter_time_end else None
}
}Files to create:
unified_timearcs.html(~200 lines)unified_timearcs.js(~400 lines)
<!DOCTYPE html>
<html>
<head>
<title>Unified TimeArcs - Attack + Flow Analysis</title>
<link rel="stylesheet" href="styles.css">
</head>
<body>
<!-- Header with file loaders -->
<div id="header">
<div id="attack-loaders">
<input type="file" id="attack-csv" accept=".csv">
<input type="file" id="ip-map" accept=".json">
<input type="file" id="event-mapping" accept=".json">
</div>
</div>
<!-- Attack TimeArcs Panel -->
<div id="attack-panel">
<h3>Attack TimeArcs</h3>
<svg id="attack-svg"></svg>
</div>
<!-- Selection Info Panel -->
<div id="selection-panel">
<h3>Selection</h3>
<div id="selection-info">
<p>IPs: <span id="selected-ips">-</span></p>
<p>Time: <span id="selected-time">-</span></p>
<p>Attack: <span id="selected-attack">-</span></p>
</div>
<div id="command-area">
<pre id="python-command"></pre>
<button id="copy-command">Copy Command</button>
</div>
</div>
<!-- IP Bar Diagram Panel -->
<div id="flow-panel">
<h3>TCP Flows</h3>
<button id="load-folder">Load Generated Folder</button>
<svg id="flow-svg"></svg>
</div>
<script type="module" src="unified_timearcs.js"></script>
</body>
</html>// unified_timearcs.js
// Timestamp conversion utilities
function minutesToMicroseconds(minutes) {
return BigInt(minutes) * 60n * 1_000_000n;
}
function microsecondsToMinutes(us) {
return Number(BigInt(us) / 60_000_000n);
}
// Selection state
let currentSelection = {
ips: [],
ipNames: [],
timeRange: [null, null],
attackType: null
};
// Handle arc selection (called from attack_timearcs interaction)
function onArcSelection(selectedArcs) {
// Extract unique IPs from selected arcs
const ipSet = new Set();
const ipNameSet = new Set();
let minTime = Infinity, maxTime = -Infinity;
const attackCounts = {};
selectedArcs.forEach(arc => {
// Collect IPs (both numeric IDs and resolved names)
ipSet.add(arc.sourceId);
ipSet.add(arc.targetId);
ipNameSet.add(arc.source);
ipNameSet.add(arc.target);
// Track time range
minTime = Math.min(minTime, arc.minute);
maxTime = Math.max(maxTime, arc.minute);
// Count attack types
attackCounts[arc.attack] = (attackCounts[arc.attack] || 0) + arc.count;
});
// Find dominant attack type
const dominantAttack = Object.entries(attackCounts)
.sort((a, b) => b[1] - a[1])[0]?.[0] || 'unknown';
currentSelection = {
ips: Array.from(ipSet),
ipNames: Array.from(ipNameSet),
timeRange: [minTime, maxTime],
attackType: dominantAttack
};
updateSelectionUI();
generatePythonCommand();
}
// Generate Python command for subset extraction
function generatePythonCommand() {
const { ips, timeRange, attackType } = currentSelection;
if (ips.length === 0 || !timeRange[0]) {
document.getElementById('python-command').textContent = '# Make a selection first';
return;
}
// Convert time range from minutes to microseconds
const timeStartUs = minutesToMicroseconds(timeRange[0]);
const timeEndUs = minutesToMicroseconds(timeRange[1] + 1); // +1 to include full minute
const command = `python tcp_data_loader_streaming.py \\
--data /path/to/decoded_set1_full.csv \\
--ip-map combined_pcap_data_set5_compressed_ip_map.json \\
--output-dir subset_${attackType.replace(/\s+/g, '_')}_${Date.now()}/ \\
--filter-ips ${ips.join(',')} \\
--filter-time-start ${timeStartUs} \\
--filter-time-end ${timeEndUs} \\
--attack-context "${attackType}"`;
document.getElementById('python-command').textContent = command;
}
// Copy command to clipboard
document.getElementById('copy-command').addEventListener('click', () => {
const command = document.getElementById('python-command').textContent;
navigator.clipboard.writeText(command);
alert('Command copied to clipboard!');
});The unified UI will import and use existing modules:
// unified_timearcs.js
// Import existing attack_timearcs functionality
import { parseCSVStream, render as renderAttackArcs } from './attack_timearcs.js';
// Import existing IP bar diagram functionality
import { visualizeTimeArcs } from './ip_bar_diagram.js';
// Import folder loader for generated subset
import { FolderLoader } from './folder_loader.js';┌─────────────────────────────────────────────────────────────────────────────┐
│ USER WORKFLOW │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ STEP 1: Open Unified Visualization │
│ ─────────────────────────────────── │
│ • Open unified_timearcs.html in Chrome/Edge │
│ • Load attack CSV: set1_first90_minutes.csv │
│ • Load IP map: combined_pcap_data_set5_compressed_ip_map.json │
│ • Load event mapping: event_type_mapping.json │
│ │
│ STEP 2: Explore Attack Patterns │
│ ────────────────────────────── │
│ • View attack arcs in top panel │
│ • Hover over arcs to see details │
│ • Use legend to filter by attack type │
│ │
│ STEP 3: Select Attack Arcs │
│ ───────────────────────────── │
│ • Click individual arc to select │
│ • Or brush (click+drag) to select time range │
│ • Selection panel shows: IPs, time range, attack type │
│ │
│ STEP 4: Copy and Run Python Command │
│ ──────────────────────────────────── │
│ • Click "Copy Command" button │
│ • Open terminal, navigate to tcp_timearcs directory │
│ • Paste and run command (adjust --data path as needed) │
│ • Wait for processing (typically 10-60 seconds) │
│ │
│ STEP 5: Load Generated Subset │
│ ───────────────────────────── │
│ • Click "Load Generated Folder" in bottom panel │
│ • Select the output directory from Python command │
│ • View TCP flows for selected IPs in selected time range │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
# User selects arcs showing DDoS attack between IPs 1,2 at minutes 20954244-20954250
# Generated command (single day):
python tcp_data_loader_streaming.py \
--data /mnt/data/decoded_set1_full.csv \
--ip-map combined_pcap_data_set5_compressed_ip_map.json \
--output-dir subset_ddos_1702345678/ \
--filter-ips 1,2 \
--filter-time-start 1257254640000000 \
--filter-time-end 1257255000000000 \
--attack-context "ddos"
# Output:
# Loading IP mapping from combined_pcap_data_set5_compressed_ip_map.json...
# Processing 1 input file(s)...
# [1/1] Processing: /mnt/data/decoded_set1_full.csv
# Chunk 1: skipped (no matching packets)
# Chunk 2: skipped (no matching packets)
# ...
# Chunk 47: processed 12,345 packets, 11,234 TCP, 156 active flows...
# ...
# Successfully processed data:
# - Total packets: 45,678
# - TCP packets: 42,345
# - Unique IPs: 2
# - Total flows: 234
# - Output directory: subset_ddos_1702345678/# User loads multiple days in attack_timearcs, selects arcs spanning day 1 and day 2
# Generated command (multi-day):
python tcp_data_loader_streaming.py \
--data /mnt/data/decoded_set1_day1.csv \
/mnt/data/decoded_set1_day2.csv \
/mnt/data/decoded_set1_day3.csv \
--ip-map combined_pcap_data_set5_compressed_ip_map.json \
--output-dir subset_multiday_ddos/ \
--filter-ips 1,2,15,42 \
--filter-time-start 1257254640000000 \
--filter-time-end 1257427440000000 \
--attack-context "ddos"
# Output:
# Loading IP mapping from combined_pcap_data_set5_compressed_ip_map.json...
# Processing 3 input file(s)...
#
# [1/3] Processing: /mnt/data/decoded_set1_day1.csv
# Chunk 1: skipped (no matching packets)
# ...
# Chunk 47: processed 12,345 packets, 11,234 TCP, 156 active flows...
# File complete: 45,678 matched rows
#
# [2/3] Processing: /mnt/data/decoded_set1_day2.csv
# Chunk 1: processed 8,901 packets, 8,234 TCP, 89 active flows...
# ...
# File complete: 38,901 matched rows
#
# [3/3] Processing: /mnt/data/decoded_set1_day3.csv
# ...
# File complete: 12,456 matched rows
#
# Successfully processed data:
# - Source files: 3
# - Total packets: 97,035
# - TCP packets: 89,234
# - Unique IPs: 4
# - Total flows: 567 (including 23 cross-file flows)
# - Output directory: subset_multiday_ddos/The streaming loader maintains low memory usage by:
- Chunked CSV reading: 500,000 rows at a time (configurable)
- Early filtering: Packets filtered before processing
- Incremental flow writing: Completed flows written to disk, freed from memory
- Timeout-based completion: Flows completed after 300s inactivity
Memory usage comparison:
─────────────────────────────────────────
Without streaming: 10-20 GB (entire file in memory)
With streaming: ~200 MB (constant regardless of file size)
With filtering: ~50-100 MB (fewer active flows)
60 GB file, 6-minute selection, 2 IPs:
─────────────────────────────────────────
Total chunks: ~120 (500K rows each)
Chunks with matches: ~5-10
Processing time: 10-60 seconds
Output size: 1-50 MB
Since the streaming data lacks attack labels, they are inherited from the selection:
Attack TimeArcs Selection:
├── IPs: [1, 2]
├── Time: 20954244 - 20954250 (minutes)
└── Attack: "ddos"
│
▼
All extracted flows labeled as "ddos" in manifest.json
This is valid because:
- Attack arcs represent the dominant attack type for that IP pair at that time
- User specifically selected that attack pattern
- Individual packet-level attack labels don't exist anyway
| File | Changes | Lines |
|---|---|---|
tcp_data_loader_streaming.py |
Add multi-file support + filter args + logic | ~100 |
| File | Purpose | Lines |
|---|---|---|
unified_timearcs.html |
Combined UI layout | ~200 |
unified_timearcs.js |
Selection + command generation (multi-file aware) | ~450 |
| File | Purpose |
|---|---|
attack_timearcs.js |
Attack arc rendering (already supports multi-file) |
ip_bar_diagram.js |
TCP flow rendering |
folder_loader.js |
Folder-based data loading |
folder_integration.js |
UI integration for folder loader |
The unified UI must track which source files were loaded in attack_timearcs to generate the correct --data arguments:
// Track loaded attack data files
let loadedAttackFiles = [];
fileInput.addEventListener('change', (e) => {
loadedAttackFiles = Array.from(e.target.files).map(f => f.name);
});
// When generating command, include all corresponding streaming data files
function generateCommand(selection) {
// Map attack CSV names to streaming data file paths
// e.g., "day1_attacks.csv" → "/path/to/decoded_day1.csv"
const streamingFiles = loadedAttackFiles.map(f => mapToStreamingFile(f));
return `python tcp_data_loader_streaming.py \\
--data ${streamingFiles.join(' \\\n ')} \\
--filter-ips ${selection.ips.join(',')} \\
...`;
}For seamless integration without manual Python runs:
# tcp_server.py - Simple Flask server
from flask import Flask, request, jsonify
import subprocess
app = Flask(__name__)
@app.route('/extract', methods=['POST'])
def extract():
params = request.json
# Run streaming loader with filter params
subprocess.run([
'python', 'tcp_data_loader_streaming.py',
'--data', params['data_path'],
'--ip-map', params['ip_map'],
'--output-dir', params['output_dir'],
'--filter-ips', params['filter_ips'],
'--filter-time-start', str(params['time_start']),
'--filter-time-end', str(params['time_end']),
'--attack-context', params['attack_context']
])
return jsonify({"status": "complete", "output": params['output_dir']})
if __name__ == '__main__':
app.run(port=5000)For even faster filtering on repeated queries:
# tcp_time_index.py - One-time index generation
# Creates byte offsets for each minute in 60GB file
# Enables O(1) seeking instead of sequential scanMinutes to Microseconds:
────────────────────────
minutes × 60 × 1,000,000 = microseconds
Example:
20954244 × 60 × 1,000,000 = 1,257,254,640,000,000 μs
Microseconds to Minutes:
────────────────────────
microseconds ÷ 60,000,000 = minutes
Example:
1,257,254,615,805,569 ÷ 60,000,000 = 20,954,243.60 minutes
timestamp,length,src_ip,dst_ip,protocol,src_port,dst_port,flags,attack,count
20954244,66,7204,7203,6,80,52784,16,25,1timestamp,length,protocol,src_port,dst_port,src_ip,dst_ip,flags,seq_num,ack_num
1257254615805569,66.0,6,49382,80,1.0,2.0,16,2183799410,4243715536subset_output/
├── manifest.json # Includes attack_context
├── packets.csv # Filtered packets
├── flows/
│ ├── flows_index.json
│ └── chunk_00000.json
├── ips/
│ ├── ip_stats.json
│ ├── flag_stats.json
│ └── unique_ips.json
└── indices/
└── bins.json