Skip to content

Commit bce270a

Browse files
CopilotCodeRafay
andcommitted
Add steganography detection and hash verification analysis modules with tests
Co-authored-by: CodeRafay <154733908+CodeRafay@users.noreply.github.com>
1 parent 08129b7 commit bce270a

8 files changed

Lines changed: 1972 additions & 0 deletions

.coverage

52 KB
Binary file not shown.

Descriptions/Hash_Verification.md

Lines changed: 297 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,297 @@
1+
# 🔑 Cryptographic Hash Verification
2+
3+
## What is Hash Verification?
4+
5+
Hash verification is like creating a **digital fingerprint** for an image. Just as your fingerprint uniquely identifies you, a cryptographic hash uniquely identifies a specific file. This module uses two types of hashing:
6+
7+
1. **Cryptographic Hash (SHA-256)**: Exact matching - even one bit changed creates a completely different hash
8+
2. **Perceptual Hash**: Resilient matching - similar images produce similar hashes
9+
10+
Think of it like DNA testing:
11+
- **Cryptographic hash** = Exact DNA match (100% identical twins)
12+
- **Perceptual hash** = Family resemblance (siblings look similar)
13+
14+
## What is Blockchain-Based Provenance?
15+
16+
**Provenance** means the history of ownership and modifications of an image. Our blockchain simulation:
17+
18+
- **Records every registered image** with timestamps
19+
- **Creates an immutable audit trail** (like a notary's ledger)
20+
- **Tracks the chain of custody** for legal validity
21+
- **Detects unauthorized modifications** by comparing hashes
22+
23+
### How It Works:
24+
25+
```
26+
Original Image → Generate Hashes → Store in "Blockchain" → Verify Later
27+
28+
Timestamp + Hashes + Metadata (immutable record)
29+
```
30+
31+
When you later verify an image, it compares current hashes with stored records to determine authenticity.
32+
33+
## Types of Hashing Used
34+
35+
### 1. 🔒 Cryptographic Hash (SHA-256)
36+
37+
**Purpose**: Exact integrity verification
38+
39+
- **How it works**: Processes every single bit of the file
40+
- **Length**: 256 bits (64 hexadecimal characters)
41+
- **Collision resistance**: Virtually impossible to find two different images with same hash
42+
- **Sensitivity**: Changing even ONE pixel completely changes the hash
43+
44+
**Use Cases**:
45+
- Legal evidence verification
46+
- Exact duplicate detection
47+
- Tamper detection
48+
- Digital chain of custody
49+
50+
### 2. 👁️ Perceptual Hash (pHash, aHash, dHash, wHash)
51+
52+
**Purpose**: Find similar images despite minor changes
53+
54+
**pHash (Perceptual Hash)**:
55+
- Most robust against modifications
56+
- Based on discrete cosine transform (DCT)
57+
- Resistant to: compression, resizing, color adjustment
58+
59+
**aHash (Average Hash)**:
60+
- Fast and simple
61+
- Based on average pixel values
62+
- Good for basic similarity matching
63+
64+
**dHash (Difference Hash)**:
65+
- Based on gradient between adjacent pixels
66+
- Resistant to gamma correction and color changes
67+
68+
**wHash (Wavelet Hash)**:
69+
- Uses wavelet transform
70+
- Good for texture similarity
71+
72+
**Common Tolerances**:
73+
- 0-5 bits different: Nearly identical (minor compression/resize)
74+
- 6-10 bits different: Very similar (moderate edits)
75+
- 11-15 bits different: Similar (significant edits)
76+
- 16+ bits different: Possibly different images
77+
78+
## What Does the Analysis Show?
79+
80+
### 📊 Authenticity Score (0-100)
81+
82+
- **100**: Exact cryptographic match - identical file
83+
- **85-99**: Strong perceptual match - minor modifications only
84+
- **70-84**: Moderate match - some modifications detected
85+
- **55-69**: Weak match - significant changes
86+
- **0-54**: No match or unknown provenance
87+
88+
### 🔍 Match Types
89+
90+
**Exact Match**:
91+
- SHA-256 hashes identical
92+
- Byte-for-byte identical file
93+
- Highest confidence (100%)
94+
95+
**Perceptual Match**:
96+
- SHA-256 differs, but perceptual hashes similar
97+
- Indicates modifications like:
98+
- Format conversion (PNG → JPEG)
99+
- Compression level change
100+
- Minor cropping or resizing
101+
- Color adjustment
102+
- Watermark addition
103+
104+
**No Match**:
105+
- Neither cryptographic nor perceptual match
106+
- Unknown provenance
107+
- Possibly original (never registered)
108+
109+
### ⚖️ Legal Validity Assessment
110+
111+
**Chain of Custody**: Critical for legal admissibility
112+
113+
- **Intact**: Exact match found, high legal validity
114+
- **Likely Intact**: Minor modifications only, still admissible
115+
- **Questionable**: Moderate modifications, requires explanation
116+
- **Broken**: Significant changes, likely not admissible
117+
118+
## Interpretation Guidelines
119+
120+
### ✅ High Authenticity (Score: 85-100)
121+
122+
**Indicators**:
123+
- Exact SHA-256 match OR
124+
- Perceptual hash distance < 5 bits
125+
- Clear modification history in database
126+
- Timestamps match expected timeline
127+
128+
**Confidence**: High - Image is authentic or minimally modified
129+
130+
### 🟡 Medium Authenticity (Score: 55-84)
131+
132+
**Indicators**:
133+
- Perceptual hash distance 5-15 bits
134+
- Moderate modifications detected
135+
- Some inconsistencies in timeline
136+
- Format or size changes
137+
138+
**Confidence**: Medium - Image may be authentic but edited
139+
140+
### 🔴 Low Authenticity (Score: 0-54)
141+
142+
**Indicators**:
143+
- No perceptual match in database
144+
- Hash distance > 15 bits
145+
- Unknown provenance
146+
- Possible forgery or new image
147+
148+
**Confidence**: Low - Cannot verify authenticity
149+
150+
## Database Management
151+
152+
### Adding Images to Blockchain
153+
154+
When you add an image:
155+
1. Generates all hash types (SHA-256, pHash, aHash, dHash, wHash)
156+
2. Records timestamp (ISO 8601 format)
157+
3. Stores file metadata (size, dimensions, format)
158+
4. Creates immutable record in JSON "blockchain"
159+
160+
### Searching the Database
161+
162+
When you verify an image:
163+
1. Generates current hashes
164+
2. Searches database for matches
165+
3. Calculates similarity scores
166+
4. Returns modification history
167+
168+
### Import/Export Functionality
169+
170+
**Export**: Save database to share with other systems
171+
**Import**: Load trusted database (merge or replace)
172+
**Backup**: Regular exports for disaster recovery
173+
174+
## Use Cases
175+
176+
### Digital Forensics:
177+
- **Evidence verification**: Prove image hasn't been tampered
178+
- **Chain of custody**: Track image from capture to court
179+
- **Timeline establishment**: Verify when image was created
180+
- **Duplicate detection**: Find all versions of an image
181+
182+
### Copyright Protection:
183+
- **Proof of ownership**: Establish creation date
184+
- **Infringement detection**: Find unauthorized copies
185+
- **Licensing verification**: Confirm licensed versions
186+
187+
### Journalism & Media:
188+
- **Source verification**: Confirm image origin
189+
- **Deepfake detection**: Check against known authentic images
190+
- **Archive integrity**: Ensure historical images unchanged
191+
192+
### Corporate Security:
193+
- **Data leak prevention**: Track sensitive images
194+
- **Insider threat detection**: Monitor unauthorized distribution
195+
- **Compliance auditing**: Verify document integrity
196+
197+
## Technical Implementation
198+
199+
### Hash Generation:
200+
```
201+
Image File → SHA-256 → Cryptographic Hash (exact)
202+
→ pHash → Perceptual Hash (similarity)
203+
→ aHash → Average Hash (fast similarity)
204+
→ dHash → Difference Hash (gradient)
205+
→ wHash → Wavelet Hash (texture)
206+
```
207+
208+
### Similarity Calculation:
209+
- **Hamming Distance**: Counts differing bits between hashes
210+
- **Lower distance** = More similar images
211+
- **Threshold**: Typically 10 bits for "similar" classification
212+
213+
### Blockchain Simulation:
214+
- **JSON-based storage**: Simple, portable database
215+
- **Immutable records**: Each entry timestamped
216+
- **Chronological ordering**: Establishes timeline
217+
- **Metadata included**: Full context for each image
218+
219+
## Limitations
220+
221+
### 1. **Not a True Blockchain**
222+
- Simulated blockchain (JSON file, not distributed ledger)
223+
- Not cryptographically chained (no hash linking)
224+
- Suitable for demo/educational purposes
225+
- Production use would require real blockchain
226+
227+
### 2. **Perceptual Hash Limitations**
228+
- Cannot detect all modifications
229+
- Advanced forgery can fool perceptual hashing
230+
- Threshold selection affects accuracy
231+
232+
### 3. **Database Security**
233+
- JSON file can be manually edited (in theory)
234+
- No built-in tamper protection
235+
- Should be stored securely with access controls
236+
237+
### 4. **Initial Registration Required**
238+
- Images must be registered BEFORE verification
239+
- Cannot verify previously unknown images
240+
- Empty database = no verification possible
241+
242+
### 5. **Storage Considerations**
243+
- Database grows with each registered image
244+
- Regular backups recommended
245+
- Export/import for database migration
246+
247+
## Best Practices
248+
249+
✔️ **Register images immediately** upon capture/creation
250+
✔️ **Regular database backups** to prevent data loss
251+
✔️ **Secure database storage** with restricted access
252+
✔️ **Document all modifications** when editing registered images
253+
✔️ **Use exact match** for legal evidence (SHA-256)
254+
✔️ **Use perceptual match** for finding similar versions
255+
✔️ **Export database** before sharing with external parties
256+
✔️ **Verify timestamps** match expected timeline
257+
✔️ **Cross-reference** with other forensic techniques
258+
259+
## Legal and Ethical Considerations
260+
261+
### Admissibility in Court:
262+
- **Chain of custody** must be documented
263+
- **Exact hash match** provides strong evidence
264+
- **Modification history** must be explained
265+
- **Database integrity** must be proven
266+
267+
### Privacy Concerns:
268+
- Hash databases may contain sensitive information
269+
- Follow data protection regulations (GDPR, etc.)
270+
- Obtain proper authorization before hashing images
271+
272+
### Ethical Use:
273+
- Don't use for unauthorized surveillance
274+
- Respect copyright and intellectual property
275+
- Maintain transparency in forensic analysis
276+
277+
---
278+
279+
## Educational Context
280+
281+
This module demonstrates critical **Information Security** concepts:
282+
283+
- **Cryptographic Integrity**: Using hashes for verification
284+
- **Digital Provenance**: Tracking asset history
285+
- **Blockchain Technology**: Immutable audit trails
286+
- **Similarity Detection**: Perceptual vs. exact matching
287+
- **Legal Admissibility**: Chain of custody requirements
288+
289+
**Real-World Application**: Similar systems are used by:
290+
- Law enforcement for digital evidence
291+
- Content platforms for copyright detection (YouTube Content ID)
292+
- News organizations for image verification
293+
- Blockchain projects for NFT authenticity
294+
295+
---
296+
297+
_Hash verification is a cornerstone of digital forensics. Understanding both cryptographic and perceptual hashing enables comprehensive image authentication and provenance tracking._

0 commit comments

Comments
 (0)