Computer Vision, Edge Computing, Cloud Computing, Blockchain
This repository accompanies the paper:
Nada Shahin, Leila Ismail
Towards Trustworthy Sign Language Translation System:
A Privacy-Preserving Edge–Cloud–Blockchain Approach
Mathematics, 2025, 13, 3759.
DOI: 10.3390/math13233759
Prof. Leila Ismail Intelligent Distributed Computing and Systems (INDUCE) Lab College of Information Technology, United Arab Emirates University leila@uaeu.ac.ae
If you use this work, please cite: Shahin, Nada, and Leila Ismail. 2025. "Towards Trustworthy Sign Language Translation System: A Privacy-Preserving Edge–Cloud–Blockchain Approach" Mathematics 13, no. 23: 3759. https://doi.org/10.3390/math13233759
Overview:
This work envisions a new generation of trustworthy sign language translation systems that go beyond translation accuracy to address privacy, accountability, and real-world deployment. In response to the global shortage of certified sign language interpreters and the growing need for inclusive assistive technologies, the paper introduces a privacy-preserving, consent-aware SLMT architecture that integrates edge computing, cloud intelligence, and blockchain governance. By operating on abstract keypoint representations rather than raw video and enforcing explicit, auditable user consent, the system enables real-time communication while safeguarding user rights and regulatory compliance
At its core, the proposed system demonstrates that responsible AI and high performance are not competing goals. Through a comparative evaluation of Transformer-based models on large-scale and medical-domain datasets, the study shows that lightweight adaptive architectures can deliver accurate translations with substantially lower latency and computational cost in distributed environments. By embedding consent management and auditability directly into the AI pipeline, this work establishes a blueprint for ethically grounded, scalable assistive AI, with relevance extending beyond sign language translation to privacy-sensitive applications in healthcare and other biometric domains.
- 🔹 End-to-end edge–cloud–blockchain architecture for trustworthy SLMT
- 🔹 Consent-aware design supporting both application-level and system-level privacy
- 🔹 Comparative evaluation of:
- Encoder–Decoder Transformer
- Adaptive Transformer (ADAT)
- 🔹 New medical-domain dataset (MedASL) for sign-to-text translation
- 🔹 Comprehensive runtime analysis, including:
- Training time
- Inference latency
- Edge–cloud communication
- End-to-end system delay
System Architecture:
Our proposed end-to-end edge-cloud-blockchain system for SLMT is presented in Figure 1. It consists of the following modules:
- Sign Language Recognition Module
Captures sign videos via camera input for keypoint extraction and processing.
-
AI-Enabled Translation Application
Acts as a gateway for user interaction and consent management. -
Edge Computing Layer
- Keypoint extraction (MediaPipe)
- Preprocessing and real-time inference
- Reduced latency and improved privacy
-
Cloud Computing Layer
- Model training and retraining
- Dataset storage (with consent)
- Deployment of updated models
-
Blockchain Layer
- Immutable logging of:
- User consent receipts
- Policy versions
- System certificates
- Audit trails
It ensures transparency, integrity, and regulatory compliance.
- Immutable logging of:
- Baseline and widely adopted architecture for SLMT
- Captures long-range spatiotemporal dependencies
- Higher computational cost due to quadratic self-attention
- Lightweight and efficient variant
- Key features:
- LogSparse Self-Attention (O(n log n))
- Adaptive gating for short- and long-range dependencies
- Demonstrates:
- Faster convergence
- Reduced model size
- Lower inference and communication latency
- German Sign Language (DGS)
- Weather domain
- Large-scale, multi-signer benchmark dataset
- American Sign Language (ASL)
- Medical and healthcare conversations
- Designed to reflect real-world assistive scenarios
Dataset characteristics, preprocessing pipelines, and splits are fully described in the paper.
- Keypoint extraction using MediaPipe
- Hands, face, pose, and iris landmarks
- Normalization, rescaling, and padding
- Sliding-window segmentation for inference
- Tokenization and subword modeling for text output
This design improves privacy, efficiency, and robustness compared to raw RGB-based approaches.
Sign Language Machine Translation Edge/Cloud Demo with Transformer baseline
End-to-end edge-cloud pipeline for sign language translation, including keypoint capture via camera and model inference on edge + communication with cloud.
Experimental Setup:
- Edge: opencv-python, mediapipe, numpy, torch, requests, sentencepiece
- Cloud: tensorflow, flask, numpy, pyyaml
- Clone the repository
git clone https://github.com/INDUCE-Lab/End-to-End-Sign-Language-AI-Translation-System.git
cd End-to-End-Sign-Language-AI-Translation-System-
Train the translation model
-
Navigate to one of the translation models (Transformer/ or ADAT/)
cd Transformer # or cd ADAT
-
Install dependencies
pip install -r requirements.txt
-
Train the model By default, trains the model end-to-end using randomly generated video/gloss/text sequences:
python train.py --config config.yaml
-
Forward pass
python run.py
-
Training on real sign language datasets To train on real datasets like PHOENIX14T and ISL-CSLTR, replace the synthetic loader inside: data.py with the following preprocessed tensors:
- video_data: (N, T, 52, 65, 3) or (N, T, feature_dim)
- gloss_indices: (N, max_g) # Padded integer sequences
- text_indices: (N, max_t) # Padded integer sequences Then simply run:
python train.py --config config.yaml
-
-
Prepare the cloud service
-
Install the files under the model directory
- transformer.pth: best Transformer model checkpoint
- special_ids.json: token IDs and sizes
- medasl_bpe.model: SentencePiece-BPE model for text decoding
-
Install cloud dependencies and run:
pip install -r Cloud_requirements.txt # run server python -m Cloud # or python Cloud/.py
-
-
Configure the edge client
-
Open Edge.py and set:
- CLOUD_UPLOAD_URL = "http://<CLOUD_HOST>:/upload_keypoints"
- CLOUD_MODEL_URL = "http://<CLOUD_HOST>:/get_model"
- Adjust SEQ_LEN, MAX_SAMPLES, and paths if needed.
-
Install edge dependencies and run:
pip install -r Edge_requirements.txt # run edge python -m edge # or python edge.py
-
Controls:
- When running the edge Python script, a window will pop up showing landmarks and a caption ("translating.." appears until translation confidence/threshold is met).
- Press u to force an upload + model fetch cycle.
- Press q to quit.
-
- Edge calls /get_model, /get_specials, /get_spm to refresh assets (periodically).
- Camera captures frames, and the edge extracts keypoints using MediaPipe and concatenates keypoints, producing a vector of 1662 floats per frame.
- Sliding window of length SEQ_LEN forms an array [SEQ_LEN, 1662].
- When the sliding window is full, the window is saved locally (.npy).
- The edge preprocesses the npy file.
- Inference is run through the model to produce translations.
- When the local cache reaches MAX_SAMPLES, the edge packs all windows into samples.npz and transmits them to the cloud, along with the translations.
- Confidence gate: CONF_THRESH on edge controls when a translation becomes the displayed caption.
- Warm-up: MIN_WINDOWS_BEFORE_DISPLAY skips the first windows for stabilization before translating sign language to text.
- Local inference: toggle with RUN_LOCAL_INFERENCE on edge. If enabled, the edge loads MODEL_PATH and detokenizes with medasl_bpe.model + special_ids.json.
- Set the edge CLOUD_* URLs to the cloud host/IP and open the cloud port in the firewall.
- Run edge from the repo root:
python -m edge
# or
python edge.py- Run cloud similarly:
python -m cloud
# or
python cloud.pyThis project is released under the Creative Commons Attribution (CC BY 4.0) License, consistent with the published article.