ESFF: Enhanced Semantic Full Fusion for Multimodal Sarcasm Detection: Leveraging Textual and Visual Cues BERT base model (uncased) Vision Transformer (base-sized model) data OCR Tesseract 5