Bridging the Communication Gap: A Deep Learning- Based System for Real-Time Sign Language to Speech and Text Translation
Keywords:
Convolutional Neural Networks (CNNs), Deep learning, Gesture detection, Real-time communication, Sign language recognition, Text-to-speechAbstract
Communication barriers faced by the hearing and speech-impaired community often hinder their daily interactions with the larger society. Sign language is a vital medium of communication for them, but its understanding is limited among the general population. This paper presents a real-time system that converts hand gestures in American Sign Language (ASL) into both textual and spoken output, enabling more inclusive and accessible communication. The system employs a Convolutional Neural Network (CNN) model trained on custom-collected gesture data and uses TensorFlow for gesture recognition. The recognized signs are immediately converted to corresponding text and synthesized speech. The system demonstrates high accuracy in controlled environments and serves as a step toward bridging the communication gap between sign language users and the broader community. The system’s performance is analyzed using 95% confidence intervals and p-values to demonstrate statistical significance. The CNN’s performance is benchmarked against simpler models (SVM and ResNet18) to justify architectural choices. Additionally, preprocessing steps (RGB vs. grayscale) and layer removal experiments are discussed. The paper also details the number of participants, their skin tones, and gesture samples per class, along with steps taken to mitigate bias (such as data augmentation for diverse skin tones and hand sizes).