AI-based Interview Coaching Using Voice and Video Analysis
Keywords:
Artificial intelligence, Communication skills, Facial expression detection, Interview coaching, Machine learning, Soft skills assessment, Video analysis, Voice analysisAbstract
In the modern job market, interview performance plays an important role in selecting suitable candidates for employment, internships, and academic opportunities. Although many students and job seekers possess good technical knowledge, they often fail to perform well in interviews due to poor communication skills, lack of confidence, nervousness, and weak body language. Traditional interview preparation methods such as mock interviews, classroom practice, and mentor guidance are helpful, but they are often limited by subjectivity, time constraints, and a lack of personalized feedback. In many cases, candidates do not receive detailed insights into their verbal and non-verbal performance. This research proposes VocalVision, an AI-based interview coaching system that uses voice and video analysis to evaluate interview performance and provide real-time feedback. The system focuses on analyzing speech clarity, tone, confidence level, filler word usage, eye contact, facial expressions, and emotional consistency. By combining speech processing, computer vision, and machine learning techniques, the proposed model creates a smart and accessible interview preparation platform. The methodology includes data acquisition, preprocessing, feature extraction, model training, testing, and performance evaluation. Audio features such as MFCC and visual features obtained through facial landmark detection are used to classify interview responses such as strong, moderate, or needs improvement. The system is expected to achieve high accuracy, balanced F1-score, and low processing time, making it suitable for mobile and desktop deployment. VocalVision aims to support students and job seekers by offering scalable, objective, and personalized interview coaching cost-effectively.
References
Y. G., K. Sushmitha, Z. M. Tabha, and U. Asma, “AI-powered mock interview coach,” Int. J. Res. Publ. Rev., vol. 5, no. 12, pp. 5657–5659, Dec. 2024.
S. Pawar, G. Misal, V. Sanap, V. Nanaware, and N. Berwal, “Empowering interview success: An AI-driven approach,” Int. J. Creat. Res. Thoughts (IJCRT), vol. 12, no. 5, pp. j619–j624, 2024.
S. Rai, A. Miranda, S. Jagirdar, and N. Chitalia, “Skillup bot: An AI-driven mock interview platform,” Int. Res. J. Eng. Technol. (IRJET), vol. 11, no. 4, pp. 2344–2348, Apr. 2024.
D. Martinelli, J. Cerbaro, M. A. Teixeira, J. A. Fabro, and A. Schneider de Oliveira, “A tutorial to use the MediaPipe framework with ROS2,” in Robot Operating System (ROS): The Complete Reference, vol. 7, Cham, Switzerland: Springer, Feb. 2023, pp. 57–79.
D. R. Duke et al., “TensorFlow Lite Micro: Embedded machine learning for TinyML systems,” in Proc. Mach. Learn. Syst., vol. 3, Virtual Conf., Mar. 2021, pp. 800–811.
T. Viarbitskaya and A. Dobrucki, “Audio processing using Python language scientific libraries,” in Proc. Signal Process.: Algorithms, Architectures, Arrangements, Appl. (SPA), Poznań, Poland, Sep. 2018, pp. 350–354.
R. F. Gibadullin, M. Y. Perukhin, and A. V. Ilin, “Speech recognition and machine translation using neural networks,” in Proc. Int. Conf. Ind. Eng., Appl. Manuf. (ICIEAM), Sochi, Russia, May 2021, pp. 398–403.
S. Nagdeote, A. Serrao, P. Dsouza, N. Joshi, and S. Chiwande, “PrepGenius: An AI interview assistant,” in Proc. 3rd Int. Conf. Intell. Cyber Phys. Syst. Internet Things (ICoICI), 2025, pp. 958–963.
T. Sharma, S. Singh, A. Singh, and G. Gupta, “Review of natural language processing and computer vision in AI-powered interview platforms,” in AIP Conf. Proc., vol. 3327, no. 1, Jul. 2025, Art. no. 020001.
Y. Dai, M. Jayaratne, and B. Jayatilleke, “Explainable personality prediction using answers to open-ended interview questions,” Front. Psychol., vol. 13, Art. no. 865841, Nov. 2022.
S. J. Summaira, X. Li, A. M. Shoib, S. Li, and J. Abdul, “Recent advances and trends in multimodal deep learning: A review,” arXiv preprint, May 2021.
M. S. Bartlett, G. Littlewort, I. Fasel, and J. R. Movellan, “Real-time face detection and facial expression recognition: Development and applications to human–computer interaction,” in Proc. IEEE Comput. Vis. Pattern Recognit. Workshop, Madison, WI, USA, Jun. 2003, vol. 5, p. 53.