AI-based Interview Coaching Using Voice and  Video Analysis

Sejal V. Gaud; Sanskruti P. Shinde; Pratiksha D.  Patil; Shubhangi Bhaigade

Authors

Sejal V. Gaud
Sanskruti P. Shinde
Pratiksha D. Patil
Shubhangi Bhaigade

Keywords:

Artificial intelligence, Communication skills, Facial expression detection, Interview coaching, Machine learning, Soft skills assessment, Video analysis, Voice analysis

Abstract

In the modern job market, interview performance plays an important role in selecting suitable candidates for employment, internships, and academic opportunities. Although many students and job seekers possess good technical knowledge, they often fail to perform well in interviews due to poor communication skills, lack of confidence, nervousness, and weak body language. Traditional interview preparation methods such as mock interviews, classroom practice, and mentor guidance are helpful, but they are often limited by subjectivity, time constraints, and a lack of personalized feedback. In many cases, candidates do not receive detailed insights into their verbal and non-verbal performance. This research proposes VocalVision, an AI-based interview coaching system that uses voice and video analysis to evaluate interview performance and provide real-time feedback. The system focuses on analyzing speech clarity, tone, confidence level, filler word usage, eye contact, facial expressions, and emotional consistency. By combining speech processing, computer vision, and machine learning techniques, the proposed model creates a smart and accessible interview preparation platform. The methodology includes data acquisition, preprocessing, feature extraction, model training, testing, and performance evaluation. Audio features such as MFCC and visual features obtained through facial landmark detection are used to classify interview responses such as strong, moderate, or needs improvement. The system is expected to achieve high accuracy, balanced F1-score, and low processing time, making it suitable for mobile and desktop deployment. VocalVision aims to support students and job seekers by offering scalable, objective, and personalized interview coaching cost-effectively.

References

Y. G., K. Sushmitha, Z. M. Tabha, and U. Asma, “AI-powered mock interview coach,” Int. J. Res. Publ. Rev., vol. 5, no. 12, pp. 5657–5659, Dec. 2024.

S. Pawar, G. Misal, V. Sanap, V. Nanaware, and N. Berwal, “Empowering interview success: An AI-driven approach,” Int. J. Creat. Res. Thoughts (IJCRT), vol. 12, no. 5, pp. j619–j624, 2024.

S. Rai, A. Miranda, S. Jagirdar, and N. Chitalia, “Skillup bot: An AI-driven mock interview platform,” Int. Res. J. Eng. Technol. (IRJET), vol. 11, no. 4, pp. 2344–2348, Apr. 2024.

D. Martinelli, J. Cerbaro, M. A. Teixeira, J. A. Fabro, and A. Schneider de Oliveira, “A tutorial to use the MediaPipe framework with ROS2,” in Robot Operating System (ROS): The Complete Reference, vol. 7, Cham, Switzerland: Springer, Feb. 2023, pp. 57–79.

D. R. Duke et al., “TensorFlow Lite Micro: Embedded machine learning for TinyML systems,” in Proc. Mach. Learn. Syst., vol. 3, Virtual Conf., Mar. 2021, pp. 800–811.

T. Viarbitskaya and A. Dobrucki, “Audio processing using Python language scientific libraries,” in Proc. Signal Process.: Algorithms, Architectures, Arrangements, Appl. (SPA), Poznań, Poland, Sep. 2018, pp. 350–354.

R. F. Gibadullin, M. Y. Perukhin, and A. V. Ilin, “Speech recognition and machine translation using neural networks,” in Proc. Int. Conf. Ind. Eng., Appl. Manuf. (ICIEAM), Sochi, Russia, May 2021, pp. 398–403.

S. Nagdeote, A. Serrao, P. Dsouza, N. Joshi, and S. Chiwande, “PrepGenius: An AI interview assistant,” in Proc. 3rd Int. Conf. Intell. Cyber Phys. Syst. Internet Things (ICoICI), 2025, pp. 958–963.

T. Sharma, S. Singh, A. Singh, and G. Gupta, “Review of natural language processing and computer vision in AI-powered interview platforms,” in AIP Conf. Proc., vol. 3327, no. 1, Jul. 2025, Art. no. 020001.

Y. Dai, M. Jayaratne, and B. Jayatilleke, “Explainable personality prediction using answers to open-ended interview questions,” Front. Psychol., vol. 13, Art. no. 865841, Nov. 2022.

S. J. Summaira, X. Li, A. M. Shoib, S. Li, and J. Abdul, “Recent advances and trends in multimodal deep learning: A review,” arXiv preprint, May 2021.

M. S. Bartlett, G. Littlewort, I. Fasel, and J. R. Movellan, “Real-time face detection and facial expression recognition: Development and applications to human–computer interaction,” in Proc. IEEE Comput. Vis. Pattern Recognit. Workshop, Madison, WI, USA, Jun. 2003, vol. 5, p. 53.

AI-based Interview Coaching Using Voice and Video Analysis

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

Current Issue