Optimized Gaussian Naive Bayes Model for Predictive Analysis of Income Brackets Using Multi-Dimensional Data
Keywords:
Classification models, Ensemble methods, Feature selection, Gaussian naive Bayes, Naive Bayes classifiers, Predictive analyticsAbstract
This project presents an optimized Gaussian Naive Bayes (GNB) model for predicting income brackets using multi-dimensional data. GNB, a probabilistic classifier rooted in Bayes’ Theorem, assumes that features are conditionally independent and normally distributed, making it well-suited for continuous data classification tasks. The model is applied to classify individuals as earning either above or below $50K annually, a task with significant implications for economic policy and market analysis. To enhance performance, the project incorporates preprocessing techniques, including normalization, encoding, and feature selection, as well as cross-validation and hyperparameter tuning. Evaluation metrics like accuracy, precision, recall, F1-score, and ROC-AUC are used to validate the model’s robustness and reliability. The results demonstrate that the optimized GNB model offers a scalable, interpretable, and efficient solution for income classification, with potential applications in real-time analytics and data-driven decision-making.