Comparative Analysis of Gaussian Naïve Bayes and Categorical Naïve Bayes Algorithms with Laplace Smoothing in COVID-19 Detection
DOI:
https://doi.org/10.54082/jiki.286Keywords:
COVID-19, Laplace Smoothing, Naïve Bayes, PythonAbstract
In January 2020, it was confirmed that COVID-19 can be transmitted from human to human through the upper respiratory tract with a high infection rate. The number of COVID-19 cases worldwide continued to increase rapidly through close contact, droplets, and airborne transmission. In response, governments and the WHO implemented preventive measures, including COVID-19 treatment preparation, increased emergency healthcare capacity, and patient screening. Early detection of COVID-19 became crucial in taking action, providing treatment, and protecting others. In the Naïve Bayes algorithm, a potential issue arises with the possibility of zero probabilities for some features or attributes in the COVID-19 prediction training data. Therefore, Laplace Smoothing is used to address this problem. This study aims to compare the average accuracy rates of Gaussian Naïve Bayes and Categorical Naïve Bayes algorithms using different proportions of training data but the same testing data for COVID-19 detection. The methods used in this research are Gaussian Naïve Bayes and Categorical Naïve Bayes with Laplace Smoothing implemented using the Python library called scikit-learn. The research results show that the Gaussian Naïve Bayes algorithm without Laplace Smoothing has an average accuracy of 0.902165, while with Laplace Smoothing, it has an average accuracy of 0.973448. For the Categorical Naïve Bayes algorithm, without Laplace Smoothing, it has an average accuracy of 0.983864, while with Laplace Smoothing, it has an average accuracy of 0.984273. In conclusion, Laplace Smoothing plays a significant role in improving the average accuracy of Naïve Bayes algorithms. Categorical Naïve Bayes achieves the highest average accuracy of 0.9840685 (with and without Laplace Smoothing), while Gaussian Naïve Bayes achieves 0.947549 (with and without Laplace Smoothing). Categorical Naïve Bayes has a higher average accuracy compared to Gaussian Naïve Bayes.
References
P. Zhou et al., “A pneumonia outbreak associated with a new coronavirus of probable bat origin,” Nature, vol. 579, no. 7798, pp. 270–273, Mar. 2020, doi: 10.1038/s41586-020-2012-7.
C. Wang, Z. Wang, G. Wang, J. Y.-N. Lau, K. Zhang, and W. Li, “COVID-19 in early 2021: current status and looking forward,” Signal Transduct Target Ther, vol. 6, no. 1, p. 114, Mar. 2021, doi: 10.1038/s41392-021-00527-1.
A. S. Fauci, H. C. Lane, and R. R. Redfield, “Covid-19 — Navigating the Uncharted,” New England Journal of Medicine, vol. 382, no. 13, pp. 1268–1269, Mar. 2020, doi: 10.1056/NEJMe2002387.
C. Matrix and J. Riyono, “Early Detection of COVID-19 Disease Based on Behavioral Parameters and Symptoms Using Algorithm-C5. 0,” Indonesian Journal of Artificial Intelligence and Data Mining (IJAIDM), vol. 6, no. 1, pp. 47–53, 2023.
Alvina Felicia Watratan, Arwini Puspita. B, and Dikwan Moeis, “Implementasi Algoritma Naive Bayes Untuk Memprediksi Tingkat Penyebaran Covid-19 Di Indonesia,” Journal of Applied Computer Science and Technology, vol. 1, no. 1, pp. 7–14, Jul. 2020, doi: 10.52158/jacost.v1i1.9.
A. Ng, “CS229 Lecture notes,” CS229 Lecture notes, vol. 1, no. 1, pp. 1–3, 2000.
A. Pambudi, “Penerapan Crisp-Dm Menggunakan Mlr K-Fold Pada Data Saham Pt. Telkom Indonesia (Persero) Tbk (Tlkm)(Studi Kasus: Bursa Efek Indonesia Tahun 2015-2022),” Jurnal Data Mining dan Sistem Informasi, vol. 4, no. 1, pp. 1–14, 2023.
Nurahman, “Evaluasi Performa Algoritma C4.5 dan C4.5 Berbasis PSO untuk Memprediksi Penyakit Diabetes,” Jurnal E-Komtek (Elektro-Komputer-Teknik), vol. 4, no. 1, pp. 30–47, Jun. 2020, doi: 10.37339/e-komtek.v4i1.230.
Y. Mardi, “Data Mining : Klasifikasi Menggunakan Algoritma C4.5,” Edik Informatika, vol. 2, no. 2, pp. 213–219, Feb. 2017, doi: 10.22202/ei.2016.v2i2.1465.
E. Prasetyo, “Data mining konsep dan aplikasi menggunakan matlab,” Yogyakarta: Andi, vol. 1, 2012.
C. C. Aggarwal, Data mining: the textbook, vol. 1. Springer, 2015.
Z. H. Kilimci and M. C. Ganiz, “Evaluation of classification models for language processing,” in 2015 International Symposium on Innovations in Intelligent SysTems and Applications (INISTA), IEEE, Sep. 2015, pp. 1–8. doi: 10.1109/INISTA.2015.7276787.
G. Fiastantyo, “Perbandingan Kinerja Metode Klasifikasi Data Mining Menggunakan Naive Bayes dan Algoritma C4. 5 untuk Prediksi Ketepatan Waktu Kelulusan Mahasiswa,” Semantic Journal, 2014.
F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” the Journal of machine Learning research, vol. 12, pp. 2825–2830, 2011.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Dila Saputra, Abdul Aziz Fahmi 'Alauddin, Mochamad Azizan

This work is licensed under a Creative Commons Attribution 4.0 International License.