Comparative Analysis of Bidirectional Encoder Representations from Transformers Models for Twitter Sentiment Classification using Text Mining on Streamlit

Ahmad Fajar Tatang; Mohammad Hasbi Assidiqi

doi:10.54082/jiki.307

Authors

Ahmad Fajar Tatang Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University, Saudi Arabia
Mohammad Hasbi Assidiqi Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University, Saudi Arabia

DOI:

https://doi.org/10.54082/jiki.307

Keywords:

BERT, Fine-Tuning, Sentiment Analysis, Social Media, Text Mining, Twitter

Abstract

Social media platforms like Twitter have become highly influential in shaping public opinion, making sentiment analysis on tweet data crucial. However, traditional techniques struggle with the nuances and complexities of informal social media text. This research addresses these challenges by conducting a comparative analysis between the non-optimized BERT (Bidirectional Encoder Representations from Transformers) model and the BERT model optimized with Fine-Tuning techniques for sentiment analysis on Indonesian Twitter data using text mining methods. Employing the CRISP-DM methodology, the study involves data collection through Twitter crawling using the keyword biznet, data preprocessing steps such as case folding, cleaning, tokenization, normalization, and data augmentation, with the dataset split into training, validation, and testing subsets for modeling and evaluation using the IndoBERT-base-p1 model specifically trained for the Indonesian language. The results demonstrate that the Fine-Tuned BERT model significantly outperforms the non-optimized BERT, achieving 91% accuracy, 0.91 precision, 0.90 recall, and 0.91 F1-score on the test set. Fine-Tuning enables BERT to adapt to the unique characteristics of Twitter sentiment data, allowing better recognition of language and context patterns associated with sentiment expressions. The optimized model is implemented as a web application for practical utilization. This research affirms the superiority of Fine-Tuned BERT for accurate sentiment analysis on Indonesian Twitter data, providing valuable insights for businesses, governments, and researchers leveraging social media data.

References

Y. Qi and Z. Shabrina, “Sentiment analysis using Twitter data: a comparative application of lexicon- and machine-learning-based approach,” Soc. Netw. Anal. Min., vol. 13, no. 1, pp. 1–14, 2023, doi: 10.1007/s13278-023-01030-x.

A. Ghorbanali and M. K. Sohrabi, “A comprehensive survey on deep learning-based approaches for multimodal sentiment analysis,” Artif. Intell. Rev., vol. 56, no. 1, pp. 1479–1512, 2023, doi: 10.1007/s10462-023-10555-8.

L. Zhang, H. Fan, C. Peng, G. Rao, and Q. Cong, “Sentiment analysis methods for hpv vaccines related tweets based on transfer learning,” Healthc., vol. 8, no. 3, pp. 1–18, 2020, doi: 10.3390/healthcare8030307.

C. Sun, Z. Yang, L. Wang, Y. Zhang, H. Lin, and J. Wang, “Deep learning with language models improves named entity recognition for PharmaCoNER,” BMC Bioinformatics, vol. 22, no. 1, pp. 1–16, 2021, doi: 10.1186/s12859-021-04260-y.

F. B. Oliveira, A. Haque, D. Mougouei, S. Evans, J. S. Sichman, and M. P. Singh, “Investigating the Emotional Response to COVID-19 News on Twitter: A Topic Modelling and Emotion Classification Approach,” IEEE Access, vol. 10, no. 1, pp. 16883–16897, 2022, doi: 10.1109/ACCESS.2022.3150329.

A. Pathak, S. Kumar, P. P. Roy, and B. G. Kim, “Aspect-based sentiment analysis in hindi language by ensembling pre-trained mbert models,” Electron., vol. 10, no. 21, pp. 1–15, 2021, doi: 10.3390/electronics10212641.

V. Gupta, J. Patel, S. Shubbar, and K. Ghazinour, “COVBERT: Enhancing Sentiment Analysis Accuracy in COVID-19 X Data through Customized BERT,” CS IT Conf. Proc., vol. 14, no. 2, pp. 171–182, 2024, doi: 10.5121/csit.2024.140212.

M. Bilal and A. A. Almazroi, “Effectiveness of Fine-tuned BERT Model in Classification of Helpful and Unhelpful Online Customer Reviews,” Electron. Commer. Res., vol. 23, no. 4, pp. 2737–2757, 2023, doi: 10.1007/s10660-022-09560-w.

C. Schröer, F. Kruse, and J. M. Gómez, “A systematic literature review on applying CRISP-DM process model,” Procedia Comput. Sci., vol. 181, no. 1, pp. 526–534, 2021, doi: 10.1016/j.procs.2021.01.199.

B. Richardson and A. Wicaksana, “Comparison of Indobert-Lite and Roberta in Text Mining for Indonesian Language Question Answering Application,” Int. J. Innov. Comput. Inf. Control, vol. 18, no. 6, pp. 1719–1734, 2022, doi: 10.24507/ijicic.18.06.1719.

M. A. Khder, “Web scraping or web crawling: State of art, techniques, approaches and application,” Int. J. Adv. Soft Comput. its Appl., vol. 13, no. 3, pp. 144–168, 2021, doi: 10.15849/ijasca.211128.11.

A. Bello, S. C. Ng, and M. F. Leung, “A BERT Framework to Sentiment Analysis of Tweets,” Sensors, vol. 23, no. 1, pp. 1–14, 2023, doi: 10.3390/s23010506.

M. R. Akbar, I. Slamet, and S. S. Handajani, “Sentiment analysis using tweets data from twitter of indonesian’s capital city changes using classification method support vector machine,” AIP Conf. Proc., vol. 2296, no. 1, pp. 1–8, 2020, doi: 10.1063/5.0030357.

P. Chauhan, N. Sharma, and G. Sikka, “On the importance of pre-processing in small-scale analyses of twitter: a case study of the 2019 Indian general election,” Multimed. Tools Appl., vol. 83, no. 7, pp. 19219–19258, 2024, doi: 10.1007/s11042-023-16158-3.

D. Cho, H. Lee, and S. Kang, “An empirical study of korean sentence representation with various tokenizations,” Electron., vol. 10, no. 7, pp. 1–12, 2021, doi: 10.3390/electronics10070845.

K. Mehmood, D. Essam, K. Shafi, and M. K. Malik, “An unsupervised lexical normalization for Roman Hindi and Urdu sentiment analysis,” Inf. Process. Manag., vol. 57, no. 6, p. 102368, 2020, doi: https://doi.org/10.1016/j.ipm.2020.102368.

Y. Sujana and H. Y. Kao, “LiDA: Language-Independent Data Augmentation for Text Classification,” IEEE Access, vol. 11, no. 1, pp. 10894–10901, 2023, doi: 10.1109/ACCESS.2023.3234019.

L. Alzubaidi et al., “Review of deep learning: concepts, CNN architectures, challenges, applications, future directions,” J. Big Data, vol. 8, no. 1, pp. 1–74, 2021, doi: 10.1186/s40537-021-00444-8.

A. Vabalas, E. Gowen, E. Poliakoff, and A. J. Casson, “Machine learning algorithm validation with a limited sample size,” PLoS One, vol. 14, no. 11, pp. 1–20, 2019, doi: 10.1371/journal.pone.0224365.

S. Abdel-Salam and A. Rafea, “Performance Study on Extractive Text Summarization Using BERT Models,” Inf., vol. 13, no. 2, pp. 1–10, 2022, doi: 10.3390/info13020067.

X. Wang, F. Chao, and G. Yu, “Evaluating Rumor Debunking Effectiveness During the COVID-19 Pandemic Crisis: Utilizing User Stance in Comments on Sina Weibo,” Front. Public Heal., vol. 9, no. 1, pp. 1–18, 2021, doi: 10.3389/fpubh.2021.770111.

Q. Meng et al., “Electric Power Audit Text Classification With Multi-Grained Pre-Trained Language Model,” IEEE Access, vol. 11, no. 1, pp. 13510–13518, 2023, doi: 10.1109/ACCESS.2023.3240162.

J. Devlin, M.-W. Chang, K. Lee, K. T. Google, and A. I. Language, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Proc. naacL-HLT, vol. 1, no. 1, pp. 4171–4186, 2019, [Online]. Available: https://aclanthology.org/N19-1423.pdf

D. Li, Y. Xiong, B. Hu, B. Tang, W. Peng, and Q. Chen, “Drug knowledge discovery via multi-task learning and pre-trained models,” BMC Med. Inform. Decis. Mak., vol. 21, no. 1, pp. 1–8, 2021, doi: 10.1186/s12911-021-01614-7.

S. Lamsiyah, A. El Mahdaouy, S. E. A. Ouatik, and B. Espinasse, “Unsupervised extractive multi-document summarization method based on transfer learning from BERT multi-task fine-tuning,” J. Inf. Sci., vol. 49, no. 1, pp. 164–182, 2023, doi: 10.1177/0165551521990616.

B. Ko and H. J. Choi, “Twice fine-tuning deep neural networks for paraphrase identification,” Electron. Lett., vol. 56, no. 9, pp. 449–450, 2020, doi: 10.1049/el.2019.4183.

C. Liu, W. Zhu, X. Zhang, and Q. Zhai, “Sentence part-enhanced BERT with respect to downstream tasks,” Complex Intell. Syst., vol. 9, no. 1, pp. 463–474, 2023, doi: 10.1007/s40747-022-00819-1.

M. P. Geetha and D. Karthika Renuka, “Improving the performance of aspect based sentiment analysis using fine-tuned Bert Base Uncased model,” Int. J. Intell. Networks, vol. 2, no. 1, pp. 64–69, 2021, doi: 10.1016/j.ijin.2021.06.005.

Comparative Analysis of Bidirectional Encoder Representations from Transformers Models for Twitter Sentiment Classification using Text Mining on Streamlit

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Sidebar

Information