Enhancing Fashion Product Sales Segmentation Using Random Forest with SMOTE and Hyperparameter Optimization
DOI:
https://doi.org/10.54082/jiki.280Keywords:
Classification, Data Mining, E-commerce, Fashion, Random Forest, Sales SegmentationAbstract
The rapid expansion of the fashion e-commerce sector has intensified the need for accurate sales segmentation to support targeted marketing and efficient inventory management. This study proposes a robust methodology for classifying fashion product sales into three categories: high-selling, moderately-selling, and low-selling, using the Random Forest algorithm integrated with the Synthetic Minority Over-sampling Technique (SMOTE) and hyperparameter optimization. A real-world dataset comprising over 20,000 product records from an online marketplace was preprocessed through missing-value handling, categorical encoding, and numerical feature standardization. Class labels were generated using quantile-based segmentation of sales volume, followed by class balancing with SMOTE. The Random Forest model was tuned using RandomizedSearchCV and evaluated through accuracy, precision, recall, F1-score, and Receiver Operating Characteristic–Area Under Curve (ROC-AUC) metrics. Experimental results demonstrate strong predictive performance, achieving an accuracy of 90.43%, macro-precision of 90.60%, macro-recall of 90.45%, macro-F1 of 90.50%, and macro ROC-AUC of 0.9783. Feature importance analysis revealed that price, category, and customer ratings were the most influential predictors of sales segmentation. These findings validate the effectiveness of ensemble learning combined with class imbalance handling for multi-class classification in retail datasets. From a scientific perspective, this research contributes to the literature by presenting a reproducible, data-driven framework for product segmentation in heterogeneous and imbalanced datasets. Practically, the proposed approach can guide fashion retailers in refining pricing strategies, optimizing marketing campaigns, and improving inventory decisions in competitive online marketplaces. The methodology is adaptable to other e-commerce domains, offering broader implications for business intelligence and predictive analytics.
References
M. J. Smith et al., "Predicting demand for new products in fashion retailing using Random Forest and Deep Neural Networks," Expert Systems with Applications, vol. 210, 2024. [Online]. Available: https://doi.org/10.1016/j.eswa.2024.125313
Shopee Dataset, "Shopee Fashion Data," Kaggle, 2023. [Online]. Available: https://www.kaggle.com/datasets/shopee-codeleague
S. T. Nugroho and F. Pratama, "Implementasi Random Forest untuk Prediksi Penjualan Produk Fashion," Jurnal Teknologi Informasi dan Komunikasi, vol. 2024, no. 2, pp. 45-53, 2024.
A. K. Sharma et al., "A Supervised Machine Learning Classification Framework for Sustainable Clothing Products," Sustainability, vol. 14, no. 3, p. 1334, 2022. [Online]. Available: https://www.mdpi.com/2071-1050/14/3/1334
D. Liliyawati, "Perbandingan performa model prediksi customer churn berbasis machine learning pada fashion e-commerce," Skripsi, Universitas Lampung, 2023. [Online]. Available: http://digilib.unila.ac.id/76438/
N. V. Chawla et al., "SMOTE: Synthetic Minority Over-sampling Technique," Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.
R. Sari and D. Kurniawan, "Klasifikasi Ulasan Konsumen Menggunakan Random Forest dan SMOTE," Jurnal Sistem dan Komputer, vol. 2024, no. 1, 2024. [Online]. Available: https://journal.unpacti.ac.id/index.php/JSCE/article/view/1061
J. Lee, D. Kim, and S. Kim, "Fashion Product Sales Prediction Using Machine Learning Algorithms: A Comparative Study," Journal of Retailing and Consumer Services, vol. 75, 2023.
A. Rahman, M. Hasan, and M. S. Rahman, "Ensemble Learning for Sales Forecasting in E-commerce," International Journal of Computer Science and Network Security, vol. 22, no. 1, pp. 101-107, 2022.
P. Gupta and S. Sharma, "Predictive Analytics for Retail Sales Using Random Forest Classifier," Procedia Computer Science, vol. 197, pp. 784-791, 2021.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Guntur Tri Atmaja

This work is licensed under a Creative Commons Attribution 4.0 International License.



