Abdirahman, Abdullahi Ahmed (2023) Comparative Analysis of Machine Learning and Deep Learning Models for Sentiment Analysis in Somali Language. SSRG International Journal of Electrical and Electronics Engineering.
IJEEE-V10I7P104.pdf - Published Version
Download (307kB)
Abstract
Abstract - Understanding and analysing sentiment in user-generated content has become crucial with the increasing use of
social media and online platforms. However, sentiment analysis in less-resourced languages like Somali poses unique
challenges. This paper presents the performance of three ML algorithms (DTC, RFC, XGB) and two DL models (CNN, LSTM)
in accurately classifying sentiment in Somali text. The CC100-Somali dataset, comprising 78M monolingual Somali texts from
the Common crawl snapshots, is utilized for training and evaluation. The study employed rigorous evaluation techniques,
including train-test splits and cross-validation, to assess classification accuracy and performance metrics. The results
demonstrated that DTC achieved the highest accuracy among ML algorithms, 87.94%, while LSTM achieved the highest
accuracy among DL models, 88.58%. This study's findings contribute to sentiment analysis in less-resourced languages,
specifically Somali, and provide valuable insights into the performance of ML and DL techniques. Moreover, the study
highlights the potential of leveraging both ML and DL approaches to analyze sentiment in Somali text effectively. The results
and evaluation metrics benchmark future research in sentiment analysis for Somali and other low-resource languages.
Keywords - Somali language, Sentiment analysis, Machine learning, Deep learning, Somali dataset
| Item Type: | Article |
|---|---|
| Subjects: | A General Works > AC Collections. Series. Collected works |
| Divisions: | Faculty of Computing |
| Depositing User: | Unnamed user with email crd@smiad.edu.so |
| Date Deposited: | 20 Sep 2025 11:39 |
| Last Modified: | 20 Sep 2025 11:39 |
| URI: | https://repository.simad.edu.so/id/eprint/347 |
