Ahmed, Mohamed Mustaf (2025) Application of machine learning algorithms and SHAP explanations to predict fertility preference among reproductive women in Somalia. Scientific Reports.
s41598-025-04704-y.pdf - Published Version
Download (2MB)
Abstract
Fertility preferences significantly influence population dynamics and reproductive health outcomes,
particularly in low-resource settings, such as Somalia, where high fertility rates and limited healthcare
infrastructure pose significant challenges. Understanding the determinants of fertility preferences is
critical for designing targeted interventions. This study leverages machine learning (ML) algorithms
and Shapley Additive extensions (SHAP) to identify key predictors of fertility preferences among
reproductive-aged women in Somalia. This cross-sectional study utilized data from the 2020 Somalia
Demographic and Health Survey (SDHS), encompassing 8,951 women aged 15–49 years. The outcome
variable, fertility preference, was dichotomized as either desire for more children or preference to
cease childbearing. Predictor variables included sociodemographic factors, such as age, education,
parity, wealth, residence, and distance to health facilities. Seven ML algorithms were evaluated for
predictive performance, with Random Forest emerging as the optimal model based on metrics such
as accuracy, precision, recall, F1-score, and the Area Under the Receiver Operating Characteristic
Curve (AUROC). SHAP was employed to interpret the model by quantifying the feature contributions.
The SHAP analysis identified the most influential predictors of fertility preferences as age group,
region, number of births in the last five years, number of children born, marital status, wealth index,
education level, residence, and distance to health facilities. Specifically, age group was the most
significant feature, followed by region and number of births in the last five years. Women aged 45–49
years and those with higher parity were significantly more likely to prefer no additional children.
Distance to health facilities has emerged as a critical barrier, with better access being associated
with a greater likelihood of desiring more children. The Random Forest model demonstrated superior
performance, achieving an accuracy of 81%, precision of 78%, recall of 85%, F1-score of 82%, and
AUROC of 0.89. SHAP analysis provided interpretable insights, highlighting the nuanced interplay
of sociodemographic factors. This study underscores the potential of ML algorithms and SHAP in
advancing our understanding of fertility preferences in low-resource settings. By identifying critical
sociodemographic determinants, such as age group, region, number of births in the last five years,
number of children born, marital status, wealth index, education level, residence, distance to health
facilities, and employment status, these findings offer actionable insights to inform evidence-based
reproductive health interventions in Somalia. Future research should expand the application of ML
to longitudinal data and incorporate additional cultural and psychosocial predictors to enhance the
robustness and applicability of this model.
| Item Type: | Article |
|---|---|
| Subjects: | A General Works > AC Collections. Series. Collected works |
| Divisions: | Faculty of Medicine |
| Depositing User: | Unnamed user with email crd@smiad.edu.so |
| Date Deposited: | 20 Sep 2025 09:12 |
| Last Modified: | 20 Sep 2025 09:12 |
| URI: | https://repository.simad.edu.so/id/eprint/278 |
