TY - JOUR
T1 - Evaluating Feature Selection Methods for Accurate Diagnosis of Diabetic Kidney Disease
AU - Maeda-Gutiérrez, Valeria
AU - Galván-Tejada, Carlos E.
AU - Galván-Tejada, Jorge I.
AU - Cruz, Miguel
AU - Celaya-Padilla, José M.
AU - Gamboa-Rosales, Hamurabi
AU - García-Hernández, Alejandra
AU - Luna-García, Huizilopoztli
AU - Villalba-Condori, Klinge Orlando
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/12
Y1 - 2024/12
N2 - Background/Objectives: The increase in patients with type 2 diabetes, coupled with the development of complications caused by the same disease is an alarming aspect for the health sector. One of the main complications of diabetes is nephropathy, which is also the main cause of kidney failure. Once diagnosed, in Mexican patients the kidney damage is already highly compromised, which is why acting preventively is extremely important. The aim of this research is to compare distinct methodologies of feature selection to identify discriminant risk factors that may be beneficial for early treatment, and prevention. Methods: This study focused on evaluating a Mexican dataset collected from 22 patients containing 32 attributes. To reduce the dimensionality and choose the most important variables, four feature selection algorithms: Univariate, Boruta, Galgo, and Elastic net were implemented. After selecting suitable features detected by the methodologies, they are included in the random forest classifier, obtaining four models. Results: Galgo with Random Forest achieved the best performance with only three predictors, “creatinine”, “urea”, and “lipids treatment”. The model displayed a moderate classification performance with an area under the curve of 0.80 (±0.3535 SD), a sensitivity of 0.909, and specificity of 0.818. Conclusions: It is demonstrated that the proposed methodology has the potential to facilitate the prompt identification of nephropathy and non-nephropathy patients, and thereby could be used in the clinical area as a preliminary computer-aided diagnosis tool.
AB - Background/Objectives: The increase in patients with type 2 diabetes, coupled with the development of complications caused by the same disease is an alarming aspect for the health sector. One of the main complications of diabetes is nephropathy, which is also the main cause of kidney failure. Once diagnosed, in Mexican patients the kidney damage is already highly compromised, which is why acting preventively is extremely important. The aim of this research is to compare distinct methodologies of feature selection to identify discriminant risk factors that may be beneficial for early treatment, and prevention. Methods: This study focused on evaluating a Mexican dataset collected from 22 patients containing 32 attributes. To reduce the dimensionality and choose the most important variables, four feature selection algorithms: Univariate, Boruta, Galgo, and Elastic net were implemented. After selecting suitable features detected by the methodologies, they are included in the random forest classifier, obtaining four models. Results: Galgo with Random Forest achieved the best performance with only three predictors, “creatinine”, “urea”, and “lipids treatment”. The model displayed a moderate classification performance with an area under the curve of 0.80 (±0.3535 SD), a sensitivity of 0.909, and specificity of 0.818. Conclusions: It is demonstrated that the proposed methodology has the potential to facilitate the prompt identification of nephropathy and non-nephropathy patients, and thereby could be used in the clinical area as a preliminary computer-aided diagnosis tool.
KW - diabetic kidney disease
KW - feature selection algorithms
KW - machine learning
KW - random forest
KW - risk factors
UR - http://www.scopus.com/inward/record.url?scp=85213219215&partnerID=8YFLogxK
U2 - 10.3390/biomedicines12122858
DO - 10.3390/biomedicines12122858
M3 - Article
AN - SCOPUS:85213219215
SN - 2227-9059
VL - 12
JO - Biomedicines
JF - Biomedicines
IS - 12
M1 - 2858
ER -