TY - GEN
T1 - Minería de Patrones Secuenciales aplicada a la Predicción del Plegamiento de Proteínas
AU - Quintana-Zaez, J.
AU - Velarde-Bedregal, Héctor R.
AU - Calderón-Ruiz, Guillermo
AU - Santiesteban-Toca, Cosme E.
N1 - Publisher Copyright:
© 2019 Latin American and Caribbean Consortium of Engineering Institutions. All rights reserved.
PY - 2019
Y1 - 2019
N2 - Sequence mining consists of finding statistically relevant patterns in data collections represented sequentially. These, are an important type of data, where it matters the order that occupy the elements in the set and that finds a wide range of applications in Bioinformatics and Computational Biology. The prediction of protein structures is one of these applications. Where, a protein is no more than a sequence of amino acids forming patterns known as alpha helices, beta sheets and turns. For purposes of our investigation, these collections or secondary structures would be the itemsets, while the amino acids that make up the entire sequence, the items. Despite multiple attempts to predict protein folding, the algorithms developed to date only reach a 35% effectiveness. That is why we propose SPMCcm, an algorithm based on the prediction of frequent sequences and a scheme of classifiers. Which uses the information provided by the amino acid sequence, in two stages. Where, the first stage learns of the interactions between the secondary structures of the proteins, which it extracts as frequent sequences or itemsets. Meanwhile, the second stage learns of the interaction between the amino acids present in the interacting structures or items. The experimental evaluation showed that SPMCcm behaves in a similar way, independently of the base classifier used, reaching accuracies in the prediction of up to 48%, higher than the 35% reported by the literature, without using large computational resources and possessing explanatory capacity.
AB - Sequence mining consists of finding statistically relevant patterns in data collections represented sequentially. These, are an important type of data, where it matters the order that occupy the elements in the set and that finds a wide range of applications in Bioinformatics and Computational Biology. The prediction of protein structures is one of these applications. Where, a protein is no more than a sequence of amino acids forming patterns known as alpha helices, beta sheets and turns. For purposes of our investigation, these collections or secondary structures would be the itemsets, while the amino acids that make up the entire sequence, the items. Despite multiple attempts to predict protein folding, the algorithms developed to date only reach a 35% effectiveness. That is why we propose SPMCcm, an algorithm based on the prediction of frequent sequences and a scheme of classifiers. Which uses the information provided by the amino acid sequence, in two stages. Where, the first stage learns of the interactions between the secondary structures of the proteins, which it extracts as frequent sequences or itemsets. Meanwhile, the second stage learns of the interaction between the amino acids present in the interacting structures or items. The experimental evaluation showed that SPMCcm behaves in a similar way, independently of the base classifier used, reaching accuracies in the prediction of up to 48%, higher than the 35% reported by the literature, without using large computational resources and possessing explanatory capacity.
KW - Classification schemes
KW - Contact maps
KW - Mining sequential patterns
KW - Protein folding
UR - http://www.scopus.com/inward/record.url?scp=85073627233&partnerID=8YFLogxK
U2 - 10.18687/LACCEI2019.1.1.37
DO - 10.18687/LACCEI2019.1.1.37
M3 - Contribución a la conferencia
AN - SCOPUS:85073627233
T3 - Proceedings of the LACCEI international Multi-conference for Engineering, Education and Technology
BT - 17th LACCEI International Multi-Conference for Engineering, Education, and Technology
PB - Latin American and Caribbean Consortium of Engineering Institutions
T2 - 17th LACCEI International Multi-Conference for Engineering, Education, and Technology, LACCEI 2019
Y2 - 24 July 2019 through 26 July 2019
ER -