New Generation of Machine Learning: Analysis of Single Nucleotide Polymorphism Data with Deep Learning

18th National Biostatistics Conference |

Publication

Objective

In recent years, both bioinformatics experts and medical doctors have been working on programming languages such as R, Python, C #. Most of these studies are focused on machine learning, biostatistics and optimization. The biggest problem with these analyzes was that the “computing capacity” of the PCs or on-prem server was low and unsustainable. However, thanks to cloud systems, this problem has also been eliminated. On the other hand, “Machine Learning”, which brought a new breath to genome data analysis, has been included in almost every genomic data analysis in the last 10 years. This study aimed to compare Deep Learning, which emerged as “Next Generation Machine Learning”, for the analysis of Single Nucleotide Polymorphism (SNP) data with the classical Neural Network and Random Decision Forest methods. In this way, the performance and effect of a new generation method group will be revealed.

Method

The data used in this study has been simulated with PLINK software. Data prepared for different SNP numbers (one, two and three million SNPs) were screened as population-based (equally distributed patient-control). (100,250,500 patient-control) For analysis, “Microsoft Azure Machine Learning with Microsoft R Server” and “h2o” package has been used used. Here, all parameters for both groups of methods have been optimized by the “Hyper Search” technique. The results have been compared with accuracy, precision and recall measurements.