Big Data Analytics in Metagenomics
Saman K. Halgamuge, Melbourne School of Engineering, University of Melbourne, Australia
In collaboration with researchers in Academia Sinica and Metabolomics Australia/Department of Botany at Melbourne, we have been working in two areas of Bioinformatics: Metabolomics focusing on microbes and Metagenomics focusing on plants. Profiling large sets of data resulted from technological advances in whole genome sequencing and MALDI Imaging type technologies that can reveal vital information about the environment and plants, which is our major or primary source of food on Earth. Recently we have demonstrated considerable success in using unsupervised clustering techniques to analyse genetic and metabolomic data. This includes analysis of viral quasi species, metabolomics and microbial metagenomes.
Some microbes in the environment appear to look very similar and found “living together” in communities in non-separable ways, making them harder to culture in a lab. To make matters worst, considering our belief, if it is correct at all, that we know only about up to 2% of the microbes around us. When we know only so little about the data labels, in this case, about the identity of the species. It is even more challenging to recognise patterns associated with the genomes of the quasispecies (a set of genetically related but non-identical viral mutant types, which can also be referred to as strains,) that are able to co-exist within the host. Uncovering information about quasi-species populations of microbes significantly benefits the study of disease progression, antiviral drug design, vaccine design and viral pathogenesis. We present a new analysis pipeline called ViQuaS for viral quasispecies spectrum reconstruction using short next-generation sequencing reads. ViQuaS is based on a novel reference-assisted de novo assembly algorithm for constructing local haplotypes.