Introduction to Machine Learning in Bioinformatics
The 22nd International Conference on Bioinformatics (InCoB 2023) invited workshop
Description:
This workshop is jointly hosted by Sonika Tyagi Lab, COMBINE and Australian BioCommons.
Abstract: DNA is the blueprint defining all living organisms. Therefore, understanding the nature and function of DNA is at the core of all biological studies. Rapid advances in DNA sequencing and computing technologies over the past few decades resulted in large quantities of DNA generated for diverse experiments, exceeding the growth of all major social media platforms and astronomy data combined [1]. However, biological data is both complex and high-dimensional, and is difficult to analyse with conventional methods.
Machine learning is naturally well suited to problems with a large volume of data and complexity [2]. In particular, applying Natural Language Processing to the genome is intuitive, since DNA is a natural language. Unique challenges exist in Genome-NLP over natural languages, including the difficulty of word segmentation or corpus comparison.
To tackle these challenges, we developed the first automated and open-source genomeNLP workflow that enables efficient and accurate knowledge extraction on biological data [1], automating and abstracting preprocessing steps unique to biology. This lowers the barrier to perform knowledge extraction by both machine learning practitioners and computational biologists. In this tutorial, we will demonstrate how our workflow can be used to address the above challenges, with implications in fields such as personalised medicine [3-4].
[1] [preprint] Chen, T., Tyagi, N., Chauhan, S., Peleg, A.Y. and Tyagi, S., 2023. genomicBERT and data-free deep-learning model evaluation. bioRxiv, pp.2023-05. https://doi.org/10.1101/2023.05.31.542682 (This article is a preprint and has not been certified by peer review)
[2] Chen, T., Tyagi, S. Integrative computational epigenomics to build data-driven gene regulation hypotheses, GigaScience, Volume 9, Issue 6, June 2020, giaa064, https://doi.org/10.1093/gigascience/giaa064
[3] Chen, T., Philip, M., Lê Cao, K-A., Tyagi, S. A multi-modal data harmonisation approach for discovery of COVID-19 drug targets, Briefings in Bioinformatics, Volume 22, Issue 6, November 2021, bbab185, https://doi.org/10.1093/bib/bbab185
[4] Mu, A., Klare, W.P., Baines, S.L. et al. Integrative omics identifies conserved and pathogen-specific responses of sepsis-causing bacteria. Nat Commun 14, 1530 (2023). https://doi.org/10.1038/s41467-023-37200-w
Start: Sunday, 12 November 2023 @ 09:00
End: Wednesday, 15 November 2023 @ 17:00
Duration: 4 hours
Timezone: Brisbane
Venue: Translational Research Institute, 37 Kent Street
City: Woolloongabba Country: Australia Postcode: 4102
Prerequisites:- [required] CLI (e.g. bash shell) usage
- [optional] Connecting and working on a remote server (e.g. ssh)
- [optional] Basic knowledge of machine learning
- [optional] Machine learning dashboards (e.g. tensorboard, wandb)
- [optional] Package/environment managers (e.g. conda, mamba)
- Understand the basics of Machine Learning
- Describe the taxonomy of various learning algorithms and how to evaluate them
- Understand common representations of biological data
- Understand common biological data preprocessing steps
- Investigate biological sequence data for use in machine learning
- Perform a Machine Learning classification for genomic data
- By invitation
Organiser: 22nd International Conference on Bioinformatics (InCoB 2023)
Contact: tyrone.chen@monash.edu, sonika.tyagi@rmit.edu.au, christina.hall@biocommons.org.au, melissa.burke@biocommons.org.au, combine@combine.org.au
Host institution: Translational Research Institute, Queensland University of Technology, Asia Pacific Bioinformatics Network
Keywords: Deep learning, Machine learning, Natural language processing, Bioinformatics, Computational Biology
Fields: Bioinformatics, Natural Language Processing, Bioinformatics Software
Target audience:- bioinformaticians
- Life scientists
- Data scientists
Capacity: 40
Event type:- Workshop
- Conference
Laptop with internet connection
Cost Basis: Cost incurred by all
