Ph.d.-forsvar

PhD defence by Alfred Ferrer Florensa

Alfred Ferrer Florensa will defend his PhD thesis "Modeling of biological sequences for prediction and characterization of pathogenicity using deep learning"

Principal supervisor:

Professor Frank Aarestrup, DTU Food

Co-supervisor:

Associate Professor Henrik Nielsen, DTU Health Tech
Scientist Jose Juan Almagro Armenteros, Bristol Myers Squibb

Examiners:

Associate Professor Carolina Barra Quaglia, DTU Health Tech
Professor Simon Rasmussen, University of Copenhagen
Senior Researcher Vanessa Rossetto Marcelino, IATA, Spain

Chairperson at defence:

Associate Professor Christian Brinch, DTU Food

Resume
Every organism on this planet, from humans to viruses, carries a recipe (the genome, made of DNA) that defines its traits and capacities. Since we are able to collect and study these sequences of letters, they are a central part of our study of microorganisms, such as bacteria and viruses, as well as our efforts for identifying those which can harm us (pathogens).

Most of our ways to do so are based on similarity. If the genome of a virus A is very similar to a genome of a virus that causes an infection on the lungs of humans, probably virus A will also be able to cause that infection. If bacteria B has a part of a genome (gene) that we know produces a certain capacity, that bacteria B most probably will have that capacity. Similarity-based methods, often relying on sequence alignment, face two major challenges: they struggle with complex traits, and with low or none similarity between the new sequence and our previous knowledge. This last issue is what matters in this thesis, as when a new unseen pathogen appears, these methods will not be able to recognize them as a threat.

In this project, we propose new methods to avoid these limitations, by bringing deep learning algorithms used on text to the world of microorganisms genomes. We develop deep learning models that predict the threat of a virus or a bacteria directly from its genome, without relying on similarity. This allows us to be able to use them when a new microorganism or DNA sequence is found, and immediately assess its capacity to harm us. For bacteria, we predict its pathogenic capacity; for viruses, which hosts they are able to infect. Moreover, we create approaches to fairly evaluate these deep learning models in the complex world of biology and evolution, assuring they are accurate on completely new specimens. These approaches aim to improve early detection of emerging pathogens, supporting global health and biosecurity.

A copy of the PhD thesis is available for reading at the department.