Unmasking genomics patterns of variation with artificial intelligence
Despite major efforts in genetic studies involving thousands of individuals, the proportion of variance explained by the genetic associations discovered is not higher than about 30%. It is the long-debated problem of “missing heritability”, which seems to haunt geneticists since it was first named in 2008.
Increasing the sample size, the introduction of new methods, the use of NGS and the inclusion of de-novo and rare variants in the picture, have all improved the proportion of explained heritability: a proportion called “hidden heritability” as opposed to the heritability that is actually missing. However, these strategies overall, despite increasing the amount of phenotypic variance accounted for, do not really fill in the gap of missing/hidden heritability: still approximately 41% remains unaccounted for.
In this scenario, where the genome has accumulated a significant number of variants of recent origin, a limited number of methods include all type of variants (SNVs, INDELs, SVs) in the analysis and we believe that the distribution of these variants in the genome is not random. We suggest that specific genomic patterns can be recognised with the appropriate tools. The development of methods to unmask genomic patterns, while complementary to the more traditional efforts on hidden heritability, can help significantly the understanding of genomics determinant of phenotypes, because:
1) Can overcome the limits of currently available methods;
2) Will provide a new way to analyse the genome and understand the way it works;
3) Will offer a more holistic perspective on genome variation and its relationship with phenotype.
We intend to use machine/deep learning methods to tackle this challenge, with a key effort dedicated to the use of interpretable artificial intelligence solutions, and improve the usability of these approaches in biology.
Modelling inflammation as key predictor of phenotype trajectories
We would like to focus the above efforts in particular, to uncover genomics drivers of the inflammatory response, and in turn use these data to model the evolution of phenotypes and diseases.
Inflammation is one of the fundamental physiological processes and characterizes a life-long mechanism of defence, remodelling and repair. Chronic inflammation is a well-established factor in the pathogenesis but also in the evolution, modification and phenotypic variability of many disorders, including behaviour, and psychiatric disorders. The inflammatory response connects a number of fundamental drivers in physiology and pathology: it is a key response in the interaction with the non-self, and as such describes the history of an individual’s response to infection, but also the interaction with one’s microbiota throughout the lifetime, thus establishing an immediate connection with nutrition and therefore life-style. Cytokines produced by the immune system are well known to be involved in neural development and inflammation is key element in early-life events, capable to exert long-lasting effects during an individual’s life.
From this short overview, it is clear how inflammation represents a central element in a wide variety of phenotypes but also plays an important role in influencing an individual well-being in normal health status, clearly in connection with nutrition and life-style. It is connected to numerous measurable markers and quantitative indicators, and therefore represents an ideal phenomenon for modelling, as universal determinant of the health status all life-long. We aim at combining our work in computational genomics with artificial intelligence methods trained on inflammation indicators, in order to identify predictors to help anticipate the evolution of different phenotypes.
Open Science and Responsible Research
We firmly believe the advancement of knowledge to be based on a few core values: transparency and reproducibility of methods, accessibility of code (open source) and results (open data), responsible research (RRI).
For this reason, all our code will always be available on Github; our methods are executed through Nextflow and our reports prepared in Markdown/RMarkdown in order to ensure reproducibility and portability of our pipelines.
We participate in the nf-core community, because we believe in the value of collaboration and community standards.
We challenge ourselves by contributing to responsible research and innovation initiatives, because our work should ultimately benefit and therefore respond to society needs.