One of the great disappointments of biologists after the sequencing of the human genome is that access to this “great book of life” and its approximately 22,000 genes does not provide all the keys to understanding how our DNA, inherited from two parental cells, leads to the formation of an individual with all the tissue diversity that constitutes it. Our complex constitution, but also part of our diseases, depends on a language that regulates the expression of genes – activated or repressed – whose grammar still defies human understanding. Also in this area, artificial intelligence (AI), crowned at the beginning of October with two Nobel Prizes, in physics and chemistry, seems capable of making its contribution. This is demonstrated by a study published on October 24 in Nature.
“Gene expression is regulated in many different ways”recalls Sager Gosai (Broad Institute, MIT and Harvard), first author of the study. Together with his colleagues from two other US laboratories, he became interested in so-called “cis-regulatory elements” (CRE). These small DNA fragments, or promoters, generally located upstream of the genes they regulate, are intended to bind to proteins, called “transcription factors,” which trigger or do not trigger the translation of a certain gene into a protein. Researchers describe having designed, thanks to machine learning – whose goal is to give machines the ability to “learn” through mathematical models: CRE active in certain cells with greater specificity than those found in nature. Even when they were tested not only in vitro, but also in transgenic animals, such as zebrafish.
Randomly synthesizing CRE to find the most suitable ones is not an option: the number of possible combinations from 200 nucleotides – the length of the DNA sequences tested by the researchers – “would exceed that of atoms in the observable universe”remember in Nature. Therefore, they started with a powerful molecular biology tool, which allows testing the activity of hundreds of thousands of CREs in different cell types (in this case, nerve, blood and liver cells).
“Emerging field”
This large data set was used to train artificial neural networks to recognize those that are likely to be active in one cell type, but not the other two. The researchers then asked these models to create new sequences capable of regulating the expression of a gene in a specific cell. These artificial CRE have proven to be very efficient.
You have 50.91% of this article left to read. The rest is reserved for subscribers.