Google unveiled an artificial intelligence tool on Wednesday that its scientists said would help unravel the mysteries of the human genome — and could one day lead to new treatments for disease.
The deep learning model AlphaGenome was hailed by outside researchers as a “breakthrough” that would let scientists study and even simulate the roots of hard-to-treat genetic diseases.
While the first complete map of the human genome in 2003 “gave us the book of life, reading it remained a challenge,” Pushmeet Kohli, vice president of research at Google DeepMind, told reporters.
“We have the text,” he said, which is a sequence of three billion nucleotide pairs represented by the letters A, T, C and G that make up DNA.
But “understanding the grammar of this genome — what’s encoded in our DNA and how it governs life — is the next critical frontier of research,” said Kohli, co-author of a new study in the journal Nature.
Only about 2% of our DNA contains instructions for making proteins, which are the molecules that build and run the body.
The other 98% was long dismissed as “junk DNA” as scientists struggled to understand what it was for.
But this “non-coding DNA” is now thought to act as a conductor that directs how genetic information works in each of our cells.
These sequences also contain many variants that have been associated with diseases. It is these sequences that AlphaGenome aims to understand.
A million letters
The project is just one part of Google’s AI-powered scientific work, which also includes AlphaFold, the winner of 2024’s Chemistry Nobel.
AlphaGenome’s model was trained on data from public projects that measured non-coding DNA across hundreds of different cell and tissue types in humans and mice.

The tool is able to analyze long DNA sequences and then predict how each nucleotide pair will affect various biological processes in the cell.
This includes whether genes start and stop and how much RNA – molecules that transmit genetic instructions inside cells – is produced.
There are already other models that have a similar purpose. But they have to make compromises, either by analyzing much shorter DNA sequences or reducing how detailed their predictions are, known as resolution.
DeepMind scientist and lead study author Ziga Avsec said long sequences – up to a million DNA letters long – were “required to understand the full regulatory environment of a single gene”.
And the model’s high resolution allows researchers to study the impact of genetic variants by comparing the differences between mutated and non-mutated sequences.
“AlphaGenome can accelerate our understanding of the genome by helping to map where the functional elements are and what their roles are at a molecular level,” said study co-author Natasha Latysheva.
The model has already been tested by 3,000 scientists in 160 countries and is open to anyone to use for non-commercial reasons, Google said.
“We hope researchers will expand it with more data,” Kohli added.
‘Breakthrough’
Ben Lehner, a researcher at Cambridge University who was not involved in the development of AlphaGenome but tested it, said the model “really works very well”.
“Identifying the precise differences in our genomes that make us more or less prone to developing thousands of diseases is an important step towards developing better therapies,” he explained.
But AlphaGenome “is far from perfect and there is still a lot of work to be done”, he added.
“AI models are only as good as the data used to train them,” and the existing data is not particularly suitable, he said.
Robert Goldstone, head of genomics at the UK’s Francis Crick Institute, warned that AlphaGenome was “not a magic bullet for all biological questions”.
This was partly because “gene expression is influenced by complex environmental factors that the model cannot see”, he said.
But the tool still represented a “breakthrough” that would allow scientists to “study and simulate the genetic roots of complex disease,” Goldstone added.



