Home Technology A celebrated AI has learned a new trick: doing chemistry

A celebrated AI has learned a new trick: doing chemistry

Artificial intelligence has changed the way science is done by enabling researchers to analyze the vast amounts of data that modern scientific instruments generate. It can find a needle in a million haystacks of information and with the help of deep learning, it can learn from the data itself. AI is accelerating progress in gene hunting, medicine, drug design and the creation of organic compounds.

Deep learning uses algorithms, often neural networks trained on large amounts of data, to extract information from new data. It is very different from traditional computing with its step-by-step instructions. Instead, it learns from data. Deep learning is much less transparent than traditional computer programming and leaves open important questions: what has the system learned, what does it know?

As a chemistry professor, I like to design tests with at least one difficult question that broadens students’ knowledge to determine whether they can combine different ideas and synthesize new ideas and concepts. We came up with such a question for the AI ​​advocates’ banner, AlphaFold, who solved the problem of protein folding.

folding protein

Greetings, Humanoids

Sign up for our newsletter now for a weekly digest of our favorite AI stories delivered to your inbox.

Proteins are present in all living organisms. They give structure to cells, catalyze reactions, transport small molecules, digest food and do much more. They are made up of long chains of amino acids, like beads on a string. But for a protein to do its job in the cell, it must twist and bend in a complex three-dimensional structure, a process called protein folding. Misfolded proteins can lead to disease.

Christiaan Anfinsen posited in his 1972 Nobel Prize for Chemistry that it should be possible to calculate the three-dimensional structure of a protein from the sequence of its building blocks, the amino acids.

Just as the order and spacing of the letters in this article give it meaning and message, so the order of the amino acids determines the identity and shape of the protein, which results in its function.

an image with a thread-like line on the left and a coiled structure on the right
Within milliseconds of leaving an amino acid chain (left) from the ribosome, it is folded into the lowest-energy 3D shape (right) required for the protein’s function.
Marc roomCC BY-ND

Due to the inherent flexibility of the amino acid building blocks, a typical protein can take an estimated 10 to 300 different shapes. This is a huge number, more than the number of atoms in the universe. But within a millisecond, every protein in an organism will fold into its own specific shape — the lowest energy rank of all the chemical bonds that make up the protein. Change just one amino acid out of the hundreds of amino acids typically found in a protein and it can misfold and stop working.

AlphaFold

For 50 years, computer scientists have been trying to solve the problem of protein folding – with little success. In 2016, DeepMind, an AI subsidiary of Google’s parent company Alphabet, started its AlphaFold program. It used the protein database as a training set, which contains the experimentally determined structures of more than 150,000 proteins.

In less than five years, AlphaFold had overcome the protein folding problem – at least the most useful part of it, which is determining protein structure based on amino acid sequence. AlphaFold does not explain how the proteins fold so quickly and accurately. It was a big win for AI, because not only did it gain tremendous scientific prestige, it was also a major scientific advancement that could affect everyone’s lives.

Thanks to programs like AlphaFold2 and RoseTTAFold, researchers like me can now determine the three-dimensional structure of proteins from the sequence of amino acids that make up the protein – at no cost – in an hour or two. Before AlphaFold2, we had to crystallize the proteins and solve the structures using X-ray crystallography, a process that took months and cost tens of thousands of dollars per structure.

We also now have access to the AlphaFold Protein Structure Database, where Deepmind has deposited the 3D structures of nearly all proteins found in humans, mice and more than 20 other species. To date, they have solved over a million constructions and plan to add 100 million constructions this year alone. The knowledge of proteins has skyrocketed. The structure of half of all known proteins is likely to be documented by the end of 2022, including many new unique structures associated with new useful functions.

Think like a chemist

AlphaFold2 was not designed to predict how proteins would interact with each other, but it has been able to model how individual proteins combine to form large complex units composed of multiple proteins. We had a challenging question for AlphaFold: did the structural training set teach it some chemistry? Could it tell if amino acids would react with each other – a rare but important event?

I am a computer chemist interested in fluorescent proteins. These are proteins found in hundreds of marine organisms such as jellyfish and coral. Their glow can be used to illuminate and study diseases.

two multicolored blobs with clear lines in them against a black background
Neurons expressing fluorescent proteins reveal the brain structures of two fruit fly larvae.
Wen Lu and Vladimir I. Gelfand, Feinberg School of Medicine, Northwestern University

There are 578 fluorescent proteins in the protein database, 10 of which are “broken” and do not fluoresce. Proteins rarely attack themselves, a process called autocatalytic post-translational modification, and it is very difficult to predict which proteins will react with themselves and which will not.

Only a chemist with a significant amount of knowledge of fluorescent proteins would be able to use the amino acid sequence to find the fluorescent proteins that have the correct amino acid sequence to undergo the chemical transformations necessary to make them fluorescent. When we presented AlphaFold2 with the sequences of 44 fluorescent proteins that are not in the protein database, it folded the fixed fluorescent proteins differently than the broken ones.

a diagram with a light bulb on the left and the stem of a light bulb on the right
AlphaFold2 can take the amino acid sequence of fluorescent proteins (letters at the top) and predict their 3D vessel shapes (center). This is not surprising. What is totally unexpected is that it can also predict which fluorescent proteins are ‘broken’ and cannot fluoresce.
Marc roomCC BY-ND

The result surprised us: AlphaFold2 had learned some chemistry. It had discovered which amino acids in fluorescent proteins do the chemistry that makes them glow. We suspect that the protein database training set and the alignment of multiple sequences allow AlphaFold2 to “think” like chemists and search for the amino acids needed to react with each other to make the protein fluorescent.

A folding program that learns some chemistry from its training set also has broader implications. What else can be gained from other deep learning algorithms by asking the right questions? Can facial recognition algorithms find hidden markers for diseases? Could algorithms designed to predict consumer spending patterns also find a propensity for petty theft or cheating? Most importantly, is this capability—and comparable skill leaps in other AI systems—desirable?The conversation

This article by Marc Zimmer, Connecticut College chemistry professor, is republished from The Conversation under a Creative Commons license. Read the original article.

RELATED ARTICLES

Mexico’s President Says US Can’t Control Its Border

Latin American leader pointed to lawlessness at US border after dozens of migrants were found dead in Texas Mexican President Andres Manuel Lopez Obrador said...

NBC News – NBC Chicago

In a six-way race to take on Democratic administration JB Pritzker in November, Senator Darren Bailey will win the Republican nomination, NBC News Projects....

Boris Johnson Says NATO Members Need To ‘Dig Deep’ And Prepare For A More Dangerous Decade | world news

Boris Johnson is expected to tell NATO members to “dig deep” and prepare for a more dangerous decade of mounting threats. The first full...

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Mexico’s President Says US Can’t Control Its Border

Latin American leader pointed to lawlessness at US border after dozens of migrants were found dead in Texas Mexican President Andres Manuel Lopez Obrador said...

NBC News – NBC Chicago

In a six-way race to take on Democratic administration JB Pritzker in November, Senator Darren Bailey will win the Republican nomination, NBC News Projects....

Boris Johnson Says NATO Members Need To ‘Dig Deep’ And Prepare For A More Dangerous Decade | world news

Boris Johnson is expected to tell NATO members to “dig deep” and prepare for a more dangerous decade of mounting threats. The first full...

Monserrate loses last bid for Queens comeback

Hiram Monserrate, an ex-con and permanent New York politician who was expelled from the state Senate for assaulting his girlfriend, lost his latest comeback...