Brief Summary
This video discusses DreaMS, an AI model that uses self-supervised learning to interpret tandem mass spectra and map natural molecules. It addresses the challenge of identifying the vast majority of unknown natural molecules and explores potential applications in drug discovery, disease diagnosis, and material science. Key points include:
- DreaMS AI interprets molecular spectra to uncover hidden chemical information.
- The DreaMS Atlas maps 201 million natural molecules, revealing relationships and potential new discoveries.
- The AI can be fine-tuned to predict specific molecular properties, such as drug suitability and fluorine presence.
DreaMS Intro
Life is composed of molecules, the fundamental building blocks of everything. Currently, less than 10% of all natural molecules have been identified. An AI that could help identify the remaining 90% could lead to significant advancements across various fields, including disease diagnosis, drug discovery, and new materials development. The paper introduces DreaMS, an AI model designed to interpret molecular representations from tandem mass spectra.
Background on natural molecules
The human body and all living organisms are complex chemical landscapes, composed of molecules with different functions crucial for life, such as metabolism, healing, and fighting diseases. However, over 90% of natural small molecules remain unidentified. Discovering these molecules could unlock new drugs, materials, and chemicals. Tandem mass spectrometry coupled with liquid chromatography (LCMS) is used to identify these molecules by separating a sample's molecules and fragmenting them to create a spectrum, or molecular fingerprint. Interpreting these spectra is challenging, with less than 10% being matched to known molecular structures, leaving over 90% as uninterpretable data.
DreaMS AI for molecular spectra
Researchers developed the DreaMS neural network, an AI approach to interpret previously unreadable spectra. The AI is trained using self-supervised learning to map the hidden chemical universe. DreaMS uses millions of molecular fingerprints from natural materials to learn the properties of molecules behind each spectrum. The AI model is fed a large dataset from GNPS, extracting around 201 million unlabeled spectra. Through millions of rounds of self-supervised learning, the AI learns the "grammar" or "language" behind the spectra, decoding the chemical and structural properties of the molecules. While DreaMS cannot precisely guess the molecular structure from a spectrum, it can estimate the chemical, molecular, and structural properties of the material.
DreaMS Atlas
After learning to decode spectra, DreaMS decodes 201 million spectra of unknown natural molecules and maps each one onto a multi-dimensional space called the DreaMS Atlas. This atlas contains both known and unknown natural molecules, with the position of each molecule determined by its similarity to others. Molecules with similar properties are located closer together on the map, while those with vastly different structures or chemical properties are farther apart. The DreaMS Atlas is similar to how words are mapped for large language models, where words with related meanings are grouped in the same cluster.
Key findings
The DreaMS Atlas demonstrates a high degree of interconnectedness, indicating that DreaMS can identify meaningful similarities linking molecules, even those previously unidentified. This atlas serves as a framework or Wikipedia for molecules, allowing researchers to input a molecule's spectrum and find its position within the graph to identify neighboring molecules and their properties. The atlas can also determine the novelty of a molecule; molecules far from known substances may possess unique properties. The atlas reveals that many natural molecules are distinct from those already identified, suggesting a vast potential for discovering new drugs, chemicals, and materials.
Mapping different foods
Researchers tested DreaMS by breaking down various food items into molecules, fragmenting them into spectra, and plotting them on the network graph. The AI successfully clustered food items according to basic food taxonomy, grouping plant-based foods, animal-based foods, and beverages into distinct clusters. This demonstrates the AI's ability to classify molecules based on their properties, even without prior knowledge of the food sources.
Skin disorders and chemicals
The DreaMS Atlas identified a close relationship between psoriasis, a skin disorder, and the fungicide esoxystrobin. The proximity of these two in the atlas suggests a potential link between exposure to the chemical and the skin condition, warranting further investigation. The atlas functions as a discovery engine, identifying surprising molecular similarities between seemingly unrelated entities, prompting further research into their potential relationships.
Chemicals, diabetes, cancers
The atlas identified a plant metabolite present in seemingly unrelated plant species, suggesting a potential shared characteristic. Additionally, it linked a family of lipids to type 2 diabetes, brain cancer, lung cancer, and renal cancer. While not conclusive, these findings suggest potential avenues for further research into the relationships between these lipids and various cancers, potentially leading to new treatments.
Finding drug suitability
DreaMS can be fine-tuned to predict specific properties based on a molecule's spectrum. Researchers fine-tuned DreaMS to predict a molecule's relevance to the Lipinski's rule of five, which determines a chemical's suitability as a drug. This allows for the screening of numerous natural molecules to identify potential drug candidates.
Monica
Monica is an AI assistant that provides access to various AI tools, including top AI models like GPT, Deepseek, and Gemini, as well as image and video generators. It can be used as a browser extension on desktop or mobile devices, offering context-aware interactions with web pages. Monica can summarize articles, generate mind maps, and summarize YouTube videos with highlights and timestamps.
Finding fluorine in molecules
DreaMS was fine-tuned to predict the presence of fluorine in molecules. Fluorine is used in many industries because it often indicates that a compound is very stable and resistant to heat and chemicals. DreaMS can predict fluorine presence with 91% accuracy, a significant improvement over older methods. This allows for the identification of stable molecules for use in pharmaceuticals, non-stick coatings, refrigerants, high-performance plastics, and electronics manufacturing.
Limitations and next steps
DreaMS can currently estimate the chemical properties and composition of a molecule from its spectrum. The ultimate goal is to train an AI that can predict the full structure of the molecule. The code for DreaMS has been released on GitHub and Hugging Face under the MIT license, providing instructions for mapping molecular spectra and fine-tuning the model for specific properties. Scientists can use the DreaMS Atlas to find new drug candidates by exploring regions near known drug candidates or tapping into unexplored areas to uncover molecules with novel chemical properties.