At the 麻豆原创 Institute of Artificial Intelligence (IAI), researchers have developed MolVision, a new artificial intelligence (AI) vision language (VLM) model capable of accurately viewing a molecule鈥檚 structure. The project was launched from a bold idea, to make AI models learn scientific principles the same way students do. Leading the study is Assistant Professor of Materials Science and Engineering Shruti Vyas. 聽The MolVision research team includes Associate Professor of Computer Science and IAI member Yogesh Singh Rawat and Deepan Adak, a researcher from the National Institute of Technology, Kurukshetra.
鈥淎I should learn chemistry the way humans do 鈥 by seeing molecular structures, not just reading linear strings,鈥 Vyas says. 鈥淲hile large language models have shown promise for molecular property prediction, their reliance on representations like SMILES or SELFIES [textual representations] limits their ability to capture the rich structural cues chemists rely on.鈥
According to Vyas, this work opens a new pathway for chemical predictions and molecular analysis, by creating an AI system that operates more intuitively.
A Challenging Vision
According to Vyas, one of the biggest challenges facing the field of artificial intelligence and computer vision is in shifting AI models from a textual to a visual understanding of chemical reactions.
鈥淢olecular images represent a very different data domain compared to the natural images or text that vision-language models are typically trained on.鈥 Vyas says, 鈥淢olecules contain highly specific structural relationships 鈥 bonding patterns, stereochemistry, and functional group arrangements 鈥 that are subtle yet crucial for property prediction.鈥
Many VLM models have limited exposure to visual representations of scientific data, which makes training and adapting them to understand the nuances of molecules and their atomic structure a primary challenge.
Transforming How Scientists and AI See Chemistry
To address these challenges, Vyas and her research team developed a multi-modal data set for MolVision to refer to during its training. The data set pairs 2D diagrams with text-based descriptions on a variety of molecules and different atomic structures. Using this data set was crucial for training the MolVision VLM to integrate textual and visual information effectively. Using a LoRA (low rank adaptation) algorithm, the MolVision VLM is able to engage in billions of parameters worth of data enabling it to complete complex tasks such as molecular property prediction or chemical description without the cost of full retraining.
鈥淩ecent advances in vision鈥搇anguage models have transformed how AI understands the world, but most of that progress has focused on natural images and everyday language,鈥 says Yogesh Singh Rawat. 鈥淲ith MolVision, we鈥檙e bringing those same AI capabilities into chemistry 鈥 allowing models to reason about molecules visually, in ways that are much closer to how scientists actually think.鈥
This work has the potential to transform drug discovery, the personalization of medicine, and even sustainable design and engineering. The research team also expects that 鈥渙ver the next few years we can expect this multimodal approach to reduce experimental screening burdens, support faster identification of promising drug candidates and materials, and offer more interpretable insights into structure-property relationships,鈥 Vyas says.
Vyas and her team here at 麻豆原创 plan to scale up the MolVision VLM project in terms of its data set and capabilities. The team plans to integrate the VLM model in chemistry with technologies using current AI neural networks and large molecular simulators to create hybrid systems that can combine symbolic, visual and physical reasoning.
Vyas will also participate in the upcoming where she will be presenting an exhibit on AI for chemistry and molecules. Those interested in viewing the exhibit can attend from 7:45 to 11:00 p.m. this Saturday on the 4th Floor.