The Simplified Molecular Input Line Entry System (SMILES) is a chemical notation for detailing molecular structures, commonly used as input of the unstructured data for Large Language Models (LLMs) for molecular property prediction. To harness the unique strengths of different LLMs, we propose a novel statistical modeling method to fuse the outputs of multiple LLMs into a cohesive framework. We first train three distinct LLMs on a unified SMILES dataset and obtain preliminary predictions. These predictions are then used as inputs of second-level model, where we fuse the results from the first-level models and produce the final predictions. Alternatively, we assign subsets of one SMILES dataset to multiple LLMs and obtain the final prediction as a weighted combination of the results from different models. Our results show that the LLMs fusion architecture has better performance than standard deep learning baselines, showcasing its potential in advancing molecular property prediction.
Session
Date and Time
-
Language of Oral Presentation
English
Language of Visual Aids
English