de Novo Generated Combinatorial Library Design

S.V. Johansson, M. Chehreghani, O. Engkvist and A. Schliep

Digital Discovery 2023, issue 1, 2024, 122–135. First published 27 Nov 2023.

Artificial intelligence (AI) contributes new methods for designing compounds in drug discovery, ranging from de novo design models suggesting new molecular structures or optimizing existing leads to predictive models evaluating their toxicological properties. However, a limiting factor for the effectiveness of AI methods in drug discovery is the lack of access to high-quality data sets leading to a focus on approaches optimizing data generation. Combinatorial library design is a popular approach for bioactivity testing as a large number of molecules can be synthesized from a limited number of building blocks. We propose a framework for designing combinatorial libraries from de novo generated building blocks using k-Determinantal Point Processes and Gibbs sampling. We explore optimization of biological activity, Quantitative Estimate of Drug-likeness (QED) and diversity and the trade-offs between them, both in single-objective and in multi-objective library design settings. Using retrosynthesis models to estimate building block availability, the proposed framework is able to explore the prospective benefit from expanding a stock of available building blocks by synthesis or purchase the preferred building blocks before designing a library. In simulation experiments with building block collections from all available commercial vendors near-optimal libraries could be found without synthesis of additional building blocks; in other simulation experiments we showed that even one synthesis step to increase the number of available building blocks could improve library designs when starting with an in-house building block collection of reasonable size.

DOI: 10.1039/D3DD00095H.

The publication includes results from the following projects or software tools: IDADrugDesign.

Further publications by Alexander Schliep, Simon Johansson.