Summary: | Optical properties are crucial for the design of molecules for numerous applications, including for display technologies and biological imaging. The accurate prediction of these properties has been the subject of decades of work in both physics-based approaches and statistical modeling. Recently, large datasets of both computed and experimental optical properties have become available, along with the advent of powerful deep learning approaches cable of learning meaningful representations from these large datasets. This thesis presents new approaches for predicting optical properties by fusing the experimental and computational data in multi-fidelity models that achieve greater accuracy and generalizability than previous methods. Additionally, it conducts a thorough benchmark of various strategies for handling multi-fidelity data to inform the modeling choices of future practitioners working with optical properties and beyond. Despite the greater availability of optical property data recently, the near-infrared (NIR) region of the spectrum remains more data-sparse despite its promise in many applications. This thesis demonstrates the shortcomings of existing methods for predicting optical properties in this region of chemical space and recommends best practices for future research in this area. Finally, this thesis highlights successful usage of data-driven optical property prediction for the discovery of novel molecules for specific applications.
|