Novel QSPR Study on the Melting Points of a Broad Set of Drug-Like Compounds Using the Genetic Algorithm Feature Selection Approach Combined With Multiple Linear Regression and Support Vector Machine

Main Article Content

Alireza Jalali, Mehdi Nekoei, Majid Mohammadhosseini

Abstract

A robust and reliable quantitative structure-property relationship (QSPR) study was established to forecast the melting points (MPs)  of a diverse and long set including 250 drug-like compounds. Based on the calculated descriptors by Dragon software package, to detect homogeneities and to split the whole dataset into training and test sets, a principal component analysis (PCA) approach was used. Accordingly, there was no outlier in the constructed cluster. Afterwards, the genetic algorithm (GA) feature selection strategy was used to select the most impressive descriptors resulting in the best-fitted models. In addition, multiple linear regression (MLR) and support vector machine (SVM) were used to develop linear and non-linear models correlating the molecular descriptors and the melting points. The validation of the obtained models was confirmed applying cross validation, chance correlation along with statistical features associated with external test set. Our computational study exactly showed a determination coefficient and of 0.853 and a root mean square error (RMSE) of 11.082, which are better than those MLR model (R2=0.712, RMSE 15.042%) accounting for higher capability of SVM-based model in prediction of the theoretical values related to melting points. In fact, using the GA approach resulted in selection of powerful descriptors having useful information concerning effective variables on MPs, which can be utilized in further designing of drug-like compounds with desired melting points.

Article Details

Section
Articles