Prediction of Biopharmaceutical Drug Disposition Classification System (BDDCS) by Structural Parameters.

Modeling of physicochemical and pharmacokinetic properties is important for the prediction and mechanism characterization in drug discovery and development. Biopharmaceutics Drug Disposition Classification System (BDDCS) is a four-class system based on solubility and metabolism. This system is employed to delineate the role of transporters in pharmacokinetics and their interaction with metabolizing enzymes. It further anticipates drug disposition and potential drug-drug interactions in the liver and intestine. According to BDDCS, drugs are classified into four groups in terms of the extent of metabolism and solubility (high and low). In this study, structural parameters of drugs were used to develop classification-based models for the prediction of BDDCS class. Reported BDDCS data of drugs were collected from the literature, and structural descriptors (Abraham solvation parameters and octanol-water partition coefficient (log P)) were calculated by ACD/Labs software. Data were divided into training and test sets. Classification-based models were then used to predict the class of each drug in BDDCS system using structural parameters and the validity of the established models was evaluated by an external test set. The results of this study showed that log P and Abraham solvation parameters are able to predict the class of solubility and metabolism in BDDCS system with good accuracy. Based on the developed methods for prediction solubility and metabolism class, BDDCS could be predicted in the correct with an acceptable accuracy. Structural properties of drugs, i.e. logP and Abraham solvation parameters (polarizability, hydrogen bonding acidity and basicity), are capable of estimating the class of solubility and metabolism with an acceptable accuracy.


INTRODUCTION
A major stage in drug discovery and development is evaluating the pharmacokinetic (PK) and physicochemical (PC) properties of the candidate drugs. Computational methods have become an increasingly important part of drug design and discovery over the recent decades, used for predicting the PK and PC of a candidate drug through the use of structural parameters and their correlation which is required for an efficient use of existing drugs and effective development of new drugs [1,2].
The important parameters which control the rate and extent of oral drug absorption are the drug solubility and gastrointestinal permeability. The importance of these two properties has been emphasized in the biopharmaceutics classification system (BCS) that categorizes drugs into four groups based on their solubility and permeability. In this system, a drug substance is "highly soluble" when its highest dose strength is soluble in 250 mL or less of aqueous media over a pH range of 1-7.5 at 37°C. A drug is "highly permeable" when the extent of absorption in humans is equal or greater than 90% of an administered dose [3,4].
In 2005, Wu and Benet introduced a new system according to solubility and metabolism, to predict potential drug-drug interactions in the intestine and/or liver and drug disposition. They named this system the biopharmaceutics drug disposition classification system (BDDCS) [5]. Table 1 illustrated the BCS and BDDCS classification of drugs based on solubility and permeability/metabolism.
For class 1 drugs, only metabolic interactions need to be considered in the intestine and the liver. The efflux transporter, metabolic and the efflux transporter-enzyme interaction in the intestine must be taken into consideration for class 2 drugs. ________________________________________ Table 1. BCS and BDDCS classification of drugs based on solubility and permeability/metabolism.
Class I High solubility administered drugs from which their highest dose strength expressed as a mass quantity, not as a concentration (e.g. solutions), were included for metabolism and solubility modeling following the removal of all salt forms. For all 595 compounds, SMILES (simplified molecular-input line-entry system) code was employed via www.pubchem.ncbi.nlm.nih.gov, and the numerical values for clogP (calculated octanolwater partition coefficient) and Abraham solvation parameters were computed by ACD/Labs software (https://ilab.acdlabs.com). These parameters are independent variables, or descriptors, with the following solute properties: E is the excess molar refraction, S indicates dipolarity/polarizability descriptors of the solute, A and B are the solute hydrogen-bond acidity and basicity, and V is the McGowan volume of the solute [14].
Classification of data to training and test sets Dataset was sorted based on ascending clogP, where from six consecutive compounds, one was allocated to the test set. The training sets were used to build separate models and predict metabolism and solubility classes. Therefore, 595 compounds classified into 496 compounds for the training set and the remaining data (99 data points) as test set to evaluate the prediction capability of the developed models.
Binning of solubility compounds 496 compounds were divided into the following groups: 1) Compounds with high solubility (class 1 and 3) according to BDDCS as class H (high solubility). 2) Compounds with low solubility (class 2 and 4) according BDDCS as class L (low solubility). Binning of metabolism compounds 496 compounds were divided into the following groups: 1) Compounds with high metabolism (class 1 and 2) according to BDDCS as class H (high metabolism).
2) Compounds with low metabolism (class 3 and 4) according to reference BDDCS as class L (low metabolism).

Modeling
Various thresholds and models developed by the training set (based on the structural parameters, clogP and Abraham solvation parameters) were analyzed, and the optimal parameters and their values were obtained by calculating the prediction accuracy of solubility and metabolism class (number of correct prediction/total data) for the test set. The threshold to define the boundary between high and low metabolism was set at clogP=2. Therefore, a compound with clogP>2 would be defined as high metabolism and a compound with clogP<2 would follow a binary regression model. The definition of the borderline between high and low solubility was set at maximum dose=10 mg, hence compounds with a maximum dose under 10 mg would be defined as highly soluble and those with a maximum dose higher than 10 mg followed a binary logistic regression by SPSS version 21 software (www.spss.com.hk) for classification of drugs into the two mentioned sub-groups. For the binary regression method, each class (metabolism and solubility) was set as a dependent variable and binary classification was carried out using selected molecular descriptors as independent variables to develop the models for metabolism and solubility. Features were selected for the logistic regression model based on probability values (p-value) associated with each descriptor whenever they were statistically significant at the 99% level (p<0.01). It shows the probability that the descriptor is there by chance is less than 1% [15]. P-values and coefficients in regression analysis work together to show relationships in the model that are statistically significant. The software compares the t-statistic with values in the Student's t distribution to determine the p-value. The models were developed by binary logistic regression and structural parameters, clogP and Abraham solvation parameters, and maximum dose to predict BDDCS class. Separate models for metabolism and solubility were built using training sets of 496 compounds. The prediction capability of the developed models was checked by a test set composed of 99 compounds.

Solubility prediction
In the training set, 20% of the drugs had a maximum dose lower than 10 (N=100). Most of these drugs (85%) belonged to class 1 and 3 (highly soluble), a criterion applied to classify them in the correct group with a good accuracy. Based on the definition, when the highest dose strength of a drug substance is soluble in 250 mL or less of aqueous media over a pH range of 1-7.5 at 37°C, it is considered as "highly soluble" [16]. Therefore, dose is a critical parameter based on the obtained results in this work and most drugs with a maximum dose of 10 mg or less are highly soluble. The remaining data points (N=396) were applied to develop a model by Abraham solvation parameters and clogP based on binary logistic regression, where the obtained model is: Eq. 1 where P is the probability of binary responses (class 0 or 1) based on the solubility. In addition, probability values (p-value) associated with each descriptor were less than 0.01. The model was able to predict (Eq. 1) 79% and 68% of high and low soluble drugs of the training set in the correct group, respectively. Overall, using maximum dose, B (hydrogen bond basicity), S (polarizability) and clogP, the solubility class of 74% of compounds was classified in the correct group.
To evaluate the prediction capability of the model, 99 compounds in test set were used to predict the correct class of drug based on solubility. 82% of the compounds with maximum doses lower than 10 (N=17) were highly soluble, and Eq. 1 could accurately predict 73% of drugs with maximum doses higher than 10 (N=82) in correct group. Therefore, total prediction accuracy for the test set was 74%.

Metabolism prediction
In training set, 54% drugs had a clogP higher than 2 (N =270). Most of these drugs (91%) belonged to class 1 and 2 (high metabolism), a criterion applied to classify them in the correct group with a good accuracy. The remaining data points (N=226) were applied to develop a model by Abraham solvation parameters and clogP based on binary logistic regression, where the obtained model is: Eq. 2 Eq. 2 was able to correctly predict 80% and 66% of high and low metabolize drugs in the correct class, respectively. Generally, using A (hydrogen bond acidity) and clogP, metabolism class of 83% of the studied drugs could be classified in the correct group.
To evaluate the prediction capability of the model, 99 compounds in the test set were used to predict the correct class of drug based on metabolism. 89% of compounds with clogP higher than 2 (N=54) were high metabolism and Eq. 2 could accurately predict 84% of drugs with clogP lower than 2 (N=45) in the correct group. Overall, total prediction accuracy for the test set was 86%.

BDDCS prediction
The results associated with the prediction of the BDDCS class of the studied drugs are shown in Table 2. Based on the developed methods for prediction solubility and metabolism class, 64% and 63% of training and test set could be predicted in the correct BDDCS class.

DISCUSSION
In this study, we described a computational method to predict the BDDCS class of compounds based on their molecular descriptors. The dataset was obtained from the published data set of Benet et al. [8]. Due to the removing salts and the solution forms of drugs, the dataset was reduced to 595 oral drugs, divided into training (496 drugs) and test (99 drugs) sets. As outlined earlier, BDDCS is a modification of the BCS [17] that utilizes drug metabolism rather than intestinal permeability [18]. In this work, we attempted to build models to predict the class of metabolism and solubility class.
Both metabolism and solubility are important properties in drug discovery. However, these properties are complex and can be difficult to model. BDDCS class prediction can overcome variable metabolism and solubility data by predicting the compound classes rather than specific values as a primary initial screening of compounds. However, suitable thresholds for discriminating between high and low metabolism/solubility should be carefully considered.
There are many factors influencing solubility and metabolism, where a threshold for discriminating between high and low metabolism/solubility can be used to improve the prediction of BDDCS class. The definition of the borderline between high and low solubility is set at maximum dose 10 mg. Compounds with maximum doses lower than 10 mg were defined as highly soluble (85% and 82% for training and test set, respectively) while only 55% of drugs with maximum dose>10 are high soluble. Therefore, Eq. 1 is necessary for classification of data with a high dose in correct group. Using B, S and clogP (Eq. 1), solubility class could be accurately predicted (74% and 73% for training and test set, respectively) for those with maximum doses of higher than 10. The results for the prediction solubility class of drugs are shown in Figure 1. These data show that maximum dose is necessary for prediction class of solubility.
The optimal threshold to define the boundary between high and low metabolism based on the training set is set at clogP=2. In compounds with clogP higher than 2, metabolism classes were predicted with good accuracy (91% and 89% for training and test set, respectively). In other words, the compounds with clogP>2 are lipophilic compounds with high metabolism. Moreover, the model developed based on logistic regression by clog P and A (Eq. 2) for low metabolism drugs (clogP<2) was able to predict the metabolism class of studied compounds with an acceptable accuracy (73% and 84% for training and test set, respectively). These findings indicate that clogP which used to predict various physicochemical, pharmacokinetic and biological properties of compounds [19-21] and could be calculated based on the structure of compounds with good accuracy [22], is a crucial parameter to predicting the metabolism class of drugs in BDDCS. The intrinsic lipophilicity (log P) is a common parameter for predicting solubility and metabolism. It is a physical feature introduced to describe a compound's affinity towards lipid-like environments, affecting drug absorption, bioavailability, hydrophobic drug-receptor interactions, and metabolism of molecules. It describes the equilibrium distribution of molecular drug candidates (unionized form of the molecule) between water and octanol and is independent of pH. Several researchers have reported an inverse relationship between clogP and aqueous solubility [23-25]. ClogP has been utilized instead of experimental logP in modeling studies, where there is a high correlation between them [26, 27]. A (hydrogen bond acidity) is yet another significant parameter for predicting the metabolism of low lipophilic drugs clogP<2. The results related to the prediction metabolism class of the studied drugs were shown in Figure 2.
Collinear descriptors (R 2 >0.8) should be avoided in developing models [28], as they may entail the over fitting of the data. The intercorrelation between the selected parameters in Eq. 1 and 2 was less than 0.5, a value corroborating the validity of the developed model from this viewpoint.  Similar results associated with the prediction of the solubility and metabolism class of external test set confirm the prediction capability of the developed models.
Both models applied the Abraham solvation parameters for the prediction of solubility and metabolism, confirming the previous studies which applied these parameters for the analysis and prediction of physicochemical properties and pharmacokinetic parameters such as adsorption, distribution and toxicological features of drugs [14,29,30].
A computational procedure for predicting the BDDCS class was attempted by Broccatelli et al [31] by molecular structures calculated from the VolSurf+ software. Similarly, the proposed method predicted the BDDCS class with relatively good accuracy with a general lack of predictability for class 4 drugs. However, relatively simple statistical method (logistic regression) and descriptors i.e. clog P and Abraham solvation parameters of solute, are more acceptable in modeling studies [15], and could be useful in predicting solubility and metabolism class and estimating drug-drug interactions and transporter effects in drug disposition.
Solubility and metabolism are complex parameters whose values are affected by various factors. It is possible that the applied parameters were not sufficient to estimate the correct class. However, the variations and inaccuracy of data are among the possible reasons for the unsuccessful attempts of medicinal chemists in developing models with high capability of predicting the physicochemical, pharmacokinetic and activity of drugs. For instance, the best model for aqueous solubility prediction has a mean percentage deviation (MPD) of more than 100% [32], and the MPD value for solubility in the solvent mixture is higher than 25% for pharmaceuticals [33].
In this study, we demonstrated that the developed models for prediction solubility and metabolism could estimate BDDCS with 64% accuracy, a value which does not seem to be very satisfactory. However, following the publication of BDDCS for over 900 data points in 2011 [8], Benet and coworkers [34] in 2016 amend the classification of 13 drugs. In this data set, five compounds in the training set (colchicine, diclofenac, flecainide, pindolol and saxagliptin) and four compounds in the test set (aliskiren, clonidine, metoclopramide and pitavastatin) were corrected in terms of BDDCS. These data (old and new class, and prediction class in this study) are listed in Table 3. According to the old data set, the proposed method can predict only one compound in the correct group, while based on the updated data, the BDDCS class of five data points (diclofenac, flecainide, metoclopramide. pindolol and clonidine) was classified in the correct group. Given the possible errors in experimental data and maximum 25% prediction based on probability rules, the obtained results confirm the good accuracy of the developed models. Given the possible incorrect errors in experimental data and maximum 25% prediction based on probability rules, the obtained results confirm the good accuracy of the developed models.

CONCLUSION
To predict the BDDCS of compounds, we proposed the use of threshold values by two parameters, clogP for metabolism and maximum dose for solubility, and logistic regression based models using clogP and Abraham solvation parameters.
The descriptors utilized in this work (the three Abraham solvation parameters, namely A, B and S) showing hydrogen bond acidity and basicity, and polarizability, respectively, clogP and maximum dose of compounds are adequate for the prediction of solubility and metabolism, and can be used in the prediction of BDDCS of drugs with an acceptable accuracy.

ACKNOWLEDGMENT
This article is a part of the results of Y.G's Pharm.D thesis No. 3986 registered at Faculty of Pharmacy, Tabriz University of Medical Sciences, Tabriz, Iran.