To Be Drug or Prodrug: Structure-Property Exploratory Approach Regarding Oral Bioavailability

- Purpose. Prodrug design is a strategy that can be used to adjust physicochemical properties of drugs in order to overcome pharmacokinetic problems, such as poor oral bioavailability. However, Lipinski´s and Veber´s rules predict whether compounds will have absorption problems even before the design of prodrugs. In this context, our goal was to evaluate the molecular properties which most influenced the absorption process of prodrugs compared to its precursor through exploratory data analysis approach. Methods : A variety of prodrugs and respective precursors were randomly selected and classified by its’ percentage of human intestinal absorption. Subsequently, different molecular properties were calculated and hierarchical cluster analysis (HCA) and principal components analysis (PCA) were carried out. Results : According to the findings, antiviral, anti-hypertensive, and antibiotic prodrugs exhibited higher absorption levels than their respective precursors. Also, some relevant descriptors (molecular weight, MW, routable bonds, rot_bonds, hydrogen bond acceptors, HBA_count and polar surface area, PSA), which are included in Lipinski´s and Veber´s rules, influenced the separation process between prodrugs and drugs. Furthermore, other molecular properties, such as polarizability (α) and molar refractivity (MR), were pointed out. Conclusion : Lipinski´s and Veber´s rules proved to be important to design an orally administered drug but other descriptors should be considered by medicinal


INTRODUCTION
The three phases of drug action, described by Ariëns (1), relate dose to pharmacological effect and are very well-known. In sequence, the pharmaceutical phase corresponds to the release of drug (active substance) from the dosage form, and comprises all physical processes involved in the disintegration of the form in which a compound was administered as well as the dissolution of the active substance in organic fluids (pharmaceutical availability). The pharmacokinetic phase covers processes involved in absorption, distribution, protein binding, metabolism, intracellular penetration, transport through membranes and through the blood-brain barrier, and renal excretion. The pharmacodynamics phase comprises the drugtarget molecular interaction at the tissue site, initiating and triggering a sequence of intrinsic events finally resulting in the biological response.
Physicochemical properties are directly dependent upon the chemical structure of compounds and play important role in each phase. In this regard, molecules which do not have suitable physicochemical properties would not overcome any of those three phases, limiting their use in clinical practice (1)(2)(3)(4). The main goal of pharmaceutical industries, nowadays, is to discover new, safe, effective, and orally administered chemical entities. Despite some limitations attributed to oral route, it has still remained as preferential for drug administration (5) due to high patient compliance, simple production conditions, and low cost (6,7). However, the major problem related to the development of new orally administered drugs is low oral bioavailability.
Hit to lead optimization processes, for instance, are among molecular modification strategies which structural changes are planned in order to generate either more active analogues, or temporarily inactive derivatives (prodrugs), which could improve pharmacokinetic and pharmacodynamics profiles (8)(9)(10). Regarding prodrug design strategy, or latentiation method, the main purpose is to modify physicochemical properties of drugs to reduce undesirable pharmacokinetic features, but maintaining drug's intrinsic activity. Thus, the drugs' physicochemical properties can be adjusted through a proper choice of carrier groups in order to increase oral absorption, for example (11,12).
Prior to the designing of a prodrug, virtual and rational approaches have been applied in order to predict molecular behavior and, then, reduce the risk of failure in developing a new molecule. One of these approaches is the Rule of Five (Ro5) developed by Lipinski et al. (13), which generally allows the prediction of drugs' oral bioavailability. This rule identifies descriptors based on intrinsic and physicochemical properties, such as molecular weight (MW), lipophilicity (ClogP), hydrogen bond donors (HBD), and hydrogen bond acceptors (HBA) (13,14). All together, these descriptors help to infer whether the studied compounds might have, or not, problems regarding absorption. Thus, Ro5 can be used as a rapid screening method to identify poorly absorbed chemical entities. Drugs showing physicochemical descriptors values within an established range would have sufficient absorption after oral administration, and consequently suitable bioavailability. In general, high absorption rates for a chemical entity can be achieved when the number of HBD is less than 5, the number of HBA is less than 10, MW is less than 500 Da, and ClogP is less than 5 (15).
Subsequently, Ro5 was expanded by Veber et al. (16), who have identified two other important descriptors for an ideal oral bioavailability, such as the number of rotatable bonds (rot_bonds) and polar surface area (PSA). The number of rotatable bonds must be less than or equal to 10. This descriptor is related to molecular flexibility (degrees of freedom), which reflects on the conformational arrangement, considered as an important factor regarding the passive transport throughout membranes. Furthermore, the number of degrees of freedom is quite related to the change of entropy upon binding, which is an important factor determining the binding affinity of compounds toward their targets. In addition, Veber and coworkers (16) have also indicated compounds with polar surface area (PSA) values less than or equal to 140 Å 2 would have better oral bioavailability. In this regard, chemical entities that violate Lipinki's rules as well as Veber's expanded version could be classified as excellent candidates for prodrugs designing. Accordingly, prodrugs designed to increase oral absorption should have the calculated value for descriptors, such as ClogP, MW, HBD, HBA, PSA and rot_bonds, within the ranges previously established by Lipinski and Veber (13,14,16).
Of note, other types of procedures, considering molecular mechanics and quantum mechanics approaches, have been applied in order to investigate and predict drug/prodrug absorption profiles (17). However, herein, the aim was to evaluate the behavior of classical prodrugs regarding the expanded Ro5 ranges, and identify the differences between latent chemical entities and respective precursors. In addition, based on an exploratory analysis, the molecular properties which had more influence on the prodrugs and drugs differentiation process were detected and correlated to pharmacokinetics profile.

Drugs and Prodrugs Selection
The set of prodrugs (PRO) and parent drugs were selected from reference Parise-Filho et al. (12). Regarding the commercially available prodrugs, those classified as classical prodrugs, which were designed in order to improve bioavailability, were randomly selected. A variety of therapeutic classes was considered in this investigation, such as antiviral, antibiotic, and anti-hypertensive drugs.

Calculation of Molecular Properties and Exploratory Data Analysis
The structures of prodrugs and respective precursors were retrieved from DrugBank version 4.1 (www.drugbank.ca) and molecular properties of different nature were calculated using Marvin 5.10.3 (ChemAxon Ltd., 1998-2012), including those considered in Linpiski's and Veber's rules. Then, a table (X block) was generated which the rows corresponded to samples or drugs investigated (N = 26), and the columns to calculated molecular properties (27 independent variables or descriptors). This table (see Supplementary Material) was used as input data to carry out the exploratory data approach.
Exploratory data analysis comprised two unsupervised methods: hierarchical cluster analysis (HCA) and principal components analysis (PCA) (32,33). HCA and PCA were carried out using Pirouette 3.11 (Infometrix Inc., 1990). An autoscaling procedure was applied as a preprocessing method in both analyses. The complete linkage method and Euclidean distance were considered in HCA. The distances between samples or variables were calculated according to Equation 1.
(eq. 1)  (24) The multivariate distance d kl between two sample vectors, k and l, is determined by computing differences in each of the m variables. M is the order of the distance, and here represents the Euclidean distance (M = 2). The distance values were transformed into a similarity matrix whose elements are the similarity indices (similarity kl = 1d kl /d max , where d max is the largest distance in the data set). The similarity scale ranges from zero (dissimilar samples or variables) to one (identical samples or variables), and the larger the similarity index, the smaller the distance between any pair of samples or variable (32,33). The findings are expressed as a two-dimensional tree named dendrogram.
PCA is a multivariate method whose principal components contain most of variability from the data set in a much lower dimensional space. PC1 or factor 1 is defined along the direction of maximum variance of the whole data set, whereas PC2 or factor 2 is on the direction that describes the maximum variance of the subspace orthogonal to PC1. The subsequent components are taken orthogonally to those previously chosen, and describe the maximum of the remaining variance. Once redundant information is removed, only the first few PCs are necessary to describe most of the information contained in the original data set. The data matrix X (I × J), corresponding to I molecules and J descriptors, is broken down into two matrices, T and L, in such a way that X=TL T . The T matrix, known as the score matrix, represents the position (classification) of the compounds on the new coordinate system where the PCs are the new axes. Scores are integral to exploratory analysis because they show intersample relationships. L is the loadings' matrix whose columns describe how the new axes (PCs) are built from the old axes. It also gives insight into the variables importance, that is to say, which variables contribute more or less to each PC or factor (32,33).
The Pirouette 3.11 software (34) displays many entities that can help to explore the relationships between samples, find samples outliers, choose the optimal number of factors and make decisions about variables to be excluded. The outliers' diagnosis was performed through the Mahalanobis distance (Equation 2) (34, 35), which is a distance computed from its k factor (PC) score.
(eq. 2) In Equation 2, S is the score covariance matrix and t is the mean score vector. Assuming that the Mahalanobis distance is normally distributed, a critical value (MD crit ) can be determined from the chi squared distribution with k degrees of freedom. If the sample's Mahalanobis distance exceeds MD crit , that sample might be an outlier.

RESULTS
Herein, thirteen prodrugs and corresponding precursors were selected, and their percent values of human intestinal absorption (% HIA) as well as the score of absorption profile are listed in Table 1. Percent human intestinal absorption refers to the percentage of orally administered drug that reaches the hepatic portal vein, and it has been used to infer about drug bioavailability.
Besides the properties considered in Ro5 and Veber's expanded version to predict oral bioavailability, other molecular properties [topological (Platt, Randic, Balaban, Harary, Szeged, Wiener, Wiener polarizability indices), geometric (ASA, ASA+, ASA-, ASA_H, ASA_P), steric/hydrophobic (MR, molar refractivity), electronic (polarizability, α), and steric (MSA_vdW)] were calculated, herein, in order to explore differences between prodrugs and its precursors as well as to identify which molecular descriptors influenced more that discrimination. Then, an exploratory data analysis was carried out for the set of molecules selected. The findings are shown in Figure 1 and 2.
According to the factors' selection, the first two PCs were responsible for explaining 83.79 % of the total variance from the original data ( Figure  1A). Regarding the scores' plot for PC1 and PC2, the first component was responsible for discriminating prodrugs (right side; positive) from drugs (left side; negative). The therapeutic class seemed to be more considered in PC2. The loadings table ( Figure 1C) provides information on which calculated properties had greater influence on compounds' discrimination. In PC1, intrinsic (MW; 0.42), steric/hydrophobic (MR; 0.43), electronic (α; 0.42) and topological (rot_bonds; 0.42) descriptors presented high loading values. In PC2, however, topological (HBA_count; 0.68) and geometric (PSA; 0.67) descriptors influenced more compounds' discrimination.
In addition, the plot of sample residual versus Mahalanobis distances (34) (Figure 2A) suggested that there were no outliers. The sample residual threshold (light green line in Figure 2A) considers a 95% confidence level interval, which is set internally in Pirouette 3.11 (34). The compounds did not exceed a threshold value, considering both axes, which means that the calculated properties were sufficient to describe the structural features of the entire data set. Furthermore, complementary findings were obtained for both methods, HCA and PCA, reinforcing the separation pattern, as can be seen in the samples dendrogram ( Figure 2B   Considering that, four sub-clusters can be found and FOS_PRO would be isolated from the rest.

DISCUSSION
Nowadays, there are many drugs that do not possess an optimal pharmacokinetic profile or impairing therapeutic efficacy. One strategy adopted, in this regard, is latentiation in order to generate prodrugs, which would overcome biological barriers, and finally, reach the receptor/target.
Regarding Table 1, the selected prodrugs have presented higher % HIA values than their parent drugs, and the score of absorption profile changed from low to moderate, low to high, and moderate to high (24). According to the Biopharmaceutics Classification Scheme (BCS) (24), drug substances are classified as follows: Class I, high permeability and high solubility; Class II, high permeability and low solubility; Class III, low permeability and high solubility; Class IV, low permeability and low solubility. In general, the absorption of class II is over predicted by absorption models because dissolution is the ratelimiting step of absorption. So, molecular features as solubility, permeability, and diffusion rates (36) can affect absorption and, consequently, bioavailability.
Solubility and permeability through membranes are related to aspects such as molecule size and lipid solubility, and can be numerically expressed by some molecular descriptors, such as molecular weight (MW), calculated n-octanol/water partition coefficient (ClogP), and number of rotatable bonds (rot_bonds). Drugs diffusion rate is governed by the number of hydrogen bonds provided by hydrogen bond acceptors (HBA) and donors (HBD) groups. Therefore, physicochemical descriptors as molecular polar surface area (PSA), HBD and HBA can be applied to predict passive diffusion, for instance. Thus, the introduction of carrier groups in order to obtain a novel prodrug would modify drug's molecular properties and, consequently, solubility and permeability, which in most of cases should be changed.
In general, prodrugs have presented higher %HIA values than their parent drug. The transformation of penciclovir, oseltamivir carboxylate, benazeprilate, ramiprilate, and cefuroxime to their respective prodrugs caused a remarkable increase in absorption, emphasizing the importance of prodrug designing (latentiation).
Regarding PCA findings, the first component discriminated prodrugs from drugs, and, interestingly PC2 considered the therapeutic class. At the scores plot ( Figure 1B) it is possible to verify that antibiotics are mostly placed at the upper side, antivirals at the center/left side, and antihypertensive are at the center/bottom region of plot. Ampicillin and bacampicillin were more distant from the antibiotics' group probably because structural differences among penicillins and cephalosporins.
In PC1, intrinsic, steric/hydrophobic, electronic and topological molecular properties were important for compounds' discrimination. On the other hand, at PC2, topological and geometric descriptors presented higher influence. Steric, geometric, and topological parameters are related to the molecular shape and, then, play important role in absorption and molecular recognition of drugs.
Molecular weight (MW) corresponds to molecular mass of a chemical entity, and depends on the chemical elements in the structure (24,37,38). Considering the classical concept of prodrug design, where a drug and some chemical moiety are covalently attached, it would be expected that prodrugs present higher MW values than the corresponding parent drugs. MW was indeed one of the molecular properties that most influenced the discrimination process in PC1 (see Figure 1C).
Drug-membrane or drug-receptor interactions are associated with conformational arrangements that, in turn, depend on the presence of rotatable connections (rot_bonds) in a molecular system or compound. Veber et al. (16) proposed that oral bioavailability is dependent on molecular flexibility, which could be inferred through the number of rotatable bonds (rot_bonds values). However, the introduction of atoms and sigma bonds in molecules in order to increase rotatable bonds can also become a problem, since the excessive increasing of freedom degrees might generate a molecule with a suboptimal improvement on % HIA. That means, when the gain in number of rotatable bonds is substantial, other problems would affect the pharmacokinetic profile. The prodrugs investigated in this study present approximately ten rotatable bonds whereas the corresponding drugs have around six rotatable bonds. Thus, both have the number of rotatable bonds less or equal to ten, which could provide suitable bioavailability according to Verber's proposition.
Interestingly, two other molecular descriptors neither included in Ro5 nor Veber's extended version have presented high loading values in PC1 regarding the separation between drugs and prodrugs, molar refractivity (MR) and polarizability (α). They are related to steric/hydrophobic and electronic features, respectively. Thus, molecular properties as MR and α, not included in the rules of Lipinski and Veber, should be also investigated for designing novel prodrugs.
Complementary findings were observed at the PCA and HCA analysis. According to the dendrogram obtained, four groups were formed where therapeutic class was mainly considered. The first sub-cluster is contained of antibiotic prodrugs (cephalosporin type), except to TN_PRO. The second sub-group comprises antiviral drugs. A mixture can be observed in the third sub-cluster which are the anti-hypertensive (81 % similarity) and cephalosporin drugs (81 % similarity). The fourth sub-cluster comprises mostly antihypertensive (92 % similarity) and antiviral (66 %) prodrugs. Both AMP and AMP_PRO are not part of the antibiotic subgroups probably due to structural differences among penicillin and cephalosporin classes, corroborating the PCA findings.

CONCLUSION
This study has provided useful information on which physicochemical descriptors mostly influenced prodrugs and drugs discernment and, consequently, could affect HIA % values. Regarding exploratory data analysis, descriptors as MW, rot_bonds, HBA_count, and PSA, which are considered in Lipinski and Veber rules, have influenced the prodrugs and drugs discrimination. Moreover, other molecular properties, such as molar refractivity (MR) and polarizability (), were also important in the separation of investigated compounds, and should be considered by medicinal chemists for prodrug designing process.
Thus, the findings are in agreement with the expanded Ro5 and has indeed applicability as a rapid screening method in drug design, mainly for prodrugs, which need to be administered orally. Nevertheless, the introduction of multivariate analysis allowed better evaluating the molecular properties responsible for differences among prodrugs and drugs, emphasizing molecular properties not included at the expanded Ro5 as relevant for the prodrug designing field.

ACKNOWLEDGMENTS
This work was supported by grants of the Provost's Office for Research of University of São Paulo, CAPES and CNPq (Brazil). The authors are grateful to FAPESP (proc. nr.: 2013/18160-4).