Application of Counter-propagation Artificial Neural Networks in Prediction of Topiramate Concentration in Patients with Epilepsy

- Purpose : The application of artificial neural networks in the pharmaceutical sciences is broad, ranging from drug discovery to clinical pharmacy. In this study, we explored the applicability of counterpropagation artificial neural networks (CPANNs), combined with genetic algorithm (GA) for prediction of topiramate (TPM) serum levels based on identified factors important for its prediction. Methods : The study was performed on 118 TPM measurements obtained from 78 adult epileptic patients. Patients were on stable TPM dosing regimen for at least 7 days; therefore, steady-state was assumed. TPM serum concentration was determined by high performance liquid chromatography with fluorescence detection. The influence of demographic, biochemical parameters and therapy characteristics of the patients on TPM levels were tested. Data analysis was performed by CPANNs. GA was used for optimal CPANN parameters, variable selection and adjustment of relative importance. Results : Data for training included 88 measured TPM concentrations, while remaining were used for validation. Among all factors tested, TPM dose, renal function (eGFR) and carbamazepine dose significantly influenced TPM level and their relative importance were 0.7500, 0.2813, 0.0625, respectively. Relative error and root mean squared relative error (%) and their corresponding 95% confidence intervals for training set were 2.14 [(-2.41) - 6.70] and 21.5 [18.5 - 24.1]; and for test set were -6.21 [(-21.2) - 8.77] and 39.9 [31.7 - 46.7], respectively. Conclusions : Statistical parameters showed acceptable predictive performance. Results indicate the feasibility of CPANNs combined with GA to predict TPM concentrations and to adjust relative importance of identified variability factors in population of adult epileptic patients.


INTRODUCTION
The application of artificial neural networks (ANNs) in the pharmaceutical sciences is broad, ranging from drug discovery to clinical pharmacy. The flexibility of the ANN models allows analysis of pharmacokinetic (PK) and pharmacodynamic (PD) data (1)(2)(3). Applicability of ANNs in predicting blood concentrations of several drugs has been reported previously (4)(5)(6). ANN algorithms have been successfully used for the prediction of peak and trough plasma levels of aminoglycoside antibiotics in patients (7,8). In these studies, predictive performance of the ANNs was superior to the multiple linear regression analysis. Additionally, the results of several studies reported that the ANNs had predictive capability similar to or even better than nonlinear mixed effects modelling (NONMEM) performed by the software named after the approach (9)(10)(11). Besides that, Chow et al. showed capability of neural networks to capture relations between plasma tobramycin levels and patient-related factors from routinely collected sparse data (11). Moreover, in the study by Haider et al. ANN modelling was used to evaluate the relative significance of various covariates on the PK-PD characteristics of repaglinide (12). Finally, ANN approach was implemented in the decision making process of drug dosing (13). ________________________________________ The most often used ANNs for the prediction of drug levels are back-propagation artificial neural networks (BPANNs) primarily because of multi-layer architecture (9,11,14). Multi-layer networks have greater representational power for dealing with highly non-linear, strongly-coupled, multivariable systems. However, if BPANNs are not properly trained, they can develop models with bad generalization performances. On the other hand, the control of the generalization performances of counter-propagation artificial neural networks (CPANNs) is relatively easy compared to BPANNs (14). CPANNs can be considered as an extension of Kohonen maps and this algorithm has been widely used in chemistry and related sciences (14)(15)(16). Genetic algorithm (GA) might be used to optimize CPANN models. This algorithm may serve not only for variable selection, but also for finding the most suitable CPANN parameters, as well as for adjustment of the relative importance of various input variables (17)(18)(19)(20). The advantage of the relative importance adjustment is the fact that it allows us to examine contributions of the individual descriptors (independent variables) which, makes the models and their interpretation easier (17,19).
Various factors may affect the pharmacokinetic profile of topiramate (TPM), and consequently its concentration and possible therapeutic response. On the other hand, epilepsy is episodic disease and it is challenging to monitor efficacy of antiepileptic drugs, including TPM. Hence, identification of variability factors important for its prediction can be supportive in dosage regimen optimisation. In this study, we investigated the applicability of CPANNs combined with GA for prediction of TPM serum levels based on selected factors important for its prediction. To our knowledge, this is the first study that explores the application of CPANN approach as an alternative to the traditional, compartmental pharmacokinetics analysis of TPM data obtained from adult epileptic patients.

Data Set
The study was performed on 118 TPM measurements obtained from 78 adult epileptic patients treated at Clinic of Neurology, Clinical Centre of Serbia, University of Belgrade -Faculty of Medicine. Patients' data were collected during therapeutic drug monitoring. Patients were on mono-or co-therapy with TPM and other antiepileptic drugs (carbamazepine (CBZ), valproic acid, lamotrigine, levetiracetam, phenobarbital, pregabalin) or psychoactive drugs (benzodiazepines, risperidone). TPM was administered once to three-times a day in the form of 25, 50 or 100 mg tablets (Topamax ® , Cilag AG, Switzerland). Patients were on stable dosing regimen for at least 7 days; therefore, steady-state was assumed. Data collection and analysis were approved by the Ethics Committee of the Clinical Centre of Serbia. All patients gave written informed consent before enrolment in the study. Patients' data were collected from medical records and during the interviews with medical staff. These information included demographic characteristics (gender, age, body weight, height), smoking status, pathological characteristics (diagnosis and history of disease, comorbidities), characteristics of therapy. Biological material (1-2 blood samples) was taken from patients in steady state, mostly just before administration of the morning dose and/or 1-6 h after. TPM serum concentration was determined by high performance liquid chromatography with fluorescence detection following precolumn derivatization using 4-chloro-7-nitrobenzofurazan (NBD-Cl) for fluorescence labelling, with some modification of the method previously described by Bahrami and Mohammadi (21). In addition, serum was used for biochemical analysis (aspartate aminotransferase (AST), alkaline phosphatase (ALP), cholinesterase, bilirubin, albumin, total protein concentration and serum creatinine). Estimated glomerular filtration rate (eGFR) was calculated by Modification of diet in renal disease formula (22).
Each of the 118 TPM measurements along with its corresponding clinical information and dosing regimens was treated as a new input and output data pair. In order to perform proper validation of the model, data were randomly divided into a training and test set. Categorical covariates considered to influence TPM concentration included: smoking status, co-therapy with lamotrigine, levetiracetam, valproic acid, CBZ, benzodiazepines and risperidone. Continuous covariates considered for testing included: daily TPM dose, renal function (eGFR), liver enzymes (AST, ALP and cholinesterase), bilirubin, albumin, total protein concentration, daily CBZ dose. Missing covariate data were substituted by the median value.
Prior to optimization of the models based on CPANNs the experimental data were preprocessed. Auto-scaling was used for this purpose. In addition to this, in the cases of variables where the intervals span for more than two orders of magnitude, first the logarithm of the original variable(s) was calculated, and the obtained data were auto-scaled.

Counter Propagation Neural Networks
Data analysis was performed in Matlab 6.5. The CPANNs program used in this work is based on SOM Toolbox (23). CPANNs are consisted of two layers: (1) a Kohonen and (2) an output (Grosberg) layer. Kohonen layer performs the mapping of the multidimensional input data into, most often, twodimensional plane of neurons. The mapping is performed by competitive learning, called "winner-takes-it-all" strategy (14,16,24).
The optimization of CPANNs is performed similarly to Kohonen self-organizing maps (24). Vectors that represent all the variables (independent and dependent ones) are simultaneously presented to the neurons in both layers of the CPANNs. The main distinction is in the separate treatment of the dependent and independent variables (1) during the search of the winning neuron and (2) the correction of the weights (14,16,23,24). When CPANN are trained the winning neuron is selected by comparing the independent variables with the corresponding weight levels from the Kohonen layer. After that, the correction of the weights is performed simultaneously in both layers. Adjustment is performed according to the distances between the weights of the corresponding variables (independent and dependent ones) of a particular object. The procedure is repeated predetermined number of times (training epoch), until weights are stabilized and the network is considered as trained.
The objects with similar input vectors are positioned close to each other and it is expected that will have similar values for their output variables. These characteristics make the CPANNs appropriate for classification and modelling purposes.

Genetic Algorithm
GA was used to find CPANN parameters that would produce models with optimal performances. Specifically, the CPANNs parameters which were optimized by GA were: (1) variable selection, (2) search of the optimal network size, (3) the most suitable number of epochs for their training and (4) for automatic adjustment of the relative importance of the input variables (20).
First step in GA is to generate initial population. New generation is created using simulated evolution. Each new generation consists of 2 parent and 2 offspring chromosomes. The offspring chromosomes are obtained by coupling parent chromosomes using genetic operations (crossover and mutation). The newly created offspring chromosomes will replace chromosomes with worst performances in the current generation. This process of simulated evolution using elitist strategy is repeated for a predefined number of generations.
As previously stated, we used GA for selection of CPANN models with best possible performances. Therefore, the best CPANN models were searched among the final populations obtained after repeating GA several times.

STATISTICAL ANALYSIS
Statistical measures used to assess predictive performance were: relative error (%), root mean square relative error (RMSRE), and their corresponding 95% confidence intervals. Relative error was used to describe accuracy (bias), while RMSRE to describe the precision of the predictions. In addition, relationship between the relative error and predicted concentrations for training and test set was done (25)(26)(27). All indices of model performance were calculated from test and training set using Microsoft Office Excel 2003 ® and SPSS ® software (version 17, Chicago, Illinois, USA). Table  1. Data for training included 88 measured TPM concentrations, while 30 TPM concentrations were used for validation.

Characteristics of input data are presented in
Procedure for finding the best CPANN architecture and training epochs was repeated several times. GA optimization was used for the selection of the independent variables, for determination of the training parameters as well as for the determination of the size of the CPANNs. Additionally, GA optimization was used for adjustment of relative importance of the independent variables. Predictive performances of the models during the optimization were checked using cross-validation procedure. The size of the CPANNs, the number of epochs used in training phase and the selected independent variables for the best model are presented in Table 2.
Predictive performance of the final model was summarized in Table 3. Calculated 95% confidence interval of relative error included 0 indicating accuracy of the model prediction, while RMSRE indicated acceptable prediction error. Mean relative error indicates that in the training set observed TPM concentration on average was 2.14% higher than the predicted concentration, while in the test set the measured concentration was on average 6.21% lower (Table 3). Figure 1 illustrates the relationship between the relative error and predicted TPM concentrations for the training and test sets. Lowess fit shows no significant trend of over-or under-prediction with a change in TPM concentrations.

DISCUSSION
This work represents capability of CPANNs combined with GA to predict TPM serum concentrations based on factors important for its prediction. To our knowledge, this is the first study that explores the application of CPANN approach as an alternative to the traditional, compartmental PK analysis of TPM data obtained from adult epileptic patients.
In this study predictability was checked using test set and by appropriate diagnostic plot and statistical parameters. Parameters in Table 3 showed acceptable precision and accuracy of prediction. Calculated 95% confidence interval of relative error included 0 indicating accuracy of the model prediction. Figure 1 illustrates adequate prediction for whole TPM concentration range. The results confirmed that CPANNs are reliable predictive tool using simulated evolution for modelling the nonlinear relationships. As documented in several previous studies, predictability obtained by the ANN modelling, were similar to or even better than those obtained by a standard NONMEM modelling (5,(9)(10)(11). However, in this case, it was difficult to compare obtained model with our previous NONMEM model, since internal validation was performed and different size of data set was used for model development (28).   Usefulness of some ANN models with acceptable prediction can be limited in PK studies since models, considered as "black box", may have lower interpretability. The understanding of model is difficult if the contribution of individual factors is unknown. CPANNs combined with GA allow adjustment of the input variables' relative importance. Therefore, applying this approach we explored the influence of various demographic, biochemical parameters and the treatment characteristics of the patients on TPM concentrations and their relative importance. This information is valuable for a better understanding of the factors and comparing their relative effects on drug's level. Among all factors tested, only some showed significant influence, which is in agreement with our previous findings (28). The results confirmed the importance of TPM dose, renal function (eGFR) and CBZ dose ( Table 2). TPM given within recommended dosing range shows linear PK, and consequently the highest influence of TPM dose was detected (29). Influence of renal function (eGFR) is expected as TPM is excreted predominantly (up to 70%) unchanged through the kidneys. Also, it is well known that enzyme-inducing drugs, such as CBZ, enhance the TPM clearance, and consequently decrease TPM concentrations (29). Effects of other demographic, biochemical and co-therapy characteristics (Table 1) were not selected during the analysis. Patients' age, weight, height, gender and serum creatinine are required for the calculation of glomerular filtration rate. Therefore, the independent impact of these factors was not appropriate to investigate. However, since model was developed only on adults patients, application to other populations probably would not be appropriate.
Measured drug concentrations reflect influence of factors on drug disposition, and therapeutic drug monitoring is useful approach for routine control in terms of drug efficacy and safety. Moreover, measured concentration may be used for developing neural networks model for drug level prediction by detecting factors important for its prediction. Identification of variability factors supports the optimization of dosage regimen. Developed model in this study, allows us to predict whether the drug concentration in a particular patient, taking into account significant patient factors and dose, will be within the reference range. This is especially useful for drugs like TPM, since difficulties to monitor effect of antiepileptic drugs.
In general, ANNs allow modelling of the complex relations between dependent and independent variables, even in the cases where exists no previous knowledge about the exact relationship between input and output data. Moreover, these algorithms are recognized as useful predictive tool especially for data sets having nonlinear relationships (2,11,14). Based on our experience in this research, we can confirm that this approach is user-friendly, and that a priori knowledge of drug's PK is not required. Nevertheless, neural networks and CPANNs, as its type of analysis, can be useful in predicting drug level by detecting factors important for its prediction. This approach can be used for initial screening of influential covariates. Factor's relative importance significantly contributes to model interpretation and applicability.
In conclusion, results of this study demonstrate the feasibility of CPANNs combined with GA to predict TPM concentrations and to adjust relative importance of important variability factors in population of adult epileptic patients. Developed final model showed acceptable predictive performance. The limitation of the study is small number of patients and higher relative error in the test set. Also, it would be useful to include in the analysis more patients with renal impairment for which used GFR equation is adequate. Possible effects of unmeasured and uninvestigated variables on TPM concentration and further refining of the model might be needed. Based on the available results of the studies in this field, more research is warranted in improving the application of neural network in PK analysis.