Potential Limitations of Bioluminescent Xenograft Mouse Models: A Systematic Review.

PURPOSE
Bioluminescent imaging (BLI) is a versatile technique that offers non-invasive and real-time monitoring of tumor development in preclinical cancer research. However, the technique may be limited by several factors that can lead to misinterpretation of the data. This review aimed to investigate the validity of current BLI tumor models and provide recommendations for future model development.


METHODS
Two major databases, MedLine and EMBASE, were searched from inception to July 2018 inclusively. Studies utilizing mouse xenograft models with demonstration of linear correlations between bioluminescent signal and tumor burden were included. Coefficients of correlation and determination were extracted along with data relating to animal model parameters.


RESULTS
116 studies were included for analysis. It was found that the majority of models demonstrate good correlation regardless of the model type. Selection of a single cell clone with highest luciferase expression resulted in a significantly better correlation. Lastly, appropriate tumor measurement techniques should be utilized when validating the BLI model.


CONCLUSIONS
In general, BLI remains a valid tool for pre-clinical assessment of tumor burden. While no single factor may be identified as a general limitation, data should be interpreted with caution.


INTRODUCTION
Optical imaging techniques have become a powerful tool in cancer research by providing a means of noninvasive tumor monitoring (1). In comparison to current imaging methods, bioluminescent imaging (BLI) is becoming increasingly popular for researchers due to its high signal-to-noise ratio, rapid image acquisition, and relative ease in technical operation (2). In addition, the principle of light generation from live cells makes BLI highly specific to xenografts developed from luciferase transfected cells; several studies have shown that BLI correlates well with tumor burden (3,4). As such, BLI has been commonly employed as a method of tracking tumor growth and assessing treatment efficacy in preclinical models (5-7).
Despite its versatility, many factors exist that could impact the validity of BLI in preclinical tumor model monitoring (8). For instance, the intrinsic properties of tumor microenvironments such as necrosis and hypoxia can decrease light output leading to inaccurate interpretation of data (8). In a study by Tuli et. al., a plateau of bioluminescent signal was observed with large, necrotic tumors when approaching the study end point (9). Another study also demonstrated a poor correlation between BLI signal and tumor burden when tumor size exceeded 1.2 cm (10). Additionally, the route of luciferin administration may also affect signal intensity due to differences in substrate availability (11). For instance, peak signal intensity, optimal time of imaging, and the duration of the signal has been found to vary depending on the location of the tumor and the route of substrate delivery (12). Lastly, tumor models that generate ascites could also be susceptible to decreases in photon emission due to signal quenching by the fluid (13). Therefore, while BLI tumor monitoring has the potential to accelerate the assessment of drug efficacy and provide data that are normally not available from traditional models, these factors must be considered when utilizing this technique. Indeed, in a study validating a bioluminescent model of breast cancer, different efficacy outcomes were observed between orthotopic and intraperitoneal models (14). Importantly, although BLI correlates well with tumor burden in subcutaneous models, the correlation is reportedly reduced in disseminated intraperitoneal tumor models (5). _________________________________________ While subcutaneous xenografts are relatively easy to establish, they do not recapitulate normal pathophysiological conditions of cancer as most tumors in the clinic arise within the body cavity. As such, orthotopic or intraperitoneal xenograft models have become increasingly popular due to their ability to better represent clinically relevant disease conditions.
Based on these reported limitations and the widespread use of BLI in preclinical tumor models, it is imperative to better understand and validate the use of BLI as a tool for assessing tumor burden. In the present review, we performed a systematic literature search to identify studies that have utilized BLI in mice xenograft model. The values of correlation of determination (R 2 ) for eligible publications were extracted to determine the primary limitations and the extent to which these factors impact the validity of the model.

Study Design
The protocol for the present systematic review was conducted and documented in accordance to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRIMSA) guidelines.

Search Strategy
The aim of this search was to include articles reporting correlations between in vivo bioluminescence signal and tumor burden in mouse models. Two authors (YS and CP) independently searched two databases (MedLine and EMBASE) from inception to July 2018 with publications limited to English language. Four sets of search terms were combined with "AND". The first set consists of terms related to bioluminescence; the second set contains words describing cancer, xenografts, and peritoneal establishment; the third set includes synonyms of mouse; and the last set were keywords describing the outcome of measure. Terms within each set were combined with "OR". The complete search strategy is shown in Suppl. Table 1. Reference lists from relevant review articles were also searched and are summarized as "other sources" in the PRISMA flow chart (Figure 1).

Selection Criteria
Eligibility of all identified records was screened by two independent reviewers (YS and CP). Titles and abstracts were searched for a combination of keywords included in the complete search terms. Full articles were retrieved and screened if inclusion was unable to be determined by title and abstract. Publications were included if they met the following criteria: 1) primary research articles using mouse tumor models; 2) tumor progression or tumor response to treatment monitored by luciferase-based bioluminescent imaging; 3) examined linear correlation between bioluminescent signal and tumor volume or tumor weight measured by caliper or other imaging modalities. Review articles were excluded. Publications were also excluded if the research was conducted in rats, if tumors were monitored using fluorescence or other luminescence techniques, or if non-linear correlations were performed to evaluate the relationship between tumor burden and bioluminescence.

Data Extraction and Analysis
Data was extracted independently by three authors (YS, CP, and RA). Values of coefficient of correlation (R) or coefficient of determination (R 2 ) between bioluminescent signal and tumor burden were extracted. When possible, details pertaining to types of cancer cell lines, location of xenograft, disease state, methods of tumor burden assessment and number of animals used were obtained. In addition, parameters associated with the BLI technique including selection of single cell clones after transfection or transduction, type of luciferase utilized, and route of substrate administration were also recorded. Linear regression analyses were performed using STATA (version 15, College Station, TX, USA) to determine the association between R 2 values and parameters of the tumor model. For publications that reported only R values, R 2 were obtained by squaring the respective R values. A p value < 0.05 was considered statistically significant for all analyses. R 2 values obtained from intravenous (i.v) and orthotopic (o.t.) models were grouped in analysis between correlation and tumor inoculation site. Additionally, R 2 values obtained in models measured by ex vivo BLI and fluorescenceactivated cell sorting (FACS) were grouped in the regression analysis between R 2 and tumor assessment methods. R 2 values obtained from spontaneous transgenic models were not included in the regression analysis due to a small sample size.

Characteristics of Studies
From the two databases and other sources, 7,874 records were identified ( Figure 1). Of them, 2,024 were duplicates. A total of 5,850 records were included for initial title and abstract screening. 1,374 articles were excluded for clear lack of fit with our eligibility criteria. 2,606 conference abstracts were screened and subsequently excluded due to ineligibility. The remaining 1,870 articles were subject to full text screening and 1,754 records were excluded for reasons including no correlation reported (n=1,422), lack of relevance to our objective (n=236), correlation reported only for in vitro study (n=87), and relationship was determined through non-linear correlation (n=9). A final sample of 116 articles was included. Of the included articles, 12 reported on two different models, resulting in a total number of models included in the review to be 128. Of all the reported models, 37.5% (48/128) were subcutaneous (s.c) models, 11.7% (15/128) were intraperitoneal (i.p) models or o.t. models that resulted in peritoneal tumors, 41.4% (53/128) were o.t. models that resulted in tumors outside of the peritoneal cavity, and 9.4% (12/128) were i.v. or spontaneous transgenic mouse models. A majority of the models established solid tumors (70.3%, 90/128) whereas disseminated disease was observed in 38 models (29.7%). A total of 207 correlations were extracted from 116 articles. Correlation between BLI and Tumor Burden in Models with Different Inoculation Sites It has been shown that BLI correlates well with tumor burden in s.c. models due to the ease of tumor assessment. Conversely, xenografts that grow inside the body cavity, such as in intraperitoneal or orthotopic models, may result in signal attenuation and poor correlations. Therefore, we first investigated whether the validity of BLI is impacted by different models of tumor origin. Linear regression analysis suggested that no relationship exists between R 2 and different types of tumor model (p=0.6642). The median R 2 values ( Figure 2) were 0.86, 0.80, 0.80, 0.87 and 0.79 for subcutaneous, intraperitoneal, orthotopic, intravenous and transgenic models, respectively. The proportion of R 2 value over 0.5 for s.c, i.p, and o.t models were 82% (65/79), 76% (19/25) and 87% (77/89), respectively.

Effect of Luciferase Expression on Correlation
The generation of luciferase-expressing cells is the foundation of BLI xenograft models. We therefore evaluated the impact of luciferase expression on the correlation of BLI and tumor burden. Our regression model revealed that a better correlation is associated with tumor models developed using the highest luciferase expressing cell population, obtained through means of clonal selection (p<0.05). Specifically, the regression predicts a decrease of 0.1 in R 2 value for models that do not utilize clonally selected cells. In addition, less variability was also observed for the clonally selected models as shown by a narrower interquartile range ( Figure 3).

Effect of Tumor Burden Assessment Method on Correlation
In xenograft mouse models, several techniques may be used to assess tumor burden post-mortem. As such, the method of tumor assessment and its impact on the correlation with BLI was examined. Overall, the correlation between BLI and tumor burden when measured using different tumor assessments remained strong with median R 2 values ranging from 0.72 to 0.92 ( Figure 4). Despite this, tremendous variability was observed as demonstrated by the wide distribution of R 2 values across the methods of tumor burden assessment. Interestingly, the regression model demonstrated a significant negative relationship between R 2 and measuring tumor by non-BLI imaging-based techniques (p<0.0001). Indeed, the correlation with BLI data was the lowest (median R 2 =0.71) when non-BLI imaging techniques (MRI/CT/PET/Ultrasound) were used to assess tumor burden as compared to the traditional methods for evaluating tumor burden (i.e. using calipers and tumor weight). To further our analysis, we grouped each model based on the methods of tumor burden measurement (i.e. by calipers, tumor weight, or non-BLI imaging-based techniques) and evaluated the impact of tumor location (s.c., i.p. or o.t.) and disease presentation (solid or disseminated tumor) on the R 2 values. In models measuring tumor burden by caliper, no relationship was found between R 2 and tumor origin (p=0.0654). Similarly, no association was established when these models were categorized into solid tumor (p=0.2167). Comparison between models in disseminated disease was not done due to small sample size in both s.c. and i.p. groups (n=3 and n=2 for s.c. and i.p. models, respectively). Nonetheless, all three models produced good correlations, however high variability was found in i.p models ( Figure 5). In contrast, in solid tumors, BLI positively correlates with tumor burden by weight in o.t models (p<0.05). However, no relationship was found in disseminated tumors regardless of the model type (p=0.9236) ( Figure 6). Interestingly, the regression analysis demonstrated significant positive correlation between R 2 and non-BLI imaging techniques in i.p models, regardless of whether the disease was presented as a solid tumor or a disseminated pattern (p<0.05) (Figure 7).  The correlation between tumor burden measured using bioluminescence signal and tumor burden measured using calipers in subcutaneous (s.c.), intraperitoneal (i.p.) and orthotopic (o.t.) models in a) solid tumor and disseminated disease, b) solid tumor alone; i.p. models were removed from analysis due to small sample size (n=3). Median and interquartile range are indicated on the scatter plot. Figure 6 The correlation between tumor burden measured using bioluminescence signal and tumor burden measured using the weight of tumor nodule(s) in subcutaneous (s.c.), intraperitoneal (i.p.), orthotopic (o.t.), intravenous (i.v.) and spontaneous transgenic models in a) solid tumor and disseminated disease; b) disseminated disease alone; And c) solid tumor alone; several models were removed from the graphical presentation and analysis due to small sample size (n≤3). Median and interquartile range are indicated on the scatter plot.

DISCUSSION
Owing to its high sensitivity and ability to distinguish between live and dead cells, BLI has become a popular method for evaluating treatment response in xenograft models. While individual research groups validated their models before proceeding to treatment evaluation, there are contradictory results reported in the literature on the correlation between tumor burden as determined by BLI and other techniques (5). In this review, we have systematically searched two major databases from inception to July 2018 for publications that have reported correlations between BLI and tumor burden in xenograft mouse models. It was demonstrated that despite variabilities, BLI generally correlated strongly to tumor burden. Moderate to good correlations were observed for various methods of tumor assessment. BLI measurements were well correlated for solid tumors measured by caliper, with tumor burden measured by weight in o.t. models, and for non-BLI imaging-based tumor measurements in i.p xenografts. Importantly, selecting single cell populations with the highest luciferase expression improved the correlation between BLI and tumor burden. The correlation between tumor burden measured using bioluminescence signal and tumor burden measured using non-BLI imaging-based techniques in subcutaneous (s.c.), intraperitoneal (i.p.), orthotopic (o.t.), and intravenous (i.v.) models with either a solid tumor or a disseminated disease, i.v. model was removed from analysis due to small sample size (n=1), but was included in the figure for complete presentation of data. Median and interquartile range are indicated on the scatter plot.
Light generation from luciferase is highly specific to living cells due to the requirement of ATP as a co-factor. Consequently, photon output is thought to be positively correlated with tumor mass. As expected, strong correlations were observed between BLI and tumor burden in all models. Additionally, no interaction was observed between types of tumor model and R 2 , suggesting that BLI remains a valid tool for assessing tumor burden of xenografts in mouse models, regardless of the site of inoculation. However, a high degree of variabilities in correlation values was also observed in our analysis. This may be explained by the limitations of BLI techniques. Since luciferase requires ATP and oxygen for light output, it is not surprising that the most reported factor that led to a poor correlation was the presence of necrosis or hypoxia in the tumors assessed. In the model developed by Godechal et al.
to characterize melanoma, the authors suggested that the poor correlation between BLI and tumor weight was due to lack of oxygen and poor perfusion. As a result, limited substrate as well as cofactor (O2) led to underestimation of tumor mass by BLI (15). Moreover, in another s.c xenograft model developed for detection of breast cancer, the authors also discussed the impact of hypoxia and necrosis on the extent of correlation between BLI and tumor burden (10). In their study, it was found that a good correlation existed between BLI and tumor volume in tumors with a diameter of less than 1.2 cm (10). In contrast, as tumors got larger, the correlation became inferior (data not shown) due to excessive necrosis. A similar result was reported in another study using a glioma model (16). Furthermore, in the s.c. colon cancer model developed by Hadaschik et al., attenuation of light signal was observed at advanced disease stages. It was suggested that an increased necrosis may be responsible for the discrepancy between tumor volume and in vivo BLI signal (17). A decrease in bioluminescent signal in areas of necrosis was further confirmed histologically in two separate studies (18,19).
Another major limitation of BLI is signal quenching. It is well known that light generated from luciferase is prone to tissue attenuation (20). In the i.p. gastric cancer model by Stollfuss and colleagues, the poor correlation between tumor volume and BLI was attributed to light scattering and absorption from tissue and organs (21). Specifically, a high detection rate of metastatic lesions was observed for those that invaded the peritoneum, whereas none of the lesions on the diaphragm, and only 1/13 of the lesions on the liver were detected (21). Another example of light quenching was demonstrated in a syngeneic, orthotopic, murine bladder model where bioluminescent signal was reduced in the hemorrhagic tumor area (22). Hemoglobin in red blood cells can decrease light penetration (23). In addition, surrounding tissue may also be affected by blood coming from the site of hemorrhaging, thereby leading to further reduction of photon emission.
Besides tissue attenuation, accumulation of ascites also has the potential to prevent light penetration. In this review, we identified two studies using ascetic models that reported correlations between BLI signaling and tumor burden. Interestingly, an excellent R 2 of 0.8 (24) and 0.98 (25) was seen in both studies. In contrast, in an i.p, disseminated xenograft model of ovarian cancer recently developed by our group, we observed a reduction in bioluminescence after widespread ascites formation, which resulted in an inferior correlation (26). This is consistent with a previous study of an intrahepatic model developed by Sarraf-Yazdi et al. in which the correlation improved after paracentesis (27). The study by Sarraf-Yazdi et al. was not included in this review as a non-linear correlation was performed. The discrepancy observed between our group and the identified literature in this review may be explained by the difference in the volume of ascites fluid and the technique used to measure tumor burden. While our study was associated with an accumulation of more than 4 mL of ascites within the peritoneal cavity of mice, the study by Lan et al. (27) and our group, tumor burden was measured by weight. In contrast, in the study by Edinger et al., FACS was used to measure tumor burden in the liver and spleen (25). This suggests that in the presence of ascites fluid, tumor burden may be better measured by a combination of different techniques.
Discrepancies between BLI signal and tumor burden are often associated with advanced disease stages, which typically involve large tumors and sometimes the onset of ascites. Therefore, we sought to determine whether there is an interaction between tumor progression and the validity of BLI. However, no relationship was observed (p=0.1679). The inconclusive result was likely due to the heterogeneous nature of the data collected in this study. For example, it is known that different cancer cell lines exhibit distinct doubling times, and the data collected for this analysis consist of models developed from a diverse range of human or murine cancer cell lines inoculated into various strains of mice. As such, one can expect that advanced stages of tumor progression for one model may be at an optimal time for imaging in another model. It is plausible that if a large enough dataset of correlations were to be collected at various time points from the same model, a discrete interaction may be observed.
Next, we attempted to investigate the effect of route of substrate delivery on the correlation between BLI and tumor burden. Unfortunately, a comparison could not be made due to the fact that large majority of the models administered the luciferase substrate only through i.p injection. However, route of substrate administration remains a factor to be considered when utilizing BLI as it has been shown that administration of intravenous luciferin resulted in differential tissue uptake and time to the peak signal than intraperitoneal administration (11).
One important observation demonstrated by this review is that xenograft models developed from clonally selected cell populations demonstrate a superior correlation between BLI and tumor burden. When creating a bioluminescent xenograft model, a plasmid typically containing genes for both luciferase and antibiotic resistance is introduced to cancer cells by transfection or transduction (28). While addition of antibiotic allows for selection of successfully transfected or transduced cells, there is no control of the copy number of the plasmid that was introduced to the cells. This results in a heterogenous cell population with differential luciferase expression. Since light intensity is proportional to the copy number of luciferases (29), using a mixed cell population can possibly lead to inconsistent photon emission throughout a tumor nodule, thereby impacting the correlation. In addition, loss of copy number of the luciferase gene has been demonstrated in large s.c. tumors leading to a poor correlation. Therefore, selection of a single cell clone with the highest luciferase expression is recommended. However, one major limitation associated with clonally selected cell population is the loss of heterogeneity, which can impact the clinical relevance of the tumor model.
Imaging technology plays a crucial role in the detection of cancer. Beyond the traditional methods of tumor measurement by caliper and weight, imaging techniques such as Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET) and ultrasound have also been used to assess tumor burden. Interestingly, a negative relationship was found between BLI and tumor burden measured using non-BLI imagingbased techniques, likely due to wide range of correlation values observed. A possible explanation for the discrepancies observed in this cohort could be the technical expertise required to operate these systems. While studies using ultrasound and PET generally resulted in a strong correlation (30-32), models with CT and MRI can result in a inferior correlation. In a model of breast cancer, the authors commented that the inferior correlation between BLI and CT-based assessment of tumor burden was likely due to poor animal positioning and tissue attenuation (33). Furthermore, while CT imaging provides high resolution imaging of bony structures, its soft tissue visualization is limited (1). As such, a combination of technical limitations and human error likely contributed to the inability of CT to detect tumors in this case. In another study assessing tumor burden using MRI, the low correlation with BLI was attributed to motion artefacts from animal respiration (34). However, when respiration synchronized images were taken, the correlation was improved. Therefore, although technical error is associated with optical imaging, when properly controlled, a good correlation between BLI and tumor burden can be achieved (35-38).
Despite being regarded as a highly sensitive and reliable method for evaluating tumor burden in vivo, the accuracy of BLI model may still be impacted by various factors. However, it is surprising that validation of BLI models before subsequent experimentations is rarely performed. Of the 1870 publications that underwent full text screening, 76% (1422/1870) did not report a correlation between BLI and tumor burden, and only 6% (116/1870) validated the model. This observation is alarming as an invalid model can lead to incorrect conclusions. Although there are limitations that are intrinsic to some tumor models (hypoxia, tissue attenuation) and difficult to control, parameter such as imaging time is surprisingly overlooked by researchers. It has been shown in several studies that the peak bioluminescent signal is reached approximately 10 minutes after an i.p. injection of d-luciferin (12,39,40). However, almost half of the eligible publications (54/116) did not report the timing of BLI relative to the administration of d-luciferin, and of those that did report, only 32 groups had taken the image at the peak bioluminescent signal, as reported previously (i.e. at 10 minutes after i.p. injection). Additionally, the work by Inoue et al. suggested that the timing of peak signal shifts depends on the number of days post-inoculation. Therefore, acquisition of BLI images at a predetermined time point may become inaccurate as the tumor progresses and therefore obtaining successive images may improve tumor monitoring by capturing the peak bioluminescent signal. Indeed, our regression analysis demonstrated a significant positive relationship between R 2 values and sequential imaging (p<0.05). As such, it is recommended that sequential images be acquired to avoid underestimation of tumor burden.

CONCLUSION
Collectively, we have shown that BLI remains a valid tool for preclinical assessment of tumor growth and drug response. Since many factors can influence BLI, conflicting results on its correlation to tumor burden have been reported as shown by the high disparity observed across the analysis. Therefore, no single entity may be identified as a general limitation of the BLI technique. As such, we propose the following considerations when utilizing BLI in small animal imaging: 1) derive xenografts using singlecell populations with the highest luciferase expression using the limiting dilution method; 2) use appropriate non-BLI assessment methods to confirm tumor burden depending on the model of choice; 3) acquire sequential images to capture the peak bioluminescent signal; 4) BLI may become less reliable at advanced disease stages, thus traditional efficacy assessments such as median survival should be employed to complement the overall analysis. In light of this, we hope to bring consensus between researchers and to refine the approach to BLI-based assessment of tumor burden in preclinical xenograft cancer models.