Overcoming the linguistic divide: a barrier to consumer health information 1

: Seeking health information online has become very popular. Despite this popularity, health consumers face many barriers to successfully retrieving good quality health information. This paper reviews the literature on the linguistic divide between health consumers and consumer health information. Consumer health vocabularies (CHV) and natural language processing (NLP) show potential for bridging the divide, thereby improving recall and precision from information retrieval systems. Developers of digital libraries can incorporate CHV and (or) NLP as help tools to facili-tate health consumers’ search success. Deeper issues, such as health consumers’ mental representation of medical domain, must also be addressed in future research for optimal benefit from such help tools.


Introduction
Health care in the information age has shifted patients' approach to their roles from "passive recipient[s] of healthcare" to health care consumers [1]. Consumer health information supports a wide variety of needs, including the promotion of health and wellness, use of health care services, information about diseases and conditions, and information about medical tests, procedures, and treatments [2]. The public is taking advantage of this information to inform their wellness and health care treatment decisions [3]. However, the potential for knowledge dissemination is moderated by the digital divide and other barriers to access [4]. Of these barriers, one of the more tractable issues is the rift between health consumers' language and medical vocabulary. After a general introduction to consumer health searches, this paper focuses on efforts to democratize retrieval of consumer health information by overcoming the linguistic divide.
Knowledge of the obstacles faced by consumers when seeking information online can inform the development of digital libraries. In the context of this paper, a "digital library" is broadly defined as any institution, which in addition possibly to providing access to a collection of print consumer health material, acts as a repository for online health information and (or) a portal to selected consumer health Web sites. These may range from the bricks-and-mortar public library with a mandate to provide consumer health information online to the consumer-focused Web site to government-sponsored information portals.

Data sources and selection
The following sources were mined for articles: Library, Information Science and Technology Abstracts (LISTA), Li-brary and Information Science Abstracts (LISA), MEDLINE, Web of Science, ACM Digital Library, PsycINFO, CHASS, Google Scholar, and Google. Articles on the following topics were included in this narrative review: health consumers or laypersons, health consumers' models (mental representations, categories) of disease and health, information retrieval of consumer health information (CHI) by health consumers, Internet searches for CHI (in general and also specifically in Canada), statistics on Internet searches for health information, consumer health vocabularies, and natural language processing. Articles had to be written in English. No date limits were set, but the final search took place in April 2009.

The health consumer
Health consumers most commonly seek information about conditions and diseases [5][6][7][8][9]. Searches are most often conducted by patients with a medical condition looking for information on a specific condition [5,7,10,11]. Searches help to improve the health consumer's understanding of a health condition and promote further research [5,7]. The impact of a search for health information depends upon context but may be greater for individuals who have received a serious diagnosis or are experiencing a health crisis [5].
In Canada, the number of households seeking health information from the Internet rose from 15% in 1999 to 36% in 2003 [12]. In 2004, 65% of households used the Internet at home to search for health-related information in a typical month [13], with comparable percentages in 2005 and 2007 [14]. These findings speak to Canadians' growing interest in access to health information using digital technology. What is not apparent in these numbers is the continued presence of a digital divide [15,16]. In a rural Canadian community, the most frequently cited sources of health information (~60% of respondents) were the doctor and the Internet [17]. Seventy-four percent of residents in a rural Ontario community had looked for health or medical information in the year preceding the survey [17]. The Internet was most frequently cited as a source of health information for women in a rural Ontario community [16]. Despite these numbers, some respondents did not have access to a telephone or the Internet. In general, the percentage of Canadian households seeking health information from the Internet was lower for rural (32.4%) than urban (42.4%) households. If one looks only at households who use the Internet at home, this disparity is smaller (rural, 55.3%; urban, 59.5%) [14], which also suggests access to the Internet may be an impediment for rural Canadians.
Few consumers start their search for health information at medical portals, sites of medical associations, or libraries [18]. According to the Pew Internet and American Life Project, most online searches by health consumers begin with a search engine (66%), while only 27% begin at a health-related Web site [5]. Similar results were found in a rural community in Ontario [17]. Despite the Canadian government's emphasis on e-health, there is evidence that Canadians are not aware of the government's health information portals and (or) do not tend to seek out those sites for their health information needs [16,17].
A plethora of health-related Web sites and digital libraries are available, and people differ greatly in how effective they are in identifying authoritative sources and appraising the content [19]. Many health consumers could benefit from guidance in the selection of sources [9]. Online searches by health consumers often fail [9,20,21]. Interestingly, the majority of health consumers are positive about the result of their search [5,9] even in the face of searches that have "failed" by objective criteria [9]. From the consumers' perspective, the volume of information, unhelpful results, problems interpreting resources, and difficulties navigating are all issues [5,9,16]. Low reading literacy can compound the problem of access to health information [22]. From a usability perspective, language [21,[23][24][25][26][27], poorly formed search queries [9,18,28,29], use of short forms (abbreviations, acronyms) or slang [26,29,30], and spelling errors [11,20,28] are all barriers to retrieval of consumer health information.
A more fundamental barrier is health consumers' conceptualization of disease and illness. Patients and health professionals differ in their mental models of disease and illness, as well as the language they use to express medical concepts [24,27,31]. Laypersons interpret health in many ways, from the absence of illness to the "capacity to do" [32]. Physicians are concerned with bodily mechanisms and the causal pathophysiological causes of illness (disease model) while patients think about their health conditions in terms of a narrative reconstruction of events in their daily life [33]. In light of these differences it is not surprising that patients often find doctor's responses difficult to understand, and physicians feel that they are not adequately trained in communication of health issues [34].

The linguistic divide
Arguably the most tractable issue for a digital library attempting to address the needs of the health consumer by adding search functionality is the language gap. As described above, a linguistic rift exists between medical professionals and the layperson [23][24][25][26][27]. The layperson may not understand the terminology used by his or her doctor during consultations nor necessarily have a sufficient knowledge of basic anatomy [23]. Variability in consumer health language is driven by differences in social, cultural, educational, and personal or familial health backgrounds in the general public [35].
There is a reassuring degree of overlap between consumer and medical vocabularies, but there are discrepancies [31]. Where mismatches are found, however, they fall into three types: lay synonyms, lay usage, and lay terms that cannot be mapped [27,31]. Lay synonyms are terms in which different lexical forms (i.e., word form) have the same meaning (e.g., heart attack and myocardial infarction). Lay usage indicates terms in which the lexical form of lay and medical vocabulary is the same but the meaning is different. For instance, the term "negative" (lexical form) is present in both medical and lay vocabularies, but it may have different meanings to the two groups ("no indication" versus "unfavourable"). The two terminologies may have terms with different lexical form and meaning (e.g., soul: no equivalent professional term).
The most difficult of the three in cross-boundary communication is lay usage. Without careful exploration of the intended meaning, there is potential for incorrectly assuming that the health consumer's definition of the term matches that of the medical profession or controlled vocabulary (e.g., Unified Medical Language System (UMLS)). Concepts captured by lay usage are more difficult to identify by many of the research methods used in the consumer health vocabulary literature. For instance, evaluation of transaction logs from a database or Web site does not provide information about how the searcher defined the term he or she used in a query. Using the example above, the term "negative" may appear in a layperson's search query, but, in this instance, it might be a mistake to assume that the consumer was using the term in the same manner as a medical professional. It is most likely that these are the very concepts of greatest importance for identifying discrepancies in medical perspective between consumers and health professionals as there is a risk of incorrectly assuming shared understanding.

Consumer health vocabularies (CHV)
In recognition of the linguistic rift, new vocabularies are being developed to bridge the gap between everyday "medical" language and medical terminology [35,36]. Consumer health vocabularies are "expressions (i.e., words and phrases) commonly used by laypersons to refer to medical concepts" [37]. Consumer health vocabularies can be used for information retrieval, medical records, and health care applications. They typically have to be mapped to more standard medical vocabularies, such as those in the UMLS Metathesaurus [35].
The process of generating a CHV requires "the identification and characterization of consumer expression by selecting and annotating candidate terms from a corpus, analyzing contextual information to discern the intended meaning, and reaching consensus among reviewers" [37]. Identification of terms consumers use to communicate about health conditions, symptoms, treatment, and wellness can be achieved with collaborative human review and automated methods [38]. Enabling technologies that map consumer terminology to clinical controlled vocabularies are in development. For instance, the Consumer Health Vocabulary Initiative (http://consumerhealthvocab.org) is a multidisciplinary project promoting research and development of consumer health vocabularies [35].
According to Zeng and Tse [35], a "'first-generation'' CHV is a collection of forms used in health-oriented communication for a particular task or need (e.g., information retrieval) by a substantial percentage of consumers from a specific discourse group and the relationship of the forms to professional concepts. One component in the development of a CHV by members of the Consumer Health Vocabulary Initiative (CHVI) has been the construction of a Web application for CHV development [37]. The application enables the mapping of consumer terms to a controlled vocabulary, searching for "loose ends" (terms that have not mapped to controlled vocabularies, e.g., "heart attack" might not map to "cardiac arrest"), and, finally, reviewing of mappings by multiple reviewers for quality control. The application facilitates this lengthy and tedious task. It is a phased, distributed, and user source-based approach that has resulted in over 1000 concepts as of June 2005 [35]. Professionals from a wide variety of backgrounds have contributed their knowledge to this project, including physicians, nurses, informaticians, linguists, and medical librarians. Moreover, it is an open access project. There are plans to add Open Access Collaborative Vocabulary developed by the CHVI to the UMLS at the US National Library of Medicine [39].
Despite advances made by projects such as the CHVI, Zeng and Tse's definition highlights the variability in consumer health language, driven by differences in social, cultural, educational, and personal or familial health backgrounds in the general public [35]. The variability of consumers' medical vocabulary [31] presents a significant barrier to information access, as well as a challenge for the development of a universal English-language consumer health vocabulary. That is, a "one size fits all" solution may not be possible.

Natural language processing (NLP)
Natural language analysis offers another option as an intermediary between a layperson's terminology and the controlled vocabulary often used by a search engine. Natural language processors are "algorithms that allow computers to process and understand human languages" (The Stanford Natural Language Processing Group, http://nlp.stanford.edu/). Natural language processing is being explored for a variety of applications. For instance, The Stanford Natural Language Processing Group is researching topics ranging from sentence understanding to automatic question answering.
Natural language searches are offered as a solution to consumers' search problems [11]. For instance, Brennan and Aronson [40] have used the free text of laypersons electronic messages in the development of an application of NLP to link the consumer's query to the UMLS. MetaMap is a pro-gram designed to parse free text into noun phrases, identify UMLS terms related to the noun phrases, and then retain the concepts from best matching terms in the controlled vocabulary. MetaMap was used by Brennan and her colleagues to detect the presence of UMLS terms within the natural language of a consumer's e-mail query. MetaMap is reasonably effective in mapping the consumer's terms to the controlled vocabulary, although precision and coverage depends on which controlled vocabulary is being used. Natural language processing has been implemented to varying degrees in a wide variety of search engines [41].

Obstacles for CHV and NLP
Although promising, the application of CHV and NLP to improve consumer health searches is challenging. Abbreviated word forms, such as acronyms and abbreviations, are a challenge for support tools relying on NLP. For instance, natural language processors may inaccurately interpret an exact match between a lay term and the UMLS [26]. Smith [26] cites the example of "LATS". MetaMap interprets LATS as an acronym for the UMLS term "Long-acting Thyroid Stimulator [Amino Acid, Peptide, or Protein, Immunologic Factor]", when in fact a layperson may be searching for the muscle "latissimus dorsi". Plovnick and Zeng [29] found that replacing acronyms, abbreviations, and lay terms with UMLS vocabulary often improved search success. However, some UMLS terms are esoteric or a poor fit for the consumer's question (e.g., "epilepsy, absence" for "petit mal seizure") resulting in poorer search success after query reformulation [29]. Spelling suggestions can be of assistance. In particular, spelling assistance designed specifically for medical queries show promise [42]. Consumers' queries are often ambiguous and poorly formulated. No matter how well designed the CHV or natural language processor, such queries are likely to be difficult (if not impossible) for an automated system to reformulate or interpret effectively.
The help tools described above (e.g., spelling suggestions, mapping consumer health vocabulary to a controlled medical vocabulary, reformulating queries) address only one aspect of consumers' search problems. Successful information retrieval is based on competency in domain knowledge, general search strategies, resource knowledge, metaknowledge, and language [43]. For health consumers, a fundamental poverty in the mental representation of the medical domain underlies issues with terminology, and it is likely to affect three stages of information retrieval outlined by Keselman and her colleagues [43]: (i) the formation of a theory or hypothesis based on background knowledge, (ii) generation of a search goal, and (iii) evaluation of search results. Domain knowledge influences search strategies and the ability to benefit from help tools [44][45][46][47]. Therefore, the success of efforts aimed at CHV is likely to be limited without mechanisms for helping the consumer form a richer mental representation of the medical or health issue for which they seek information. Some level of this kind of assistance occurs in in-person interactions with a librarian, but this is largely absent in interactions with digital libraries.

Conclusion and future directions
This paper has explored barriers to consumer health information by examining characteristics of health consumers and their online search activities. The lay public is actively seeking material on health and wellness [3,4], but these efforts are often hampered by knowledge gaps [23]. A linguistic rift poses a grave impediment [9,27]. Bridging the linguist divide has been approached from a variety of angles. These lines of research offer significant promise, and, in the case of the consumer vocabulary development efforts by the Consumer Health Vocabulary Initiative, are made generally available to the community. Digital libraries with consumer health collections can take advantage of these initiatives to optimize information retrieval for their patrons.
Consumer health vocabularies (CHV) and natural language processing (NLP) are not panaceas, as deeper issues concerning the health consumer's mental representation of health and medicine also impede information retrieval. Solutions are challenging and require educational efforts [24]. This is an interesting and complex topic that requires more research. Despite the complexity of the issue, relatively simple measures may be possible in the meantime. For instance, the National Library for Health in the UK includes an online medical dictionary in a prominent location.
In addition, the hurdle from theory to practice must be overcome. Digital libraries must not only implement new advances (e.g., new consumer health vocabularies) in their search engines, they must conduct thorough usability analysis and testing to assure functionality for patrons of the digital library. The translation from research to conceptual design and, finally, to implementation of a "physical" instantiation of a project is a complex process [48]. Collaborative efforts to offer digital library services to consumers have repeatedly discovered the critical role of acquiring user input to optimize functionality of the system [49]. User input at all stages of the project identifies problems before the systems become too entrenched to change. Simple and cost-effective techniques are available, such as paper prototyping and storyboarding [50], enabling even the most financially constrained digital library initiative to conduct some level of user testing.
Democratizing access to health information requires financial resources and a commitment to understanding users' interests, competencies, and motives. There exists a substantial body of research in the area of improving the health consumers' access to health information via digital technologies that can inform digital library development. Digital libraries can offer enhanced capabilities as part of their services that are integrated into the design of the search engine.