“How Do I Do That?” A Literature Review of Research Data Management Skill Gaps of Canadian Health Sciences Information Professionals

Abstract: There is a recognized need to provide research data management (RDM) services in health sciences libraries. A review of the literature reveals numerous strategies to provide training for health sciences librarians as they provide RDM services to health sciences researchers, faculty, and students. However, no consensus emerges through this literature review with respect to RDM training initiatives. With training initiatives being developed and documented, more in-depth research will emerge that verifies which initiatives have the greatest success for upskilling information professionals in managing research data. This is an area where future library and information studies research can be conducted. It is the hope that with this literature review, I can conduct my own survey to gain more perspective on RDM in a Canadian health sciences library context.


Introduction
Research data management (RDM) is an increasingly common set of practices offered in Canadian health sciences libraries. RDM, as defined by Whyte and Tedds [1], "concerns the organisation of data, from its entry to the research cycle through to the dissemination and archiving of valuable results." RDM organizes data that is created during the overall research lifecycle to make it accessible for current and future users. Managing research data is complex and comprises wide-ranging data services such as planning, data curation, data storage, data hosting, and long-term preservation. Specific services include consultations on funding body compliance, creating informatics, data licensing, depositing data into digital repositories, among others [2].
Though researchers are the ones creating data, they may not be the best managers of data. Conrad, Shorish, Whitmire, and Hswe claim "[m]ost researchers have not been formally trained to manage their own data" [3]. Who should manage research data, if not researchers? Library and information studies (LIS) researchers analyzed the impending "data deluge" and recognized librarians could provide services to manage research data [4][5]. Librarians have a long history of organizing and managing digital resources, making RDM a natural fit. Conrad, et al. recognize the link between curation of digital library resources and research data as they write: "[m]anagement of content and data aligns logically with digital curation practices" [3].
However, are health sciences librarians prepared to provide this new service? Do health sciences librarians have the necessary skills to provide expert data services? This paper reviews literature discussing data management services, highlights skills necessary to provide RDM services in different contexts, and identifies training initiatives that could be used in Canadian academic health sciences libraries. Following the literature review, potential research opportunities are presented to study how best to bridge the gap in research data management skills among Canadian health sciences librarians.

Methods
The author searched the University of Manitoba Libraries' discovery service (Ex Libris Primo) to find an initial set of peer-reviewed resources on research data management in academic and health sciences libraries. The author used the search string: ("research data management" OR "RDM" OR "research data services") AND ("academic" OR "health sciences" OR "medical" OR "medicine") AND librar*. The author limited the search results to scholarly articles. This resulted in 2 142 results. The author completed further searches in MEDLINE, Scopus, and Web of Science using the same search string. In MEDLINE, the search string was modified by omitting quotation marks and searching the following field codes: .ti (title), .ab (abstract), .ot (original title), .kf (keyword heading word), and .hw (subject heading word). The author then used "pearl growing," or scanning of selected reference lists, of reliable and scholarly articles. Reliable and scholarly for the author is defined by articles with a Field Weighted Citation Impact score greater than 1.00 in Scopus. Overall, the author found 38 articles that discussed research data management to build a historical narrative in this emerging area of LIS. As well, these articles list specific skills and training initiatives for information professionals to provide RDM library services. The author prioritized articles with a geographic focus on Canada, however the author found little research in this area. The author had to discard one article due to its focus on bibliometrics despite the title indicating split coverage of bibliometrics and RDM [6]. The author included Cronin [7] and Gieryn [8] to reflect early crossdisciplinary activities based on feedback received from peers. Due to research data services as an emerging field in LIS, sources are largely compiled from the mid-1990s to the present, with a focus on sources published after 2012.

Cross-Disciplinary Activities in LIS
RDM is a new service provided by health sciences libraries, but its roots can be seen as far back as the 1980s in cross-disciplinary scientific research [7][8]. "Before information professionals can begin to improve existing services or develop new approaches that account for the complex needs of contemporary researchers," Palmer writes, "[information professionals] need to understand the activities and patterns involved in the cross-disciplinary research process" [9]. Palmer sees academic librarians as active participants in cross-disciplinary research, which was becoming increasingly common throughout the 1990s. For Palmer, librarians step outside traditional domains of knowledge to support researchers doing "boundary work," or work that interacts with multiple fields, such as sharing data between labs. This includes improving access to scholarly documents, which has a direct link to making research data more accessible. Palmer suggests, "[w]e will need to develop new standards and criteria for the presentation of raw data and results and create platforms for discussion around materials." Storing and providing access to research data becomes increasingly significant as research data proliferates, especially in digital formats.

The Data Deluge
Increasing amounts of research data is shared through cross-disciplinary studies in academic institutions. However, it is important to define what research data is. In the late twentieth and into the early twenty-first century, researchers saw an exponential increase in scientific research data, with digital data becoming increasingly common [4,9]. Hey and Trefethen call this the "data deluge" and predicted the increase in repositories to store data, not unlike current domain or institutional repositories [4].
Much of the increased digital data stems from scientific fields and is given the name "e-science." Escience joins experimental, theoretical, and computational approaches in scientific research [10]. Jim Gray, a respected computer scientist, as quoted in Hey et al., says "[t]he goal is to have a world in which all of the science literature is online, all of the science data is online, and they interoperate with each other. Lots of new tools are needed to make this happen" [10]. The world in which Gray writes is overwhelming and intimidating. Research data spans a diverse range of types and formats. Surkis and Read [11] use the example of studying before and after MRI images of patients in a clinical trial. The data produced in this study includes folders filled with images, spreadsheets of drug dosages, data examining tumour size, and processed data used to create figures for publication [11]. With the amount and variety of research data, it is clear that services are required to manage e-science data.

Services to Manage Research Data
The deluge of e-science data necessitates management. Read, Surkis, Larson, et al., interviewed basic scientists and clinical researchers and found major challenges in managing data including lack of standards, a diverse range of types of data, and low quality of data associated with inconsistent data collection methods [12]. These challenges opened the door for libraries to offer data services. A major white paper published by Tenopir, Birch, and Allard gives RDM services scope [13]. The authors claimed that in 2012, "[o]nly a small minority of academic libraries in the United States and Canada currently offer research data services (RDS), but a quarter to a third of all academic libraries are planning to offer some services within the next two years." Yakel divides data management into five areas: "lifecycle management of materials; active long-term involvement by data creators and managers; appraisal and selection of materials; provision of access; and preservation" [14]. Lee and Stvilia list many lifecycle models of research data but use the Digital Curation Centre's (DCC) Curation Lifecycle Model in their study of the roles of institutional repository staff [15]. The DCC model lists the sequential activities in curating and preserving data: conceptualize, create or receive, appraise and select, ingest, preserve, store, and access, use and reuse [16].
Looking at the data management lifecycle, Walters and Skinner's report, New Roles for New Times: Digital Curation for Preservation, outlines academic librarian's role as "collaborative network creators and participants," which sees American academic librarians building digital frameworks to make scholarly data accessible [17]. Harkening back to Palmer (1996), Pryor and Donnelly describe data practitioners as "hybrid information specialists with boundary-spanning roles" [18]. Lee and Stvilia [15] list specific activities for data curators and metadata specialists: building "data governance structure," helping "data providers to create appropriate metadata for their dataset," maintaining software, and metadata creation. Lyon sees information professionals offering RDM planning, training, citing, licensing, and storage [19].
There are several data management service models proposed throughout the literature to deliver RDM services. Pinfield, Cox, and Smith see a libraryoriented model, which sees library staff managing research data directly [20]. Whyte provides further clarity to the library-oriented model [21]. He sees libraries providing 3 levels of data management services: minimal, mediated, and consultancy. Others see librarians collaborating with other departments, echoing Palmer [9]. Wittenberg and Elings' case study shows a successful partnership of the University of California, Berkeley's Library with the Research Information Technologies department to manage research data [22]. Wang and Fong's study at Rutgers University-Newark sees health sciences librarians playing a central role by being embedded directly into the research process, including processing data in the lab and developing data management plans [23]. Wherever health sciences librarians physically find themselves, LIS researchers see them being involved in the management of research data.

National Policies on Research Data Management
Antell, Foote, Turner, and Shults explore RDM services in light of requirements by the United States' National Science Foundation (NSF) to have a data management plan before applying for grants [24]. Their work has relevance to Canada since Canadian federal funding bodies have similarities to NSF's requirements, such as the Tri-Agency Open Access Policy on Publications. In this policy, researchers who receive federal funding are mandated to make any peer-reviewed publication stemming from this funding open access within 12 months [25]. Further, Canadian Institutes of Health Research grant recipients are required to deposit "bioinformatics, atomic, and molecular coordinate data" into a publicly accessible database and retain datasets for a minimum of 5 years [25].
A research data management draft policy is currently being circulated for consultation by the Tri-Agencies for managing research data, which complements the existing Tri-Agency Statement of Principles on Digital Data Management [26]. In the draft policy, researchers are required to provide a complete data management plan (DMP) and deposit all research data into a recognized digital repository [27]. In addition, institutions that administer Tri-Agency funds will be required to have a local research data management policy [27]. It is the hope that with this policy, RDM will be seen as an integral step in the research process. Health sciences information professionals are well-positioned to provide services in this emerging area, including providing assistance with creating a DMP and providing support with depositing and preserving data in repositories.
Though Canadian perspectives are difficult to find, Steeleworthy speaks to the Tri-Agency requirements for open access [28]. However, because the Tri-Agencies' policies are relatively new, the policies are in flux and Steeleworthy's article shows its age. Steeleworthy advocates for partnerships with stakeholders and to unravel shifting open access requirements by federal funders. He recognizes managing research data is multi-faceted and includes scholarly communication, information technology, and liaison services in the provision of management services. Guindon presents an informal survey of research data practices at Concordia University, but it is not comprehensive as it only surveys one institution in Canada, nor is it peer-reviewed [29].

Infrastructure of Research Data Management
Health sciences information professionals currently or are able to provide a wide variety of services to manage research data, but adequate infrastructure is needed to properly preserve data. Long-term storage of scientific data is a consideration for RDM as this is how data will be preserved and made accessible. Digital repositories are one way to deposit, store, and make research data accessible. These online platforms host and provide access to research documents and data. Among the 3 types of digital repositories (domain, discipline, and institutional), academic institutions commonly offer institutional repositories which store theses, dissertations, and data produced at the university [30]. All U15 research institutions in Canada have institutional repositories, such as the University of Alberta's Education & Research Archive, the University of Toronto's Tspace, and the University of Manitoba's Mspace [31].
However, generally institutional repositories are not capable of ingesting large amounts of research data. Rather, Canadian institutions such as the University of Manitoba and Dalhousie University have implemented Dataverse, software for managing research data which also includes a repository.
Pinfield, Cox, and Smith highlight the need for secure storage of research data, albeit in a UK context [20]. In semi-structured interviews with librarians, they found data storage was often prioritized in their institutions and one of the major reasons research data services were being offered in the first place. The researchers note necessary collaboration with IT departments may limit libraries' access to storage, important to note for health sciences libraries without internal storage solutions.

Health Sciences Librarians' RDM Skill Sets
With new and evolving roles in data management, it should not be a surprise that health sciences librarians are currently under skilled to offer fully developed data management services. For example, Read, et al. found a perception from basic scientists and clinical researchers that librarians "do not understand research data and have no role to play in data management" [12]. This suggests a gap in skills, a lack of advocacy and marketing to show librarians' value, and a failure to translate traditional librarian skill sets to this new domain. While not a prerequisite to managing research data, Lyon claims few librarians have direct experience working in scientific environments, such as a laboratory, and do not feel comfortable providing data curation for research studies [32]. Lyon terms this the "curation domain disconnect" [33].
Cox, Kennan, Lyon, and Pinfield point directly to data management skills that are lacking among librarians [34]. The researchers developed a maturity model to benchmark current RDM services in academic libraries. The authors use studies led by Carol Tenopir, et al. to highlight areas of significant management and operation concerns [35][36]. This includes the capability of library staff to provide RDM services and technical gaps including the curation of active research data. Cox, et al. provide an international scope, which includes information professionals from Australia, Canada, Germany, Ireland, the Netherlands, and the UK. Librarians may be adequately prepared for advising or consulting but lack technical skills such as data cataloguing and curating, which may be necessary for health sciences librarians to know.
Research from Auckland [37] and Cox and Pinfield [38] highlight several areas that librarians should focus on to reduce RDM skill gaps, including preserving research data, data curation, advising on funder mandates. These are areas which saw service need and growth since Auckland's report was written and where librarians were currently deficient [38]. The researchers also highlight data curation skills that are lacking among librarians and are needed now and in the future.
Lyon identifies many skills required for information professionals to manage data [19]. She identifies "strong informatic skills," along with "working knowledge of the research practices and workflows...an awareness of the national and international data centres where research data in that domain are deposited, and a good grasp of the data publication requirements of the leading scholarly journals." Delserone documents the need for adequate infrastructure and curation by information professionals [39]. Delserone includes quotations from researchers at the University of Minnesota: "[d]ata storage is fundamental to all of us" and "[t]he Libraries could facilitate the curation and preservation of data by scholars, and teach researchers how to better organize it." Lee and Stvilia list additional skills, including "metadata knowledge particular for research data" and "technical details of repository software, server, and its architecture" [15]. Nicholson and Bennett claim "sound and consistent methodologies" are needed to ensure data is available to access [40].
Auckland, speaking on her research from UK libraries, recognizes the need for preserving research outputs via repositories, data analysis, and knowledge of data manipulation tools [37]. Heidorn suggests grant proposal assistance will be necessary for library staff to provide, especially in light of library staff who are familiar with "digital object access and preservation" [41].
Recent research from Federer shows data librarians need an immense skill set [42]. Somewhat surprisingly, while Federer suggests there is little consensus on what specific skills are needed to manage data, her research shows soft skills such as oral communication and teamwork are rated as very important by data librarians.
Though researchers throughout the literature disagree on precisely what skills are needed to provide RDM services as information professionals due to differing levels of service maturity, and differing contexts, Table 1 summarizes the author's findings of what skills are needed and may be missing by information professionals to provide a wide range of RDM services.

RDM Training Initiatives
Pryor and Donnelly want to establish a clearly defined career path for research data management practitioners [18]. The authors seek to entrench competencies for librarians during academic training, though they recognize restructuring LIS curricula is difficult. Lyon is another proponent of ensuring data management skills are taught in library and information science graduate programs [19,32]. Wang and Fong recognize the lack of RDM training when they write, "research data services are new and few library school programs offer formal training in this area" [23]. Lyon suggests three initiatives where LIS graduate programs could bolster RDM skills: define core components of data management, encourage potential students with science-related backgrounds into LIS graduate programs, and set up an international data informatics working group [19]. Lyon and Brenner see the potential for graduate programs to offer what they call the Capability Ramp Model [43]. This model leverages three areas that library and information science graduate programs excel at: education, research intelligence, and professional practice.
Heidorn [41] echoes Lyon's [19] suggestion for training in graduate program. He notes the LIS programs at University of Illinois, University of North Carolina, and the University of Arizona offer courses that train students in skills necessary for RDM. In the years following Heidorn's research, many more programs offer data management and data services courses, documented by Research Data Management Librarian Academy [43].
Federer also suggests training information professionals in data management while potential data librarians are in graduate school [42]. Her study focuses on identifying key competencies and skills data librarians require. While Federer is hesitant to identify specific training in graduate programs due to the ever-changing needs of library patrons, her point of bolstered data services in LIS programs is a strong one and echoed by other researchers [2,18,23,32,41,43].
Wang and Fong want data librarians to keep up to date with resources, including scholarly literature and online tools [23]. Brown, Wolski, and Richardson's case study of an academic librarian at Griffith University (Australia) successfully transitioning to a research data support role highlights key training opportunities such as mentorship, background reading, and participation in a massive open online course on metadata [45]. Brown, et al. propose the development of a support network that consists of trained specialists with specific domain knowledge, which information professionals could call upon when needed [45]. This idea has similarities to Lyon's [2] proposed international data informatics working group.
Cox and Pinfield, in their research surveying academic librarians, found the majority of librarians surveyed thought they had adequate RDM skills, with some caveats [38]. Their research found librarians claim development of staff combined with recruitment would fill skill gaps slowly over time. This hints at the bolstered graduate programs suggested by Lyon and Brenner above [43].
Conrad, Shorish, Whitmire, and Hswe [3] support professional development workshops held by the Association of College and Research Libraries (ACRL) [46]. These workshops, called "roadshows," travel to institutions, organizations, and conferences to teach skills in a specific area. One roadshow focuses on data management: "Building Your Research Data Management Toolkit: Integrating RDM into Your Liaison Work," which has its roots in a 2015 ACRL preconference workshop [3]. Read describes success in hosting library workshops for clinical researchers at New York University's Health Sciences Library [47]. These workshops were primarily attended by clinical professionals such as research coordinators, research managers, and faculty, but could be applied to library staff as well.
Another option are online courses. The National Library of Medicine and the National Network of Libraries of Medicine Training Office offer a 7module course on data management, entitled "Biomedical and Health Research Data Management for Librarians" [48]. This course can be taken online and is an option for Canadian health sciences librarians to upgrade their RDM skillset. Another online course is the Research Data Management Librarian Academy (RDMLA), set to launch in Fall 2019 [44]. RDMLA is a collaboration between Elsevier and several American post-secondary institutions. Canadian health sciences associations would do well to offer similar courses if the programs above do not satisfy current Canadian health information professionals. Another option is midcareer fellowships to upgrade existing skills for already experienced information professionals.
More research is required to determine which RDM skills should be prioritized, as well as which training initiatives are most successful at Canadian health sciences libraries. Training initiatives will be beneficial as health sciences information professionals are increasingly providing some level of data services

Limitations
The author recognizes this paper is not a full systematic review. Articles were selected from a predefined set of criteria for a concise review of relevant literature. As well, the focus of this paper is on academic health sciences libraries rather than other health contexts such as hospital libraries, of which little literature currently exists. The author also recognizes he excluded non-English sources in the identified scholarly articles. As such the results of this paper are intended to be an introduction for Canadian health sciences librarians to ensure research data management initiatives are implemented. More robust research will be required for libraries to implement evidence-based practices with regards to how to train information professionals to provide RDM services.

Conclusion
RDM is poised to be an essential library service across health sciences libraries. Canadian postsecondary institutions are beginning to integrate these services, all the more relevant due to the forthcoming Tri-Agency Research Data Management Policy. The author foresees a trickle-down effect of research data services in health sciences and specialized libraries, regardless of affiliation with a post-secondary institution. In light of these developments, information professionals of all types of libraries need to ensure staff have key competencies, especially if they find themselves in a role where they directly manage data.

Future Research
While the literature is populated with a number of studies from an American, UK, and Australian perspective, Canadian health sciences libraries would benefit from further research. Research exists that highlights specific skill sets, captured in Table 1 above, but work could be completed to confirm whether these skills are relevant for Canadian information professionals. Furthermore, more in-depth research should emerge verifying which initiatives have the greatest success as RDM training is developed across Canada. A future research study can explore current RDM-focused graduate-level courses offered at Canadian post-secondary institutions, complementing the work completed by RDMLA researchers. Using this information, Canadian health sciences libraries and library associations could implement professional development opportunities to bridge skills gaps to provide high-level, mature RDM services.