An Online Community of Data Enthusiasts Collaborates to Seek, Share, and Make Sense of Data
A Review of:
Stvilia, B., & Gibradze, L. (2022). Seeking and sharing datasets in an online community of data enthusiasts. Library & Information Science Research 44(3). https://doi.org/10.1016/j.lisr.2022.101160
Objective – To understand the major activities, tools, sources, and challenges of online communities focused on datasets.
Design – Content analysis informed by activity theory.
Setting – The r/Datasets subreddit, a web forum for sharing, seeking, and discussing datasets.
Subjects – 1232 “hot” or “top” discussion threads (1232 original posts and 6813 responding comments) first posted between 2010 and 2020.
Methods – The researchers used Reddit’s API to collect their sample of threads. Using a random subset of the sample, the researchers developed a coding scheme for content analysis, which identified major themes in the data. Through this process, they controlled for quality: each researcher coded half the subset independently, then together evaluated their intercoder reliability and discussed and resolved disagreements. The researchers also employed labelled latent Dirchlet allocation to construct topic models corresponding to the theme’s manual content analysis, which produced profiles of the top 100 terms most likely to appear in that topic. Finally, the researchers extracted URLs from threads in the sample to ascertain types of information and data sources used by the community. Presenting their findings, the researchers discussed notable themes and proposed a metadata model for describing datasets, the Data Q&A metadata (DQAM) model.
Main Results – The r/Datasets community engages in three distinct activities: asking and answering questions, disseminating information, and community building. The closely related Q&A and dissemination activities shared themes of obtaining and aggregating data, sensemaking, collaborating and crowdsourcing, and data evaluation. Community members frequently discussed tools, competencies, and sources for data work. Major challenges for members of the community related to the general themes of data quality, accessibility, ethics, and legality. A proposed 16-element metadata schema should meet the needs of data enthusiasts.
Conclusion – The content analysis reveals a dedicated community engaged in an array of data-seeking and data-sharing activities. Data producers should be mindful of how their data can be accessed and used outside of their original professional or scholarly contexts.
Canadian Institutes of Health Research (CIHR), Natural Sciences and Engineering Research Council of Canada (NSERC), & Social Sciences and Humanities Research Council of Canada (SSHRC). (2021, March 15). Tri-agency research data management policy. Government of Canada. https://science.gc.ca/site/science/en/interagency-research-funding/policies-and-guidelines/research-data-management/tri-agency-research-data-management-policy
Glynn, L. (2006). A critical appraisal tool for library and information research. Library Hi Tech, 24(3), 387-399. https://doi.org/10.1108/07378830610692154
Institute for Quantitative Social Science. (n.d.). About. The Dataverse Project. https://dataverse.org/about
Institute for Quantitative Social Science (2019, November 21). Dataverse documentation v.4.18.1: User guide: Appendix. The Dataverse Project. https://guides.dataverse.org/en/4.18.1/user/appendix.html
Stvilia, B., & Gibradze, L. (2022). Seeking and sharing datasets in an online community of data enthusiasts. Library & Information Science Research 44(3).https://doi.org/10.1016/j.lisr.2022.101160
How to Cite
Copyright (c) 2023 Jordan Patterson
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The Creative Commons-Attribution-Noncommercial-Share Alike License 4.0 International applies to all works published by Evidence Based Library and Information Practice. Authors will retain copyright of the work.