# Add the dataset of DOAJ Seal journals

doaj_seal <- read_csv("data/DOAJ_Seal.csv")

1 Administrative Data

1.1 ID

Not Applicable

1.2 Funder

This research is being submitted for funding to the J. Bohannon Foundation.

1.3 Grant Reference Number

Not Applicable (Proposal in preparation)

1.4 Project Name

Do Reputable Open Access Journals Require Open Data Sharing?

1.5 Project Description

This study analyzes the submission requirements of the most reputable open access journals to determine the prevalence and characteristics of data sharing policies. This question is an important one for 21^st century authors and readers because open data sharing is seen as a key component of open and more trusted scientific record.

According to the Research Data Alliance (RDA) Data Policy Standardisation and Implementation group:

“the prevalence of research data policies from institutions and research funders is increasing, so publishers and editors are paying more attention to standardisation and the wider adoption of data sharing policies.”

This study investigates whether the most reputable Open Access journals have data sharing polices and the characteristics of those policies. These policies require authors, in some fashion, to openly disseminate the data and software underlying their published articles.

Our investigation builds on the recent work of Castro et al. (2017) who assessed the prevalence and characteristics of data sharing policies from randomly-selected, English-language, open access journals. Their findings reveal that only a small minority of these journals have data sharing policies. These findings – which are consistent with those of other studies (see for example, Vasilevsky et al. 2017) – may be skewed because of the authors’ rules of inclusion and exclusion.¹

In this study, we will include only the most reputable open access journals in our assessment of journal sharing policies, regardless of language. We will analyze all journals that have attained the Seal of Approval from the Directory of Open Access Journals, DOAJ (shown below). We will apply the same coding framework devised by Castro et al. (2017) to the DOAJ Seal journals. We contend that a more rigorously screened population of open access journals, regardless of language, will yield a more accurate and reproducible set of findings than those published from Castro et al. (2017).

DOAJ Seal of Approval

DOAJ Seal journals are considered the most reputable because they:

“achieve a high level of openness, adhere to Best Practice and high publishing standards.The Seal is awarded to a journal that fulfills a set of criteria related to accessibility, openness, discoverability, reuse and author rights. It acts as a signal to readers and authors that the journal has generous use and reuse terms, author rights and adheres to the highest level of ‘openness’.”²

Moreover, the DOAJ Seal journals do include over 200 non-English language journals that merit analysis in this study. Excluding these from the analysis represents cultural bias that undermines reliable research. The following plot of DOAJ Seal Journals by Country indicates the problem.

Finally, our research group has determined that following is true when it comes to reputable open access journals.

According to Xie et al (2018):

Inline LaTeX equations can be written in a pair of dollar signs using the LaTeX syntax, e.g., \(f(k) = {n \choose k} p^{k} (1-p)^{n-k}\)

Math expressions of the display style can be written in a pair of double dollar signs, e.g., \[f(k) = {n \choose k} p^{k} (1-p)^{n-k}\]

1.6 Researcher

Gail Clement, Principal Investigator, California Institute of Technology

1.7 Researcher ID

ORCID: 0000-0001-5494-4806

1.8 Date of First Version

Sunday, June 10, 2018

1.9 Date of Last Update

Thursday, August 09, 2018

2 Data Collection

2.1 Existing Data Being Reused

This study relies on the DOAJ Journal Metadata available as a csv file download from the DOAJ website. The csv file is updated every 30 minutes.

This data will be read into the open source tool OpenRefine to filter it for those journals awarded the DOAJ Seal, and to remove unneeded columns containing the web addresses for journal policies around plagiarism, submission fees, and other urls not related to this study.

The filtered version of the data set will be exported as a new file named doaj_seal.csv for importing into analytical software.

A sample of the doaj_seal.csv data set is shown below. The complete data set is available in searchable and broweseable format as Annex 10.1 at the end of this document.

knitr::kable(head(doaj_seal, 4), caption = 'A Table of the first 4 rows of the DOAJ Seal data.')

A Table of the first 4 rows of the DOAJ Seal data.
JnlTitle	Publisher	PubCountry	Fee	WaiverPolicy	Identifiers	FirstYear	Language	ReviewProcess	Plagiarism	Sub2Pub	JnlLicense	AuthorCopyright	DOAJ_Seal
Archives Animal Breeding	Copernicus Publications	Germany	No	Yes	DOI	1999	English	Peer review	Yes	13	CC BY	TRUE	Yes
Bothalia: African Biodiversity & Conservation	AOSIS	South Africa	No	NA	DOI	2014	English	Double blind peer review	Yes	12	CC BY	TRUE	Yes
Geographica Helvetica	Copernicus Publications	Germany	No	Yes	DOI	1946	English, French, German, Italian	Double blind peer review	Yes	53	CC BY	TRUE	Yes
Hereditas	BioMed Central	United Kingdom	Yes	Yes	DOI	2005	English	Blind peer review	Yes	6	CC BY	TRUE	Yes

2.2 Data being collected

The doaj_seal.csv data set currently includes 1281 reputable open access journals that we will investigate in this study. The doaj_seal.csv data set will be copied and enhanced with additional columns, resulting in the processed data set, doaj_seal_enhanced.csv. The following columns will be added to doaj_seal.csv, in conformance with the Coding Framework of Castro et al. (2017).

‘Data Policy’ (Boolean)
- Yes | No
‘Data Sharing Policy’ (Factor)
- No mention
- Implied
- Mentioned
- Explicitly encouraged
- Required, but not explicitly tied to editorial decisions
- Required as a condition of publication
‘Data Citation Policy’ (Factor)
- No mention
- Implied
- Explicitly encouraged

Investigators will examine the websites of each journal listed in the doaj_seal_enhanced.csv file to determine whether the data sharing policy is included in the Instructions to Authors. The Coding Framework published by Castro et al. (2017) will be applied.

2.2.1 Data file formats and standards

All data retrieved for the DOAJ Seal of Approval are downloaded and stored in the open common separated value (csv) file format.

All data policies culled from the web sites of the DOAJ seal journals will be saved as .txt files. The data generated by applying Castro et al’s (2017) Coding Framework will be stored in csv file format.

Analysis, visualization, and summarization of the study’s findings will be performed in the open source software R and RStudio using the tidyverse package (Wickham 2017). Reports produced from the study will be also be created in RStudio using the open source text format Rmarkdown and output to HTML documents, slides, and MS Word documents for submission to funders or publishers (Xie, Allaire, and Grolemund 2018).

All files associated with the project will be maintained under the Git version control system and made openly available for download from the Principal Investigator’s GitHub repository.

2.2.2 Expected outputs of the project

Output #	Digital Output	Type	Format,Duration,Size	Planned access
1	doaj_seal.csv	raw data set downloaded from DOAJ site	CSV file, plain text format, 2.7 MB	Not retained since duplicative
2	doaj_seal_enhanced.csv	enhanced data set with new data	CSV file, plain text format, 2.7 MB	GitHub repository; Zenodo repository
3	Data set documentation	json metadata file	plain text file, .1 MB	Datacite registry database; GitHub public repository; Zenodo repository
4	Data Processing steps	R scripts and comments	R Notebook file, 1 MB	GitHub public repository
5	Data Visualizations	R scripts and documentation; Plots	R Notebook file, .png image files 4 MB	GitHub public repository
6	Journal article	Rendered report	RMarkdown, 9 MB	Publisher website, community preprint server

3 Documentation and Metadata

The journal metadata contained in the project’s data sets comes directly from the Directory of Open Access Journals.

The final outputs from the project will be documented in metadata files according to the DataCite DOI registration agency – see the [DataCite Metadata Schema 4.1] (https://schema.datacite.org/) for specific details. By following this standard metadata format, other researchers (and computers) will be able to find, access, and reuse the outputs from this project by searching the DataCite metadatabase.

4 Ethics and Legal Compliance

No additional ethical or privacy issues arise in this study because both the DOAJ data, and the information about data policies for any published journal, are publicly posted online.
The data provided about journals awarded the Directory of Open Access Journals Seal of Approval is distributed under a CC BY-SA license. This license requires that reusers of the data share their derivative data set under the same license. Therefore, the output of this research will be disseminated under the CC BY-SA license. This license adheres to the Principles and Guidelines of the Research Data Alliance Legal Interoperability Group, which recommends the use of Creative Commons Attribution licenses to allow the broadest sharing of data while guaranteeing attribution to the data provider.³

5 Storage and Backup

During the active phase of the project data will be stored on and backed up to the Research Data Storage Facility (RDSF) at California Institute of Technology. This facility represents 2 million pounds of digital resilient storage, with ongoing capital investment. The RDSF is overseen by a steering group of senior research and support staff, which includes the PVC Research. Backup procedures are robust (overnight backup, copies held remotely on tape) and secured access is in place

6 Selection and long-term preservation

To be completed by participants

8 Responsibilities and Resources

The Principal Investigator is responsible for implementing the Data Management Plan and ensuring it is reviewed and revised as necessary. (S)he will be responsible for all data collection and recording; for data analysis and visualization; and for maintain all files under version control using git and GitHub.

The Data Management Specialist assigned to the project as an in-kind contribution from the California Institute of Technology Library will be responsible for creating the DataCite metadata documentation for all outputs and ensuring timely DOI registration of each final output. (s)he will also deposit all final outputs to the Zenodo repository and update metadata associated with the DOI as necessary.

Annexes

8.1 Complete dataset `doaj_seal.csv`

doaj_seal %>%
  datatable(rownames = FALSE, 
            colnames = c("Title", "Publisher", "Country", "Fees", "Waivers", "Identifiers", "Start Year", "Language(s)", "Review  Process", "Plagiarism check", "Time to Press", "License", "Author owns Copyright", "DOAJ Seal"),
            class = "cell-border stripe", 
            caption = "Journals with DOAJ Seal",
            filter = list(position = "bottom"),
            extensions = 'Buttons', options = list(dom = 'Bfrtip',
            buttons = c('colvis', 'csv', 'pdf'))
  )

8.2 Principal Investigator’s BioSketch

This is auto-populated from your ORCID profile using the (???) rorcid package.

Gail Clement is a Library Administrator and science research librarian at the Caltech Library. She oversees a team of subject librarians, repository, metadata and licensing specialists who develop knowledge management resources, publishing services, and authorship programs for the Caltech community. In this role she also represents Caltech Library on the Organizational Advisory Assembly for the Research Data Alliance, the Committee on Publication Ethics, the Overleaf Steering Committee, and the ORCID Curriculum working group. She is a certified Data Carpentry Instructor, specializing in research data cleaning and enhancement; data sharing; and citation and publication of all research outputs.

As Coordinator of Author Carpentry - a researcher training program in 21st century authorship and publishing - Gail collaborates with developers and researchers to create practical, hands-on, and useful lessons in responsible, reproducible, and reuseable open publication. Author Carpentry builds on a pilot lesson from Software Carpentry on Scientific Authorship. Author Carpentry workshops are offered at Caltech and through professional groups such as the CODATA-RDA Summer School in Research Data Science, the Force11 Scholarly Communication Institute, and the Association for Artificial Intelligence 2017 Tutorial Forum.

Gail’s professional leadership and service includes mentoring researchers from developing nations via Author Aid; Co-Chair of the CODATA-Research Data Alliance Group on Legal Interoperability of Research Data; and service on the Editorial Board for the Journal of Librarianship and Scholarly Communication (JLSC) where she co-edited the 2015 special issue on data sharing, data publication, and data citation. She serves on the License Review Committee and the Scholarly Communication Taskforce of the Statewide California Electronic Library Consortium; on the Library Advisory Committee for Ubiquity Press; and as a Collection Editor for “Research Paper of the Future” on ScienceOpen.

References

Castro, Eleni, Merce‘ Crosas, Alex Garnett, Kasey Sheridan, and Micah Altman. 2017. “Evaluating and Promoting Open Data Practices in Open Access Journals.” Journal of Scholarly Publishing 49 (1). University of Toronto Press Inc. (UTPress): 66–88. doi:10.3138/jsp.49.1.66.

Vasilevsky, Nicole A., Jessica Minnier, Melissa A. Haendel, and Robin E. Champieux. 2017. “Reproducible and Reusable Research: Are Journal Data Sharing Policies Meeting the Mark?” PeerJ 5 (April). PeerJ: e3208. doi:10.7717/peerj.3208.

Wickham, Hadley. 2017. Tidyverse: Easily Install and Load the ’Tidyverse’. https://CRAN.R-project.org/package=tidyverse.

Xie, Yihui, J.J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Chapman; Hall/CRC.

in particular, the choice to include open access journals merely because of their use of the Open Journal Systems (OJS) hosting platform; the choice to exclude non_English language journals↩
DOAJ selection for Seal Approval is explained in the FAQ at https://doaj.org/faq#seal ↩
https://www.rd-alliance.org/rda-codata-legal-interoperability-research-data-principles-and-implementation-guidelines-now ↩
Zenodo policies are available online at http://about.zenodo.org/policies/↩

Data Management Plan for the Study Do Reputable Open Access Journals Require Open Data Sharing?

Principal Investigator - Gail Clement, California Institute of Technology

Co-Investigator - Thomas Morell, Caltech

Thursday, August 09, 2018

1 Administrative Data

1.1 ID

1.2 Funder

1.3 Grant Reference Number

1.4 Project Name

1.5 Project Description

1.6 Researcher

1.7 Researcher ID

1.8 Date of First Version

1.9 Date of Last Update

2 Data Collection

2.1 Existing Data Being Reused

2.2 Data being collected

2.2.1 Data file formats and standards

2.2.2 Expected outputs of the project

3 Documentation and Metadata

4 Ethics and Legal Compliance

5 Storage and Backup

6 Selection and long-term preservation

8 Responsibilities and Resources

Annexes

8.1 Complete dataset `doaj_seal.csv`

8.2 Principal Investigator’s BioSketch

References

Data Management Plan for the Study Do Reputable Open Access Journals Require Open Data Sharing?

Principal Investigator - Gail Clement, California Institute of Technology

Co-Investigator - Thomas Morell, Caltech

Thursday, August 09, 2018

1 Administrative Data

1.1 ID

1.2 Funder

1.3 Grant Reference Number

1.4 Project Name

1.5 Project Description

1.6 Researcher

1.7 Researcher ID

1.8 Date of First Version

1.9 Date of Last Update

1.10 Related Policies

2 Data Collection

2.1 Existing Data Being Reused

2.2 Data being collected

2.2.1 Data file formats and standards

2.2.2 Expected outputs of the project

3 Documentation and Metadata

4 Ethics and Legal Compliance

5 Storage and Backup

6 Selection and long-term preservation

7 Data Sharing

8 Responsibilities and Resources

Annexes

8.1 Complete dataset doaj_seal.csv

8.2 Principal Investigator’s BioSketch

References

8.1 Complete dataset `doaj_seal.csv`