One of the lines of research of Sustaining the Knowledge Commons (SKC) is a longitudinal study of the minority (about a third) of the fully open access journals that use this business model. The original idea was to gather data during an annual two-week census period. The volume of data and growth in this area makes this an impractical goal. For this reason, we are posting this preliminary dataset in case it might be helpful to others working in this area. Future data gathering and analysis will be conducted on an ongoing basis. Major sources of data for this dataset include: • the Directory of Open Access Journals (DOAJ) downloadable metadata; the base set is from May 2014, with some additional data from the 2015 dataset • data on publisher article processing charges and related information gathered from publisher websites by the SKC team in 2015, 2014 (Morris
on, Salhab, Calvé-Genest & Horava, 2015) and a 2013 pilot • DOAJ article content data screen scraped from DOAJ (caution; this data can be quite misleading due to limitations with article-level metadata) • Subject analysis based on DOAJ subject metadata in 2014 for selected journals • Data on APCs gathered in 2010 by Solomon and Björk (supplied by the authors). Note that Solomon and Björk use a different method of calculating APC so the numbers are not directly comparable. • Note that this full d
ataset includes some working columns which are meaningful only by means of explaining very specific calculations which are not necessarily evident in the dataset per se. Details below. Significant limitation: • This dataset does not include new journals added to DOAJ in 2015. A recent publisher size analysis indicates some significant changes. For example, DeGruyter, not listed in the 2014 survey, is now the third largest DOAJ publisher with over 200 titles. Elsevier is now the 7th largest DOAJ publisher. In both cases, gathering data from the publisher websites will be time-consuming as it is necessary to conduct individual title look-up. • Some OA APC data for newly added journals was gathered in May 2015 but has not yet been added to this dataset. One of the reasons for gathering this data is a comparison of the DOAJ "one price listed" approach with potentially richer data on the publisher's own website. For full details see the documentation.