HOW TO USE THE NCBI'S NEW API KEYS
The NCBI is one of the most important sources of biological data. The centre provides access to information on 28 million scholarly articles through PubMed and 250 million DNA sequences through GenBank. More importantly, records in the [50 public databases] (https://www.ncbi.nlm.nih.gov/guide/all/#databases) maintained by the NCBI are strongly cross-referenced. As a result, it is possible to pinpoint searches using almost 2 million taxonomic names or a controlled vocabulary with 270,000 terms.
Rentrez has been designed to make it easy to search for and download NCBI records and download them from within an R session.
I though it might be fun to use this post to find out where papers describing R packages are published these days
Here we use the entrez_search
and entrez_summary
functions to get some information on all of the papers published in 2017 with the term ‘R package’ in their title:
if (!require("rentrez")) install.packages("rentrez")
library(rentrez)
pkg_search <- entrez_search(db="pubmed",
term="(R Package[TITLE]) AND (2018[PDAT])",
use_history=TRUE)
pkg_summs <- entrez_summary(db="pubmed", web_history=pkg_search$web_history)
pkg_summs
## List of 31 esummary records. First record:
##
## $`29554216`
## esummary result with 42 items:
## [1] uid pubdate epubdate
## [4] source authors lastauthor
## [7] title sorttitle volume
## [10] issue pages lang
## [13] nlmuniqueid issn essn
## [16] pubtype recordstatus pubstatus
## [19] articleids history references
## [22] attributes pmcrefcount fulljournalname
## [25] elocationid doctype srccontriblist
## [28] booktitle medium edition
## [31] publisherlocation publishername srcdate
## [34] reportnumber availablefromurl locationlabel
## [37] doccontriblist docdate bookname
## [40] chapter sortpubdate sortfirstauthor
we are interested in the journals in which these papers appear. We can use the helper function extract_from_esummary
to isolate the source of each paper, then use table
to count up the frequency of each journal.
library(ggplot2)
library(ggpomological)
#scales::show_col(ggpomological:::pomological_palette)
journals <- extract_from_esummary(pkg_summs, "source")
journal_freq <- as.data.frame(table(journals, dnn="journal"), responseName="n.papers")
pkg_journal <- ggplot(journal_freq, aes(reorder(journal, n.papers), n.papers)) +
geom_point(size=2) +
coord_flip() +
scale_y_continuous("Number of papers") +
scale_x_discrete("Journal") +
theme_bw() +
ggtitle("Venues for papers describing R Packages in 2018")
pkg_journal + ggpomological::theme_pomological()
So, it looks like Bioinformatics, Plos One and Comput Methods Progams Biomed Resources are popular destinations for papers describing R packages, but these appear in journals all the way across the biological sciences.
The NCBI now gives users the opportunity to register for an access key that will allow them to make up to 10 requests per second (non-registered users are limited to 3 requests per second per IP address).For one-off cases, this is as simple as adding the api_key argument to a given function call.