How to identify which packages (and which functions) are actually being used by an R file

31 March 2020

When inheriting someone else’s R code, I always like to check which packages and functions it uses. Not just those it loads with library function calls, but those it actually uses. To this end I recently discovered this useful little utility which will provide you a list of all the packages and all the functions within those packages referenced in a given R file.

The package you will need to install is NCmisc, and the function we are going to use within it is the aptly named list.functions.in.file.

Example use of NCmisc::list.functions.in.file()

It works by identifying all the function calls in the given file and comparing these to functions currently loaded in memory.

This means that before running list.functions.in.file you will need to have loaded into memory all the packages you believe are referenced, and only those packages. The easiest way to do this is to open a fresh instance of RStudio and just execute the code in the file. (In the example above, the “global.R” file is part of a Shiny app which I first ran, then stopped.)

The output is a named list. Each item is named for the name of the package, or packages (plural) if a particular function name is common to more than one package. Each list entry is a list of function names. Any custom functions defined in the file itself will appear under .GlobalEnv.

If your R script file loads a package, but that package is not listed in this output, it is redundant and you can remove it. Decluttering is good.

In our example there are five functions referenced in this file which hail from the plotly package. But there is another one — the layout function — which is defined in both plotly and graphics.

Finding potential function name clashes

So this can be a tool to spot potential namespace conflicts. I happen to know we don’t need to worry about this particular clash, but that will not always be the case. In this same example, there is a function called get in the config package which clashes with a completely different base function of the same name.

Even with the config package loaded, we should always call its get function with an explicit reference — using the double-colon notation — to avoid unintended consequences.

And that’s it. You are of course free to wrap this in your own code if you want it to parse through multiple files, look for specific packages / functions, output results to a text file etc.

Related articles :

Helping RNIB inform their strategic priorities with multi-criteria decision analysis

The Royal National Institute of Blind People (RNIB) is one of the UK’s leading sight loss charities and forms the […]

More

Analytical decision-making models can fail…so what can your organisation do about it?

Managers rely on analytical models to inform difficult decisions: where to build that next hospital; how to allocate resources during […]

More

Reducing risk in procurement bid scoring: Generate bespoke scores for your grade or rating lists

For nearly all procurement professionals, its tough to produce bespoke tender evaluation systems again and again…

More