R profiling

Profiling in R

R has a built in performance and memory profiling facility: Rprof. Type  ?Rprof into your console to learn more.

The way the profiler works is as follows:

  1. you start the profiler by calling Rprof, providing a filename where the profiling data should be stored
  2. you call the R functions that you want to analyse
  3. you call Rprof(NULL) to stop the profiler
  4. you analyse the file created by Rprof, typically using  summaryRprof

For example:

The output looks like this:

A lot of information!

As a side note, the sample.interval of 0.02 indicates the frequency with which Rprofile analysis the call stack and takes measures. So, Rprofile works entirely through polling. As a result, summaryRprof may look different each time you profile code. Not only slight differences in the numbers, but also e.g. elements missing, for instance because in one run the measurement was done by chance while, say, mode was executing, whereas in another run mode slipped between two measurements.

Stylized profiling example

Let’s look at a stylized example. Assume you have the following functions:

Ugly and pointless, true, but for the sake of this example they serve their purpose: it will take time to execute them.

So, again, let’s profile  Do_1 :

Which looks like this:

Nice. We see that combn uses about three quarter of computing time, while paste uses only one quarter.

But hang on: matrix? Where does that come from? Must be either combn or paste calling that internally. No big deal here, as matrix only uses 2.3% of total time. But still, would be interesting to understand this, right?

Analysis of profiling data with prof.tree

Luckily, the prof.tree package by Artem Kelvtsov, which is available from CRAN or from github, provides an alternative to analyzing that data. It displays profiling information as a tree:

This will print like so:

Surprise! matrix was called from combn, and not from paste!

Note that pr is a data.tree structure, so all data.tree operations are available. For example, we can sum up specific functions by name:

And, just as above, this gives us 73.56%.

Also, we can limit the nodes that are printed, by pruning out all subtrees whose percent is smaller than, say, 5%:

Et voilà:

Or we can use the data.tree plot facility to visualize this:

This will plot like so:

plot profiling

If you like what you see, make sure you show some love by starring Artem’s package on github.

Leave a Reply