Reference semantics in R 2


I recently got a mail from Václav on reference semantics in data.tree, reading as follows:

Dear Christoph,
I am rather inexperienced when it comes to environments in R and henceforth I apologize if my question is basic; however, my colleagues are no better than me to answer my question.
I would have a question iro the following behavior of your data.tree package. Is it correct that if I create a function which uses some data.tree structure as a parameter, the input value would get changed too?
In the following case I would assume that acme’s values should not get changed.
Thank you, Vaclav
The code he provided was similar to this:


My answer was as follows:

Well observed, that is indeed the behavior of data.tree. From the manual:

Node and Reference Semantics

The entry point to the package is Node. Each tree is composed of a number of Nodes, referencing each other. One of most important things to note about data.tree is that it exhibits reference semantics. In a nutshell, this means that you can modify your tree along the way, without having to reassign it to a variable after each modification. By and large, this is a rather exceptional behavior in R, where value-semantics is king most of the time.

Reference Semantics Explained

In a nutshell, reference semantics can be understood by the following analogy: If I give you a URL, I provide you with a reference to a web page. You, I and the owner of the web page can access that web page with that URL. And if the owner changes the content, then you will see these changes next time you connect to the URL.
Contrarily, if I print out the web page and give you that print out, then I provide you with a disconnected copy of the web page. You may modify that copy (e.g. by highlighting passages with a marker), but I will not see these changes, nor will you see changes made in the original page by the owner. This is value semantics.

Why data.tree uses reference semantics

The main reason why we chose to do it that way in data.tree is that we treat each Node as a unit. When modifying a Node, or when adding a field to a Node, we do not want to create a deep copy of the entire tree for performance reasons.
Another reason is that it greatly simplifies the API of the package. For example, we can do:

Reference Semantics in R

While very common with object oriented languages (e.g. C++, Java, C#), this paradigm is not very wide-spread in R. Though it’s gaining more and more acceptance. Check out, for instance, the := operator in data.table, or google R reference classes, or R6.
The downside is, of course, that it might seem confusing at first.
Hope that helps!

Leave a Reply

2 thoughts on “Reference semantics in R

  • Carl Witthoft

    I’m interested in the reason that CRAN allows packages that modify by reference. Their rules explicitly disallow functions with <<- or anything similar which modifies (designated) objects in the parent environment. data.tree objects allow functions (unknowingly) to violate this rule, albeit only to an object passed via the argument list. I guess it's a fine line as to what constitutes "modifying" objects from within a function.

    • gluc Post author

      < <- is not at all the same thing as reference semantics. <<- is about SCOPE, you can use it to assign (and overwrite) a variable in an enclosing environment. This is considered bad practice, especially from packages, because a package writer cannot know ex ante what variables are in the work space of the user, so a package might inadvertently overwrite one of the user's variables. To take the above analogy of the URL: Using <<- is as if I sneak into your room, burn your printout of the web page, and replace it with the printout of another web page. Not nice, but still value semantics.
      Reference semantics is different. The user specifically asks the function to change the object. It’s not entirely different from this:
      x < - c(1, 2)
      names(x) < - c("a", "b")
      Agreed, that's not passing by reference. But the point is: if you ask R nicely to modify your variable x, it will do so. Or, to use the analogy again: If you give me access to your web-page and ask me nicely to write a post for your blog, you will be able to read it next time you go to the URL, and so will everybody else knowing your URL.

      Reference objects have been a part of R for a very long time and are an integral part of R base. Environments have always exposed reference semantics, for instance. Or, type ?ReferenceClasses.