TreeMap World Population visualisation

This example is inspired by the examples of the treemap package.

You’ll learn how to

    • convert a data.frame to a data.tree structure
    • navigate a tree and locate specific nodes
    • use Aggregate and Cumulate
    • manipulate an existing tree, e.g. by using the Prune method
    • use data.tree in connection with the treemap package

This code builds on version 0.2.4 of the data.tree package, which you can get from CRAN or from github. For more posts on data.tree, see here. You will also find this example in the package’s applications vignette.

Original treemap Example (to be improved)

The original example, as available in the treemap package documentation, visualises the world population as a tree map.

 

There are many countries, so the chart gets clustered with many very small boxes. In this example, we will limit the number of countries shown, and sum the remaining population in a catch-all country called “Other”.

We use the data.tree package to do this aggregation.

Conversion from data.frame

First, let’s convert the population data into a data.tree structure:

We can easily navigate the tree to find the population of a specific country. Luckily, RStudio is quite helpful with its code completion (use CTRL + SPACE):

Or, we can look at a sub-tree:

 

 

Aggregate and Cumulate

We now want to aggregate the population. For non-leaves, this will recursively iterate through children, and cache the result in the population field. The main reason why we do this is not to calculate the population of the world, but to store the result via the cacheAttribute.

 

Next, we sort each node by population:

 

Finally, we cumulate among siblings, and store the running sum in an attribute called cumPop:

 

The tree now looks as follows. Note the new attributes cumPop, as well as the sort order:

 

Prune

The previous steps were done to define our threshold: big countries should be displayed, while small ones should be grouped together. This lets us define a pruning function that will allow a maximum of 7 countries per continent. Additionally, it will prune all countries making up less than 90% of a continent’s population:

 

We clone the tree. The reason is that data.tree uses reference semantics, and we want to store the original tree, because we might want to play around later with different parameters:


Finally, we need to sum countries that we pruned away into a new “Other” node:

Plotting the treemap

In order to plot the treemap, we need to convert the data.tree structure back to a data.frame:

 

And here we go: Our treemap now has at most 7 countries per continent, and groups all countries below the 90th percentile:

If you have enjoyed this example, I recommend you read the package’s vignettes, or have a look at the other data.tree posts in this blog.

One thought on “TreeMap World Population visualisation”

Leave a Reply