R in Wandora

From WandoraWiki
Jump to: navigation, search

Wandora can be used with the R language. R is an environment for statistical computing and graphing. Properties of the topic map and its topics can be accessed from R and statistics and graphs can be generated from them.

Contents

Setting up R

To use R in Wandora you need to install R, then install a few R libraries using the package manager inside R and finally possibly adjust some environment parameters in the Wandora startup script. These steps are explained in detail below.

Installing R

Download R from the R website at http://www.r-project.org/ and follow the installation instructions there. The default installation installs both 32-bit and 64-bit versions. If you decide to only install one, make sure it matches your Java runtime environment version.

On Linux environments R may also be available using the package repository of your Linux distribution. In Ubuntu the name of the package you need is r-base.

Installing required R libraries

At the very least you must install the rJava library. Do this using the package manager inside R. First run R using the administrator account. On Windows right click the icon and select "Run as Administrator". The installation may have created two icons on your desktop or the start menu, one for 32-bit R and one for 64-bit. You should use the one that matches your Java runtime environment. Note that you might have a 32-bit Java runtime environment even if your Windows is 64-bit.

On Linux run R with root, for example in console using "sudo R".

When installing a package you will be prompted to select a mirror for download. Just select your country or one close to it. To install the package issue the command in R:

 install.packages("rJava")

Most likely you will also want to install the igraph library. It is needed to plot network graphs.

 install.packages("igraph")

Currently Wandora has a problem with the default graphics device in Windows environment. To be able to plot anything in Windows you will need to install the JavaGD graphics device. This is only needed in Windows.

 install.packages("JavaGD")

Setting up enviroment variables

Next make sure that the environment variables are setup correctly in the Wandora startup script. In Windows open the bin/SetR.bat file and in Linux the bin/SetR.sh file. If you did a standard installation of R then the Linux start-up script likely needs no changes at all.

The Windows start-up script however has two things that may need adjusting. Make sure the first line points to you R installation directory. Especially the R version number may need to be changed. Also make sure that the processor architecture matches your Java installation. Note that you may have a 32-bit Java even if your system is 64-bit. If you aren't sure which Java version you have you can simply try both settings and see which one works. The architecture is specified on lines 5 and 6. For 32-bit use

 set R_ARCH=i386
 REM set R_ARCH=x64

And for 64-bit

 REM set R_ARCH=i386
 set R_ARCH=x64

Other parameters should be correct unless you have customized your R installation beyond a standard setup.

You should now be able to use R inside Wandora.

Plotting in Windows

The default graphics device doesn't work correctly in Windows. If you try to plot anything you will get an unresponsive graphics window. If at any time you accidentally open it you can close it cleanly in Wandora R Console with

 dev.off()

To work around this issue you need to use the JavaGD graphics device. You first need to load the JavaGD library with

 library("JavaGD")

Then initialize the graphics device with

 JavaGD()

This will open an empty graphics window. You can then use plot normally to plot in this window.

R console in Wandora

You can open the R console in Wandora by clicking the R console button in the top toolbar. Assuming that you have installed R and setup the environment correctly you will get the standard R greeting with R version and license information.

 R version 2.11.1 (2010-05-31)
 Copyright (C) 2010 The R Foundation for Statistical Computing
 ISBN 3-900051-07-0
 
 R is free software and comes with ABSOLUTELY NO WARRANTY.
 You are welcome to redistribute it under certain conditions.
 Type 'license()' or 'licence()' for distribution details.
 
   Natural language support but running in an English locale
 
 R is a collaborative project with many contributors.
 Type 'contributors()' for more information and
 'citation()' on how to cite R or R packages in publications.
 
 Type 'demo()' for some demos, 'help()' for on-line help, or
 'help.start()' for an HTML browser interface to help.
 Type 'q()' to quit R.
 
 >

Otherwise you'll get an error message and instructions about how to setup your R environment.

You can issue R commands in the text area at the bottom part of the window. This includes almost everything you can do in R, one notable exception is that the help system doesn't work properly so "?plot" and the like don't do anything. Also a few other functions have been disabled because they don't work very well when R is ran inside Java. These functions include q, quit, demo, contributors and citation.

You can browse the topic map in Wandora while having the R console open. This way you can select topics in the main Wandora window and then get references to those topics in the R environment (see next section).

Using R with topic maps in Wandora

There are a couple of ways to access the topic map in R. Some of these rely heavily on the Java topic map API used in Wandora. You will need to call the Java methods of the topic objects. To find out more about the API look at the javadocs of Wandora. Mostly you will only need to look at the Topic and TopicMap classes and a few classes related to them. The Java methods are accessed with the $ indexing operator. For example if the variable t contains a topic you can get the base name of that with

 t$getBaseName()

The R environment in Wandora is initialized by running the /build/resources/conf/rinit.r file. This file defines some functions that make accessing the topic map slightly easier. You can get a reference to the topic map object itself with getTopicMap function in R. Alternatively you can get a list of all the topics in the topic map with getAllTopics or the currently selected topics in Wandora with getContextTopics. For example, to get the base names of the currently selected topics in Wandora use

 lapply(getContextTopics(),function(t) t$getBaseName())

To plot something you first need to gather the data you want to plot. For example, you could plot a histogram that visualizes the amount of associations the topics have in the topic map. Note that you may have to copy this and other examples on this page one line at a time.

 ts<-getAllTopics() # get a list of all topics
 as<-lapply(ts,function(t) t$getAssociations()$size()) # as has the number of associations in each topic
 fac<-factor(unlist(as)) # make a factor from as
 plot(fac) # plot the factor

In Windows you should open the JavaGD graphics device before the last plot line. Do this with

 library("JavaGD") # loads the JavaGD, only need to do this once per R session
 JavaGD() # opens a JavaGD graphics window

There is also a function to setup a graph object that can be used with the igraph library. use the makeGraphJava function and pass it a list of topic objects. It returns a graph object that can be plot directly. See the igraph R documentation for more information about how to customize the plot. Especially the plot function may be passed parameters relating to the layout of the graph. Before using the makeGraphJava you have to load the igraph library. For example to plot a network of the currently selected topics in a circular layout use the following. And of course you must have selected some topics in Wandora for this to work.

 library("igraph") # loads the library, only need to do this once per R session
 plot(makeGraphJava(getContextTopics()),layout=layout.circle)

Again in Windows you need to remember to use the JavaGD library before plotting.

Instead of getting a list of selected topics you can get a list of selected associations with getContextAssociations. After this you can get the players with getPlayers which takes as parameters a list of associations and a role, this can either be a topic object or a string giving the base name of the role. So for example you could select some associations and then get the topics playing the role value with

 getPlayers(getContextAssociations(),"value")

You can convert topics to strings or numbers using as.character or as.numeric respectively. These use the topic base name to do the conversion. If you want to use a variant name or get the data from an occurrence you will have to use the topic map API to get the desired value. But if you have your numeric data in the base name and can get it listed in a table in Wandora and then selected then you can get a simple vector of numbers with something like

 sapply( getPlayers(getContextAssociations(),"value"), as.numeric )

Note that the as.numeric is fairly lenient when converting topic base names to numbers and knows how to skip non numeric characters in the base name.

Example

In this example we will use the SPARQL extractor to extract some data relating to the demographics of Helsinki. We will then make a map plot of the districts of Helsinki with colours indicating the total population in the district.

First we use the SPARQL extractor to extract the data. Go to File menu and select Extract/Other/SPARQL extractor.

Rexample1.png

We are going to use the Helsinki Region Infoshare SPARQL end point so select the HRI tab. Then clear the query text area and replace it with following. It will get the population of all areas of Helsinki that themselves don't have any sub areas.

SELECT ?area ?poly ?value
WHERE { 
  ?area rdf:type dimension:Alue;
     geo:polygon ?poly.
  ?item rdf:type scv:Item;
     rdf:value ?value;
     dimension:ikäryhmä ikäryhmä:Väestö_yhteensä;
     dimension:vuosi vuosi:_2009;
     dimension:yksikkö yksikkö:Henkilöä;
     dimension:alue ?area.
  OPTIONAL { ?narrower skos:broader ?area . }
  FILTER ( !BOUND(?narrower) )
}

Then click Extract.

Rexample2.png

After a few seconds you should get a message informing you that one result set was extracted.

Next find the result set topic on the left hand side (labelled 1 below) and double click it to open the topic. Then select all the rows in the result set by right clicking somewhere on the association table (labelled 2). In the context menu choose Select/Select all'. Then open the R console by clicking the button in the toolbar (labelled 3).

Rexample3.png

Then give the following commands to R. You can copy and paste all of it at once in the input box at the lower part of the window. If you aren't doing this in Windows then you can remove lines 5 and 6 (the JavaGD part). Press enter or the Evaluate button to evaluate the commands.

associations<-getContextAssociations()
polygons<-getPlayers(associations,"poly")
polygons<-lapply(polygons,function(p){extractPolygon(getDisplayName(p),reverse=TRUE)})
values<-sapply(getPlayers(associations,"value"),as.numeric)
library("JavaGD") # Remove these two lines if you
JavaGD()          # aren't doing this on Windows
plotPolygons(polygons,values)

Rexample4.png

You should get an R plot window containing the final plot.

Rexample5.png

We'll now go through the short R code line by line to see what it does.

1 associations<-getContextAssociations()
2 polygons<-getPlayers(associations,"poly")
3 polygons<-lapply(polygons,function(p){extractPolygon(getDisplayName(p),reverse=TRUE)})
4 values<-sapply(getPlayers(associations,"value"),as.numeric)
5 library("JavaGD") # Remove these two lines if you
6 JavaGD()          # aren't doing this on Windows
7 plotPolygons(polygons,values)

The first line gets the associations we selected in Wandora. The second line gets all the players of those associations that play the role "poly" (strictly speaking "poly" is the base name of the role topic). The third line extracts the polygon data from those topics. This is done by applying the anonymous function to all topics in the polygons list. The extractPolygon function extracts the polygon data from a string which we get from the display name of the topic with the getDisplayName function. The reverse parameter to extractPolygon swaps x and y axes. This is done because latitude (the y axis) is first in the polygon data. The fourth line gets the population for each area. This is done by getting the player topics of role "value" and converting them to numeric. Lines five and six setup the JavaGD display device. Finally on line seven the polygons and their associated values are plotted with plotPolygons.

Note that all of getContextAssociations, getPlayers, extractPolygon, getDisplayName and plotPolygons are defined in the rinit.r R script that is loaded when you first start the R console in Wandora. In addition to this, as.numeric is extended to handle topic objects there as well.

R language resources

This short introduction for R language integration of Wandora applications doesn't even try to teach you the R language. If you find the R language interesting and would like to know more, please refer available on-line manuals such as

Also, Contributed Documentation page has an excellent list of R language resources.

See also

Personal tools