## Getting and managing occurrence data from gbif.org

The R package dismo by Hijman et al. offers a very powerful function named **gbif** that allows to get occurence data from the gbif dabase.

For example if we are to map the geographical range of the Mediterranean fruit fly, *Ceratitis capitata* we simply do:

library(dismo)

distr <- gbif(genus="Ceratitis", species="capitata*", geo=TRUE,removeZeros = TRUE)

Note that **“capitata”** and **“capitata*”** won’t lead to similar results because the species name is often associated to author’s name in the database i.e. *Ceratitis capitata* or *Ceratitis capitata* (Wiedemann, 1824).

The results can be plotted:

plot(distr$lon,distr$lat, pch=3, cex=1, col="red", asp=1, xlab="longitude", ylab="latitude")

map("world", resolution = 0.5, add=T)

Alas! many occurrences correspond to the same geographical location i.e. there are many duplicated points which might be a source of trouble in subsequent data analyses and distribution modeling. This problem can be easily solved using the R package spatstat by Baddeley and Turner (2005) and its function **duplicated.ppp**.

We first have to create a spatial point pattern object from the occurrences points using the function **ppp** and a window created with the **ripras** function. Then **duplicated.ppp** will identify duplicated points and we will be able to remove them. The code is:

library(spatstat)

x<-distr$lon ; y<-distr$lat

w <- ripras(x,y)

wp<-ppp(x,y,window=w)

dupv<-duplicated.ppp(wp)

x2<-x[which(dupv==FALSE)] ; y2<-y[which(dupv==FALSE)] # coordinates of points with no duplicates

In the case of *Ceratitis capitata* gbif returned a total of 1279 occurrences that included not less that 982 duplicated points.

References

A. Baddeley and R. Turner (2005). Spatstat: an R package for analyzing spatial point patterns. Journal of Statistical Software 12 (6), 1-42. ISSN: 1548-7660. URL: http://www.jstatsoft.org

Robert J. Hijmans, Steven Phillips, John Leathwick and Jane Elith (2012). dismo: Species distribution modeling. R package version 0.7-23. http://CRAN.R-project.org/package=dismo

## Retrieve spatial coordinates from location’s name

It is a common task to retrieve the spatial coordinates (longitude and latitude) from the sole name of a place. This occurs for example when one wants to map spatial occurrences of a species that are reported in the literature in the form of a list of places.

This work is strongly facilitated by the R package titled gooJSON developped by Christopher Steven Marcum.

Let’s say that we want to retrieve the coordinates of a place reported as Fort Myers in Florida where J. C. Denmark reported the presence of *Homalodisca vitripennis* Germar, 1821 in 1957.

library(gooJSON)

a<-gooadd(address = list("Fort Myers","Florida"))

b<-goomap(a)

b$Placemark[[1]]$Point$coordinates

```
```

`[1] -81.87231 26.64063 0.00000`

Here we are, Google API tells us (through R) that Fort Myers in Florida is located at -81.87231 longitude and 26.64063 latitude.

Now imagine that we want to repeat that operation 500 times because we have a bunch of places where our study species has been recorded. The **goomap** function can be put into a loop and the Google API questioned 500 times, but here comes the twist: Google will not answer 500 times in a row and will send no data for some records. This problem can simply be solved by slowing your loop so that Google API won’t receive your calling to quickly and will answer to all queries. This is done by adding a call to the function **Sys.sleep** in the loop to suspend execution of R expressions for a given number of seconds. I used **Sys.sleep(0.2)** and it worked perfectly!

Reference

Christopher Steven Marcum (2012). gooJSON: Google JSON Data Interpreter for R. R package version 1.0.01.http://CRAN.R-project.org/package=gooJSON

## Plot color symbols upon s.class {ade4} outputs

I am used to perform mutlivariate analyses using the R package ade4 by D. Chessel et al. The package features some wonderful graphical functions like s.class or s.label. Unfortunately, these functions not always allow using different colors or pch values when plotting the scatter diagrams with representation of point classes.

The present post deals with that point and will provide some code allowing to get graphical outputs looking like the one below:

The functions s.label {ade4} does not allow plotting different symbols for the classes and for that reason I have modified the function and transformed it into s.label.bis. It is:

s.label.bis <- function (dfxy, xax = 1, yax = 2, label = row.names(dfxy), clabel = 1, pch = 20, cpoint = if (clabel == 0) 1 else 0, boxes = TRUE, neig = NULL, cneig = 2, xlim = NULL, ylim = NULL, grid = TRUE, addaxes = TRUE, cgrid = 1, include.origin = TRUE, origin = c(0, 0), sub = "", csub = 1.25, possub = "bottomleft", pixmap = NULL, contour = NULL, area = NULL, add.plot = FALSE, col="black", bg=NULL,cex=par("cex")) { dfxy <- data.frame(dfxy) opar <- par(mar = par("mar")) on.exit(par(opar)) par(mar = c(0.1, 0.1, 0.1, 0.1)) coo <- scatterutil.base(dfxy = dfxy, xax = xax, yax = yax, xlim = xlim, ylim = ylim, grid = grid, addaxes = addaxes, cgrid = cgrid, include.origin = include.origin, origin = origin, sub = sub, csub = csub, possub = possub, pixmap = pixmap, contour = contour, area = area, add.plot = add.plot) if (!is.null(neig)) { if (is.null(class(neig))) neig <- NULL if (class(neig) != "neig") neig <- NULL deg <- attr(neig, "degrees") if ((length(deg)) != (length(coo$x))) neig <- NULL } if (!is.null(neig)) { fun <- function(x, coo) { segments(coo$x[x[1]], coo$y[x[1]], coo$x[x[2]], coo$y[x[2]], lwd = par("lwd") * cneig) } apply(unclass(neig), 1, fun, coo = coo) } if (clabel > 0) scatterutil.eti(coo$x, coo$y, label, clabel, boxes) if (cpoint > 0 & clabel < 1e-06) points(coo$x, coo$y, pch = pch, cex = cex,col=col, bg=bg) box() invisible(match.call()) }

Similarly I changed the ade4 function s.class into s.class.bis which does not plot the “branches” of the stars but only the centroid of each class. It is:

s.class.bis<-function (dfxy, fac, wt = rep(1, length(fac)), xax = 1, yax = 2, cstar = 1, cellipse = 1.5, axesell = TRUE, label = levels(fac), clabel = 1, cpoint = 1, pch = 20, col = rep(1, length(levels(fac))), xlim = NULL, ylim = NULL, grid = TRUE, addaxes = TRUE, origin = c(0, 0), include.origin = TRUE, sub = "", csub = 1, possub = "bottomleft", cgrid = 1, pixmap = NULL, contour = NULL, area = NULL, add.plot = FALSE) { scatterutil.eti.jp<-function (x, y, label, clabel, boxes = TRUE, coul = rep(1, length(x)), horizontal = TRUE) { if (length(label) == 0) return(invisible()) if (is.null(label)) return(invisible()) if (any(label == "")) return(invisible()) cex0 <- par("cex") * clabel for (i in 1:(length(x))) { cha <- as.character(label[i]) cha <- paste(" ", cha, " ", sep = "") x1 <- x[i] y1 <- y[i] xh <- strwidth(cha, cex = cex0) yh <- strheight(cha, cex = cex0) * 5/3 if (!horizontal) { tmp <- scatterutil.convrot90(xh, yh) xh <- tmp[1] yh <- tmp[2] } if (boxes) { rect(x1 - xh/2, y1 - yh/2, x1 + xh/2, y1 + yh/2, col = "white", border = coul[i]) } if (horizontal) { text(x1, y1, cha, cex = cex0, col = coul[i]) } else { text(x1, y1, cha, cex = cex0, col = coul[i], srt = 90) } } } f1 <- function(cl) { n <- length(cl) cl <- as.factor(cl) x <- matrix(0, n, length(levels(cl))) x[(1:n) + n * (unclass(cl) - 1)] <- 1 dimnames(x) <- list(names(cl), levels(cl)) data.frame(x) } opar <- par(mar = par("mar")) par(mar = c(0.1, 0.1, 0.1, 0.1)) on.exit(par(opar)) dfxy <- data.frame(dfxy) if (!is.data.frame(dfxy)) stop("Non convenient selection for dfxy") if (any(is.na(dfxy))) stop("NA non implemented") if (!is.factor(fac)) stop("factor expected for fac") dfdistri <- f1(fac) * wt coul = col w1 <- unlist(lapply(dfdistri, sum)) dfdistri <- t(t(dfdistri)/w1) coox <- as.matrix(t(dfdistri)) %*% dfxy[, xax] cooy <- as.matrix(t(dfdistri)) %*% dfxy[, yax] if (nrow(dfxy) != nrow(dfdistri)) stop(paste("Non equal row numbers", nrow(dfxy), nrow(dfdistri))) coo <- scatterutil.base(dfxy = dfxy, xax = xax, yax = yax, xlim = xlim, ylim = ylim, grid = grid, addaxes = addaxes, cgrid = cgrid, include.origin = include.origin, origin = origin, sub = sub, csub = csub, possub = possub, pixmap = pixmap, contour = contour, area = area, add.plot = add.plot) if (clabel > 0) scatterutil.eti(coox, cooy, label, clabel, coul = col) box() invisible(match.call()) }

Now let look at the example provided with the ade4 function dudi.pca

library(ade4) data(deug) pca1 <- dudi.pca(deug$tab, scale = FALSE, scan = FALSE)

We now construct different vectors that will be necessary later (this code is necessary because of the data structure of deug)

colv<-rep("black",times=length(as.numeric(deug$result))) pchv<-rep(1,times=length(as.numeric(deug$result))) cc<-topo.colors(length(unique(as.numeric(deug$result))))

and use these vectors to create new color and pch vectors that will be used for plotting:

for (j in 1:(length(unique(ccn))) ){ ww<-which(ccn==unique(ccn)[j]) colv[ww]<-cc[unique(ccn)][j] ; pchv[ww]<-cc2[j] }

We now plot the usual scatter plot using s.class

s.class(pca1$li,deug$result,grid=F,cgrid=0,csub=2,cellipse=0,col=cc,cpoint = 0)

and superimpose the symbols using s.label.bis

s.label.bis(pca1$li, pch=pchv, clabel = 0, col=colv, add.plot = T)

The modified s.class.bis function can be used to redraw the class labels :

s.class.bis(pca1$li, deug$result,csub = 2,cellipse=0, col=cc, cpoint = 0,label=levels(deug$result), add.plot = T)

s.label.bis and s.class.bis can also be used to draw color scatter class graphics such as this:

## The MIGCLIM R package

I just saw the paper by Engler et al (2012) published a few days ago presenting the R version of MIGCLIM. The MIGCLIM R package allows the integration of dispersal constraints into projections of species distribution models. This is a very promising tool and I will test it as soon as possible.

Robin Engler, Wim Hordijk, Antoine Guisan (2012) The MIGCLIM R package – seamless integration of dispersal constraints into projections of species distribution models. Ecography. In press. DOI: 10.1111/j.1600-0587.2012.07608.x Article first published online: 3 AUG 2012

## Import worldclim data into GRASS

This post is about how to import worldclim data sets into a `GRASS`

project.

WorldClim – Global Climate Data WorldClim is a set of global climate layers (climate grids) with a spatial resolution of about 1 square kilometer. The data can be used for mapping and spatial modeling in a GIS or with other computer programs.

http://www.worldclim.org/

First you need to create a location using the EPSG code corrresponding to the worldclim data. It is 4326. You can find a list of the EPSG codes at http://spatialreference.org/

Once the location has been created you can launch GRASS with the location and import worldclim data sets. Worldclim offers different resolutions and you have to choose one of them. Then, define the `g.region`

in GRASS using the selected resolution. For instance, we want data at a resolution of 00:02:30.

In GRASS type:

g.region n=90N s=90S e=180E w=180W res=00:02:30

This matches with the extent of the worldclim data and the desired resolution.

The last step is pretty straightforward. Select the directory where the worldclim data have been downloaded.

For instance:

cd /home/rossi/data/worldclim/

Then import the data (here we import the Altitude layer)

r.in.bin -s input=alt.bil output=altitude bytes=2 north=90 south=-60 east=180 west=-180 rows=3600 cols=8640 anull=-9999

Note that the number of columns and rows as well as the east, west, north and south limits are given in the *.bil file available in the download from http://www.worldclim.org.

It is convenient to import automatically a set of layers using a loop :

for i in $(seq 1 19) do r.in.bin -s input=bio$i.bil output=bio$i bytes=2 north=90 south=-60 east=180 west=-180 rows=3600 cols=8640 anull=-9999 done

In that example, we have imported the 19 layers of the BIOCLIM variables using a loop and the GRASS function `r.in.bin`

Now the worldclim data are available within our GRASS location. These raster layers can be used visualisation, analyses and modelling with GRASS but also using R (see the post “Playing with R within a GRASS environment” from this blog).

Have fun!

## Playing with R within a GRASS environment

Although there are some powerful `GIS`

utilities in `R`

, I prefer using `GRASS`

to manage my `GIS`

data while I use `R`

to perform scientific computing.

In fact `GRASS`

and `R`

can be very simply interfaced by means of the `R`

package `spgrass6`

.

Under linux operating system (actually Kubuntu Natty Narwhal) I open the console, launch GRASS by typing `grass`

and then select a location (see GRASS manual for details).

Once GRASS is running, `R`

can be launched from within the `GRASS`

console…

From now on I am working in `R`

and I can use the library `spgrass6`

in order *e.g.* to read or write rasters into the `GRASS`

system.

As an example we will consider the case of the pine shoot beetle *Tomicus piniperda* (Coleoptera: Curculionidae: Scolytinae). Readers can find the whole story in Horn et al. (2012) available here. *T. piniperda* is present throughout Europe. Interestingly, it has long been assumed to be present in North Africa too although molecular studies have recently shown that *T. piniperda* only rarely occurs in these regions (Horn et al., 2006, 2009).

We have a set of localities where *T. piniperda* was recorded as present or, on the contrary, has been searched and was absent (true absence). GRASS is used to host the data and produce the graphical outputs.

Starting `R`

from the `GRASS`

console and using the `dismo`

package allows us to fit a SDM (Species Distribution Model) and build a raster with the probabilities of presence e.g. using the function `predict`

. This raster layer can be expressed as presence/absence and the resulting data written within the GRASS system using the function `writeRAST6`

from package `spgrass6`

.

The raster can now be plotted using the `GRASS`

interface and its utilities.

Horn, A., Kerdelhué, C., Lieutier, F., Rossi, J.-P. (2012). Predicting the distribution of the two bark beetles

*Tomicus destruens*and

*Tomicus piniperda*in Europe and the Mediterranean region. Agricultural and Forest Entomology, in press, DOI: 10.1111/j.1461-9563.2012.00576.x.

Horn, A., Roux-Morabito, G., Lieutier, F. & Kerdelhué, C. (2006) Phylogeographic structure and past history of the circum-Mediterranean species *Tomicus destruens* Woll. (Coleoptera: Scolytinae). Molecular Ecology, 15, 1603–1615.

Horn, A., Stauffer, C., Lieutier, F. & Kerdelhué, C. (2009) Complex postglacial history of the temperate bark beetle T*omicus piniperda* L. (Coleoptera, Scolytinae). Heredity, 103, 238–247.

## Computing the “Multivariate Environmental Similarity Surfaces” (MESS) index in R

***edit oct 03 2012 ***

The mess function is now part of the dismo package developped by Robert J. Hijmans, Steven Phillips, John Leathwick and Jane Elith (dismo 0.7-23)***

This post is about the computation of the “Multivariate Environmental Similarity Surfaces” aka MESS in R. The MESS index was proposed by Elith et al (2010) [Methods in Ecology & Evolution 2010, 1, 330–342]. MESS can be computed with the Maxent software but I wanted to perform all my data treatments within R/GRASS environment. For that reason I wrote a `R`

function called `mess.R`

.

Note that the function requires the `R`

package `raster`

developed by R. Hijmans and J. van Etten. Both the function and a tutorial can be downloaded for my site.

The present post provides some explanations, the `mess.R `

function and some examples.

**1- What is MESS?**

MESS stands for Multivariate Environmental Similarity Surfaces. It is an index of similarity reporting the closeness of a point described by a set of environmental attributes to the distribution of these attributes within a population of reference points. The MESS approach has been proposed by Elith et al (2010). It works as BIOCLIM does but provides negative values which allows one to differentiate levels of dissimilarity. This is particularly valuable when considering points associated to values lying outside the range of the reference points.

**2-The mess.R function**

mess<-function(X,V,full=TRUE) { messi<-function(p,v){ niinf<-length(which(v<=p)) f if(f==0) simi<-100*(p-min(v))/(max(v)-min(v)) if(0 if(50<=f & f<100) simi<-2*(100-f) if(f==100) simi<-100*(max(v)-p)/(max(v)-min(v)) return(simi) } E<-extract(x=X,y=1:ncell(X)) r_mess<-X for (i in 1:(dim(E)[2])) { e<-data.frame(E[,i]) ; v<-V[,i] r_mess[[i]][]<-apply(X=e, MARGIN=1, FUN=messi, v=v) } rmess<-r_mess[[1]] E<-extract(x=r_mess,y=1:ncell(r_mess[[1]])) rmess[]<-apply(X=E, MARGIN=1, FUN=min) if(full==TRUE) { out layerNames(out)<-c(layerNames(X),"mess") } if(full==FALSE) out return(out) }

The function `mess.R`

allows the computation of the MESS index for a set of raster objets. raster objects are defined within the package `raster`

developped by Hijmans & van Etten (2011). `mess.R`

therefore requires that the package `raster`

is properly installed and loaded using the command `library(raster)`

.

**3-Examples**

The following example uses the *Bradypus* data hosted in the package `dismo`

(Hijmans et al., 2012). As with `raster`

, `dismo`

must be installed and loaded.

library(dismo);library(raster) filename bradypus bradypus files ’/ex’, sep=’’), pattern=’grd’, full.names=TRUE ) predictors predictors<-dropLayer(x=predictors,i=9) reference_points<-extract(predictors,bradypus) mess.out<-mess(X=predictors,V=reference_points,full=TRUE) plot(mess.out)

This is the result:

**4-References**

Elith J., Kearney M., & Phillips S. 2010. The art of modelling range-shifting species. Met. Ecol. Evol 1 :330-342.

Hijmans R.J. & van Etten J. (2011). raster : Geographic analysis and modeling with raster data. R package version 1.9-58. http://CRAN.R-project.org/package=raster