Getting and managing occurrence data from gbif.org

The R package dismo by Hijman et al. offers a very powerful function named gbif that allows to get occurence data from the gbif dabase.

For example if we are to map the geographical range of the Mediterranean fruit fly, Ceratitis capitata we simply do:

library(dismo)
distr <- gbif(genus="Ceratitis", species="capitata*", geo=TRUE,removeZeros = TRUE)

Note that “capitata” and “capitata*” won’t lead to similar results because the species name is often associated to author’s name in the database i.e. Ceratitis capitata or Ceratitis capitata (Wiedemann, 1824).
The results can be plotted:

plot(distr$lon,distr$lat, pch=3, cex=1, col="red", asp=1, xlab="longitude", ylab="latitude")
map("world", resolution = 0.5, add=T)

gbif occurrences for Ceratitis capitata

gbif occurrences for Ceratitis capitata

Alas! many occurrences correspond to the same geographical location i.e. there are many duplicated points which might be a source of trouble in subsequent data analyses and distribution modeling. This problem can be easily solved using the R package spatstat by Baddeley and Turner (2005) and its function duplicated.ppp.

We first have to create a spatial point pattern object from the occurrences points using the function ppp and a window created with the ripras function. Then duplicated.ppp will identify duplicated points and we will be able to remove them. The code is:

library(spatstat)
x<-distr$lon ; y<-distr$lat
w <- ripras(x,y)
wp<-ppp(x,y,window=w)
dupv<-duplicated.ppp(wp)
x2<-x[which(dupv==FALSE)] ; y2<-y[which(dupv==FALSE)] # coordinates of points with no duplicates

In the case of Ceratitis capitata gbif returned a total of 1279 occurrences that included not less that 982 duplicated points.


References
A. Baddeley and R. Turner (2005). Spatstat: an R package for analyzing spatial point patterns. Journal of Statistical Software 12 (6), 1-42. ISSN: 1548-7660. URL: http://www.jstatsoft.org

Robert J. Hijmans, Steven Phillips, John Leathwick and Jane Elith (2012). dismo: Species distribution modeling. R package version 0.7-23. http://CRAN.R-project.org/package=dismo



Categories: R Tags: ,

Retrieve spatial coordinates from location’s name

It is a common task to retrieve the spatial coordinates (longitude and latitude) from the sole name of a place. This occurs for example when one wants to map spatial occurrences of a species that are reported in the literature in the form of a list of places.

This work is strongly facilitated by the R package titled gooJSON developped by Christopher Steven Marcum.

Let’s say that we want to retrieve the coordinates of a place reported as Fort Myers in Florida where J. C. Denmark reported the presence of Homalodisca vitripennis Germar, 1821 in 1957.


library(gooJSON)
a<-gooadd(address = list("Fort Myers","Florida"))
b<-goomap(a)
b$Placemark[[1]]$Point$coordinates

[1] -81.87231 26.64063 0.00000

Here we are, Google API tells us (through R) that Fort Myers in Florida is located at -81.87231 longitude and 26.64063 latitude.

Now imagine that we want to repeat that operation 500 times because we have a bunch of places where our study species has been recorded. The goomap function can be put into a loop and the Google API questioned 500 times, but here comes the twist: Google will not answer 500 times in a row and will send no data for some records. This problem can simply be solved by slowing your loop so that Google API won’t receive your calling to quickly and will answer to all queries. This is done by adding a call to the function Sys.sleep in the loop to suspend execution of R expressions for a given number of seconds. I used Sys.sleep(0.2) and it worked perfectly!


Reference
Christopher Steven Marcum (2012). gooJSON: Google JSON Data Interpreter for R. R package version 1.0.01.http://CRAN.R-project.org/package=gooJSON



Categories: R Tags: , ,

Plot color symbols upon s.class {ade4} outputs

24 August 2012 1 comment

I am used to perform mutlivariate analyses using the R package ade4 by D. Chessel et al. The package features some wonderful graphical functions like s.class or s.label. Unfortunately, these functions not always allow using different colors or pch values when plotting the scatter diagrams with representation of point classes.
The present post deals with that point and will provide some code allowing to get graphical outputs looking like the one below:

A scatter diagram with classes and different symbols & colors

The functions s.label {ade4} does not allow plotting different symbols for the classes and for that reason I have modified the function and transformed it into s.label.bis. It is:

s.label.bis <- function (dfxy, xax = 1, yax = 2, label = row.names(dfxy), clabel = 1,
    pch = 20, cpoint = if (clabel == 0) 1 else 0, boxes = TRUE,
    neig = NULL, cneig = 2, xlim = NULL, ylim = NULL, grid = TRUE,
    addaxes = TRUE, cgrid = 1, include.origin = TRUE, origin = c(0,
        0), sub = "", csub = 1.25, possub = "bottomleft", pixmap = NULL,
    contour = NULL, area = NULL, add.plot = FALSE, col="black", bg=NULL,cex=par("cex"))
{
    dfxy <- data.frame(dfxy)
    opar <- par(mar = par("mar"))
    on.exit(par(opar))
    par(mar = c(0.1, 0.1, 0.1, 0.1))
    coo <- scatterutil.base(dfxy = dfxy, xax = xax, yax = yax,
        xlim = xlim, ylim = ylim, grid = grid, addaxes = addaxes,
        cgrid = cgrid, include.origin = include.origin, origin = origin,
        sub = sub, csub = csub, possub = possub, pixmap = pixmap,
        contour = contour, area = area, add.plot = add.plot)
    if (!is.null(neig)) {
        if (is.null(class(neig)))
            neig <- NULL
        if (class(neig) != "neig")
            neig <- NULL
        deg <- attr(neig, "degrees")
        if ((length(deg)) != (length(coo$x)))
            neig <- NULL
    }
    if (!is.null(neig)) {
        fun <- function(x, coo) {
            segments(coo$x[x[1]], coo$y[x[1]], coo$x[x[2]], coo$y[x[2]],
                lwd = par("lwd") * cneig)
        }
        apply(unclass(neig), 1, fun, coo = coo)
    }
    if (clabel > 0)
        scatterutil.eti(coo$x, coo$y, label, clabel, boxes)
    if (cpoint > 0 & clabel < 1e-06)
        points(coo$x, coo$y, pch = pch, cex = cex,col=col, bg=bg)
    box()
    invisible(match.call())
}

Similarly I changed the ade4 function s.class into s.class.bis which does not plot the “branches” of the stars but only the centroid of each class. It is:

s.class.bis<-function (dfxy, fac, wt = rep(1, length(fac)), xax = 1, yax = 2,
    cstar = 1, cellipse = 1.5, axesell = TRUE, label = levels(fac),
    clabel = 1, cpoint = 1, pch = 20, col = rep(1, length(levels(fac))),
    xlim = NULL, ylim = NULL, grid = TRUE, addaxes = TRUE, origin = c(0,
        0), include.origin = TRUE, sub = "", csub = 1, possub = "bottomleft",
    cgrid = 1, pixmap = NULL, contour = NULL, area = NULL, add.plot = FALSE)
{
scatterutil.eti.jp<-function (x, y, label, clabel, boxes = TRUE, coul = rep(1, length(x)),
    horizontal = TRUE)
{
    if (length(label) == 0)
        return(invisible())
    if (is.null(label))
        return(invisible())
    if (any(label == ""))
        return(invisible())
    cex0 <- par("cex") * clabel
    for (i in 1:(length(x))) {
        cha <- as.character(label[i])
        cha <- paste(" ", cha, " ", sep = "")
        x1 <- x[i]
        y1 <- y[i]
        xh <- strwidth(cha, cex = cex0)
        yh <- strheight(cha, cex = cex0) * 5/3
        if (!horizontal) {
            tmp <- scatterutil.convrot90(xh, yh)
            xh <- tmp[1]
            yh <- tmp[2]
        }
        if (boxes) {
            rect(x1 - xh/2, y1 - yh/2, x1 + xh/2, y1 + yh/2,
                col = "white", border = coul[i])
        }
        if (horizontal) {
            text(x1, y1, cha, cex = cex0, col = coul[i])
        }
        else {
            text(x1, y1, cha, cex = cex0, col = coul[i], srt = 90)
        }
    }
}
    f1 <- function(cl) {
        n <- length(cl)
        cl <- as.factor(cl)
        x <- matrix(0, n, length(levels(cl)))
        x[(1:n) + n * (unclass(cl) - 1)] <- 1
        dimnames(x) <- list(names(cl), levels(cl))
        data.frame(x)
    }
    opar <- par(mar = par("mar"))
    par(mar = c(0.1, 0.1, 0.1, 0.1))
    on.exit(par(opar))
    dfxy <- data.frame(dfxy)
    if (!is.data.frame(dfxy))
        stop("Non convenient selection for dfxy")
    if (any(is.na(dfxy)))
        stop("NA non implemented")
    if (!is.factor(fac))
        stop("factor expected for fac")
    dfdistri <- f1(fac) * wt
    coul = col
    w1 <- unlist(lapply(dfdistri, sum))
    dfdistri <- t(t(dfdistri)/w1)
    coox <- as.matrix(t(dfdistri)) %*% dfxy[, xax]
    cooy <- as.matrix(t(dfdistri)) %*% dfxy[, yax]
    if (nrow(dfxy) != nrow(dfdistri))
        stop(paste("Non equal row numbers", nrow(dfxy), nrow(dfdistri)))
    coo <- scatterutil.base(dfxy = dfxy, xax = xax, yax = yax,
        xlim = xlim, ylim = ylim, grid = grid, addaxes = addaxes,
        cgrid = cgrid, include.origin = include.origin, origin = origin,
        sub = sub, csub = csub, possub = possub, pixmap = pixmap,
        contour = contour, area = area, add.plot = add.plot)
    if (clabel > 0)
        scatterutil.eti(coox, cooy, label, clabel, coul = col)
    box()
    invisible(match.call())
}

Now let look at the example provided with the ade4 function dudi.pca

library(ade4)
data(deug)
pca1 <- dudi.pca(deug$tab, scale = FALSE, scan = FALSE)

We now construct different vectors that will be necessary later (this code is necessary because of the data structure of deug)

colv<-rep("black",times=length(as.numeric(deug$result)))
pchv<-rep(1,times=length(as.numeric(deug$result)))
cc<-topo.colors(length(unique(as.numeric(deug$result))))

and use these vectors to create new color and pch vectors that will be used for plotting:

for (j in 1:(length(unique(ccn))) ){
   ww<-which(ccn==unique(ccn)[j])
   colv[ww]<-cc[unique(ccn)][j] ; pchv[ww]<-cc2[j]
}

We now plot the usual scatter plot using s.class

s.class(pca1$li,deug$result,grid=F,cgrid=0,csub=2,cellipse=0,col=cc,cpoint = 0)

output of the s.class function

and superimpose the symbols using s.label.bis

s.label.bis(pca1$li, pch=pchv, clabel = 0, col=colv, add.plot = T)

output of the s.label.bis function

The modified s.class.bis function can be used to redraw the class labels :

s.class.bis(pca1$li, deug$result,csub = 2,cellipse=0, col=cc, cpoint = 0,label=levels(deug$result), add.plot = T)

s.label.bis and s.class.bis can also be used to draw color scatter class graphics such as this:

The MIGCLIM R package

I just saw the paper by Engler et al (2012) published a few days ago presenting the R version of MIGCLIM. The MIGCLIM R package allows the integration of dispersal constraints into projections of species distribution models. This is a very promising tool and I will test it as soon as possible.

Robin Engler, Wim Hordijk, Antoine Guisan (2012) The MIGCLIM R package – seamless integration of dispersal constraints into projections of species distribution models. Ecography. In press. DOI: 10.1111/j.1600-0587.2012.07608.x Article first published online: 3 AUG 2012

paper abstract
MigClim

Categories: R Tags: ,

Import worldclim data into GRASS

This post is about how to import worldclim data sets into a GRASS project.

WorldClim – Global Climate Data WorldClim is a set of global climate layers (climate grids) with a spatial resolution of about 1 square kilometer. The data can be used for mapping and spatial modeling in a GIS or with other computer programs.
http://www.worldclim.org/

First you need to create a location using the EPSG code corrresponding to the worldclim data. It is 4326. You can find a list of the EPSG codes at http://spatialreference.org/

Once the location has been created you can launch GRASS with the location and import worldclim data sets. Worldclim offers different resolutions and you have to choose one of them. Then, define the g.region in GRASS using the selected resolution. For instance, we want data at a resolution of 00:02:30.
In GRASS type:

g.region n=90N s=90S e=180E w=180W res=00:02:30

This matches with the extent of the worldclim data and the desired resolution.

The last step is pretty straightforward. Select the directory where the worldclim data have been downloaded.
For instance:

cd /home/rossi/data/worldclim/ 

Then import the data (here we import the Altitude layer)

r.in.bin -s input=alt.bil output=altitude bytes=2 north=90 south=-60 east=180 west=-180 rows=3600 cols=8640 anull=-9999 

Note that the number of columns and rows as well as the east, west, north and south limits are given in the *.bil file available in the download from http://www.worldclim.org.

It is convenient to import automatically a set of layers using a loop :

for i in $(seq 1 19)
do
r.in.bin -s input=bio$i.bil output=bio$i bytes=2 north=90 south=-60 east=180 west=-180 rows=3600 cols=8640 anull=-9999
done

In that example, we have imported the 19 layers of the BIOCLIM variables using a loop and the GRASS function r.in.bin

Now the worldclim data are available within our GRASS location. These raster layers can be used visualisation, analyses and modelling with GRASS but also using R (see the post “Playing with R within a GRASS environment” from this blog).

Have fun!

Playing with R within a GRASS environment

14 August 2012 1 comment

Although there are some powerful GIS utilities in R, I prefer using GRASS to manage my GIS data while I use R to perform scientific computing.

In fact GRASS and R can be very simply interfaced by means of the R package spgrass6.

Under linux operating system (actually Kubuntu Natty Narwhal) I open the console, launch GRASS by typing grass and then select a location (see GRASS manual for details).

console

Once GRASS is running, R can be launched from within the GRASS console…

From now on I am working in R and I can use the library spgrass6 in order e.g. to read or write rasters into the GRASS system.

As an example we will consider the case of the pine shoot beetle Tomicus piniperda (Coleoptera: Curculionidae: Scolytinae). Readers can find the whole story in Horn et al. (2012) available here. Tomicus piniperdaT. piniperda is present throughout Europe. Interestingly, it has long been assumed to be present in North Africa too although molecular studies have recently shown that T. piniperda only rarely occurs in these regions (Horn et al., 2006, 2009).

We have a set of localities where T. piniperda was recorded as present or, on the contrary, has been searched and was absent (true absence). GRASS is used to host the data and produce the graphical outputs.

Sites where T. piniperda was either present (black circle) or absent (open circle).

Starting R from the GRASS console and using the dismo package allows us to fit a SDM (Species Distribution Model) and build a raster with the probabilities of presence e.g. using the function predict. This raster layer can be expressed as presence/absence and the resulting data written within the GRASS system using the function writeRAST6 from package spgrass6.
The raster can now be plotted using the GRASS interface and its utilities.

GRASS window showing the occurences of T. piniperda and the associated predicted distribution derived from a SDM run in R


Horn, A., Kerdelhué, C., Lieutier, F., Rossi, J.-P. (2012). Predicting the distribution of the two bark beetles Tomicus destruens and Tomicus piniperda in Europe and the Mediterranean region. Agricultural and Forest Entomology, in press, DOI: 10.1111/j.1461-9563.2012.00576.x.

Horn, A., Roux-Morabito, G., Lieutier, F. & Kerdelhué, C. (2006) Phylogeographic structure and past history of the circum-Mediterranean species Tomicus destruens Woll. (Coleoptera: Scolytinae). Molecular Ecology, 15, 1603–1615.

Horn, A., Stauffer, C., Lieutier, F. & Kerdelhué, C. (2009) Complex postglacial history of the temperate bark beetle Tomicus piniperda L. (Coleoptera, Scolytinae). Heredity, 103, 238–247.


Computing the “Multivariate Environmental Similarity Surfaces” (MESS) index in R

13 August 2012 9 comments

*edit oct 03 2012 *
The mess function is now part of the dismo package developped by Robert J. Hijmans, Steven Phillips, John Leathwick and Jane Elith (dismo 0.7-23)***

This post is about  the computation of the “Multivariate Environmental Similarity Surfaces” aka MESS in R. The MESS index was proposed by Elith et al (2010) [Methods in Ecology & Evolution 2010, 1, 330–342]. MESS can be computed with the Maxent software but I  wanted to perform all my data treatments within R/GRASS environment. For that reason I wrote a R function called mess.R.

Note that the function requires the R package raster developed by R. Hijmans and J. van Etten. Both the function and a tutorial can be downloaded for my site.

The present post provides some explanations, the mess.R function and some examples.

1- What is MESS?
MESS stands for Multivariate Environmental Similarity Surfaces. It is an index of similarity reporting the closeness of a point described by a set of environmental attributes to the distribution of these attributes within a population of reference points. The MESS approach has been proposed by Elith et al (2010). It works as BIOCLIM does but provides negative values which allows one to differentiate levels of dissimilarity. This is particularly valuable when considering points associated to values lying outside the range of the reference points.

2-The mess.R function

mess<-function(X,V,full=TRUE)
{
messi<-function(p,v){
    niinf<-length(which(v<=p))
    f    if(f==0) simi<-100*(p-min(v))/(max(v)-min(v))
    if(0    if(50<=f & f<100) simi<-2*(100-f)
    if(f==100) simi<-100*(max(v)-p)/(max(v)-min(v))
    return(simi)
    }
E<-extract(x=X,y=1:ncell(X))
r_mess<-X
for (i in 1:(dim(E)[2])) {
    e<-data.frame(E[,i]) ; v<-V[,i]
    r_mess[[i]][]<-apply(X=e, MARGIN=1, FUN=messi, v=v)
    }
rmess<-r_mess[[1]]
E<-extract(x=r_mess,y=1:ncell(r_mess[[1]]))
rmess[]<-apply(X=E, MARGIN=1, FUN=min)
if(full==TRUE) {
    out     layerNames(out)<-c(layerNames(X),"mess")
}
if(full==FALSE) out return(out)
}

The function mess.R allows the computation of the MESS index for a set of raster objets. raster objects are defined within the package raster developped by Hijmans & van Etten (2011). mess.R therefore requires that the package raster is properly installed and loaded using the command library(raster).

3-Examples
The following example uses the Bradypus data hosted in the package dismo (Hijmans et al., 2012). As with raster, dismo must be installed and loaded.

library(dismo);library(raster)
filename bradypus bradypus files ’/ex’, sep=’’), pattern=’grd’, full.names=TRUE )
predictors predictors<-dropLayer(x=predictors,i=9)
reference_points<-extract(predictors,bradypus)
mess.out<-mess(X=predictors,V=reference_points,full=TRUE)
plot(mess.out)

This is the result:


4-References
Elith J., Kearney M., & Phillips S. 2010. The art of modelling range-shifting species. Met. Ecol. Evol 1 :330-342.

Hijmans R.J. & van Etten J. (2011). raster : Geographic analysis and modeling with raster data. R package version 1.9-58. http://CRAN.R-project.org/package=raster

Hijmans R.J., Phillips S., Leathwick J. & Elith J. (2012). dismo : Species distribution modeling. R package version 0.7-17. http://CRAN.R-project.org/package=dismo


Categories: R Tags: ,