Archive

Posts Tagged ‘data mining’

Getting and managing occurrence data from gbif.org

The R package dismo by Hijman et al. offers a very powerful function named gbif that allows to get occurence data from the gbif dabase.

For example if we are to map the geographical range of the Mediterranean fruit fly, Ceratitis capitata we simply do:

library(dismo)
distr <- gbif(genus="Ceratitis", species="capitata*", geo=TRUE,removeZeros = TRUE)

Note that “capitata” and “capitata*” won’t lead to similar results because the species name is often associated to author’s name in the database i.e. Ceratitis capitata or Ceratitis capitata (Wiedemann, 1824).
The results can be plotted:

plot(distr$lon,distr$lat, pch=3, cex=1, col="red", asp=1, xlab="longitude", ylab="latitude")
map("world", resolution = 0.5, add=T)

gbif occurrences for Ceratitis capitata

gbif occurrences for Ceratitis capitata

Alas! many occurrences correspond to the same geographical location i.e. there are many duplicated points which might be a source of trouble in subsequent data analyses and distribution modeling. This problem can be easily solved using the R package spatstat by Baddeley and Turner (2005) and its function duplicated.ppp.

We first have to create a spatial point pattern object from the occurrences points using the function ppp and a window created with the ripras function. Then duplicated.ppp will identify duplicated points and we will be able to remove them. The code is:

library(spatstat)
x<-distr$lon ; y<-distr$lat
w <- ripras(x,y)
wp<-ppp(x,y,window=w)
dupv<-duplicated.ppp(wp)
x2<-x[which(dupv==FALSE)] ; y2<-y[which(dupv==FALSE)] # coordinates of points with no duplicates

In the case of Ceratitis capitata gbif returned a total of 1279 occurrences that included not less that 982 duplicated points.


References
A. Baddeley and R. Turner (2005). Spatstat: an R package for analyzing spatial point patterns. Journal of Statistical Software 12 (6), 1-42. ISSN: 1548-7660. URL: http://www.jstatsoft.org

Robert J. Hijmans, Steven Phillips, John Leathwick and Jane Elith (2012). dismo: Species distribution modeling. R package version 0.7-23. http://CRAN.R-project.org/package=dismo



Advertisements
Categories: R Tags: ,

Retrieve spatial coordinates from location’s name

It is a common task to retrieve the spatial coordinates (longitude and latitude) from the sole name of a place. This occurs for example when one wants to map spatial occurrences of a species that are reported in the literature in the form of a list of places.

This work is strongly facilitated by the R package titled gooJSON developped by Christopher Steven Marcum.

Let’s say that we want to retrieve the coordinates of a place reported as Fort Myers in Florida where J. C. Denmark reported the presence of Homalodisca vitripennis Germar, 1821 in 1957.


library(gooJSON)
a<-gooadd(address = list("Fort Myers","Florida"))
b<-goomap(a)
b$Placemark[[1]]$Point$coordinates

[1] -81.87231 26.64063 0.00000

Here we are, Google API tells us (through R) that Fort Myers in Florida is located at -81.87231 longitude and 26.64063 latitude.

Now imagine that we want to repeat that operation 500 times because we have a bunch of places where our study species has been recorded. The goomap function can be put into a loop and the Google API questioned 500 times, but here comes the twist: Google will not answer 500 times in a row and will send no data for some records. This problem can simply be solved by slowing your loop so that Google API won’t receive your calling to quickly and will answer to all queries. This is done by adding a call to the function Sys.sleep in the loop to suspend execution of R expressions for a given number of seconds. I used Sys.sleep(0.2) and it worked perfectly!


Reference
Christopher Steven Marcum (2012). gooJSON: Google JSON Data Interpreter for R. R package version 1.0.01.http://CRAN.R-project.org/package=gooJSON



Categories: R Tags: , ,