Home > ggene, package, R > Managing missing values in ggene

Managing missing values in ggene

Missing values is a big problem because they are so common in microsatellite datasets that removing them sometimes amounts to loose all the information (individuals or alleles).

ggene cannot handle NAs and thus users must remove individuals/alleles with missing data or convert them to something… else!

One common strategy is to replace NAs by the mean value of the corresponding column or by 0 (see the package adegenet). I won’t discuss these options and simply provide a simple solution to users who cannot remove NAs in their data and have to replace them.

We define below two functions NAtoMean and NAto0 which replace NA by the mean of the column (allele) or by 0, respectively.

NAtoMean <- function(x){
    x[is.na(x)] <- mean(x, na.rm = TRUE)

NAto0 <- function(x){ 
    x[is.na(x)] <- 0

Now we create a missing data in the $tab of a ggene object:

## [1] 0
crypho$tab[1,1] <- NA
## [1] NA

If we try to compute the variogram with this altered dataset we get an error message:

var <- svariog(crypho, plot=TRUE)
## Error in FUN(X[[i]], ...): NA/NaN/Inf dans un appel à une fonction externe (argument 4)

Replacing NAs by the mean of the column in the data table is achieved using apply:

crypho$tab <- apply(X=crypho$tab, MARGIN=2, FUN=NAtoMean)
## [1] 0.2872727

The svariog can now compute the variogram:

var <- svariog(crypho, plot=TRUE)

plot of chunk unnamed-chunk-6

NAtoMean and NAto0 can easily be customised.

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: