Posts Tagged ‘missing data’

Managing missing values in ggene

Missing values is a big problem because they are so common in microsatellite datasets that removing them sometimes amounts to loose all the information (individuals or alleles).

ggene cannot handle NAs and thus users must remove individuals/alleles with missing data or convert them to something… else!

One common strategy is to replace NAs by the mean value of the corresponding column or by 0 (see the package adegenet). I won’t discuss these options and simply provide a simple solution to users who cannot remove NAs in their data and have to replace them.

We define below two functions NAtoMean and NAto0 which replace NA by the mean of the column (allele) or by 0, respectively.

NAtoMean <- function(x){
    x[] <- mean(x, na.rm = TRUE)

NAto0 <- function(x){ 
    x[] <- 0

Now we create a missing data in the $tab of a ggene object:

## [1] 0
crypho$tab[1,1] <- NA
## [1] NA

If we try to compute the variogram with this altered dataset we get an error message:

var <- svariog(crypho, plot=TRUE)
## Error in FUN(X[[i]], ...): NA/NaN/Inf dans un appel à une fonction externe (argument 4)

Replacing NAs by the mean of the column in the data table is achieved using apply:

crypho$tab <- apply(X=crypho$tab, MARGIN=2, FUN=NAtoMean)
## [1] 0.2872727

The svariog can now compute the variogram:

var <- svariog(crypho, plot=TRUE)

plot of chunk unnamed-chunk-6

NAtoMean and NAto0 can easily be customised.