This past week I wrote an R function that randomly withholds data from a training set( known trait values from the tree database). The function call can specify the proportion of values to omit and how to weight columns and rows. Columns contain traits and rows contain individual species. Subsequently, I tested phylogenetic imputations (using RPhylopars package in R) at different proportions of missing values. Then, I plotted the imputed values with 95% confidence intervals and the known values to see if the known values were in the given confidence intervals.
This past week, I finished cleaning up the plant/tree dataset that I will be using for my project. I wrote R scripts to run several imputations using predictive mean matching (pmm). The pmm linear regression method worked the best on the data. I also started using the R package Rphylopars to reconstruct the phylogenetic tree of the species to predict the missing traits.
This week I started to impute missing data trait values using a package called multivariate imputation with chained equations (mice). I also used a package called Rphylopars for phylogenetic trees.
Hello all, I did a couple of things in the past week: Learned about the dplyr library in R, which helps get data frames and matrices in a useable format and wrote an R script to merge some data files from a plant trait database. Learned about parallelizing matrix operations with mpi4py. Learned about remote sensing and geospatial mapping with LIDAR (which also maybe were self-driving car technology may go as LIDAR gets cheaper). Learned about data visualization with ggplot2 utilizing the Grammar of Graphics. Read some cool articles about predicting missing data values from sparsely populated data sets. If you are using R and need to convert or transform data in a particular format, then dpylr is extremely useful. Here's some documentation: http://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf http://r4ds.had.co.nz/transform.html