Data Modeling/Cleanup — Week 5 Update (Jay)

This week I started to impute missing data trait values using a package called multivariate imputation with chained equations (mice). I also used a package called Rphylopars for phylogenetic trees. Unfortunately, we found mistakes in a data set that was taken from a larger database. When we were generating predicted values with the mice package, we were getting massive outliers. When we went to look back at the data, we noticed that the original data had outrageous values. The R Script that was used to extract the values was not taking into account distinctions between categorical and numerical data. Now, I’m editing that script in order to extract the correct trait values. This is extremely tedious and the least fun part of my job. Imputing missing data values and using statistical models is much more enjoyable.

The moral of the story is to make sure that the dataset that you are using is accurate and has reasonable values before you get started on any major statistical analyses.

I also made corrections on my abstract, and I am currently working on my annotated bibliography.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

Up ↑

%d bloggers like this: