Assembly Assessments – Susanna wk 6

This past week, I have been doing preliminary assessments with the data that I have gotten back from Xander. For every gene, I used R to examine the kmer abundance distribution, calculate the length (in basepairs) statistics, the percent identity to the gene of interest, and number at 99% identity. I also use R to calculate... Continue Reading →


Xander protein-targeted assembly- Susanna Wk5

As I head into week 5, I can't believe the summer program is almost half way over. I just arrived here, but in only about a month I have learned a lot. I am also looking forward to the second half of the program to see what results I will have. This past week I... Continue Reading →

Genome sizes and assembly – Week 4

This past week I have worked more with the metagenomes. After quality-trimming the files last week, I have started some analysis with the data. Basically, I want to find out the diversity of different arsenic-resistant genes in these metagenomes. To do this I need to find the size of the metagenome by finding the number... Continue Reading →

Metagenomes and HPCC- Susanna Wk3

This week I am downloading metagenomes, checking the quality of the data, and quality-filtering the data. To do this, I submit jobs to the HPCC because if I did it all individually on my computer it would take too long. A metagenome is "all the genetic material present in an environment sample, consisting of the... Continue Reading →

Data manipulation with R – Week #3 Update (Jay)

Hello all, I did a couple of things in the past week: Learned about the dplyr library in R, which helps get data frames and matrices in a useable format and wrote an R script to merge some data files from a plant trait database. Learned about parallelizing matrix operations with mpi4py. Learned about remote sensing and geospatial mapping with LIDAR (which also maybe were self-driving car technology may go as LIDAR gets cheaper). Learned about data visualization with ggplot2 utilizing the Grammar of Graphics. Read some cool articles about predicting missing data values from sparsely populated data sets. If you are using R and need to convert or transform data in a particular format, then dpylr is extremely useful. Here's some documentation:

Beginning Research on AsRG Diversity

My research is on the diversity of arsenic-resistant genes (AsRG) in soil bacteria. Questions that my mentor has been thinking about are: What are AsRG concentrations across soils? How does the diversity of AsRGs change between soils? Can this be mostly predicted or explained by what species make up the community in the soil? Metadata of various... Continue Reading →

Introduction and Expectations

This summer will be working under Dr. Ashley Shade with microbiology and molecular genetics. I will be working with my mentor Taylor Dunivin who focuses on bacterial resistance to arsenic, in both forms of  arsenate and arsenite. By testing arsenic-resistant genes from various microbial communities in topsoil around the world, it will be possible to quantify a baseline of... Continue Reading →

Blog at

Up ↑