Projects


As part of the work of Dr. Rui Kuang's lab, I extended a tensor decomposition and graph regularization based algorithm for spatial transcriptomic data imputation from two to three spatial dimensions, making it the first method of its kind. This algorithm is implemented in an easy-to-use, documented command line tool for 2D and 3D ST data imputation. As part of the work, we measured performance of our method against scRNA-seq imputation methods, finding that our model outperformed all other models. This work was funded by an NSF Research Experience for Undergraduates (REU) supplement.

During the summer of 2022, I worked in Dr. Julie Segre's skin microbiome lab at the National Human Genome Research Institute. In this context, I learned how cross-collaboration between biologists and computational folks can lead to insights that would otherwise go unnoticed. At the NIH, I examined the impact of variant calling choices applied to samples of the emerging multidrug resistant fungal pathogen Candida auris. Utilizing my background in statistics and data visualization, I created a series of diagnostic plots that allowed us to diagnose errors in variant calling pipelines, which resulted in a series of discoveries about which steps in our pipeline were necessary. Furthermore, as a statistician, I was uneasy with the lack of experiments using ground truth data in fungal genomics, so I designed an analysis for the lab to preform once we received further sequencing data to validate our pipeline.

Publications:

  • Proctor DM, Atkins TK, Chen Q, Conlan S, Deming C, Samson SE, Hayden MK, Segre JA. Integrating data types to understand the genomic epidemiology of the emerging fungal pathogen Candida auris. Presented at: NHGRI Annual Retreat; October 14-15 2022; Natcher Conference Center, Bethesda, MD.
  • Atkins TK, Proctor DM, Deming C, Chen Q, Conlan SP, Segre JA. Diagnostic Measures for Fungal Genome Variant Callers. Poster session presented at: NIH Summer Poster Day; Aug. 3 2022; Virtual.

Promethease is an online tool that provides reports of genomic variation to users who upload genotyping data from 23andMe or other services. To do this, Promethease uses data from the online open-source database SNPedia, which provides summaries and other metadata on common or high-impact human SNPs.

However, Promethease is not free software (neither free as in speech nor free as in beer). Most concerningly, this limits the user's control over their data. Therefore, I have developed software I call The Modern Promethease for the advancement of open science that creates a report from a user's personal genome data and the SNPedia database, during which all genomic data are kept locally.

As part of the course GCD 3485, I analyzed protein a protein of unknown function, FAM151A, to determine putative function and other characteristics using purely computational techniques (including but not limited to: sequence conservation analysis, ortholog hunting to reconstruct evolutionary history, mRNA and expression analysis, and protein structure prediction). A summary of the analysis was used to create FAM151A's Wikipedia page, and the full results of the analysis can be found here.