Alan Min
Video by Naveen Kumar and Alan Min

About me

I am a second year statistics PhD student at the University of Washington, and my research focus is in statistical genetics. This is an area that particularly excites me because not only do I get to work on genetics, which has many applications, I also get to use my computational skills. I completed my undergraduate degree at Purdue University, where I studied computer science, statistics, and math! In my spare time, I do a good amount of photography, music production, and videography.

Research Interests

Heritability Estimation

Heritability can be thought of as how much variation in a phenotype can be explained by genetics. For example, many scientists have worked on estimating the heritability for height. If the heritability were 0%, then we would expect that your height did not depend at all on the heights of your parents. On the other hand, if heritability were 100%, then your height is completely determined once we know what genes you have. This question has many important consequences in agriculture and genomics. If we knew that the heritability for height was extremely high, then we would expect that no amount of drinking milk could affect how tall someone would end up.

For many years, scientists have used familial structure to estimate heritability, for example, comparing the heights of siblings or cousins. Recently, however, technology has allowed scientists to collect data on the specific genes of individuals. This has led us to try and replicate the earlier familial heritability studies with the new technology. However, we have consistently come up with lower estimates for heritability using this modern approach. This is known as the problem of missing heritability. I am currently working with Dr. Elizabeth Thompson and Dr. Saonli Basu, using simulation studies to understand problems with our modern methodology.

Latent Dirichlet Allocation

A common problem in natural language processing is to assign topics to documents. For instance, it would be nice if computers were able to read an NPR article and figure out that it was about politics and dogs. Latent dirichlet allocation (LDA) is a Bayesian methodology that has been developed to this end. It assumes that the word order does not matter, and that different topics generate different words. Depending on the content of a document, the LDA algorithm is then able to assign how much of a document is based off of a given topic.

There has been research that applies this same LDA algorithm to cells to assign different cell types to a corpus of cells based on their RNA-seq data. I'm interested in methodologies of prior specification for these models and comparing performance between different priors.

Music
Photography

You can check out my work at alanminphotography.weebly.com

Scattegories!