Today we’re introducing Andrew Riha who recently was awarded one of our project grants for his tool lineage. With lineage Andrew will make the genetic data you store on Open Humans even more useful, by enabling Ancestry analyses!
Hey Andrew, please give our blog readers a quick introduction about who you are!
I’m a systems engineer at an aerospace company in Southern California. I studied at Iowa State University, the University of Newcastle, and Delft University of Technology, and I have a B.S. and M.S. in computer engineering. A few years ago, I became interested in direct-to-consumer DNA testing after a friend told me about his experience with 23andMe. This interest developed into a passion, and I’m currently pursuing a graduate certificate in bioinformatics. My hobbies include running, traveling, and backpacking.
When and how did you come to Open Humans?
Director of Research, Bastian, introduced me to the Open Humans platform in early 2018. I had mentioned to Bastian that I wanted to turn my hobby open source Python project lineage into a web app, so he suggested I consider applying for a project grant.
Have you been involved in any projects on Open Humans so far, either as a participant or even running your own?
This is my first project with Open Humans. I’m looking forward to learning from others and further developing and integrating lineage into the Open Humans ecosystem as a great open source web app!
Your project lineage was awarded one of the Open Humans project grants. Can you explain us what the project is about?
lineage is a framework for analyzing genotype files (e.g., raw data files from 23andMe, Ancestry, etc.), primarily for the purposes of genetic genealogy and ancestry analysis. It can identify DNA and genes shared between individuals, and it provides other useful capabilities such as merging raw data files from different testing companies, identifying discrepant and discordant SNPs, and remapping SNPs to different assemblies / builds.
How did you come up with the idea behind lineage?
After my friend told me about his experience with 23andMe, I started researching how to get tested and found the International Society of Genetic Genealogy’s wiki very helpful and informative. The wiki led me to an excellent paper by Whit Athey that discussed using genotype files to phase the chromosomes of a family group and “reverse engineer” the DNA of a missing parent in the process! So, for a CS50 final project, I challenged myself to implement Whit’s algorithm in Python, using scientific libraries and vectorized programming in order to efficiently handle and analyze the large datasets involved.
The initial algorithm implementation was successful, and lineage had begun. But, I soon realized the need for other capabilities, such as comparing / merging files from different testing companies and determining what DNA is shared between individuals so that it could be used to guide the phasing algorithm. So, lineage grew into the framework that exists today, and I eventually want to return to implementing Whit’s algorithm, applying the bioinformatics and visualization concepts that I’ve learned along the way.
Is there anything important that we didn’t cover so far that you’d like to add?
lineage wouldn’t have been possible without the knowledge and help graciously provided by so many people. It is in that spirit that I’d like to encourage others to create and contribute to open source projects – sharing your ideas and passions with the world can be a very rewarding endeavor!
Oh, and thanks Mom, Dad, grandmas, and grandpas for the genes. 🙂
3 thoughts on “Meet Andrew Riha, our next project grant awardee”
Have you thought of contacting GEDmatch with your software?
What I am thinking is that your software could be perhaps a Tier 1 service on GEDmatch where one could phase a gene chip file from the GEDmatches.
The problem with phasing with close relatives is that you can wind up with ambiguous heterozygous calls that cannot be phased. With GEDmatch you could phase off the relatives that match. By using such an extended database, it would seem likely that one could fully phase one’s genome. That would be great!
Hi R, thanks for the note. Since lineage is being developed as an open source project, it would certainly be available to integrate with GEDmatch, and it seems like it could be a good fit as well!
As for the phasing problem, another option could be using Eagle2 ( https://data.broadinstitute.org/alkesgroup/Eagle/ ) and the Michigan Imputation Server ( https://imputationserver.sph.umich.edu/index.html ) in order to perform haplotype phasing using a reference panel (e.g., data from the 1000 Genomes Project). I agree that fully phasing one’s data would be great, and it’s a feature that I hope lineage will have someday!