Monthly Archives: June 2019

snps: a new open source project with Open Humans roots

snps is a new open source Python package that aims to help users interact with genetic data from a variety of sources, including direct-to-consumer (DTC) DNA testing companies and whole genome sequencing (WGS) services. Specifically, snps provides tools to help with reading, writing, merging, and remapping SNPs.

The initial snps capability was developed by Andrew Riha as part of lineage, which joined the Open Humans ecosystem as a project in early 2019. Soon thereafter, numerous members from the Open Humans community, including Bastian Greshake Tzovaras, Mad Price Ball, James Turner, Ben Carr, and Beau Gunderson, requested support for VCF files. So, in May 2019, Kevin Arvai (with his VCF experience from Imputer) and Andrew teamed up to add the VCF capability, and wanting to share the work with others, snps began as an open source project to further enable citizen science.

Some of snps capabilities are detailed below, and a Notebook demonstrating usage of `snps` is available on Open Humans has been developed by Kevin & Bastian.

Reading

snps supports reading VCF (variant call format) files, in addition to files from 23andMe, Ancestry, Family Tree DNA, and MyHeritage.

Moreover, snps attempts to detect the assembly, or build, of the data. Commonly, Builds 36, 37, and 38 are used today, and these represent the “version” of the reference genome.

Writing

snps supports writing SNPs to CSV and VCF files for Builds 36, 37, and 38. This also means that snps can be used to essentially convert files from DTC DNA tests to VCF format.

Merging

snps supports merging datasets, e.g., if test results are available from more than one source. When SNPs are merged, any discrepancies are identified.

Remapping

snps supports remapping SNPs from one assembly to another. SNPs can be remapped between Builds 36, 37, and 38.

This guest post was written by Andrew Riha and Kevin Arvai. Andrew & Kevin have both launched projects that make use of genetic data on Open Humans before.