snps
is a new open source Python package that aims to help users interact with genetic data from a variety of sources, including direct-to-consumer (DTC) DNA testing companies and whole genome sequencing (WGS) services. Specifically, snps
provides tools to help with reading, writing, merging, and remapping SNPs.
The initial snps
capability was developed by Andrew Riha as part of lineage, which joined the Open Humans ecosystem as a project in early 2019. Soon thereafter, numerous members from the Open Humans community, including Bastian Greshake Tzovaras, Mad Price Ball, James Turner, Ben Carr, and Beau Gunderson, requested support for VCF files. So, in May 2019, Kevin Arvai (with his VCF experience from Imputer) and Andrew teamed up to add the VCF capability, and wanting to share the work with others, snps
began as an open source project to further enable citizen science.
Some of snps
capabilities are detailed below, and a Notebook demonstrating usage of `snps` is available on Open Humans has been developed by Kevin & Bastian.
Reading
snps
supports reading VCF (variant call format) files, in addition to files from 23andMe, Ancestry, Family Tree DNA, and MyHeritage.
Moreover, snps
attempts to detect the assembly, or build, of the data. Commonly, Builds 36, 37, and 38 are used today, and these represent the “version” of the reference genome.
Writing
snps
supports writing SNPs to CSV and VCF files for Builds 36, 37, and 38. This also means that snps
can be used to essentially convert files from DTC DNA tests to VCF format.
Merging
snps
supports merging datasets, e.g., if test results are available from more than one source. When SNPs are merged, any discrepancies are identified.
Remapping
snps
supports remapping SNPs from one assembly to another. SNPs can be remapped between Builds 36, 37, and 38.
This guest post was written by Andrew Riha and Kevin Arvai. Andrew & Kevin have both launched projects that make use of genetic data on Open Humans before.