snps is a new open source Python package that aims to help users interact with genetic data from a variety of sources, including direct-to-consumer (DTC) DNA testing companies and whole genome sequencing (WGS) services. Specifically,
snps provides tools to help with reading, writing, merging, and remapping SNPs.
snps capability was developed by Andrew Riha as part of lineage, which joined the Open Humans ecosystem as a project in early 2019. Soon thereafter, numerous members from the Open Humans community, including Bastian Greshake Tzovaras, Mad Price Ball, James Turner, Ben Carr, and Beau Gunderson, requested support for VCF files. So, in May 2019, Kevin Arvai (with his VCF experience from Imputer) and Andrew teamed up to add the VCF capability, and wanting to share the work with others,
snps began as an open source project to further enable citizen science.
snps capabilities are detailed below, and a Notebook demonstrating usage of `snps` is available on Open Humans has been developed by Kevin & Bastian.
snps attempts to detect the assembly, or build, of the data. Commonly, Builds 36, 37, and 38 are used today, and these represent the “version” of the reference genome.
snps supports writing SNPs to CSV and VCF files for Builds 36, 37, and 38. This also means that
snps can be used to essentially convert files from DTC DNA tests to VCF format.
snps supports merging datasets, e.g., if test results are available from more than one source. When SNPs are merged, any discrepancies are identified.
snps supports remapping SNPs from one assembly to another. SNPs can be remapped between Builds 36, 37, and 38.
This guest post was written by Andrew Riha and Kevin Arvai. Andrew & Kevin have both launched projects that make use of genetic data on Open Humans before.