All posts by Bastian

Doing Personal Science together – in memory of Steven Keating

February 6, 2021Open Humans, Self-researchBastian

Whether you call it Personal Science, Self-Research or Quantified Self, many of us have questions about ourselves and try to find answers for them. It’s a process that can range from structured journaling to rigorous N-of-1 studies. It’s about you – and it can be about nearly anything. From introspection to data analysis, self-research can help people find what habits work for them, manage chronic conditions, and learn more about themselves.

In collaboration between Quantified Self and Open Humans, we want to support self-research as a collective endeavor and provide a framework where a community of self-researchers can help each other. As one of the main support frameworks for this we have been hosting weekly self-research chats. In these chats you can share your currently on-going projects or projects you are currently planning; ask for feedback or advice; get help when you’re stuck; or brainstorm new ideas.

We started these calls last year as part of our collective memorial to Steven Keating. On July 19 2019 Steven – an inspiring advocate for patient data access – passed away. Curiosity was a driving force in Steven’s life. He recorded and shared videos of his brain surgery, explored his cancer’s genetic data, and printed 3D models of his tumor. Steven was also a supporter of Open Humans, and served as a director up until his passing. In his memory and in celebration of his life, we would like to help more people be curious about themselves. Our goal is to share reports about what we learned this July, in honor of him.

Our first memorial to Steven last year culminated in a Show & Tell meeting in which participants of the regular self-research chats presented their projects and what they learned from them. Given the large interest in this we will do the same this year: At the end of July we will organize another Show & Tell meeting in memory of Steven and just like last year we will hold weekly self-research meetings to prepare our projects.

We’re inviting you to join our calls – every Thursday at 10am Pacific time – to share your ideas about self-research projects – questions they have, and potential approaches – and provide support for each other in our efforts. Doing this in a group means you can get help when you’re stuck! This is an opportunity to try self-research for the first time – or to pursue a project you already have – by doing it with a small group of people who have diverse skills, lots of experience, and a desire to support you.

Join us in learning about ourselves. Find more information about the Memorial, the self-research chats and when the next meeting is.

Learn about Quantified Flu and our weekly community calls

April 23, 2020Open HumansBastian

Join Quantified Flu

Do changes in physiological signals that are measured through wearables (such as body temperature and resting heart rate) precede consciously experienced symptoms (such as coughs, sore throats, shortness of breath, etc.)? This question is on many people’s minds right now, and it was a hot topic during an Open Humans community call in early March. Collectively we wondered whether it is possible to explore and potentially improve, through collective self research & an open process, what individual value data has with wearable data and symptom tracking.

As a result of this we have launched Quantified Flu, a collective project that tries to answer this question through both retrospective data analyses and the ongoing reporting of symptoms. It allows you to annotate your existing wearable data by adding past dates on which you fell sick. For the ongoing symptom collection, you can set up a daily check-in time on which you will get an email asking you to either quickly report “no symptoms” or enter symptoms if you experience any.

You can decide how much of this data you want to share: You opt-in to publicly share the data through a random identifier, allowing others to analyze that data and create new and improved data visualizations. In the long-run we might share aggregated and de-identified data as well, as long as we are confident that this respects individual privacy. If you are interested in joining this effort as a participant, head to Quantified Flu and sign up!
All of this was made possible in such a short amount of time thanks to our great community, including Gary Wolf, Ernesto Ramirez, Katarzyna Wac, Chris Ball, Beau Gunderson, Lukasz Baldy, Konstantin Vdovkin and many more. If you want to contribute data visualizations, adding support for more wearables or help with web development, join our Slack and visit the #quantifiedflu channel!

Come to our Community Calls

As mentioned before: The idea to run Quantified Flu came directly out of our community calls, and given the huge interest in them we have ramped up the frequency and are now doing them on a weekly basis: They take place every Tuesday at 10am Pacific / 1pm Eastern / 6 pm GMT / 7pm CEST and you are invited to join us. We always welcome seeing new faces!

The calls are done via Zoom and we have a growing Google Document that contains all the details on how to join the calls, an agenda for the upcoming call as well as notes of the past ones. If you want to get reminders for the meetings in your own calendar, we do have a community calendar that you can subscribe to.

If you want to catch up on the prior community calls, we are recording them and make them available on the Open Humans YouTube channel.

Meet the newest projects on Open Humans

September 4, 2019Open HumansBastian

In the last few weeks our community has launched a plethora of new projects that you can join to collect more data about yourself as well as new research opportunities covering topics from the genetics of personality over blood pressure tracking to cluster headaches. Find out more about those projects below:

QCycle: Tracking ovulatory cycles

QCycle is a participatory research study that follows the spirit of the Quantified Self. As a collaboration between Azure Grant at the University of California, Berkeley and all participants, the study is interested in mapping the diversity of biological rhythms such as the ovulatory cycle through different wearable devices such as the Oura Ring. In the long-term one of the aims could be creating open-source ovulation predictions.

As the study is a collaborative project, you can also bring along your own research questions as a participant. Visit the study website to learn more on the aims and how it works.

Genetics of Personality Type

If you have your own genetic data from 23andMe, AncestryDNA or FamiliyTreeDNA you might be interested in the Genetics of Personality Type study of Dr. Denise Cook of the Ronin Institute. In her study, Denise wants to find out whether it is possible to find genetic variants that are correlated with the personality type as defined by different questionnaires that make use of the Myers-Briggs Type Indicator.

After joining the study you will be asked to fill out 3 surveys, the results of which will be deposited in your Open Humans account as well. You can learn more about the study and join it on the study website.

Nobism: Tracking cluster headaches

The cluster headache patient community around Nobism has been particularly active in the last few weeks. Under the lead of Rogier Koning , the community was awarded one of the Open Humans project grants. The grant allowed them to integrate a data synchronization into their mobile application for tracking symptoms and interventions. Check out their app for iOS and Android.

Thanks to a collaboration with the Ubiqum Code Academy you can already use the data collected by the mobile apps to get personalized reports and data visualizations. In the Nobism Ubiqum Cluster Headache Project a team of data science students will create evolving, monthly reports based on the data you collect.

Last but not least you can also decide to share those individualized reports with the larger Nobism community through the nobism reports for Advocating project. By sharing those reports you allow the larger community to use those reports in outreach materials like presentations.

snps: a new open source project with Open Humans roots

June 25, 2019Open HumansBastian

snps is a new open source Python package that aims to help users interact with genetic data from a variety of sources, including direct-to-consumer (DTC) DNA testing companies and whole genome sequencing (WGS) services. Specifically, snps provides tools to help with reading, writing, merging, and remapping SNPs.

The initial snps capability was developed by Andrew Riha as part of lineage, which joined the Open Humans ecosystem as a project in early 2019. Soon thereafter, numerous members from the Open Humans community, including Bastian Greshake Tzovaras, Mad Price Ball, James Turner, Ben Carr, and Beau Gunderson, requested support for VCF files. So, in May 2019, Kevin Arvai (with his VCF experience from Imputer) and Andrew teamed up to add the VCF capability, and wanting to share the work with others, snps began as an open source project to further enable citizen science.

Some of snps capabilities are detailed below, and a Notebook demonstrating usage of `snps` is available on Open Humans has been developed by Kevin & Bastian.

Reading

snps supports reading VCF (variant call format) files, in addition to files from 23andMe, Ancestry, Family Tree DNA, and MyHeritage.

Moreover, snps attempts to detect the assembly, or build, of the data. Commonly, Builds 36, 37, and 38 are used today, and these represent the “version” of the reference genome.

Writing

snps supports writing SNPs to CSV and VCF files for Builds 36, 37, and 38. This also means that snps can be used to essentially convert files from DTC DNA tests to VCF format.

Merging

snps supports merging datasets, e.g., if test results are available from more than one source. When SNPs are merged, any discrepancies are identified.

Remapping

snps supports remapping SNPs from one assembly to another. SNPs can be remapped between Builds 36, 37, and 38.

This guest post was written by Andrew Riha and Kevin Arvai. Andrew & Kevin have both launched projects that make use of genetic data on Open Humans before.

Meet our latest tools to use your genetic data: Imputer & Lineage

May 6, 2019Open HumansBastian

Two of our project grant awardees – Kevin Arvai and Andrew Riha – have been working tirelessly to build two new web tools that can make use of your genetic data that’s stored in Open Humans in interesting ways. And their hard work has paid off: Kevin’s Imputer and Andrew’s Lineage are now available!

Imputer is designed to fill the gaps in your genetic testing data. Direct-To-Consumer companies like 23andMe usually genotype just a small fraction of your genome, focusing on generating a low-resolution snapshot across your whole genome. Genotype imputation fills in those gaps by looking at reference populations of many individuals who have been fully sequenced in a high resolution, using this data to predict how to fill the gaps in your own data set. Imputer is using the reference data from the 1000 Genomes Project to perform this gap-filling and deposits the filled-up data in your Open Humans account. Kevin also provides two Personal Data Notebooks that you can use to explore your newly imputed data set. If you want to explore the quality of the newly identified variants, you can use this quality control notebook. And if you’re interested to see where your genome falls within a two-dimensional graph of different populations from around the globe, this notebook allows you to explore how closely you relate to other people in the 1000 Genomes data.

Andrew’s Lineage brings some further tools and genetic genealogy methods to Open Humans. If you have been tested by more than one Direct-To-Consumer genetic testing company, Lineage allows you to merge those different datasets into one large file, while also highlighting the variants that came out as different between those tests. You can also lift your files to a newer version of the human reference genome, which might be needed for using your data with other tools. Furthermore, Lineage brings a lot of interesting genetic genealogy tools: It allows you to compute how much shared DNA can be found between your own data and the genetic data of other individuals, using a genetic map. You can then create plots of the shared DNA between those two data sets, determine which genes are shared between them and even find discordant SNPs between the data sets.

Enjoy exploring your DNA!

Why I’ve Joined the Board of Open Humans

April 16, 2019Open HumansBastian

This is a post by Gary Wolf, one of the new Directors of the Open Humans Foundation. The post was originally published on the Quantified Self blog.

I’ve recently joined the board of directors of Open Humans, joining the current board along with two other new directors, Marja Pirttivaara and Alexander (Sasha) Wait Zaranek. I’m honored to be in their company, and I want to take advantage of joining the board to explain how, in my view, Quantified Self and Open Humans fit together. Both communities include many people working in science and technology who take an interest in biometric data. But this isn’t enough to define a common purpose, and in fact a much deeper connection between Open Humans and Quantified Self has developed over the last few years, as each community has approached, from nearly opposite directions, a common problem: How can we make meaningful discoveries with our own personal data?

Sample projects from Open Humans, an open infrastructure for storing and sharing personal data with chosen collaborators.

Open Humans has its roots in the Personal Genome Project, whose purpose was to supply scientists with human genomic data so that they could make discoveries more quickly. The geneticist George Church created a project to sequence the genome of individual volunteers who agreed to donate their genomic data non-anonymously, creating a common data resource. Since many important genomic questions cannot be answered with genome data alone, volunteers also shared other information about themselves. The Personal Genome Project inevitably became a somewhat more general personal data resource for science; however, with its focus on genomic data, much relevant data, including the kind of data that could be collected in daily life, remained out of scope.

When I first met Jason Bobe, who co-founded Open Humans with Mad Price Ball, he was keenly interested in this question of how to connect personal genomes with other personal data sets. Jason had worked with George Church on the Personal Genome Project. He and Mad saw Open Humans as an analogous effort, but one that would allow volunteers to contribute any kind of data. The Personal Genome Project was now a decade old. Perhaps, with deep personal data sets to work with, scientists could deliver on the promise of genomics to revolutionize medicine, a promise that had been long frustrated by the complexity connecting genomic data with real world outcomes.

I understood the goal. A few years earlier, I’d written a long Wired story about the taxonomic collaboration between Daniel Janzen and Paul Hebert. Janzen, along with his other accomplishments, was among the world’s most knowledgeable field biologists. Hebert had developed a genomic assay that promised to identify animals using an extremely small region (about 650 base pairs) of their mitochondrial DNA. Hebert was confident in his technique, but needed to prove its utility. How could the genomic data he was collecting be paired to real world ecological knowledge? At their field station in the Guanacaste Preserve in Costa Rica, Janzen and his partner Winnie Hallwachs, along with their students and colleagues, collected hundreds of butterflies and moths, identified them, snipped off a leg, and shipped it to Guelph, a city in Canada, where Hebert ran the sequence. Slowly, painstakingly, they connected the genomic data to the real world data. More than just proving that Hebert’s technique worked, they also brought a new degree of resolution to the ecological picture; showing, for instance, that individual specimens, though visually almost identical as adults, may belong to distinct evolutionary clades and feed on different plants. In my first conversations with Jason, I saw this as how Open Humans should work. It promised to provide the “field biology” for the genomic studies of the Personal Genome Project.

Handwritten species list from the Patilla field station in the Guanacaste National Park, Costa Rica.

Unfortunately, as attentive readers, link followers, and experts in the history of overconfidence in science may already have realized, there’s a pretty serious flaw in my analogy. Paul Hebert was using the genome to distinguish strands in evolutionary history, mostly at the level of species. He wanted to know, given a leg, what kind of creature it was from. Answering relevant health questions requires understanding the world at a far more detailed level, down to extremely small differences among individuals of the same species. The trick that Hebert used is never going to work; and, for many of the health related questions we care about, nobody knows the tricks that will work. Fifteen years after the launch of the Personal Genome Project, it continues to supply data resources to basic science, but its relevance to medicine remains mostly a promise.

In the Quantified Self community the focus has always been on individual discovery: How can we learn about ourselves using our own data? Many of the questions addressed by people doing their own QS projects relate to health and disease. Browse the archive of Quantified Self Show&Tell presentations and you’ll find projects on Parkinson’s disease, diabetes, cognitive decline, cardiovascular health, depression, hearing loss, and many other health related issues. The kind of “everyday science” practiced in the Quantified Self community can be understood as being the opposite of the genome-wide association studies. Instead of finding small, telling differences among groups of people, the everyday science of the Quantified Self finds large effects within a single person who is both subject and scientist.

This comes with its own kinds of difficulties. People doing Quantified Self projects related to health face a number of discouraging barriers, including lack of access to their own data and medical records, bureaucratic roadblocks and exorbitant costs in ordering their own lab tests, problems in acquiring the requisite domain knowledge to test their ideas and interpret their data, and – perhaps most discouraging to people who are dependent on medical professionals for some aspect of their care – lack of recognition in the health care system that self-collected data can be useful for making decisions about treatment.

In the 11 years since Quantified Self started, participants have tried many different ways to overcome these barriers, both individually for their own projects and systematically through creating tools and advocating for better policies. One of the lessons from this work is that while the focus of self-tracking projects is typically on individual learning, the methods required to make sense of our data often require collaboration. Existing systems are not designed to provide support for the kind of highly individualized reasoning we do; therefore, we have to build a new system. Key requirements of this new system include: private, secure data storage; capacity to integrate data from commercial wearable devices; fine-grained permissions allowing sharing of particular data with particular projects, and withdrawal of permission; capacity for ethical review both to protect individual participants and to enable academic collaborations.

Two years ago, we organized our first participant-led research project in the Quantified Self community. A group of about two dozen of us measured our blood cholesterol as often as once per hour, exploring both individual questions about the patterns and causes of variation in our blood lipids and a common group question about lipid variability. We had a pressing need for some collective study infrastructure, but there was no available tool that worked for our needs. We took a DIY approach and at the end of the project we’d learned a tremendous amount both about our own varying cholesterol and about the process of self-directed research. (Our paper, “Approaches to governance of participant-led research,” has recently been published in BMJ Open; our paper on our collective discovery about lipid variability has been accepted for publication in the Journal of Circadian Biology; we’ll add a URL when we have it.)

Slide detail from one of Azure Grant’s QS Show&Tell Talks

At the conclusion of our study, one of the participant organizers Azure Grant, decided to press ahead with another participant-led study on ovulatory cycling. Azure had already presented a self-study on using continuous body temperature to predict ovulation at a Quantified Self conference. Now, she wanted to organize a group of self-trackers to try something similar, but integrating newer measurement tools to acquire higher resolution data. Among these tools was the new version of the Oura ring, which offered body temperature, heart rate, and sleep data. This idea put new demands on our study infrastructure. Thanks to generous collaboration from Oura engineers, we could offer participants access to detailed data from their rings. But how could this data be stored privately and controlled by each individual, while also being available using fine-grained permissions to their fellow participants and study organizers? How could this data be integrated with other data types they might decide to collect during the project? Where was there infrastructure for a “field biology” of the self?

We turned to Open Humans. The personal reasons were as important as the technical ones. Mad Ball, along with her work leading Open Humans, is a long time participant in the Quantified Self community, who has consistently advocated for non-exploitive approaches to handling personal data, and has contributed the results of her own self-directed research. (See Mad’s recent talk on “A Self-Study Of My Child’s Genetic Risk.”) And Bastian Greshake Tzovaras, the Open Humans research director, quickly proved to be an extremely sensitive and skilled collaborator. Bastian co-founded openSNP, a grassroots effort that outgrew Personal Genome Project by supporting citizen science participation. (Currently, there are more genotyping datasets publicly shared in openSNP than all other projects in the world combined.)

With help from Mad and Bastian and the Open Humans infrastructure, we built our next stage study workflows with encouraging speed and harmony. Fundamentally, we found ourselves aligned on the core idea that research processes designed around personal data sets should be built to protect individual agency, even where this requirement creates friction for academic collaborators. The rarity of this commitment may only be obvious to those few people who have gotten painfully deep into the workflows of study infrastructure. (And I recognize that a post of this length that is this deep in the weeds can have very few readers!) But, in a way, that’s one of the beautiful things about this stage of building a new knowledge infrastructure. We’re far into it enough to have evidence that we’re on the right track. But we’re still close enough to the beginning that each step is a significant contribution and a potential model to build on.

I very much hope that over time – and the sooner the better – our shared ideas about individual agency and everyday reasoning are embodied in tools and policies that are so commonplace that no single organization is responsible for them. But for now, it’s impossible not to recognize that Open Humans is an indispensable resource, defining an approach that needs to be developed and expanded, and managed by a team that has deep insight into the challenges and potential of participatory science. I look forward to building more connections between our two communities.

Meet the latest Open Humans projects

December 27, 2018Open HumansBastian

We got a great selection of new projects and personal data explorations for you as an end-of-year gift. Here is an overview of the data import projects recently launched on Open Humans:

Oura Ring: You can now explore your sleep habits, body temperature and physical activity data as collected by the Oura Ring.
Overland: If you are using an iPhone you can now use Overland to collect your own geo locations along with additional data such as your phone’s battery levels over the day.
Google Location History: As an alternative way to record and import your location data you can now import a full Google Location History data set.
Spotify: Start creating an archive of your listening history through the Spotify integration
RescueTime: Import your computer usage data and productivity records into your account

Read more details about those integrations below:

Connect your Oura Ring

**Explore how your body temperature changes on weekdays and weekends by** **connecting your Oura Ring to Open Humans** **and** **running a Personal Data Notebook**.

The Oura is a wearable device well hidden inside a ring. It measures heart rate, physical activity and body temperature to generate insights into your sleep and activity habits. With Oura Connect you can setup an ongoing import of those data into your Open Humans account. This allows you to explore those data more thanks to already available Personal Data Notebooks!

Map your own locations with Overland

**Explore how you move around. To recreate this with your personal data** **use Overland** **and** **run this Personal Data Notebook**.

Overland is a free and open-source iOS application that keep track of your location through your phone’s GPS along with some metadata like velocity and the WiFi you are connected to. With Overland Connect you can import these data into your Open Humans account. The data can be visualized through Personal Data Notebooks, used to display your current location through a Personal API or to Geo-Tag your photo collection!

Use Google Location History to explore your location data

**Explore where you have been around the world. To recreate this with your personal data,** **import your Google Location History** **and** **run this Personal Data Notebook**.

Thanks to our Outreachy interns we have another new geolocation data source: Google Location History. No matter if you are using an iPhone or an Android phone, you can use the Google or Google Maps app on your phone to record where you have been. Through Google Takeout you can now export this data and then load it into Open Humans and explore it through Personal Data Notebooks.

Explore your music listening behaviour with Spotify data

**Explore when and how you listen to music. To recreate this with your personal data use** **Spotify Connect** **and** **run this Personal Data Notebook**.

Another Outreachy intern project was to collect your Spotify Listening History through Open Humans. Using Spotify Connect will automatically import the songs you listen to along with lots of metadata (e.g. how popular was the song at the time you listened to it?). Once you have collected some data, you can explore these through another Personal Data Notebook!

Learn about your productivity with RescueTime

**Find out whether your computer usage is correlated with how much you walk. Recreate this by** **using RescueTime** **and** **Fitbit**. **Then** **run this Personal Data Notebook**.

RescueTime is a service that collects how you are using your computer through a data collection app on your computer. It keeps track of the apps you use and the websites you visit and classifies these as productive or unproductive time (Hello Facebook!). Thanks to a personal project by Bastian you can import this data into your Open Humans account and explore it through Personal Data Notebooks

With this the whole Open Humans team wishes you a happy personal data exploration, relaxed holidays and a wonderful start of 2019!

The first manuscript describing the Open Humans community

December 18, 2018Open HumansBastian

Open Humans now consists of over 6,000 members that collectively have uploaded over 16,000 data sets!

To share this great community effort as a resource, we wrote our first academic manuscript. In it, we describe the platform, community, and some diverse projects that we’ve all enabled. You can find a pre-print on BioRxiv.

True to the community spirit of Open Humans, we wrote the manuscript completely in public and with an open call for contributions through our Slack. Thanks to this we could gather diverse perspectives of how Open Humans can be utilized for both research as well as personal data exploration. Using these existing projects and studies running on Open Humans as examples, we explore how our community tackles complex issues such as informed consent, data portability, and individual-centric research paradigms. Read more about this in the manuscript.

All of this is only made possible by your contributions to Open Humans, so we want to take this opportunity to thank you for your participation!

Personal Data Notebooks: Explore and analyze your data right in your browser

May 14, 2018UncategorizedBastian

With Open Humans we are not only working to empower you to decide with whom to share your personal data – but also to explore your own data. With our latest project addition – the Personal Data Notebooks – we are taking a further step in that direction. Based on the increasingly popular Jupyter Notebooks they bring together data analysis code, documentation and data visualization. With the added twist that the Personal Data Notebooks also easily provide simple and private access to your personal data that is stored in Open Humans. Which not only makes it easy to write and use a data analysis – it also makes it easy to share your results without having to share your personal data with someone else. That way you can not only learn about yourself and your data, but also about how data analyses are performed.

If you want to write your own data analysis for the notebooks from scratch you can get started in Python, R or Julia. Or if you want to tweak or run existing data analysis you can use and adapt existing notebooks. In the simplest case you don’t even have to write/edit any code, as the input data are standardized according to their Open Humans data source. So for example you can easily run a Fitbit analysis notebook written by someone else right away on your own Fitbit data. To get you started we have a step-by-step guide on how to use the Personal Data Notebooks, along with a set of ready-to-use data analysis notebooks for Fitbit, Apple Health, Moves, 23andMe and Twitter archive data.

But this is just the start. We can’t wait to see what kind of analysis notebooks the community will come up with. To kick off the development of additional notebooks we are running a small competition. Submit your own personal data notebooks until May 27th and our judges will select the most interesting submissions to add them to our example notebooks. For this competition Steven Jonas, Azure Dominique and Gary Wolf of QuantifiedSelf.com have agreed to be our judges! If you need an inspiration for your notebooks you can take a look at already proposed notebook ideas and discuss your ideas on Slack.

Meet Andrew Riha, our next project grant awardee

March 23, 2018Open HumansBastian

Today we’re introducing Andrew Riha who recently was awarded one of our project grants for his tool lineage. With lineage Andrew will make the genetic data you store on Open Humans even more useful, by enabling Ancestry analyses!

Hey Andrew, please give our blog readers a quick introduction about who you are!

I’m a systems engineer at an aerospace company in Southern California. I studied at Iowa State University, the University of Newcastle, and Delft University of Technology, and I have a B.S. and M.S. in computer engineering. A few years ago, I became interested in direct-to-consumer DNA testing after a friend told me about his experience with 23andMe. This interest developed into a passion, and I’m currently pursuing a graduate certificate in bioinformatics. My hobbies include running, traveling, and backpacking.

When and how did you come to Open Humans?

Director of Research, Bastian, introduced me to the Open Humans platform in early 2018. I had mentioned to Bastian that I wanted to turn my hobby open source Python project lineage into a web app, so he suggested I consider applying for a project grant.

Have you been involved in any projects on Open Humans so far, either as a participant or even running your own?

This is my first project with Open Humans. I’m looking forward to learning from others and further developing and integrating lineage into the Open Humans ecosystem as a great open source web app!

Your project lineage was awarded one of the Open Humans project grants. Can you explain us what the project is about?

lineage is a framework for analyzing genotype files (e.g., raw data files from 23andMe, Ancestry, etc.), primarily for the purposes of genetic genealogy and ancestry analysis. It can identify DNA and genes shared between individuals, and it provides other useful capabilities such as merging raw data files from different testing companies, identifying discrepant and discordant SNPs, and remapping SNPs to different assemblies / builds.

How did you come up with the idea behind lineage?

After my friend told me about his experience with 23andMe, I started researching how to get tested and found the International Society of Genetic Genealogy’s wiki very helpful and informative. The wiki led me to an excellent paper by Whit Athey that discussed using genotype files to phase the chromosomes of a family group and “reverse engineer” the DNA of a missing parent in the process! So, for a CS50 final project, I challenged myself to implement Whit’s algorithm in Python, using scientific libraries and vectorized programming in order to efficiently handle and analyze the large datasets involved.

The initial algorithm implementation was successful, and lineage had begun. But, I soon realized the need for other capabilities, such as comparing / merging files from different testing companies and determining what DNA is shared between individuals so that it could be used to guide the phasing algorithm. So, lineage grew into the framework that exists today, and I eventually want to return to implementing Whit’s algorithm, applying the bioinformatics and visualization concepts that I’ve learned along the way.

Is there anything important that we didn’t cover so far that you’d like to add?

lineage wouldn’t have been possible without the knowledge and help graciously provided by so many people. It is in that spirit that I’d like to encourage others to create and contribute to open source projects – sharing your ideas and passions with the world can be a very rewarding endeavor!

Oh, and thanks Mom, Dad, grandmas, and grandpas for the genes. 🙂