Two of our project grant awardees – Kevin Arvai and Andrew Riha – have been working tirelessly to build two new web tools that can make use of your genetic data that’s stored in Open Humans in interesting ways. And their hard work has paid off: Kevin’s Imputer and Andrew’s Lineage are now available!
Imputeris designed to fill the gaps in your genetic testing data. Direct-To-Consumer companies like 23andMe usually genotype just a small fraction of your genome, focusing on generating a low-resolution snapshot across your whole genome. Genotype imputation fills in those gaps by looking at reference populations of many individuals who have been fully sequenced in a high resolution, using this data to predict how to fill the gaps in your own data set. Imputer is using the reference data from the 1000 Genomes Project to perform this gap-filling and deposits the filled-up data in your Open Humans account. Kevin also provides two Personal Data Notebooks that you can use to explore your newly imputed data set. If you want to explore the quality of the newly identified variants, you can use this quality control notebook. And if you’re interested to see where your genome falls within a two-dimensional graph of different populations from around the globe, this notebook allows you to explore how closely you relate to other people in the 1000 Genomes data.
Andrew’s Lineage brings some further tools and genetic genealogy methods to Open Humans. If you have been tested by more than one Direct-To-Consumer genetic testing company, Lineage allows you to merge those different datasets into one large file, while also highlighting the variants that came out as different between those tests. You can also lift your files to a newer version of the human reference genome, which might be needed for using your data with other tools. Furthermore, Lineage brings a lot of interesting genetic genealogy tools: It allows you to compute how much shared DNA can be found between your own data and the genetic data of other individuals, using a genetic map. You can then create plots of the shared DNA between those two data sets, determine which genes are shared between them and even find discordant SNPs between the data sets.
I’ve recently joined the board of directors of Open Humans, joining the current board along with two other new directors, Marja Pirttivaara and Alexander (Sasha) Wait Zaranek. I’m honored to be in their company, and I want to take advantage of joining the board to explain how, in my view, Quantified Self and Open Humans fit together. Both communities include many people working in science and technology who take an interest in biometric data. But this isn’t enough to define a common purpose, and in fact a much deeper connection between Open Humans and Quantified Self has developed over the last few years, as each community has approached, from nearly opposite directions, a common problem: How can we make meaningful discoveries with our own personal data?
Open Humans has its roots in the Personal Genome Project,
whose purpose was to supply scientists with human genomic data so that
they could make discoveries more quickly. The geneticist George Church
created a project to sequence the genome of individual volunteers who
agreed to donate their genomic data non-anonymously, creating a common
data resource. Since many important genomic questions cannot be answered
with genome data alone, volunteers also shared other information about
themselves. The Personal Genome Project inevitably became a somewhat
more general personal data resource for science; however, with its focus
on genomic data, much relevant data, including the kind of data that
could be collected in daily life, remained out of scope.
When I first met Jason Bobe, who co-founded Open Humans with Mad Price Ball,
he was keenly interested in this question of how to connect personal
genomes with other personal data sets. Jason had worked with George
Church on the Personal Genome Project. He and Mad saw Open Humans as an
analogous effort, but one that would allow volunteers to contribute any
kind of data. The Personal Genome Project was now a decade old. Perhaps,
with deep personal data sets to work with, scientists could deliver on
the promise of genomics to revolutionize medicine, a promise that had
been long frustrated by the complexity connecting genomic data with real
I understood the goal. A few years earlier, I’d written a long Wired story about the taxonomic collaboration between Daniel Janzen and Paul Hebert.
Janzen, along with his other accomplishments, was among the world’s
most knowledgeable field biologists. Hebert had developed a genomic
assay that promised to identify animals using an extremely small region
(about 650 base pairs) of their mitochondrial DNA. Hebert was confident
in his technique, but needed to prove its utility. How could the genomic
data he was collecting be paired to real world ecological knowledge? At
their field station in the Guanacaste Preserve in Costa Rica, Janzen
and his partner Winnie Hallwachs, along with their students and
colleagues, collected hundreds of butterflies and moths, identified
them, snipped off a leg, and shipped it to Guelph, a city in Canada,
where Hebert ran the sequence. Slowly, painstakingly, they connected the
genomic data to the real world data. More than just proving that
Hebert’s technique worked, they also brought a new degree of resolution
to the ecological picture; showing, for instance, that individual
specimens, though visually almost identical as adults, may belong to distinct evolutionary clades and feed on different plants.
In my first conversations with Jason, I saw this as how Open Humans
should work. It promised to provide the “field biology” for the genomic
studies of the Personal Genome Project.
Unfortunately, as attentive readers, link followers, and experts in
the history of overconfidence in science may already have realized,
there’s a pretty serious flaw in my analogy. Paul Hebert was using the
genome to distinguish strands in evolutionary history, mostly at the
level of species. He wanted to know, given a leg, what kind of creature
it was from. Answering relevant health questions requires understanding
the world at a far more detailed level, down to extremely small
differences among individuals of the same species. The trick that Hebert
used is never going to work; and, for many of the health related
questions we care about, nobody knows the tricks that will work. Fifteen
years after the launch of the Personal Genome Project, it continues to
supply data resources to basic science, but its relevance to medicine
remains mostly a promise.
In the Quantified Self community the focus has always been on
individual discovery: How can we learn about ourselves using our own
data? Many of the questions addressed by people doing their own QS
projects relate to health and disease. Browse the archive of Quantified Self Show&Tell
presentations and you’ll find projects on Parkinson’s disease,
diabetes, cognitive decline, cardiovascular health, depression, hearing
loss, and many other health related issues. The kind of “everyday
science” practiced in the Quantified Self community can be understood as
being the opposite of the genome-wide association studies. Instead of
finding small, telling differences among groups of people, the everyday
science of the Quantified Self finds large effects within a single
person who is both subject and scientist.
This comes with its own kinds of difficulties. People doing
Quantified Self projects related to health face a number of discouraging
barriers, including lack of access to their own data and medical
records, bureaucratic roadblocks and exorbitant costs in ordering their
own lab tests, problems in acquiring the requisite domain knowledge to
test their ideas and interpret their data, and – perhaps most
discouraging to people who are dependent on medical professionals for
some aspect of their care – lack of recognition in the health care
system that self-collected data can be useful for making decisions about
In the 11 years since Quantified Self started, participants have
tried many different ways to overcome these barriers, both individually
for their own projects and systematically through creating tools and
advocating for better policies. One of the lessons from this work is
that while the focus of self-tracking projects is typically on
individual learning, the methods required to make sense of our data
often require collaboration. Existing systems are not designed to
provide support for the kind of highly individualized reasoning we do;
therefore, we have to build a new system. Key requirements of this new
system include: private, secure data storage; capacity to integrate data
from commercial wearable devices; fine-grained permissions allowing
sharing of particular data with particular projects, and withdrawal of
permission; capacity for ethical review both to protect individual
participants and to enable academic collaborations.
Two years ago, we organized our first participant-led research
project in the Quantified Self community. A group of about two dozen of
us measured our blood cholesterol as often as once per hour, exploring
both individual questions about the patterns and causes of variation in
our blood lipids and a common group question about lipid variability. We
had a pressing need for some collective study infrastructure, but there
was no available tool that worked for our needs. We took a DIY approach
and at the end of the project we’d learned a tremendous amount both
about our own varying cholesterol and about the process of self-directed
research. (Our paper, “Approaches to governance of participant-led research,”
has recently been published in BMJ Open; our paper on our collective
discovery about lipid variability has been accepted for publication in
the Journal of Circadian Biology; we’ll add a URL when we have it.)
At the conclusion of our study, one of the participant organizers
Azure Grant, decided to press ahead with another participant-led study
on ovulatory cycling. Azure had already presented a self-study on using continuous body temperature to predict ovulation
at a Quantified Self conference. Now, she wanted to organize a group of
self-trackers to try something similar, but integrating newer
measurement tools to acquire higher resolution data. Among these tools
was the new version of the Oura ring,
which offered body temperature, heart rate, and sleep data. This idea
put new demands on our study infrastructure. Thanks to generous
collaboration from Oura engineers, we could offer participants access to
detailed data from their rings. But how could this data be stored
privately and controlled by each individual, while also being available
using fine-grained permissions to their fellow participants and study
organizers? How could this data be integrated with other data types they
might decide to collect during the project? Where was there
infrastructure for a “field biology” of the self?
We turned to Open Humans. The personal reasons were as important as
the technical ones. Mad Ball, along with her work leading Open Humans,
is a long time participant in the Quantified Self community, who has
consistently advocated for non-exploitive approaches to handling
personal data, and has contributed the results of her own self-directed
research. (See Mad’s recent talk on “A Self-Study Of My Child’s Genetic Risk.”) And Bastian Greshake Tzovaras, the Open Humans research director, quickly proved to be an extremely sensitive and skilled collaborator. Bastian co-founded openSNP,
a grassroots effort that outgrew Personal Genome Project by supporting
citizen science participation. (Currently, there are more genotyping
datasets publicly shared in openSNP than all other projects in the world
With help from Mad and Bastian and the Open Humans infrastructure, we
built our next stage study workflows with encouraging speed and
harmony. Fundamentally, we found ourselves aligned on the core idea that
research processes designed around personal data sets should be built
to protect individual agency, even where this requirement creates
friction for academic collaborators. The rarity of this commitment may
only be obvious to those few people who have gotten painfully deep into
the workflows of study infrastructure. (And I recognize that a post of
this length that is this deep in the weeds can have very few readers!)
But, in a way, that’s one of the beautiful things about this stage of
building a new knowledge infrastructure. We’re far into it enough to
have evidence that we’re on the right track. But we’re still close
enough to the beginning that each step is a significant contribution and
a potential model to build on.
I very much hope that over time – and the sooner the better – our
shared ideas about individual agency and everyday reasoning are embodied
in tools and policies that are so commonplace that no single
organization is responsible for them. But for now, it’s impossible not
to recognize that Open Humans is an indispensable resource, defining an
approach that needs to be developed and expanded, and managed by a team
that has deep insight into the challenges and potential of participatory
science. I look forward to building more connections between our two
I’m thrilled to announce the results of our 2019 elections for the Open Humans Foundation Board of Directors!
Our community seat winner is Marja Pirttivaara, and our board-elected seats are Gary Wolf and Sasha Wait Zaranek.
Marja Pirttivaara: When I first met Marja at the MyData conference in 2018, it was wonderful to find a like-minded soul — between her interests in genetics and in empowering individuals with their personal data. Marja generously agreed to our EU representative for GDPR, and it’s been exciting to see our project become more global.
Gary Wolf: As co-founder and director of Quantified Self Labs, Gary has supported numerous citizen scientists in their quest to use their personal data to understand themselves, and to collectively create new knowledge. His work is strongly aligned with that of Open Humans, and we very much looking forward to his contribution and leadership.
Sasha Wait Zaranek: Sasha is one of the founders of the Harvard Personal Genome Project and continues to lead in this area. Their focus is on genome data: they want to see that data managed by the people it came from, more understandable, and more re-usable for new projects — and they want to help Open Humans make those things happen.
Marja, Gary, and Sasha join our ongoing board members: Mad Price Ball, Karien Bezuidenhout, Steven Keating, Dana Lewis, and James Turner.
We must bid farewell to Misha Angrist and Michelle Meyer — their terms have ended. Both have been involved with the organization for many years, and we hope this is not the last we see of them! We must also bid farewell to Chris Gorgolewski, who has resigned; his 2018 seat is being left vacant for now. Also, we’ve made the voting results from the 2019 Community Seat election available here: http://openhumansfoundation.org/2019-ohf-election-votes.csv
We’re honored by the contribution of every board member, and their collective stewardship of our project. And we’re honored by all candidates for these positions. Not everyone can win — indeed, it would be a poor election if we didn’t have people to choose between. We very much hope other candidates remains involved — there are so many things to do together!
We got a great selection of new projects and personal data explorations for you as an end-of-year gift. Here is an overview of the data import projects recently launched on Open Humans:
Oura Ring: You can now explore your sleep habits, body temperature and physical activity data as collected by the Oura Ring.
Overland: If you are using an iPhone you can now use Overland to collect your own geo locations along with additional data such as your phone’s battery levels over the day.
Google Location History: As an alternative way to record and import your location data you can now import a full Google Location History data set.
Spotify: Start creating an archive of your listening history through the Spotify integration
RescueTime: Import your computer usage data and productivity records into your account
Read more details about those integrations below:
Connect your Oura Ring
The Oura is a wearable device well hidden inside a ring. It measures heart rate, physical activity and body temperature to generate insights into your sleep and activity habits. With Oura Connect you can setup an ongoing import of those data into your Open Humans account. This allows you to explore those data more thanks to already available Personal Data Notebooks!
Use Google Location History to explore your location data
Thanks to our Outreachy interns we have another new geolocation data source: Google Location History. No matter if you are using an iPhone or an Android phone, you can use the Google or Google Maps app on your phone to record where you have been. Through Google Takeout you can now export this data and then load it into Open Humans and explore it through Personal Data Notebooks.
Explore your music listening behaviour with Spotify data
Another Outreachy intern project was to collect your Spotify Listening History through Open Humans. Using Spotify Connect will automatically import the songs you listen to along with lots of metadata (e.g. how popular was the song at the time you listened to it?). Once you have collected some data, you can explore these through another Personal Data Notebook!
Open Humans now consists of over 6,000 members that collectively have uploaded over 16,000 data sets!
To share this great community effort as a resource, we wrote our first academic manuscript. In it, we describe the platform, community, and some diverse projects that we’ve all enabled. You can find a pre-print on BioRxiv.
True to the community spirit of Open Humans, we wrote the manuscript completely in public and with an open call for contributions through our Slack. Thanks to this we could gather diverse perspectives of how Open Humans can be utilized for both research as well as personal data exploration. Using these existing projects and studies running on Open Humans as examples, we explore how our community tackles complex issues such as informed consent, data portability, and individual-centric research paradigms. Read more about this in the manuscript.
All of this is only made possible by your contributions to Open Humans, so we want to take this opportunity to thank you for your participation!
While developing these features, it has been an absolutely fulfilling experience to be able to refine the work as Dana Lewis did some testing as an end-user and provided some great feedback 🙂
So far, some of the highlights of my work are:
The dashboard shows a lot of information and allows a variety of actions – this made it crucial to design the application focusing on the user experience. I worked with various aspects of a front-end framework (Bootstrap here) and came across the ease of basic styling of HTML elements such as forms, tables, buttons, icons, etc. and the most useful – Bootstrap modal. Bootstrap provided with a consistent theme for the dashboard with good documentation. It now keeps the scope to leverage the grid system to allow development of a mobile-first application.
Running codebase in different environments is always a great learning. Working with the file-download feature I learned that a given request on the web worker may only last 30 seconds on Heroku (our production environment) and will be killed after that period. Since creating a zip file can easily take longer, we did some brainstorming and decided to keep the network calls out of the request-response cycle by creating a celery worker task to do the downloading job. Working with celery worker tasks and Redis broker was a new and enriching experience for me.
To allow OH-run version and allow developers to run a version themselves as well, we weighed in two options to store the downloaded files – AWS S3 and transfer.sh wherein we settled on the first option given the 10GB limit on the latter.
Since the downloading of files happened as a background job, the user could be notified of the completion of file download either through a dashboard notification or via an email. We decided to go ahead by emailing the user (easy peasy) by setting up a configurable SMTP server in Django.
The work done so far has been rewarding in terms of experience with Django, Bootstrap and various other modules. More development calls for more feedback and hence, more iterations. Therefore, I’ll be working on making some modifications to the UI and features incorporating the feedback.
We are happy that Open Humans will have four Outreachy interns this summer. Our interns are working on their own Open Humans related projects and will regularly blog about their internship experience. Read Manaswini’s post about working with the Github API for a new Open Humans project:
I have been working with Github API all this while. I had come across some really cool visualizations with Github API but hadn’t had the chance to work on it. Thanks to the project ‘Adding data sources’ that I was motivated to add Github as a data source and guess what, I discovered that it will work in principle! The online development tool Github enables developers to contribute and discover open source projects, thus realizing their aspirations.
Github provides REST API access to get variety of the data of your projects from Github, be it repositories, issues or pull requests and lots more! I chose Github API since the documentation is meticulous and easy to comprehend for anyone who is looking forward to getting started. Also this provides an excellent source for creating data explorations.
The outputs generated are available in various formats including JSON and other compatible formats depending upon the data to be extracted. One can view public data without authentication but in case one wants to store private data, then authentication is a must. Both single and two-factor authentications are supported. Two-factor authentications, as the name suggests, is more secure as this doesn’t include sharing your passwords as in basic authentication.
This internship period has been truly rewarding. I came across some really cool stuffs such as rate limiting and requests respectful during this period which I was unaware of until now. Diving deeper, I also came across the types of rate limiting.i.e. user rate limiting, geographic rate limiting and server rate limiting. All thanks to my mentor, Mike Escalante for giving me invaluable insight regarding the above terms and supervising me in each and every step.
The Github API provides rate limits of 5000 requests per hour. Till now, I have been editing the demo template and resolving issues simultaneously. I was experimenting with the output JSON and investigating various ways in which the output JSON will look better.
We are happy that Open Humans will have four Outreachy interns this summer. Our interns are working on their own Open Humans related projects and will regularly blog about their internship experience. Read Tarannum’s post about working on creating packages for the Open Humans API:
The Open Humans API provides command line tools to interact with Open Humans. From where, I started working with Open Humans API to where it is now, the transformation has been just magical.
With the help of Open Humans organisation mentors, lot of features were added like messaging and uploading large files with the help of AWS. From learning the proper way to name the functions to testing the functions, documenting the API using Sphinx, adding more command line tools, every smaller piece of learning added to create an avalanche of wisdom.
For legacy code, test is a necessity. The code is full of logic, it’s alive and it’s core to any application. We need to keep going back to the code to update it, so tests are crucial.
I mocked the API test using VCR.py . It’s just like Thor’s stormbreaker (Marvel fan will understand) in the field of unit testing. It’s apt for those applications which make http calls to external services(in my case Open Humans).
The external service should not be tightly coupled with the application. If the external services fail, the application should still run correctly. One absolute novice testing way is to hard code error prone request and code the necessary error handling routines. But, this is highly not recommended because once you update even a few line of code, you have to manually test it again. So, we hop on to the more sophisticated way of testing: Unittesting.
Running tests which call the external service every time will be very slow. Apart from it, the tests won’t work offline and sending too many requests to the external service can be a problem too. So, here steps the protagonist, VCR.py. 🙂
You just need to run the test online once when the cassette file which stores all the important information related to the requests response has not been created. Once, it is done, the response of requests will just be compared with the cassette file and yes , it’s done 🙂 . You have became successful in making your application more sturdy.
How to use VCR.py
For each function serving a particular feature, a class for testing is created. For testing different lines of code of a particular feature, different tests(functions with assertion) is created in the test class. Tests for valid response, invalid responses are written.
For the first time, when the test is run, the request really hits the external service and a cassette file is formed. Once the cassette is formed which contain status codes, status, etc, the response of tests is just compared with the cassette file when the tests are run again. It really speeds up the testing process and saves us from sending requests to the external service again and again. Whether you are offline at your home or 40,000 feet above the ground anytime you can test your code, once the cassettes are formed.
I would like to thank Mad Price Ball for mentoring me in the best way possible. The meetings with them are fun where I learn a lot of new stuff and a new way to see things, to tackle an issue with a fresh perspective.
Apart from this, I also worked on testing the public functions of Open Humans API. Presently, I am working on creating a reusable django app for Open Humans. A lot of brainstorming goes while you are in the process of designing an application, it’s use cases, what kind of audience it will cater.
Reusable application are the application which will be used by other applications. After all, reusability is the way of life in Python. I will cover the reusable app topic in my next blog. Till then, stay tuned. 🙂
Soon Moves will be dead – and there is no obvious replacement. But there could be.
OpenPaths is a similar tool, developed seven years ago by a team at the New York Times Research & Development Group. It had an ethos that matched our own – it empowered its users, gave them access to their data, and the ability to share with projects. The NYT team handed OpenPaths to an academic group at UCSD. And late last year, UCSD gave it to us.
OpenPaths was like Open Humans before we existed – and by some amazing act of fate, we have inherited it. It could be something better than Moves: a nonprofit, open source tool that strives to empower the community that maintains it.
But OpenPaths is broken, and we need help to fix it!
We are happy that Open Humans will have four Outreachy interns this summer. Our interns are working on their own Open Humans related projects and will regularly blog about their internship experience. Read Manaswini Das’s post about their way to Outreachy and their first two weeks as an Outreachy intern:
Open-source… I was a bit obnoxious about this term until a year ago, when I was not familiar with this new world of outstanding work done by millions across the globe. It’s been a year now and the journey has been more than rewarding.
My journey started with contributions to repositories as a part of Hacktoberfest 2017. I got a limited edition Hacktoberfest T-shirt too, as promised. The thought of contributing to something that will be utilized by the world intoxicated me and inspired me to dive deep into this. I started looking for other ways to find repositories that kindled my interest.
Going through several blog posts over the internet, I came across Outreachy, an open source internship program for people from marginalized groups. I had applied for Winter term 2017 (Round 15). But then, it was already nearing the deadline when I started contributing. So, I knew I stood a slim chance of getting accepted.
This time, I didn’t commit the same mistake. Once Round 16 was announced, I started exploring organizations and projects. I concentrated on ‘Adding data sources to Open Humans’ project under Open Humans Foundation and began my contributions right away! Two months hence, I found my name among the accepted interns. I am overwhelmed and looking forward to making the most of this internship period.
For those who haven’t come across this open source internship program, let me enlighten you.
What is Outreachy all about?
Well, Outreachy is an open source internship program for individuals belonging to groups traditionally underrepresented in technology. This program is similar to Google Summer of Code, except for the fact that it is not limited to students and it happens twice a year, May through August and December through March cohorts.
For those who want to probe deeper into this, find the details here.
Now, some tips for the ones preparing for the upcoming rounds:
To make it into the internship program, you should begin as soon as possible. It takes time to comprehend the code base. It may seem intimidating at first but select your projects wisely.
In case you are not comfortable with a project even after contributing, you still have time and liberty to switch to projects matching your interest. Keep in mind that you can apply to a maximum of two projects.
In case you have questions, don’t hesitate to ask questions to your mentors. Don’t be afraid of being judged.
Asking questions doesn’t reveal your ignorance. It is a sign that you are learning.
Don’t be shy. Shed your cocoon and feel free to ask even the silliest of questions. But remember, do your research too. Try to work out the problem on your own first. If you are still stuck, then reach out to the community. You never know, it might be a bug!
Don’t doubt your abilities
If you think you don’t fit in, then, trust me you are the right person to apply for this internship program. You won’t be able to explore this new you unless you do it.
Don’t aim at a huge last-minute contribution. Make small but consistent contributions till the end of the application period. This creates a good impression.
Another golden tip: In case you are not into contributions for some time, be in touch with your mentors. Discuss your ideas about the project and know more about the organization.
At times, you may feel that you could achieve everything only due to luck and that you lack potential.
You may also feel that you won’t be able to make it even after you get accepted. Well, my friend, you are suffering from the imposter syndrome. This happens when you focus on the big picture of what you are trying to do in a project. To overcome this, follow the divide and conquer rule.
Have faith in yourself. Don’t let the imposter syndrome grip you.
This is the final dash to the race. Discuss your ideas with your mentors and come up with a suitable timeline. Work out your schedule and make sure your proposal is precise. Submit your proposal for review to your mentors. Trust me, your proposal will get better with each review. And yes, don’t wait till the last minute for this.
It’s been more than two weeks into this internship period now. I am working on adding Github and Twitter API integrations under the mentor-ship of Mike Escalante. First three weeks, I have been getting familiar with the codebase and the workflow that is to be followed for the integrations, taking some help from the existing integrations. Apart from that, I have been exploring the Github API and setting up the app on Heroku.
My mentor, Mike has been very supportive and encouraging throughout, checking-in almost everyday and clearing all my doubts in a jiffy.
I’m planning to get the Github integration up and running by this week and then, I will be working on creating data explorations of this integration for the next two weeks.
I’ll be coming up with the technical details of this Github integration in the upcoming posts.