snps: a new open source project with Open Humans roots

snps is a new open source Python package that aims to help users interact with genetic data from a variety of sources, including direct-to-consumer (DTC) DNA testing companies and whole genome sequencing (WGS) services. Specifically, snps provides tools to help with reading, writing, merging, and remapping SNPs.

The initial snps capability was developed by Andrew Riha as part of lineage, which joined the Open Humans ecosystem as a project in early 2019. Soon thereafter, numerous members from the Open Humans community, including Bastian Greshake Tzovaras, Mad Price Ball, James Turner, Ben Carr, and Beau Gunderson, requested support for VCF files. So, in May 2019, Kevin Arvai (with his VCF experience from Imputer) and Andrew teamed up to add the VCF capability, and wanting to share the work with others, snps began as an open source project to further enable citizen science.

Some of snps capabilities are detailed below, and a Notebook demonstrating usage of `snps` is available on Open Humans has been developed by Kevin & Bastian.

Reading

snps supports reading VCF (variant call format) files, in addition to files from 23andMe, Ancestry, Family Tree DNA, and MyHeritage.

Moreover, snps attempts to detect the assembly, or build, of the data. Commonly, Builds 36, 37, and 38 are used today, and these represent the “version” of the reference genome.

Writing

snps supports writing SNPs to CSV and VCF files for Builds 36, 37, and 38. This also means that snps can be used to essentially convert files from DTC DNA tests to VCF format.

Merging

snps supports merging datasets, e.g., if test results are available from more than one source. When SNPs are merged, any discrepancies are identified.

Remapping

snps supports remapping SNPs from one assembly to another. SNPs can be remapped between Builds 36, 37, and 38.

This guest post was written by Andrew Riha and Kevin Arvai. Andrew & Kevin have both launched projects that make use of genetic data on Open Humans before.

Meet our latest tools to use your genetic data: Imputer & Lineage

Two of our project grant awardees – Kevin Arvai and Andrew Riha – have been working tirelessly to build two new web tools that can make use of your genetic data that’s stored in Open Humans in interesting ways. And their hard work has paid off: Kevin’s Imputer and Andrew’s Lineage are now available!

Imputer is designed to fill the gaps in your genetic testing data. Direct-To-Consumer companies like 23andMe usually genotype just a small fraction of your genome, focusing on generating a low-resolution snapshot across your whole genome. Genotype imputation fills in those gaps by looking at reference populations of many individuals who have been fully sequenced in a high resolution, using this data to predict how to fill the gaps in your own data set. Imputer is using the reference data from the 1000 Genomes Project to perform this gap-filling and deposits the filled-up data in your Open Humans account. Kevin also provides two Personal Data Notebooks that you can use to explore your newly imputed data set. If you want to explore the quality of the newly identified variants, you can use this quality control notebook. And if you’re interested to see where your genome falls within a two-dimensional graph of different populations from around the globe, this notebook allows you to explore how closely you relate to other people in the 1000 Genomes data.

Andrew’s Lineage brings some further tools and genetic genealogy methods to Open Humans.  If you have been tested by more than one Direct-To-Consumer genetic testing company, Lineage allows you to merge those different datasets into one large file, while also highlighting the variants that came out as different between those tests. You can also lift your files to a newer version of the human reference genome, which might be needed for using your data with other tools. Furthermore, Lineage brings a lot of interesting genetic genealogy tools: It allows you to compute how much shared DNA can be found between your own data and the genetic data of other individuals, using a genetic map. You can then create plots of the shared DNA between those two data sets, determine which genes are shared between them and even find discordant SNPs between the data sets.

Enjoy exploring your DNA!

Why I’ve Joined the Board of Open Humans

This is a post by Gary Wolf, one of the new Directors of the Open Humans Foundation. The post was originally published on the Quantified Self blog.

I’ve recently joined the board of directors of Open Humans, joining the current board along with two other new directors, Marja Pirttivaara and Alexander (Sasha) Wait Zaranek. I’m honored to be in their company, and I want to take advantage of joining the board to explain how, in my view, Quantified Self and Open Humans fit together. Both communities include many people working in science and technology who take an interest in biometric data. But this isn’t enough to define a common purpose, and in fact a much deeper connection between Open Humans and Quantified Self has developed over the last few years, as each community has approached, from nearly opposite directions, a common problem:  How can we make meaningful discoveries with our own personal data?

Sample projects from Open Humans, an open infrastructure for storing and sharing personal data with chosen collaborators.

Open Humans has its roots in the Personal Genome Project, whose purpose was to supply scientists with human genomic data so that they could make discoveries more quickly. The geneticist George Church created a project to sequence the genome of individual volunteers who agreed to donate their genomic data non-anonymously, creating a common data resource. Since many important genomic questions cannot be answered with genome data alone, volunteers also shared other information about themselves. The Personal Genome Project inevitably became a somewhat more general personal data resource for science; however, with its focus on genomic data, much relevant data, including the kind of data that could be collected in daily life, remained out of scope.

When I first met Jason Bobe, who co-founded Open Humans with Mad Price Ball, he was keenly interested in this question of how to connect personal genomes with other personal data sets. Jason had worked with George Church on the Personal Genome Project. He and Mad saw Open Humans as an analogous effort, but one that would allow volunteers to contribute any kind of data. The Personal Genome Project was now a decade old. Perhaps, with deep personal data sets to work with, scientists could  deliver on the promise of genomics to revolutionize medicine, a promise that had been long frustrated by the complexity connecting genomic data with real world outcomes.

I understood the goal. A few years earlier, I’d written a long Wired story about the taxonomic collaboration between Daniel Janzen and Paul Hebert. Janzen, along with his other accomplishments, was among the world’s most knowledgeable field biologists. Hebert had developed a genomic assay that promised to identify animals using an extremely small region (about 650 base pairs) of their mitochondrial DNA. Hebert was confident in his technique, but needed to prove its utility. How could the genomic data he was collecting be paired to real world ecological knowledge? At their field station in the Guanacaste Preserve in Costa Rica, Janzen and his partner Winnie Hallwachs, along with their students and colleagues, collected hundreds of butterflies and moths, identified them, snipped off a leg, and shipped it to Guelph, a city in Canada, where Hebert ran the sequence. Slowly, painstakingly, they connected the genomic data to the real world data. More than just proving that Hebert’s technique worked, they also brought a new degree of resolution to the ecological picture; showing, for instance, that individual specimens, though visually almost identical as adults, may belong to distinct evolutionary clades and feed on different plants. In my first conversations with Jason, I saw this as how Open Humans should work. It promised to provide the “field biology” for the genomic studies of the Personal Genome Project.

Handwritten species list from the Patilla field station in the Guanacaste National Park, Costa Rica.

Unfortunately, as attentive readers, link followers, and experts in the history of overconfidence in science may already have realized, there’s a pretty serious flaw in my analogy. Paul Hebert was using the genome to distinguish strands in evolutionary history, mostly at the level of species. He wanted to know, given a leg, what kind of creature it was from. Answering relevant health questions requires understanding the world at a far more detailed level, down to extremely small differences among individuals of the same species. The trick that Hebert used is never going to work; and, for many of the health related questions we care about, nobody knows the tricks that will work. Fifteen years after the launch of the Personal Genome Project, it continues to supply data resources to basic science, but its relevance to medicine remains mostly a promise.

In the Quantified Self community the focus has always been on individual discovery: How can we learn about ourselves using our own data? Many of the questions addressed by people doing their own QS projects relate to health and disease. Browse the archive of Quantified Self Show&Tell presentations and you’ll find projects on Parkinson’s disease, diabetes, cognitive decline, cardiovascular health, depression, hearing loss, and many other health related issues. The kind of “everyday science” practiced in the Quantified Self community can be understood as being the opposite of the genome-wide association studies. Instead of finding small, telling differences among groups of people, the everyday science of the Quantified Self finds large effects within a single person who is both subject and scientist.

This comes with its own kinds of difficulties. People doing Quantified Self projects related to health face a number of discouraging barriers, including lack of access to their own data and medical records, bureaucratic roadblocks and exorbitant costs in ordering their own lab tests, problems in acquiring the requisite domain knowledge to test their ideas and interpret their data, and – perhaps most discouraging to people who are dependent on medical professionals for some aspect of their care – lack of recognition in the health care system that self-collected data can be useful for making decisions about treatment.

In the 11 years since Quantified Self started, participants have tried many different ways to overcome these barriers, both individually for their own projects and systematically through creating tools and advocating for better policies. One of the lessons from this work is that while the focus of self-tracking projects is typically on individual learning, the methods required to make sense of our data often require collaboration. Existing systems are not designed to provide support for the kind of highly individualized reasoning we do; therefore, we have to build a new system. Key requirements of this new system include: private, secure data storage; capacity to integrate data from commercial wearable devices; fine-grained permissions allowing sharing of particular data with particular projects, and withdrawal of permission; capacity for ethical review both to protect individual participants and to enable academic collaborations.

Two years ago,  we organized our first participant-led research project in the Quantified Self community. A group of about two dozen of us measured our blood cholesterol as often as once per hour, exploring both individual questions about the patterns and causes of variation in our blood lipids and a common group question about lipid variability. We had a pressing need for some collective study infrastructure, but there was no available tool that worked for our needs. We took a DIY approach and at the end of the project we’d learned a tremendous amount both about our own varying cholesterol and about the process of self-directed research. (Our paper, “Approaches to governance of participant-led research,” has recently been published in BMJ Open; our paper on our collective discovery about lipid variability has been accepted for publication in the Journal of Circadian Biology; we’ll add a URL when we have it.)

Slide detail from one of Azure Grant’s QS Show&Tell Talks

At the conclusion of our study, one of the participant organizers Azure Grant, decided to press ahead with another participant-led study on ovulatory cycling. Azure had already presented a self-study on using continuous body temperature to predict ovulation at a Quantified Self conference. Now, she wanted to organize a group of self-trackers to try something similar, but integrating newer measurement tools to acquire higher resolution data. Among these tools was the new version of the Oura ring, which offered body temperature, heart rate, and sleep data. This idea put new demands on our study infrastructure. Thanks to generous collaboration from Oura engineers, we could offer participants access to detailed data from their rings. But how could this data be stored privately and controlled by each individual, while also being available using fine-grained permissions to their fellow participants and study organizers? How could this data be integrated with other data types they might decide to collect during the project? Where was there infrastructure for a “field biology” of the self?

We turned to Open Humans. The personal reasons were as important as the technical ones. Mad Ball, along with her work leading Open Humans, is a long time participant in the Quantified Self community, who has consistently advocated for non-exploitive approaches to handling personal data, and has contributed the results of her own self-directed research. (See Mad’s recent talk on “A Self-Study Of My Child’s Genetic Risk.”) And Bastian Greshake Tzovaras, the Open Humans research director, quickly proved to be an extremely sensitive and skilled collaborator. Bastian co-founded openSNP, a grassroots effort that outgrew Personal Genome Project by supporting citizen science participation. (Currently, there are more genotyping datasets publicly shared in openSNP than all other projects in the world combined.)

With help from Mad and Bastian and the Open Humans infrastructure, we built our next stage study workflows with encouraging speed and harmony. Fundamentally, we found ourselves aligned on the core idea that research processes designed around personal data sets should be built to protect individual agency, even where this requirement creates friction for academic collaborators. The rarity of this commitment may only be obvious to those few people who have gotten painfully deep into the workflows of study infrastructure. (And I recognize that a post of this length that is this deep in the weeds can have very few readers!) But, in a way, that’s one of the beautiful things about this stage of building a new knowledge infrastructure. We’re far into it enough to have evidence that we’re on the right track. But we’re still close enough to the beginning that each step is a significant contribution and a potential model to build on.

I very much hope that over time – and the sooner the better – our shared ideas about individual agency and everyday reasoning are embodied in tools and policies that are so commonplace that no single organization is responsible for them. But for now, it’s impossible not to recognize that Open Humans is an indispensable resource, defining an approach that needs to be developed and expanded, and managed by a team that has deep insight into the challenges and potential of participatory science. I look forward to building more connections between our two communities.

Our new 2019 Directors

I’m thrilled to announce the results of our 2019 elections for the Open Humans Foundation Board of Directors!

Our community seat winner is Marja Pirttivaara, and our board-elected seats are Gary Wolf and Sasha Wait Zaranek.

Marja Pirttivaara: When I first met Marja at the MyData conference in 2018, it was wonderful to find a like-minded soul — between her interests in genetics and in empowering individuals with their personal data. Marja generously agreed to our EU representative for GDPR, and it’s been exciting to see our project become more global.

Gary Wolf: As co-founder and director of Quantified Self Labs, Gary has supported numerous citizen scientists in their quest to use their personal data to understand themselves, and to collectively create new knowledge. His work is strongly aligned with that of Open Humans, and we very much looking forward to his contribution and leadership.

Sasha Wait Zaranek: Sasha is one of the founders of the Harvard Personal Genome Project and continues to lead in this area. Their focus is on genome data: they want to see that data managed by the people it came from, more understandable, and more re-usable for new projects — and they want to help Open Humans make those things happen.

Marja, Gary, and Sasha join our ongoing board members: Mad Price Ball, Karien Bezuidenhout, Steven Keating, Dana Lewis, and James Turner.

We must bid farewell to Misha Angrist and Michelle Meyer — their terms have ended. Both have been involved with the organization for many years, and we hope this is not the last we see of them! We must also bid farewell to Chris Gorgolewski, who has resigned; his 2018 seat is being left vacant for now. Also, we’ve made the voting results from the 2019 Community Seat election available here: http://openhumansfoundation.org/2019-ohf-election-votes.csv

We’re honored by the contribution of every board member, and their collective stewardship of our project. And we’re honored by all candidates for these positions. Not everyone can win — indeed, it would be a poor election if we didn’t have people to choose between. We very much hope other candidates remains involved — there are so many things to do together!

2019 Board of Directors Candidates

The self-nomination period for our Board of Directors is over and we are excited to share this year’s candidates! We hope to begin the community seat election sometime next week, followed by a board ratification of this vote and election of two additional seats.

Benjamin Carr

Links
https://twitter.com/BenjaminHCCarr
https://github.com/BenjaminHCCarr/
https://www.linkedin.com/in/bencarr/

I have been involved with and contributing to open source software, and like-minded communities for over 20 years now. I, like others, in OH am a firm believer in open science, open data, and open access. I was an early enrollee in Harvard-PGP, excited by the promise of enabling precision medicine and an open dataset for researchers to use. I hold a Ph.D. in biology from Boston University and have worked professionally in academic, NGO, government, and private industry.

My expertise bridges multiple areas of science having worked in oceanography, satellite remote sensing, AUVs, marine biology, and bioinformatics, as well as being involved with the 9/11 impact assessment of the Hudson River. I have also been running the OH Facebook account for the last two years. In 2018 I was lucky enough to have a hand in facilitating and doing QA/QC on a portion of the NIH Data Commons Pilot Phase Consortium, and have high hopes that at least one fully open source stack emerges from that endeavor.

Vero Estrada-Galiñanes

Links
LinkedIn: https://www.linkedin.com/in/veronicaestrada/
MyPage: https://sites.google.com/view/veroeg
Twitter: https://twitter.com/GalinanesVero
DSS workshop paper: https://arxiv.org/abs/1809.01974

I am passionate about trustworthy storage systems and digital archives. I am an active member of Open Humans. My interest is mainly focused on: 1) new storage solutions for OH data and 2) better data visualisations of life-logging data collections. I am also co-author of the Open Humans open collaboration article.

My vision about an open health archive was presented during the Data-Driven Self-Regulating Systems (DSS) Workshop in 2018. The main concept is to preserve the health-related data generated throughout the life of an individual without giving away data ownership while promoting open data and data sharing. I keep working on these ideas.

My recent experience comes from postdoc roles (storage systems / distributed systems).  I am a former postdoc at the Quality of Life Technologies (DIKU). Prior to academic jobs, I had leadership roles in the industry and government. I have experience in making sense of large databases. I collaborate with the SciEd Network (Lectures without borders).

Beau Gunderson

Links
Homepage: https://beaugunderson.com
GitHub: https://github.com/beaugunderson

I am a previous employee of Open Humans (2014-2016). Prior to 2014 I worked at Practice Fusion on the Data Science team, and from 2016 to the present I’ve worked at Canvas Medical building electronic health record software for primary care practices. My recent work at Canvas has focused on security and privacy (I am now the security and privacy officer in addition to my engineering duties).

Since leaving Open Humans as an employee I have been an active user of the project. I’ve also maintained a presence on the OH Slack and GitHub as well as offering my review of projects on the Project Review forum.

I believe I would be most useful in the realms of security and privacy and software development guidance.

Nathaniel Pearson

Links
Twitter: https://www.twitter.com/GenomeNathan
Blog: http://genomena.com/
Slides about various projects: https://www.slideshare.net/NathanielPearson
Talks:
https://www.youtube.com/watch?v=ZX0culYjU_A (whole-genome talk) https://www.youtube.com/watch?v=NxkL-KtUaJY (HLA talk)

Exploring what inner data say, about our health and history, has long driven my work. And teaming with fellow geeks, caregivers, and layfolk has made that a joy. The chance now, to help guide how we Open Humans bring our big ideas to life, as an anchor cohort for the biodata-informed future, would fulfillingly continue that effort. To that aim, I bring strong grounding in genomics, a passion to learn new stuff (hello microbiomes!…), and team spirit.

Background-wise, I trained in evolutionary genomics at Stanford and U. Chicago, led collaborative science at ships both small (Knome) and big (New York Genome Center), and teach genetic counseling students as guest faculty at Sarah Lawrence. To help folks pool personal biodata to drive crowd discovery, I launched the Empowered Genome Community in 2012 and recently founded the free, good cause-allied personal immunogenomics company, Root, to honor tissue donor volunteers with well grounded insights from their own match-screened genes.

Marja Pirttivaara

Links
Linked In: https://fi.linkedin.com/in/pirttivaaramarja
Twitter: https://twitter.com/marja_p?lang=en
Blog: http://www.dnaguru.fi/
Facebook group: https://www.facebook.com/groups/FinlandDNA/

I’m a Finnish PhD (physics) and MBA (social and healthcare management), working at the Finnish Innovation Fund Sitra and also an unpaid visiting researcher of the University of Helsinki (DNA related issues). I’m a genetic genealogy expert, admin of Finland DNA project with more than 15 000 members, admin of Finland DNA Facebook group, with 7 700 members. I’m also a founding member of MyData Global. I’m a practical and knowledgeable bridge builder, always curious about the future. I’m just waiting for my whole genome results.

My vision of Open Humans is a trusted global platform and actively cooperating community for fair & responsible sharing and utilizing personal data, mydata, tools and creating best practises.

As a Finn and European and a genetic genealogy & genome data expert (etc) I’d like to contribute to the Open Humans humans community.

Gary Wolf

Links
http://quantifiedself.com

By vocation I’m a journalist but since 2008 I’ve been focused on supporting the Quantified Self community as Director of Quantified Self Labs, a California based social enterprise whose mission is to help people learn from their own data. We’ve been allies and active collaborators with OH. Our most recent collaboration involves using OH to support a participant led research project (PLR) focused on self-tracking of ovulatory cycles. I’m aligned with the Open Humans mission to both support individual agency in using our own personal data to answer our own questions; and, in supporting the formation of new collectivities for shared knowledge making. I’m also closely aligned with the OH approach and cultural roots in the open source community. I look forward to helping.

Alexander (Sasha) Wait Zaranek

Links
Twitter: https://twitter.com/wait_sasha
Google Scholar: https://scholar.google.com/citations?user=Ifj9cY0AAAAJ&hl=en
Orcid: https://orcid.org/0000-0002-0415-9655

I am head of quantified biology at Veritas Genetics, the first company to introduce whole genome sequencing and interpretation to consumers and their physicians for under $1,000. My current research is focused on the delivery of real-time, biomedical insights from massive data sets, spanning millions of individuals across collaborating organizations, eventually encompassing exabytes of data. I am also a co-founder of the Harvard Personal Genome Project.

My hope is that Open Humans becomes a central, global hub for participatory research and participant led data sharing much as Wikipedia has become a hub for sharing facts. Specifically, I will use my relationships with the Global Alliance for Genomics and Health (GA4GH), NIH common fund, , the NIST “Genome In a Bottle” reference material consortium, and the global Personal Genome Project (PGP) organizations to further the integration of Open Humans with other local, national and international biomedical data sharing efforts.

Inviting candidates for our board

In upcoming weeks Open Humans Foundation will be electing three new members to our Board of Directors. Two seats are elected within the board — and one is a community seat chosen by Open Humans members!

Anyone may apply to our board. The process involves a self-nomination, and nominees should be seconded by a current member of the Board of Directors. Board seat terms are three years.

At this stage we are inviting self-nominations. Being a director of this organization is a position of trust. It is our highest tier of governance – our ultimate decision-making authority. You can learn more about our organization’s governance by visiting the website: http://openhumansfoundation.org/

Our deadline for self-nominations is March 15. Please self-nominate by completing our self-nomination form: https://goo.gl/forms/P3eCAmExACoJ0P3Z2

About Open Humans: Open Humans is a US-based nonprofit website and community that helps individuals aggregate personal data, explore and analyze it, and choose to contribute data to academic research and community/citizen science projects. Visit the website to learn more: https://www.openhumans.org

You’re also welcome to chat with us and other Open Humans members in our community Slack chatroom! See: http://slackin.openhumans.org

Meet the latest Open Humans projects

We got a great selection of new projects and personal data explorations for you as an end-of-year gift. Here is an overview of the data import projects recently launched on Open Humans:

  • Oura Ring: You can now explore your sleep habits, body temperature and physical activity data as collected by the Oura Ring.
  • Overland: If you are using an iPhone you can now use Overland to collect your own geo locations along with additional data such as your phone’s battery levels over the day.
  • Google Location History: As an alternative way to record and import your location data you can now import a full Google Location History data set.
  • Spotify: Start creating an archive of your listening history through the Spotify integration
  • RescueTime: Import your computer usage data and productivity records into your account

Read more details about those integrations below:

Connect your Oura Ring

Explore how your body temperature changes on weekdays and weekends by connecting your Oura Ring to Open Humans and running a Personal Data Notebook.

The Oura is a wearable device well hidden inside a ring. It measures heart rate, physical activity and body temperature to generate insights into your sleep and activity habits. With Oura Connect you can setup an ongoing import of those data into your Open Humans account. This allows you to explore those data more thanks to already available Personal Data Notebooks!

Map your own locations with Overland

Explore how you move around. To recreate this with your personal data use Overland and run this Personal Data Notebook.

Overland is a free and open-source iOS application that  keep track of your location through your phone’s GPS along with some metadata like velocity and the WiFi you are connected to. With Overland Connect you can import these data into your Open Humans account. The data can be visualized through Personal Data Notebooks, used to display your current location through a Personal API or to Geo-Tag your photo collection!

Use Google Location History to explore your location data

Explore where you have been around the world. To recreate this with your personal data, import your Google Location History and run this Personal Data Notebook.

Thanks to our Outreachy interns we have another new geolocation data source: Google Location History. No matter if you are using an iPhone or an Android phone, you can use the Google or Google Maps app on your phone to record where you have been. Through Google Takeout you can now export this data and then load it into Open Humans and explore it through Personal Data Notebooks.

Explore your music listening behaviour with Spotify data

Explore when and how you listen to music. To recreate this with your personal data use Spotify Connect and run this Personal Data Notebook.

Another Outreachy intern project was to collect your Spotify Listening History through Open Humans. Using Spotify Connect will automatically import the songs you listen to along with lots of metadata (e.g. how popular was the song at the time you listened to it?). Once you have collected some data, you can explore these through another Personal Data Notebook!

Learn about your productivity with RescueTime

Find out whether your computer usage is correlated with how much you walk. Recreate this by using RescueTime and Fitbit. Then run this Personal Data Notebook.

RescueTime is a service that collects how you are using your computer through a data collection app on your computer. It keeps track of the apps you use and the websites you visit and classifies these as productive or unproductive time (Hello Facebook!). Thanks to a personal project by Bastian you can import this data into your Open Humans account and explore it through Personal Data Notebooks

With this the whole Open Humans team wishes you a happy personal data exploration, relaxed holidays and a wonderful start of 2019!

The first manuscript describing the Open Humans community

Open Humans now consists of over 6,000 members that collectively have uploaded over 16,000 data sets!

To share this great community effort as a resource, we wrote our first academic manuscript. In it, we describe the platform, community, and some diverse projects that we’ve all enabled. You can find a pre-print on BioRxiv.

True to the community spirit of Open Humans, we wrote the manuscript completely in public and with an open call for contributions through our Slack. Thanks to this we could gather diverse perspectives of how Open Humans can be utilized for both research as well as personal data exploration. Using these existing projects and studies running on Open Humans as examples, we explore how our community tackles complex issues such as informed consent, data portability, and individual-centric research paradigms. Read more about this in the manuscript.

All of this is only made possible by your contributions to Open Humans, so we want to take this opportunity to thank you for your participation!

OH Project Management App – Going Forward

We are happy that Open Humans will have four Outreachy interns this summer. Our interns are working on their own Open Humans related projects and will regularly blog about their internship experience. Read Rosy’s post about creating an app to manage your Open Humans project:

Open Humans Project Management Web app allows Open Humans project admins to view and work with their members and data. Since the last time I wrote, plenty of work has been done to make the app more useful:
  1.  Members can be filtered based on multiple parameters.
  2. Custom groups can be formed and members can be conveniently added/removed to/from those groups.
  3. Project Admin can keep notes about specific project members and can also edit/delete these notes.
  4. Every member usually shares some data with the project which earlier had to be accessed by downloading individual files. A single click download for all files of a project is now possible in the form of a zip file. The files can be selectively downloaded together for specific project members by downloading the files for a particular custom-group.
While developing these features, it has been an absolutely fulfilling experience to be able to refine the work as Dana Lewis did some testing as an end-user and provided some great feedback 🙂
So far, some of the highlights of my work are:
  • The dashboard shows a lot of information and allows a variety of actions – this made it crucial to design the application focusing on the user experience.  I worked with various aspects of a front-end framework (Bootstrap here) and came across the ease of basic styling of HTML elements such as forms, tables, buttons, icons, etc. and the most useful – Bootstrap modal. Bootstrap provided with a consistent theme for the dashboard with good documentation. It now keeps the scope to leverage the grid system to allow development of a mobile-first application.
  • Running codebase in different environments is always a great learning. Working with the file-download feature I learned that a given request on the web worker may only last 30 seconds on Heroku (our production environment) and will be killed after that period. Since creating a zip file can easily take longer, we did some brainstorming and decided to keep the network calls out of the request-response cycle by creating a celery worker task to do the downloading job. Working with celery worker tasks and Redis broker was a new and enriching experience for me.
  • To allow OH-run version and allow developers to run a version themselves as well, we weighed in two options to store the downloaded files – AWS S3 and transfer.sh wherein we settled on the first option given the 10GB limit on the latter.
  • Since the downloading of files happened as a background job, the user could be notified of the completion of file download either through a dashboard notification or via an email. We decided to go ahead by emailing the user (easy peasy) by setting up a configurable SMTP server in Django.

The work done so far has been rewarding in terms of experience with Django, Bootstrap and various other modules. More development calls for more feedback and hence, more iterations. Therefore, I’ll be working on making some modifications to the UI and features incorporating the feedback.

More Tango with Django ahead 😀 Cheers!

Working with Github API

We are happy that Open Humans will have four Outreachy interns this summer. Our interns are working on their own Open Humans related projects and will regularly blog about their internship experience. Read Manaswini’s post about working with the Github API for a new Open Humans project:

I have been working with Github API all this while. I had come across some really cool visualizations with Github API but hadn’t had the chance to work on it. Thanks to the project ‘Adding data sources’ that I was motivated to add Github as a data source and guess what, I discovered that it will work in principle! The online development tool Github enables developers to contribute and discover open source projects, thus realizing their aspirations.

Github provides REST API access to get variety of the data of your projects from Github, be it repositories, issues or pull requests and lots more! I chose Github API since the documentation is meticulous and easy to comprehend for anyone who is looking forward to getting started. Also this provides an excellent source for creating data explorations.

The outputs generated are available in various formats including JSON and other compatible formats depending upon the data to be extracted. One can view public data without authentication but in case one wants to store private data, then authentication is a must. Both single and two-factor authentications are supported. Two-factor authentications, as the name suggests, is more secure as this doesn’t include sharing your passwords as in basic authentication.

This internship period has been truly rewarding. I came across some really cool stuffs such as rate limiting and requests respectful during this period which I was unaware of until now. Diving deeper, I also came across the types of rate limiting.i.e. user rate limiting, geographic rate limiting and server rate limiting. All thanks to my mentor, Mike Escalante for giving me invaluable insight regarding the above terms and supervising me in each and every step.

The Github API provides rate limits of 5000 requests per hour. Till now, I have been editing the demo template and resolving issues simultaneously. I was experimenting with the output JSON and investigating various ways in which the output JSON will look better.

Cheers!