Monthly Archives: February 2018

Interviewing project grant awardee Kevin Arvai

Today we’re interviewing Kevin Arvai. Kevin is a bioinformatician with an interest in personal genetic data and he was awarded a project grant to implement a project that will bring genotype imputation to the Open Humans community.

Kevin, please give our blog readers a quick introduction about who you are!

I am a data scientist at a clinical genetics company in Maryland. My background and formal education is in biology, however I completed a master’s degree in computational biology and bioinformatics. Like many, I’m riding the wave of data that our generation has found itself immersed in by competing in data science competitions and contributing to “open-” (source, science, data) projects. I’m particularly interested in machine learning and human genetics but looking forward to learning new skills by building Imputer.

When and how did you come to Open Humans?

I came to Open Humans in February 2018 after working on a project with the Director of Research, Bastian, at a hackathon hosted by NCBI.

Have you been involved in any projects on Open Humans so far, either as a participant or even running your own

Not only is this my first project working with Open Humans, this is my first project as part of a open source community. Open Humans was a welcoming and collaborative group of people that encouraged my ideas, so it seemed like a perfect fit to start contributing.

Your project Imputer was awarded one of the Open Humans project grants. Can you explain us what the project is about?

The goal of Imputer is to provide users with a more comprehensive picture of their genome. Direct to consumer genetics companies, like 23andMe, only genotype a small fraction of the genome. Researchers are finding new genetic locations associated with traits and diseases at a rapid pace. Users might be interested in knowing their genotype status for these new associations, but the locations may be in regions that direct to consumer tests are not genotyping. Imputer leverages the vast amount of genotype data made available by 1000 genomes project and by the Haplotype Research Consortium to provide Open Humans users with genotype estimates at additional locations in their genome.

How did you come up with the idea behind Imputer?

The genesis of Imputer was spawned from long conversation over lunch with Bastian.

Is there anything important that we didn’t cover so far that you’d like to add?

I’d like to encourage others who are “interested in, but anxious about” contributing to open source projects to take the leap! If you’ve found this post, Open Humans is a great place to start!

Kevin’s encouragement motivated you to take action? The Open Humans project grants are ongoing and you can apply for one too!

Open Humans, what’s next?

President Bartlet of The West Wing is calling his famous “What’s next” to his secretary after managing a task.

I just defended my PhD last week, and one question from virtually every person who attended and stayed for the after-party: What’s Next? Which initially felt a bit weird. After all, I already took my next step three months ago when I joined Open Humans as the Director of Research. But then I realized that this is a nice opportunity to reflect a bit on my first months and think about what my next goals for Open Humans are.

Where is Open Humans so far?

So far I spent good parts on learning the ropes. First of all, I had to find my way into the technical infrastructure of Open Humans. Learning the code base, the APIs, server setups and so on. And what better way to do this but starting my own projects? I thus integrated two new projects on Open Humans: First I connected my long-standing project openSNP with Open Humans – allowing users of both platforms to re-use their genetic data more easily. Then I started TwArχiv, which not only brings a new data source but also some data-visualization to Open Humans. This integration of Twitter data will hopefully also be a first step towards a more holistic view of personal data that includes non-medical data.

Hand in hand with the technical side of things I also found my way into the community around Open Humans. Learning which projects there are, how to best support them and also how to grow the Open Humans community even more. I not only got to know many of the brilliant individuals inside the Open Humans community, but I also helped them to achieve their goals – be it through bug fixes, relevant connections or finding out how to optimize our website to make it work for their needs. First steps towards a further community growth were also taken: We could announce the first three successful grant applications, all bringing new data sources to Open Humans. And a fourth grant announcement – enhancing existing data sets – will be out soon!

The Open Humans community grows nicely and is becoming more and more engaged. So things are on track. But where should we go from here? And what is the larger vision? Traditional academic research – as well as corporate data silos – put themselves into the center of all data collection. In contrast, Open Humans is very different to this. As Steph laid out in her blog post: Open Humans is a technological platform; a vibrant community; and a paradigm shift to how research is done at the same time. In addition to all these things there is one thing that I always mention when people ask me what Open Humans is: It is empowerment. Putting individuals in control of their own data and of research at large. And to me, this means more than ‘just’ giving people the choice of when and where to share their data.

What should Open Humans be?

Empowerment means giving people the opportunity and chance to explore and understand their own data. Be it on their own – or in collaboration as a community outside the traditional academic research setting. The growth of the independent Open Artificial Pancreas community – which aggregates their own data through Open Humans – is a stellar example for this empowerment. As stewards of the Open Humans ecosystem it is our responsibility to support people to run projects like these. It is up to us to make it easier to create and run projects on Open Humans – empowering more people including those who are not highly programming savvy. Open Humans offers the unique chance to democratize science, enabling people outside academia to do new research that has never existed before. To pull this off we have to become more inclusive in our approach. This means getting everybody on board who has great ideas for research.

First steps towards this direction have been made already: We now have a first data uploader template that allows everyone to create their own, data-collecting Open Humans project while requiring zero programming knowledge. Instead a web browser is enough to do the complete setup. A similar idea for the administration of projects should become a reality in the near future. Furthermore, we are on the way to create shareable analyses notebooks. These can be written and run by everyone – facilitating community-driven data analysis. By increasing our inclusivity more we will not only see more projects on Open Humans, we will also see a much wider diversity in how these projects will use data. I can’t wait to interact with all of them.

I see this diversity reflected in the kinds of data that will be on Open Humans and the kinds of research that will be done with it. Traditionally many of the projects on Open Humans have and had a focus on health. But I don’t see why this should be the sole kind of research that profits by being run with and by highly involved participants. After all, while much of the Quantified Self revolves around health, it is far from the only topic: People are interested in their personal finance data, phone usage, emails and more. And so are social scientists, economists and other academic disciplines. My goal is to get these people on board for Open Humans too, showing them the huge benefit that an engaged study population offers.

Let’s just think of a simple example: Everyone can pay Twitter to get access to their firehose of data or just scrape tweets for keywords from the web. But who but Open Humans can offer potential access to 200 or more full Twitter archives that are available right now? And more importantly, who offers the possibility to get in touch with these people and as such a way to get additional metadata and consent them? The same is true for virtually all kinds of social media data and many other data types. Humans are more than their bodies, and Open Humans should reflect this.

So this is what’s next for Open Humans: Creating an ecosystem that enables the largest possible number of people to do research; that collects and enables the re-use of the most diverse set of data; and that brings together participants and researchers from all disciplines and walks of life – informing each other and creating the most interesting research.

An interview with project grant awardee Anh Nguyet Vu

Today we’re interviewing Anh Nguyet Vu. She is the recipient of one of our Project Grants. With MyFitnessPal Miner she not only brings a new data source to Open Humans, she is also working on visualizing these data and connecting them to genetic data. 

Hey Anh Nguyet, please give our blog readers a quick introduction about who you are! How did you come up with the idea behind MyFitnessPal Miner?

Generally I wouldn’t want to introduce myself by talking about my problems, but in this case it does give you the story behind the project. So when I was a freshman in undergrad, I faced a problem that would eventually lead to the development of MyFitnessPal Miner. This problem, no doubt a familiar one for many others, was weight gain. Since I was (and am) the kind of person who believes that “what cannot be measured cannot be managed”, I started tracking dietary intake. Because I was already tracking what I was eating, I became interested in the quantified self movement, and it wasn’t long before I was convinced that collecting other types of data would be valuable. I experimented with many food logging tools, including MyFitnessPal (which was never my primary app, but it happens to be the most popular one today). I also tried a variety of activity and exercise trackers before Fitbit hit mainstream. Probably my most earnest project was tracking how much time was spent on different activities, down to a minute’s resolution, over a span of three months.

It was inevitable that I would want to incorporate genetic data. To gather all the other kinds of data without considering your personal genetics is to miss out on a crucial part — especially if you wanted to optimize health, as I was a little obsessive about. I had a self-defined area of concentration called “personalized medicine” (also known as “precision medicine”) for my undergraduate major at Stanford. I think more people understand personalized medicine as tailoring drug treatments for an individual’s genetic makeup, but if you believe in “food as medicine”, then it should encompass nutrition as well.

Your project MyFitnessPal Miner was awarded one of the Open Humans project grants. Can you explain us what the project is about?

MyFitnessPal Miner exists with three goals. The first is about making the data more accessible, allowing you to get your own data in a format useful for other projects, including ones on Open Humans. The app ports your data to standard .csv files and does some additional parsing to create potentially useful tags. For example, it tries to recognize instances of fast food by matching records containing restaurant chain names.

The second goal is integrating that data with current genetic resources. There are some really interesting studies on how your genetics influences and interacts with your diet, such as your preferences for salty/sweet/bitter foods, risk for specific food intolerances, and how you’d react to a low-carb versus a high-carb diet. When curating these kinds of studies, I think that many people must also be curious about how the findings apply to them. So if you have 23andMe data and MyFitnessPal data, the app gives you a kind of integrated dashboard of genetics and nutritional behavior. You might, for instance, be able to see that your fast food consumption is greater than average, and that this seems congruent with what a published study has found given your genetic variants. Or next to the summary of your actual sodium intake, you might notice the relevant finding that your genetics predict that your blood pressure is fairly sensitive to how much salt you’re eating. However, because MyFitnessPal doesn’t contain explicit data for vitamins and minerals, not every related published finding can be connected with your real-life dietary data, unless the app can be made to intelligently infer vitamin and mineral intake from the food records.

Beyond comparing existing information, through the app it should be possible to use your real-life dietary data along with your genetic data to suggest something new. This third goal is kind of a reach goal given the limited time frame I have, but it’s the essence of an Open Humans project. I’d still have to think about the questions that are feasible and the methodology for them. Hopefully it won’t be just me, and there will be people in the Open Humans community who’d want to build upon MyFitnessPal Miner.

I do also hope that there will be interest outside of Open Humans. You can recall that all of this started not with my interest in genetics, but with food tracking. Well, it’s the start of a new year, and there will be a lot of people doing that as they pursue a healthier lifestyle. Some will seek understanding of their calorie and macronutrient patterns and then be hungry for additional value from their collected data. Being shown where the genetics tie in to create that additional value can perhaps entice people to bring their genetic data to the project, and therefore to Open Humans.  

When and how did you come to Open Humans?

I consider myself a relatively new member of Open Humans, since I joined in the fall of 2017. Around that time I was doing research for a start-up, where a colleague mentioned Open Humans and said making his genetic data public was something he wouldn’t do. I, on the other hand, was someone who was already quite open, having been to quantified self meetings to hear others share their data and insights and to share mine.

Have you been involved in any projects on Open Humans so far, either as a participant or even running your own?

When I joined Open Humans, I made my data accessible to all studies. I would think more about running my own study after finishing the development of MyFitnessPal Miner.