Category Archives: Open Humans

Meet the latest Open Humans projects

December 27, 2018Open HumansBastian

We got a great selection of new projects and personal data explorations for you as an end-of-year gift. Here is an overview of the data import projects recently launched on Open Humans:

Oura Ring: You can now explore your sleep habits, body temperature and physical activity data as collected by the Oura Ring.
Overland: If you are using an iPhone you can now use Overland to collect your own geo locations along with additional data such as your phone’s battery levels over the day.
Google Location History: As an alternative way to record and import your location data you can now import a full Google Location History data set.
Spotify: Start creating an archive of your listening history through the Spotify integration
RescueTime: Import your computer usage data and productivity records into your account

Read more details about those integrations below:

Connect your Oura Ring

**Explore how your body temperature changes on weekdays and weekends by** **connecting your Oura Ring to Open Humans** **and** **running a Personal Data Notebook**.

The Oura is a wearable device well hidden inside a ring. It measures heart rate, physical activity and body temperature to generate insights into your sleep and activity habits. With Oura Connect you can setup an ongoing import of those data into your Open Humans account. This allows you to explore those data more thanks to already available Personal Data Notebooks!

Map your own locations with Overland

**Explore how you move around. To recreate this with your personal data** **use Overland** **and** **run this Personal Data Notebook**.

Overland is a free and open-source iOS application that keep track of your location through your phone’s GPS along with some metadata like velocity and the WiFi you are connected to. With Overland Connect you can import these data into your Open Humans account. The data can be visualized through Personal Data Notebooks, used to display your current location through a Personal API or to Geo-Tag your photo collection!

Use Google Location History to explore your location data

**Explore where you have been around the world. To recreate this with your personal data,** **import your Google Location History** **and** **run this Personal Data Notebook**.

Thanks to our Outreachy interns we have another new geolocation data source: Google Location History. No matter if you are using an iPhone or an Android phone, you can use the Google or Google Maps app on your phone to record where you have been. Through Google Takeout you can now export this data and then load it into Open Humans and explore it through Personal Data Notebooks.

Explore your music listening behaviour with Spotify data

**Explore when and how you listen to music. To recreate this with your personal data use** **Spotify Connect** **and** **run this Personal Data Notebook**.

Another Outreachy intern project was to collect your Spotify Listening History through Open Humans. Using Spotify Connect will automatically import the songs you listen to along with lots of metadata (e.g. how popular was the song at the time you listened to it?). Once you have collected some data, you can explore these through another Personal Data Notebook!

Learn about your productivity with RescueTime

**Find out whether your computer usage is correlated with how much you walk. Recreate this by** **using RescueTime** **and** **Fitbit**. **Then** **run this Personal Data Notebook**.

RescueTime is a service that collects how you are using your computer through a data collection app on your computer. It keeps track of the apps you use and the websites you visit and classifies these as productive or unproductive time (Hello Facebook!). Thanks to a personal project by Bastian you can import this data into your Open Humans account and explore it through Personal Data Notebooks

With this the whole Open Humans team wishes you a happy personal data exploration, relaxed holidays and a wonderful start of 2019!

The first manuscript describing the Open Humans community

December 18, 2018Open HumansBastian

Open Humans now consists of over 6,000 members that collectively have uploaded over 16,000 data sets!

To share this great community effort as a resource, we wrote our first academic manuscript. In it, we describe the platform, community, and some diverse projects that we’ve all enabled. You can find a pre-print on BioRxiv.

True to the community spirit of Open Humans, we wrote the manuscript completely in public and with an open call for contributions through our Slack. Thanks to this we could gather diverse perspectives of how Open Humans can be utilized for both research as well as personal data exploration. Using these existing projects and studies running on Open Humans as examples, we explore how our community tackles complex issues such as informed consent, data portability, and individual-centric research paradigms. Read more about this in the manuscript.

All of this is only made possible by your contributions to Open Humans, so we want to take this opportunity to thank you for your participation!

OH Project Management App – Going Forward

July 17, 2018Outreachy intern updatesRosy Gupta

We are happy that Open Humans will have four Outreachy interns this summer. Our interns are working on their own Open Humans related projects and will regularly blog about their internship experience. Read Rosy’s post about creating an app to manage your Open Humans project:

Open Humans Project Management Web app allows Open Humans project admins to view and work with their members and data. Since the last time I wrote, plenty of work has been done to make the app more useful:

Members can be filtered based on multiple parameters.
Custom groups can be formed and members can be conveniently added/removed to/from those groups.
Project Admin can keep notes about specific project members and can also edit/delete these notes.
Every member usually shares some data with the project which earlier had to be accessed by downloading individual files. A single click download for all files of a project is now possible in the form of a zip file. The files can be selectively downloaded together for specific project members by downloading the files for a particular custom-group.

While developing these features, it has been an absolutely fulfilling experience to be able to refine the work as Dana Lewis did some testing as an end-user and provided some great feedback 🙂

So far, some of the highlights of my work are:

The dashboard shows a lot of information and allows a variety of actions – this made it crucial to design the application focusing on the user experience. I worked with various aspects of a front-end framework (Bootstrap here) and came across the ease of basic styling of HTML elements such as forms, tables, buttons, icons, etc. and the most useful – Bootstrap modal. Bootstrap provided with a consistent theme for the dashboard with good documentation. It now keeps the scope to leverage the grid system to allow development of a mobile-first application.
Running codebase in different environments is always a great learning. Working with the file-download feature I learned that a given request on the web worker may only last 30 seconds on Heroku (our production environment) and will be killed after that period. Since creating a zip file can easily take longer, we did some brainstorming and decided to keep the network calls out of the request-response cycle by creating a celery worker task to do the downloading job. Working with celery worker tasks and Redis broker was a new and enriching experience for me.
To allow OH-run version and allow developers to run a version themselves as well, we weighed in two options to store the downloaded files – AWS S3 and transfer.sh wherein we settled on the first option given the 10GB limit on the latter.
Since the downloading of files happened as a background job, the user could be notified of the completion of file download either through a dashboard notification or via an email. We decided to go ahead by emailing the user (easy peasy) by setting up a configurable SMTP server in Django.

The work done so far has been rewarding in terms of experience with Django, Bootstrap and various other modules. More development calls for more feedback and hence, more iterations. Therefore, I’ll be working on making some modifications to the UI and features incorporating the feedback.

More Tango with Django ahead 😀 Cheers!

Working with Github API

July 10, 2018Outreachy intern updatesManaswini Das

We are happy that Open Humans will have four Outreachy interns this summer. Our interns are working on their own Open Humans related projects and will regularly blog about their internship experience. Read Manaswini’s post about working with the Github API for a new Open Humans project:

I have been working with Github API all this while. I had come across some really cool visualizations with Github API but hadn’t had the chance to work on it. Thanks to the project ‘Adding data sources’ that I was motivated to add Github as a data source and guess what, I discovered that it will work in principle! The online development tool Github enables developers to contribute and discover open source projects, thus realizing their aspirations.

Github provides REST API access to get variety of the data of your projects from Github, be it repositories, issues or pull requests and lots more! I chose Github API since the documentation is meticulous and easy to comprehend for anyone who is looking forward to getting started. Also this provides an excellent source for creating data explorations.

The outputs generated are available in various formats including JSON and other compatible formats depending upon the data to be extracted. One can view public data without authentication but in case one wants to store private data, then authentication is a must. Both single and two-factor authentications are supported. Two-factor authentications, as the name suggests, is more secure as this doesn’t include sharing your passwords as in basic authentication.

This internship period has been truly rewarding. I came across some really cool stuffs such as rate limiting and requests respectful during this period which I was unaware of until now. Diving deeper, I also came across the types of rate limiting.i.e. user rate limiting, geographic rate limiting and server rate limiting. All thanks to my mentor, Mike Escalante for giving me invaluable insight regarding the above terms and supervising me in each and every step.

The Github API provides rate limits of 5000 requests per hour. Till now, I have been editing the demo template and resolving issues simultaneously. I was experimenting with the output JSON and investigating various ways in which the output JSON will look better.

Cheers!

Outreachy: The learning phase

July 9, 2018Outreachy intern updatesTarannum Khan

We are happy that Open Humans will have four Outreachy interns this summer. Our interns are working on their own Open Humans related projects and will regularly blog about their internship experience. Read Tarannum’s post about working on creating packages for the Open Humans API:

The Open Humans API provides command line tools to interact with Open Humans. From where, I started working with Open Humans API to where it is now, the transformation has been just magical.

With the help of Open Humans organisation mentors, lot of features were added like messaging and uploading large files with the help of AWS. From learning the proper way to name the functions to testing the functions, documenting the API using Sphinx, adding more command line tools, every smaller piece of learning added to create an avalanche of wisdom.

For legacy code, test is a necessity. The code is full of logic, it’s alive and it’s core to any application. We need to keep going back to the code to update it, so tests are crucial.

I mocked the API test using VCR.py . It’s just like Thor’s stormbreaker (Marvel fan will understand) in the field of unit testing. It’s apt for those applications which make http calls to external services(in my case Open Humans).

Why VCR.py

The external service should not be tightly coupled with the application. If the external services fail, the application should still run correctly. One absolute novice testing way is to hard code error prone request and code the necessary error handling routines. But, this is highly not recommended because once you update even a few line of code, you have to manually test it again. So, we hop on to the more sophisticated way of testing: Unittesting.

Running tests which call the external service every time will be very slow. Apart from it, the tests won’t work offline and sending too many requests to the external service can be a problem too. So, here steps the protagonist, VCR.py. 🙂

You just need to run the test online once when the cassette file which stores all the important information related to the requests response has not been created. Once, it is done, the response of requests will just be compared with the cassette file and yes , it’s done 🙂 . You have became successful in making your application more sturdy.

How to use VCR.py

For each function serving a particular feature, a class for testing is created. For testing different lines of code of a particular feature, different tests(functions with assertion) is created in the test class. Tests for valid response, invalid responses are written.

For the first time, when the test is run, the request really hits the external service and a cassette file is formed. Once the cassette is formed which contain status codes, status, etc, the response of tests is just compared with the cassette file when the tests are run again. It really speeds up the testing process and saves us from sending requests to the external service again and again. Whether you are offline at your home or 40,000 feet above the ground anytime you can test your code, once the cassettes are formed.

Want to try a hand on this cool stuff? You can follow this https://github.com/OpenHumans/open-humans-api/blob/master/ohapi/tests/test_api.py. The link contains a lot of examples to write tests using vcr.py. You can see the bigger picture by observing any function in the https://github.com/OpenHumans/open-humans-api/blob/master/ohapi/api.py and how for each functions tests and cassettes are formed.

I would like to thank Mad Price Ball for mentoring me in the best way possible. The meetings with them are fun where I learn a lot of new stuff and a new way to see things, to tackle an issue with a fresh perspective.

Apart from this, I also worked on testing the public functions of Open Humans API. Presently, I am working on creating a reusable django app for Open Humans. A lot of brainstorming goes while you are in the process of designing an application, it’s use cases, what kind of audience it will cater.

Reusable application are the application which will be used by other applications. After all, reusability is the way of life in Python. I will cover the reusable app topic in my next blog. Till then, stay tuned. 🙂

Help us resurrect OpenPaths!

July 6, 2018Hiring, Open HumansMad Price Ball

Four days ago Facebook announced it is killing Moves, a smartphone app that lets you collect your continuous GPS data. People could download their location data, donate it to research, and connect it to other apps.

Soon Moves will be dead – and there is no obvious replacement. But there could be.

OpenPaths is a similar tool, developed seven years ago by a team at the New York Times Research & Development Group. It had an ethos that matched our own – it empowered its users, gave them access to their data, and the ability to share with projects. The NYT team handed OpenPaths to an academic group at UCSD. And late last year, UCSD gave it to us.

OpenPaths was like Open Humans before we existed – and by some amazing act of fate, we have inherited it. It could be something better than Moves: a nonprofit, open source tool that strives to empower the community that maintains it.

But OpenPaths is broken, and we need help to fix it!

We need an iOS developer, and an Android developer. We’re also seeking a full time Django developer for our main site, who might also help build a new OpenPaths server.

Please help and spread word! Come chat with us in our community Slack, and share our jobs page to help us find developers that can help.

Outreachy: My journey so far

June 7, 2018Outreachy intern updatesManaswini Das

We are happy that Open Humans will have four Outreachy interns this summer. Our interns are working on their own Open Humans related projects and will regularly blog about their internship experience. Read Manaswini Das’s post about their way to Outreachy and their first two weeks as an Outreachy intern:

Open-source… I was a bit obnoxious about this term until a year ago, when I was not familiar with this new world of outstanding work done by millions across the globe. It’s been a year now and the journey has been more than rewarding.

My journey started with contributions to repositories as a part of Hacktoberfest 2017. I got a limited edition Hacktoberfest T-shirt too, as promised. The thought of contributing to something that will be utilized by the world intoxicated me and inspired me to dive deep into this. I started looking for other ways to find repositories that kindled my interest.

Going through several blog posts over the internet, I came across Outreachy, an open source internship program for people from marginalized groups. I had applied for Winter term 2017 (Round 15). But then, it was already nearing the deadline when I started contributing. So, I knew I stood a slim chance of getting accepted.

This time, I didn’t commit the same mistake. Once Round 16 was announced, I started exploring organizations and projects. I concentrated on ‘Adding data sources to Open Humans’ project under Open Humans Foundation and began my contributions right away! Two months hence, I found my name among the accepted interns. I am overwhelmed and looking forward to making the most of this internship period.

For those who haven’t come across this open source internship program, let me enlighten you.

What is Outreachy all about?

Well, Outreachy is an open source internship program for individuals belonging to groups traditionally underrepresented in technology. This program is similar to Google Summer of Code, except for the fact that it is not limited to students and it happens twice a year, May through August and December through March cohorts.

For those who want to probe deeper into this, find the details here.

Now, some tips for the ones preparing for the upcoming rounds:

Start early

To make it into the internship program, you should begin as soon as possible. It takes time to comprehend the code base. It may seem intimidating at first but select your projects wisely.

In case you are not comfortable with a project even after contributing, you still have time and liberty to switch to projects matching your interest. Keep in mind that you can apply to a maximum of two projects.

Subscribe to the announcement mailing list here.

Ask questions

In case you have questions, don’t hesitate to ask questions to your mentors. Don’t be afraid of being judged.

Asking questions doesn’t reveal your ignorance. It is a sign that you are learning.

Don’t be shy. Shed your cocoon and feel free to ask even the silliest of questions. But remember, do your research too. Try to work out the problem on your own first. If you are still stuck, then reach out to the community. You never know, it might be a bug!

Don’t doubt your abilities

If you think you don’t fit in, then, trust me you are the right person to apply for this internship program. You won’t be able to explore this new you unless you do it.

Be consistent

Don’t aim at a huge last-minute contribution. Make small but consistent contributions till the end of the application period. This creates a good impression.

Another golden tip: In case you are not into contributions for some time, be in touch with your mentors. Discuss your ideas about the project and know more about the organization.

Imposter Syndrome

At times, you may feel that you could achieve everything only due to luck and that you lack potential.

You may also feel that you won’t be able to make it even after you get accepted. Well, my friend, you are suffering from the imposter syndrome. This happens when you focus on the big picture of what you are trying to do in a project. To overcome this, follow the divide and conquer rule.

Have faith in yourself. Don’t let the imposter syndrome grip you.

Proposal time

This is the final dash to the race. Discuss your ideas with your mentors and come up with a suitable timeline. Work out your schedule and make sure your proposal is precise. Submit your proposal for review to your mentors. Trust me, your proposal will get better with each review. And yes, don’t wait till the last minute for this.

Updates

It’s been more than two weeks into this internship period now. I am working on adding Github and Twitter API integrations under the mentor-ship of Mike Escalante. First three weeks, I have been getting familiar with the codebase and the workflow that is to be followed for the integrations, taking some help from the existing integrations. Apart from that, I have been exploring the Github API and setting up the app on Heroku.

My mentor, Mike has been very supportive and encouraging throughout, checking-in almost everyday and clearing all my doubts in a jiffy.

What’s next?

I’m planning to get the Github integration up and running by this week and then, I will be working on creating data explorations of this integration for the next two weeks.

I’ll be coming up with the technical details of this Github integration in the upcoming posts.

Cheers!

Open Humans + Outreachy : Weeks 1-2

June 6, 2018Outreachy intern updatesRosy Gupta

We are happy that Open Humans will have four Outreachy interns this summer. Our interns are working on their own Open Humans related projects and will regularly blog about their internship experience. Read Rosy Gupta’s post about their first two weeks as an Outreachy intern:

What is Outreachy? What are the do’s and don’ts for your application if you are interested? How did I find such an apt gig? What have I been and will be doing this non-vacation summer? Read on to find it out.

Outreachy is an amazing opportunity for underrepresented folks in Open Source Software Development. It allows you to work with tech organizations through a remote internship. It is somewhat similar to (the more heard of companion) Google Summer of Code (GSOC) but Outreachy happens twice a year and you don’t have to be a student to be eligible for it. Like most people, I didn’t know about Outreachy until I heard about it from former interns at an open source meetup. I was delighted to know that a remote and paid internship existed for non-students – seemed like an interesting way out to spend summers at home before starting my Masters in the fall.

So how do you get in?

**Decide that you really really want to go for it**

Getting onboard with Outreachy isn’t an overnight thing. You need to be involved with the organization that you intend to work with for a couple of months (hard truth). I started making contributions for Open Humans in February itself. Read up about the organization and the project nicely and THINK if you’d actually be able to spend your summer doing that. My fellow intern, Tarannum has some really good points on the organization and project selection in her blog post. Check it out here.

Code Communicate Sleep Repeat

Start with small contributions, even trivial bugs maybe and you’ll be able to make a major impactful one gradually as you get the hang of the code and the language. It’s good to raise your doubts in the common group (there’s no such thing as a stupid question). Having said that, it’s equally important to make a sincere effort before poking mentors and community with low-hanging fruit kind-of questions. The mentors in my case were damn helpful and pretty quick in solving our doubts, reviewing the code and merging the pull requests. Thanks for all the sweet help – Bastian Greshake Tzovaras (my mentor), Mad Price Ball and Mike Esclante.

Show Time – The Proposal

Unless you know about the project well, it will be difficult to come up with improvement suggestions for the project. Last minute stint usually doesn’t work – so it’s good to start with your proposal application ahead of the deadline. Keep it succinct.

After being chosen from a competitive pool of applicants, now, I am working with Bastian Greshake Tzovaras on Creating a stand-alone web application to manage and administer projects on Open Humans using Django. For the next three months, I will be adding some new features and enhancing the user experience for this project management application.

The first two weeks of my internship have flown by. I spent them going through unread pieces of code in the repo. Here, I learned that I need to comment the code a lot and since the project is in its infant stage, this quote would be a handy reminder 😀

“Always code as if the guy who ends up maintaining your code will be a violent psychopath and knows where you live”

The application uses Django framework, so I’ve been trying to get my head around Python lately. One of the initial weird things was HTML with a bunch of curly brackets containing Python code. This turned out to be the templating engine, Jinja. I have also been learning more about designing the dashboards to deliver a good user experience. The work is giving me the opportunity to sharpen my Git skills too and I’m learning to make NEAT git pulls now.

The first few weeks have mostly been trying to fit in the remote work setting and understanding the timeline of the project. Luckily, my wonderful mentor, Bastian has been great putting my nerves at ease. He’s always encouraging me to communicate often (the key to remote work) and is quick with the doubt-solving sessions. Despite our contrasting time zones, it has been a smooth sail so far and his guidance has been really valuable.

My upcoming task is to work on building annotations for the dashboard. This would make the user experience more interactive. I’d also be working on adding a feature to download files in customizable ways.

I’d include more about my first two weeks’ work in the next blog post. Well yes, we need to blog every two weeks as a part of the internship. The good part is I’m writing my first blog post ever! Need more motivation? Hit me up 😀

Good luck!

The road to Outreachy

June 4, 2018Outreachy intern updatesTarannum Khan

We are happy that Open Humans will have four Outreachy interns this summer. Our interns are working on their own Open Humans related projects and will regularly blog about their internship experience. Read Tarannum Khan’s first post about how they came to join Open Humans as an Outreachy intern:

Four months ago, my amazing journey with Outreachy started. Ever since I see a tremendous growth in me regarding not only the open source development field but also a great boost in my will to continue and succeed, interact with a new bunch of people easily and learn from them.

Just clearing the fog, Outreachy is a great program for the people out there traditionally underrepresented in tech who are interested in open source development and don’t know where to start or try hands in this area. This is the right place to get started. Outreachy community provide an immense support for beginner like you and me and nurture our development skills by providing a superb platform to work in a collaborative environment with the mentors and other Outreachy participants.

I am working with the Open Humans organization and my project is on “Writing an Python module for the Open Humans API & a self-contained, modular Django app”. Open Humans community have been very supportive throughout. I would like to thank Mad Price Ball(mentor), Bastian Greshake Tzovaras and Mike Esclante for being always there to clear my doubts. And the best thing about them is that they has been a major support in this process of learning by sharing their development knowledge which is pretty cool.

I got to know about Outreachy through my friends of my institute and took my first big step in this area.

START(very important)

Organisation and project selection

Select the perfect organisation with the perfect project and the perfect language to work on(But life is not so perfect). So let’s go practical. On the outreachy website, once the projects get floated, look for each and every project of every organisation. Take a look at the repository provided by them. For the beginners, do check on the issue tagged with beginners or good first issue. If it looks understandable after having a look at the repository, you are good to go and start contributing to it.

Initially you can contribute to two or three repositories but with time you would know which one to really pick and focus on. Every mentor has a different interaction style which is mentioned in the outreachy website from which you will get an idea, how to reach the mentors. I suggest that your decision to choose a project should majorly depend on the project repository, it’s issue and then on the language. Don’t hesitate to pick a language in which you are not much comfortable but you should know the basics at the least, rest assured learning with this amazing community.

Bug fixing

Once the project is selected, take an issue, if you think you can do it, get it assigned and start working. If you are having any trouble feel free to either raise your doubts in the organisation forum or you can contact mentors. Someone will be always there to git pull you ahead from where you were stuck. You would love the feeling when your first pull request will be merged. 🙂

Patience is the key

Sometimes things might get a little tough as you are a new player in this field. Just keep up with patience, keep reading documentations, keep discussing with your mentors, keep coding and you will solve the issue(BA-AM). After solving each issue from beginner friendly to moderate to hard level, your self-confidence will be boosted immensely. In this great learning curve, you will learn a lot of new and interesting development stuff and your stamina to read documentations will increase drastically. (Life of a developer: Birth, find bug, read documentation, code, death)

Outreachy proposal

Now it’s time for the big show: PROPOSAL. Think clearly and meticulously about the project, the problem it’s trying to solve and make a clear plan of how you will solve the problem with the proper timeline and technical details. You can see a link to my proposal here. Start at least two weeks before the deadline to write the proposal.

Done and dusted. After the outreachy application period, keep on contributing. Whether you get selected or not, but the road to Outreachy is totally amazing. Your knowledge and confidence will boost immensely. And maybe not immediately but definitely, you will get a project to work with the Outreachy community as Outreachy programme is held in winters as well as in summers, so just keep on coding. I was lucky enough to get selected in my first attempt and work with Open Humans. Just be ready to give your time and energy to Outreachy and keep on working. 😀

Some pre-internship suggestions will be to learn git. It really helps and will save a lot of your time. The earlier you start contributing, the better your chances will be to spend your summer or winter non-vacation.

Feel free to contact me at tkhaniitr@gmail.com. I will be there to clear the doubts and even interact with you, given that I am not that boring 😀.

Keep your enthusiasm up and keep coding.

Meet Andrew Riha, our next project grant awardee

March 23, 2018Open HumansBastian

Today we’re introducing Andrew Riha who recently was awarded one of our project grants for his tool lineage. With lineage Andrew will make the genetic data you store on Open Humans even more useful, by enabling Ancestry analyses!

Hey Andrew, please give our blog readers a quick introduction about who you are!

I’m a systems engineer at an aerospace company in Southern California. I studied at Iowa State University, the University of Newcastle, and Delft University of Technology, and I have a B.S. and M.S. in computer engineering. A few years ago, I became interested in direct-to-consumer DNA testing after a friend told me about his experience with 23andMe. This interest developed into a passion, and I’m currently pursuing a graduate certificate in bioinformatics. My hobbies include running, traveling, and backpacking.

When and how did you come to Open Humans?

Director of Research, Bastian, introduced me to the Open Humans platform in early 2018. I had mentioned to Bastian that I wanted to turn my hobby open source Python project lineage into a web app, so he suggested I consider applying for a project grant.

Have you been involved in any projects on Open Humans so far, either as a participant or even running your own?

This is my first project with Open Humans. I’m looking forward to learning from others and further developing and integrating lineage into the Open Humans ecosystem as a great open source web app!

Your project lineage was awarded one of the Open Humans project grants. Can you explain us what the project is about?

lineage is a framework for analyzing genotype files (e.g., raw data files from 23andMe, Ancestry, etc.), primarily for the purposes of genetic genealogy and ancestry analysis. It can identify DNA and genes shared between individuals, and it provides other useful capabilities such as merging raw data files from different testing companies, identifying discrepant and discordant SNPs, and remapping SNPs to different assemblies / builds.

How did you come up with the idea behind lineage?

After my friend told me about his experience with 23andMe, I started researching how to get tested and found the International Society of Genetic Genealogy’s wiki very helpful and informative. The wiki led me to an excellent paper by Whit Athey that discussed using genotype files to phase the chromosomes of a family group and “reverse engineer” the DNA of a missing parent in the process! So, for a CS50 final project, I challenged myself to implement Whit’s algorithm in Python, using scientific libraries and vectorized programming in order to efficiently handle and analyze the large datasets involved.

The initial algorithm implementation was successful, and lineage had begun. But, I soon realized the need for other capabilities, such as comparing / merging files from different testing companies and determining what DNA is shared between individuals so that it could be used to guide the phasing algorithm. So, lineage grew into the framework that exists today, and I eventually want to return to implementing Whit’s algorithm, applying the bioinformatics and visualization concepts that I’ve learned along the way.

Is there anything important that we didn’t cover so far that you’d like to add?

lineage wouldn’t have been possible without the knowledge and help graciously provided by so many people. It is in that spirit that I’d like to encourage others to create and contribute to open source projects – sharing your ideas and passions with the world can be a very rewarding endeavor!

Oh, and thanks Mom, Dad, grandmas, and grandpas for the genes. 🙂