Welcome to the Cancer Immunotherapy Data Science Grand Challenge!

Credit : Alex Ritter, Jennifer Lippincott Schwartz and Gillian Griffiths, National Institutes of Health Killer T cells (green and red) surround a cancer cell (blue, center).

Welcome to the Cancer Immunotherapy Data Science Grand Challenge!



The future of cancer care is immunotherapy — using our own body’s immune system to eliminate tumors. While T cells, our immune system's fighter cells, should, in theory, recognize and kill growing tumors, cancer cells send signals to T cells that cause the T cells to malfunction and fail to control tumor growth. But what if we could modify individual genes in T cells to stop this process — and transform T cells into tumor destroyers? While scientists have made breakthroughs in cancer immunotherapy and T cell engineering in the last two decades, the problem is that there are 20,000 individual gene modifications, or “perturbations,” researchers could make to affect T cell function. Experimentally testing so many perturbations — and combinations of perturbations — in the lab would be too costly and time-consuming.

That's why the Eric and Wendy Schmidt Center at the Broad Institute, Harvard Laboratory for Innovation Science, and other partners are holding a data science challenge to bring together the machine learning community to develop algorithms that identify the best genetic changes in T cells to prevent malfunction and enable tumor killing.

Topcoder will host a series of challenges with specific problem statements and acceptance criteria to iterate upon this problem. We will need machine learning specialists to help:

  1. Use a training set of T cells with experimentally characterized perturbations to predict the effects of unseen, held-out perturbations.
  2. Propose the best individual gene perturbations (among all 20,000 possibilities) to prevent T cell malfunction and enable tumor killing.
  3. Propose a quantitative metric for ranking the efficacy of these proposed perturbations.

We will experimentally validate the predictions from question 2, choosing perturbations based on the top-scoring submissions from question 1 and expert discussion.

Sponsored by Saturn Cloud


The dataset is hosted by Saturn Cloud which will be available for download. Each participant has the option to use the Saturn Cloud computing environment, which provides 100 free hours of compute per participant and a python environment. Message Saturn Cloud support and say "I'm competing in the Cancer Immunotherapy Data Science Grand Challenge", and you'll be upgraded from the standard free tier to 100 hours of compute! Click the link below to get started with the competition.

Run in Saturn Cloud