PatientsLikeMe: Data Scientist or Senior Data Scientist - Biostatistics
160 Second Street
Cambridge, MA 02142


PatientsLikeMe is a patient network that improves lives and a real-time research platform that advances medicine. Through the network, patients connect with others who have the same disease or condition and track and share their own experiences. With more than 700,000 members representing hundreds of disease conditions, PatientsLikeMe is a trusted source for real-world disease information and a clinically robust resource that has published more than 100 peer-reviewed research studies.

The PLM Translational Science team is developing a state-of-the-art Advanced Research Platform that supports the modeling and analysis of holistic longitudinal health data derived from blood samples (multi-omics: proteomics, metabolomics, RNA, etc.) and self-reported patient phenotypic data (e.g., symptoms, environmental factors). This is an opportunity for the right candidate to contribute to the early stages of a growing science and technology organization whose work in creating biological models will accelerate the growth of biological knowledge and an improved health care system by helping patients understand their own biology.

The Senior Data Scientist - Biostatistics will play a key role in developing the methodologies we will incorporate into our Advanced Research Platform to analyze and model our rich set of patient health data. You will report to the Director of Data Science within the Translational Science organization. You will collaborate with other data scientists, computational biologists, researchers, bioinformaticists, and data engineers to: determine and understand analysis requirements; discover, develop, and evaluate methods for generating insights and biomarkers from phenotypic and biologic data; build, document, and manage our library of statistical and modeling methods; and integrate new methodologies into our Advanced Research Platform.

We are seeking candidates who possess:

  • experience in and passion for generating insights from health data
  • intuition, creativity, and appreciation for the health questions answerable from our data
  • a deep understanding of statistical analysis and modeling techniques as applied to health and biological data
  • proficiency writing code that supports analytics efforts (e.g., exploratory research, feature selection, data wrangling)


Collaborate with scientists to determine analysis requirements

  • Identify health / biological problems of interest
  • Propose methods for application to problem
  • Determine performance metrics and proper interpretation of results

Extend our computational biology analysis and modeling library

  • Identify and curate promising new statistical methods from available toolboxes, literature, or original research
  • Evaluate new methods applied to appropriate data in terms of performance metrics and relevance to health / biology
  • Document results and package up the methods for use in our Advanced Research Platform

Generate datasets for evaluation/testing

  • Be self-sufficient in wrangling and transforming data acquired from various sources for subsequent analysis and modeling (e.g., our internal data warehouse; external data that is publicly available; simulated data)
  • Provide feedback for improving data collection and processing

Support processes for model development and deployment

  • Cooperate with the Science team to version and track our work and results: analysis requirements, code, feature definitions, and data
  • Work with engineers to deploy methods to the Advance Research Platform

Desired Skills and Experience:

  • Advanced degree with significant focus on statistical analysis and modeling, OR
  • 3-5 years relevant professional experience, OR
  • A convincing combination of each


  • Experience applying statistical methods to scientific analysis problems, OR
  • Prior experience in a scientific career or field of study, OR
  • Deep personal interest in science and eagerness to collaborate with scientists

Statistics / Mathematics

  • Expertise with standard statistical techniques and analysis methods (classification, regression, clustering, dimension reduction); machine learning experience highly desirable
  • Track record of applying analytical methods to answer real-world questions (health/medical/biology domains strongly preferred)
  • Ability to read and understand statistical research papers, then implement new methods in code
  • Grok the theory behind the standard biostats toolbox and new, unfamiliar algorithms
  • Invent or discover novel methods / algorithms as needed for new applications


  • Advanced proficiency in R (strongly preferred) or another statistical software package
  • Solid SQL skills
  • Proficiency with Python (very nice to have)