Data Scientist

Soley Therapeutics • Full-time • South San Francisco, CA, US • 1w ago

Data Scientist – Omics & Systems Biology

Soley is looking to hire a talented bioinformatics data scientist who is self-motivated, collaborative and driven with a strong discovery mindset. The individual will develop and deploy analytical and bioinformatics pipelines to build the discovery and drug development platform.

Responsibilities

Develop and deploy advanced computational pipelines in Python and R to process and analyze high-throughput biological and chemistry data to support Soley’s proprietary discovery platform.
Establish scalable and reproducible workflows for both proprietary and public datasets, ensuring robustness, efficiency, and traceability.
Perform comprehensive data preprocessing, including cleaning, transformation, normalization, and quality control of diverse omics datasets (e.g., RNA-seq, proteomics, genomics).
Develop, optimize, and integrate multi-omics data to uncover mechanistic insights, identify biomarkers, and support mechanism-of-action (MOA) hypotheses.
Apply and develop mathematical and statistical models (e.g., classification, regression, clustering, dimensionality reduction) for tasks such as patient stratification and drug response prediction.
Communicate findings effectively to diverse stakeholders, including experimental scientists, executives, and AI/ML team members.
Cross-functional collaboration with interdisciplinary teams including automation, science, and machine learning experts to scale and optimize analytical workflows.
Continuously update on advancements in bioinformatics, data science, and computational biology, and incorporate innovative methods to improve existing capabilities.
Documentation of analytical workflows

Required Qualifications:

5-7 years of experience in Data Science, Bioinformatics, Biostatistics in life sciences or biotech.
Ph.D. in bioinformatics or computational biology
Domain expertise in running computational pipelines and bioinformatics tools for at least one of the following ‘omics fields: genomics, transcriptomics, or proteomics.
Demonstrated experience in the analysis of high-throughput biomedical data, including data collection, management, cleaning, and normalization of omics datasets.
Proven ability to apply and develop computational tools for interpreting, analyzing, and visualizing complex, multivariate biological data.
Strong proficiency in statistical and machine learning methods, including parametric and non-parametric statistics, linear and multiple regression, classification algorithms (e.g., Random Forest, SVM, XGBoost), neural networks, and dimensionality reduction techniques.
Experience with systems biology approaches for integrative analysis of multi-omics or other high-dimensional datasets.
Demonstrated experience in algorithm development and prototyping for high-dimensional data analysis.
Familiarity with version control systems (e.g., Git) and workflow management tools (e.g., Snakemake).
Proficiency in Python and R for statistical computing, data manipulation, and visualization.
Experience in developing novel tools or statistical methods for large-scale data analysis is a plus.

Excellent organizational and critical thinking skills, clear written documentation, and verbal communication skills.