Data Scientist – Omics & Systems Biology
Soley is looking to hire a talented bioinformatics data scientist who is self-motivated, collaborative and driven with a strong discovery mindset. The individual will develop and deploy analytical and bioinformatics pipelines to build the discovery and drug development platform.
Responsibilities
- Develop and deploy advanced computational pipelines in Python and R to process and analyze high-throughput biological and chemistry data to support Soley’s proprietary discovery platform.
- Establish scalable and reproducible workflows for both proprietary and public datasets, ensuring robustness, efficiency, and traceability.
- Perform comprehensive data preprocessing, including cleaning, transformation, normalization, and quality control of diverse omics datasets (e.g., RNA-seq, proteomics, genomics).
- Develop, optimize, and integrate multi-omics data to uncover mechanistic insights, identify biomarkers, and support mechanism-of-action (MOA) hypotheses.
- Apply and develop mathematical and statistical models (e.g., classification, regression, clustering, dimensionality reduction) for tasks such as patient stratification and drug response prediction.
- Communicate findings effectively to diverse stakeholders, including experimental scientists, executives, and AI/ML team members.
- Cross-functional collaboration with interdisciplinary teams including automation, science, and machine learning experts to scale and optimize analytical workflows.
- Continuously update on advancements in bioinformatics, data science, and computational biology, and incorporate innovative methods to improve existing capabilities.
- Documentation of analytical workflows
Required Qualifications:
- 5-7 years of experience in Data Science, Bioinformatics, Biostatistics in life sciences or biotech.
- Ph.D. in bioinformatics or computational biology
- Domain expertise in running computational pipelines and bioinformatics tools for at least one of the following ‘omics fields: genomics, transcriptomics, or proteomics.
- Demonstrated experience in the analysis of high-throughput biomedical data, including data collection, management, cleaning, and normalization of omics datasets.
- Proven ability to apply and develop computational tools for interpreting, analyzing, and visualizing complex, multivariate biological data.
- Strong proficiency in statistical and machine learning methods, including parametric and non-parametric statistics, linear and multiple regression, classification algorithms (e.g., Random Forest, SVM, XGBoost), neural networks, and dimensionality reduction techniques.
- Experience with systems biology approaches for integrative analysis of multi-omics or other high-dimensional datasets.
- Demonstrated experience in algorithm development and prototyping for high-dimensional data analysis.
- Familiarity with version control systems (e.g., Git) and workflow management tools (e.g., Snakemake).
- Proficiency in Python and R for statistical computing, data manipulation, and visualization.
- Experience in developing novel tools or statistical methods for large-scale data analysis is a plus.
Excellent organizational and critical thinking skills, clear written documentation, and verbal communication skills.