NO C2C. Cannot typically sponsor. This is a direct-hire role - no contract periods available. You must be eligible and authorized to work in the US; Visa transfers are typically an option*.
--
A global energy infrastructure company is seeking a highly skilled Data Scientist to design, build, and deploy machine learning and optimization models in a distributed computing environment. This role involves full lifecycle model development—from data preparation to deployment—and requires a strong foundation in data engineering, Python programming, and Spark (PySpark).
You’ll work in close partnership with technical and business stakeholders to ensure that data solutions provide measurable impact. This is an onsite role based in Arlington, VA, reporting to the Director of Business Intelligence, and embedded within the organization’s enterprise applications team.
Key Responsibilities:
- Own and execute end-to-end data science workflows, from requirements gathering to model deployment and evaluation.
- Transform, clean, and join large datasets to make them suitable for modeling—no data, no model.
- Build, test, and deploy machine learning, simulation, and optimization models that drive real business value.
- Deliver production-ready code that is modular, scalable, and minimizes technical debt.
- Engage with subject matter experts to understand and solve targeted operational or strategic problems.
- Contribute to the design of automated data science pipelines and continuous improvement of infrastructure.
Qualifications:
Required:
- Bachelor's degree in a quantitative or technical field and 2+ years of experience, or a Master’s in Data Science or related discipline.
- Proficiency in Python and deep familiarity with one or more data manipulation libraries (e.g., Pandas, Polars, Spark).
- Experience preparing data for modeling: including ingestion, cleansing, transformation, aggregation, and profiling.
- Ability to work autonomously, communicate technical concepts clearly, and manage multiple tasks in a fast-paced setting.
Preferred:
- Hands-on expertise in Spark (PySpark) using the Spark SQL API (not Pandas-on-Spark).
- Ability to build data visualizations using libraries like Seaborn, Plotly, Matplotlib, Altair, etc.
- Familiarity with functional programming paradigms, unit testing, and software engineering best practices.
- Experience with simulation modeling (e.g., system dynamics, agent-based modeling, or discrete event simulation).
- Exposure to distributed ML/DL libraries (e.g., SparkML, PyTorch).
- Familiarity with open-source Python optimization frameworks.