Job description Posted 06 March 2024

Data Engineer - Python, Google Cloud

6 months

£700 - £780 per day (via Umbrella)

London (Stanley Building, King's Cross) - hybrid working

We see a world in which advanced applications of Machine Learning and AI will allow us to develop novel therapies to existing diseases and to quickly respond to emerging or changing diseases with personalized drugs, driving better outcomes at reduced cost with fewer side effects. It is an ambitious vision that will require the development of products and solutions at the cutting edge of Machine Learning and AI. We’re looking for a highly skilled data engineer to help us make this vision a reality.

Strong candidates will have a track record of shipping data products derived from complex sources, responsible for the process from conceptual data pipelines to production scale. We have a commitment to quality, so successful candidates will be able to use modern cloud tooling and techniques to deliver reliable data pipelines and continuously improve them.

This role requires a passion for solving challenging problems aligned to exciting Artificial Intelligence and Machine Learning applications. Educational or professional background in the biological sciences is a plus but is not necessary; passion to help therapies for new and existing diseases, and a pattern of continuous learning and development is mandatory.

Key responsibilities

• Build data pipelines using modern data engineering tools on Google Cloud: Python, Spark, SQL, BigQuery, Cloud Storage

• Responsible for high quality software implementations according to best practices, including automated test suites and documentation

• Develop, measure, and monitor key metrics for all tools and services and consistently seek to iterate on and improve them

• Participate in code reviews, continuously improving personal standards as well as the wider team and product

• Liaise with other technical staff and data engineers in the team and across allied teams, to build an end-to-end pipeline consuming other data products

Basic qualifications:

• 2+ years of data engineering experience with a Bachelors’ degree in a relevant field (including computational, numerate or life sciences), or equivalent experience

• Cloud experience (e.g. Google Cloud preferred)

• Strong skills with industry experience in Python and SQL

• Unit testing experience (e.g. pytest)

• Knowledge of agile practices and able to perform in agile software development environments

• Strong experience with modern software development tools / ways of working (e.g. git/GitHub, DevOps tools for deployment)

Preferred qualifications:

• Demonstrated experience with biological or scientific data (e.g. genomics, transcriptomics, proteomics), or pharmaceutical industry experience

• Knowledge of NLP techniques and experience of processing unstructured data, using vector stores, and approximate retrieval

• Familiarity with orchestration tooling (e.g. Airflow or Google Workflows)

• Experience with AI/ML powered applications

• Experience with Docker or containerized applications