Senior Data Engineer
£545.00 per day Umbrella
London (Hybrid)
6 Month Contract
Our client is currently searching for a Senior Data Engineer to join their team in London. If you are interested in this role, please do not hesitate to apply!
In this role you will:
• Execute migration of raw and derived unstructured datasets (images, videos, etc) between on-prem and cloud data locations (e.g. GCP, Azure, AWS). Datasets magnitude vary between small scale (Gb) up to large scale (Tb).
• Ensure consistency between the data ingested and the data manifests.
• Organise raw and derived data into appropriate hierarchies.
• Collaborate with AI/ML engineers and product managers to
o Develop data pipelines for incoming batch data and update existing pipelines where necessary.
o Design and implement well decoupled, modularized, reusable, and scalable scripts and code for the retrieval and pre-processing of large-scale histopathology images into the AI/ML pipeline (i.e. each one with order of magnitude of gigabytes)
• Document data flows and ingestion pipelines, data use and re-use
• Implement data flows to connect operational systems, data for analytics and business intelligence (BI) systems (e.g. Power-BI)
• Ensure completion of requisite documentation i.e. ingestion form and any related IHD documentation
• Track & report completion of data migration to AIML & Onyx stakeholders and raise blockers preventing migration.
[Non-comp path requirements]
• Migrate ML pipelines from on-prem HPC solutions to the cloud.
• Migrate ML pipelines between cloud environments and across cloud computing providers.
• Optimise and parallelise said ML pipelines for scalability, speed and cost efficiency.
We are looking for professionals with these required skills to achieve our goals:
• 5+ years of work experience as a professional data/software engineer.
• Experience with large-size images and data formats for computational pathology(e.g. .svs, .tiff, .h5).
• Advanced programming expertise in Python and in developing and delivering robust software solutions.
• Machine learning experience / background
• CICD experience
• Expert level and industrial experience in design, development and deployment of data engineering pipelines.
• Experience with cloud platforms, such as Google Cloud Platform, Azure, AWS (preference GCP)
• Experience in handling big data at scale.
It would a plus if you:
• Expertise in SQL and/or similar database languages.
• Experience with business intelligence platforms, e.g. Power-BI
Please note, due to the high volume of applications we are only able to respond to the successful candidates in the first instance. Thank you for your interest in this role.