Job description Posted 30 June 2020

Data Engineer: Talend, Azure, Power BI, Docker/Kubernetes

Job Purpose

The data engineer, which is an emerging role in [GSK] PSC DDA team, will play a pivotal role in operationalizing data and analytics initiatives for GSK’s digital business initiatives. The bulk of the data engineer’s work would be in building, managing and optimizing data pipelines and then moving these data pipelines effectively into production for key data and analytics consumers (like business/data analysts, data scientists or any persona that needs curated data for data and analytics use cases).

Data engineers also need to guarantee compliance with data governance and data security requirements while creating, improving and operationalizing these integrated and reusable data pipelines. This would enable faster data access, integrated data reuse and vastly improved time-to-solution for GSK’s DDA analytics initiatives. The data engineer will be measured on their ability to integrate analytics and (or) data science results GSK’s business processes.

The newly hired data engineer will be the key interface in operationalizing data and analytics on behalf of the [business unit(s)] and organizational outcomes. This role will require both creative and collaborative working with Tech and the wider business. It will involve evangelizing effective data management practices and promoting better understanding of data and analytics. The data engineer will also be tasked with working with key business stakeholders, Tech experts and subject-matter experts to plan and deliver optimal analytics and data science solutions.

Key Responsibilities

  • Build data pipelines: Managed data pipelines consist of a series of stages through which data flows (for example, from data sources or endpoints of acquisition to integration to consumption for specific use cases). These data pipelines have to be created, maintained and optimized as workloads move from development to production for specific use cases. Architecting, creating and maintaining data pipelines will be the primary responsibility of the data engineer.
  • Drive Automation through effective metadata management: The data engineer will be responsible for using innovative and modern tools, techniques and architectures to partially or completely automate the most-common, repeatable and tedious data preparation and integration tasks in order to minimize manual and error-prone processes and improve productivity. The data engineer will also need to assist with renovating the data management infrastructure to drive automation in data integration and management.

Technical Knowledge/Skills

  • Strong ability to design, build and manage data pipelines for data structures encompassing data transformation, data models, schemas, metadata and workload management. The ability to work with both IT and business in integrating analytics and data science output into business processes and workflows.
  • Strong experience in working with large, heterogeneous datasets in building and optimizing data pipelines, pipeline architectures and integrated datasets using traditional data integration technologies. These should include [ETL/ELT, data replication/CDC, message-oriented data movement, API design and access] and upcoming data ingestion and integration technologies such as [stream data integration, CEP and data virtualization].
  • Hands-on development exposure on Microsoft Azure cloud services-Spark structure streaming with Azure,Spark on HD insight, Azure Stream Analytics, Azure Data Factory(v1 & v2), Azure Blob Storage, Azure Data Lake Store(Gen1/Gen2), SQL DW, and Denodo, Talend, others
  • Hands on Talend a MUST
  • Basic experience working with popular data discovery, analytics and BI software tools like [Tableau, Qlik, PowerBI and others] for semantic-layer-based data discovery.
  • Basic experience in working with [data governance/data quality] and [data security] teams and specifically [information stewards] and [privacy and security officers] in moving data pipelines into production with appropriate data quality, governance and security standards and certification.
  • Demonstrated ability to work across multiple deployment environments including [cloud, on-premises and hybrid], multiple operating systems and through containerization techniques such as [Docker, Kubernetes, and others].

Additional information about the process

Join GSK’s vision to do more, feel better and live longer:

Who will I be working with?