Working as part of the Big Data Engineering team which is responsible for transforming data into useful information for the data science team and product team. Working with Linux systems and Hadoop databases to extract data from Hadoop database and ingested using Scoop. Staging the real time data from gateway into AWS S3 or Azure Blob storage. Analyzing data using SQL Queries and transforming data into various stages such as preprocessed, standardized and filtered. Implementing Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data. Responsible to Use partitions in Spark session to improve the performance of the load time. Creating Pipelines in ADF using Linked Services/Datasets/ Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards. Responsible for importing data and develop Spark streaming pipeline in Java Work under supervision. Travel And/or Relocation to unanticipated client sites is required.
Master's degree in Computer Science/IT/IS/Engineering (Any) or closely related field.
Standard Company Benefits.
Please see the Job description