The data engineer will work within the Data Management Platform (DMP) team and is responsible for the development and maintenance of data from Endurance, Constant Contact and third party sources in both batch and near real time to analytic systems and customer touch point systems such as Genesys and Salesforce. The DMP is built on the Hortonworks Data Platform distribution of the hadoop file systems and uses the Hortonworks Data Flow distribution featuring Apache Nifi to move data from enterprise sources to manage data in motion securely and efficiently. The senior engineer will design the extract layer and as well as the target API to move data in near real time and/or batch oriented from brand operational system. We are looking for a Big Data engineer that will work on the collecting, storing, processing and analyzing o huge datasets. The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them. You may also be responsible for integrating them with the architecture used across the company.
The candidate will also support or operationalize Data Science team work.
The candidate must have previous experience with database development and data analysis. Extensive experience with Structured Query Language (SQL) or Hive Query Language (Stinger 2.0 or better) is a must. In addition, strong understanding of Nifi and Spark is essential. Either a bachelor's degree in computer science or a related field or a level of work experience is an appropriate substitute for a degree. Specifically, the individual should have familiarity in the Apache Hadoop stack, particularly but not limited to oozie, pig, zeppelin and ambari. Knowledge in sed/awk shell scripting as well as Scala or Java is highly recommended.
The engineer should be highly organized with skills that include excellent written and verbal communication, problem solving, data analysis and the ability to work alone or with others as needed. Knowledge of how to work with multi-database environments integral to their database development position, so prior experience is a plus. The DMP team works with outside technologies and development programs, so the ability to pick up technical skills quickly and adapt to new technologies as they are introduced is imperative.
Database Engineer Tasks:
- Proficient understanding of distributed computing principles
- Proficiency with Hadoop v2, MapReduce, HDFS.
- Good knowledge of Big Data querying tools, such as Pig, Hive, and Impala
- Experience with Spark is highly recommend
- Experience with integration of data from multiple data sources
- Document work of operational responsibilities.
- Create UNIX shell scripts
- Perform Quality Assurance and Analysis
- Develop Java/Python/Scala code
- Develop efficient code in terms of performance and resource utilization
- Implementing ETL process
- Selecting and integrating any BigData tools and frameworks required to provide requested capabilities
Basic Non-functional Skills:
- Familiar with Kanban Development
- Ability to quickly adapt to changing Apache landscape
- Experienced in leveraging Open Source
- Familiar with Source Control Management