Data Engineer

  • 39099
  • Non-Life - Data Science
  • |
  • New York, United States
  • |
  • Feb 10, 2021
  • Develop, implement, and deploy custom data pipelines powering machine learning algorithms, insights generation, client benchmarking tools, business intelligence dashboards, reporting and new data products.
  • Innovate new ways to leverage enormous amounts of various datasets to drive revenues via the development of new products with the Data Strategy team, as well as the enhanced delivery of existing products
  • Consume data from a variety of sources & formats
  • Construct and maintain data pipelines between databases, and other sources, with the data lake utilizing modern ETL frameworks
  • Own the role of data steward for a variety of high value datasets and implement innovative quality assurance practices
  • Establish and implement metadata management standards and capabilities, including lineage mapping
  • Enforce strong development standards across the team through code reviews, unit testing, and monitoring
  • Perform basic data analysis within Jupyter Notebooks to validate the fulfillment of requirements for data pipelines
  • Evangelize data strategy techniques and best practices throughout global strategic advisory
  • Keep up-to-date on the latest trends and innovation in data technology and how these trends apply to company’s business and data strategy
  • 3-5 years of relevant experience as a data engineer or in a similar role
  • Bachelor’s or master's degree in data science, computer science or related quantitative field such as applied mathematics, statistics, engineering, or operations research
  • Extensive experience with Spark, Python, and SQL
  • Extensive experience integrating data from semi-structured
  • Experience deploying/maintaining cloud resources (AWS, Azure, or GCP)
  • Knowledge of various industry-leading SQL and NoSQL database systems
  • Experience working in an Agile environment to facilitate the quick and effective fulfillment of group goals
  • Good interpersonal skills for establishing and maintaining good internal relationships, working well as part of a team and for presentations and discussions
  • Strong analytical skills and intellectual curiosity as demonstrated through academic experience or work assignments
  • Good ability to prioritize workload according to volume, urgency, etc. and to deliver on required projects in a timely fashion
  • Strong understanding of entity resolution, streaming technologies, and ELT/ETL frameworks
  • Ability to articulate the advantages of various cloud and on-premises deployment options
  • Experience with Master Data Management
  • Experience with web scraping and crowd sourcing technologies
  • Familiarity with modern data productivity frameworks and their alternatives such as Databricks, DataRobot, and Alteryx
  • Experience with the MS Azure cloud environment, including ARM template deployments
  • Strong knowledge of CI/CD principles and practical experience with a CI/CD technology (Azure Devops, GitLab, Travis, Jenkins)