Amazon

Data Engineer, Prime Video Personalization and Discovery Data Platform

Amazon, New York, New York, us, 10261

Description

Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience.

As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people.

We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career in Prime Video Tech takes you!

Prime Video is a premium streaming service that offers a vast collection of digital videos—all with the ease of customers finding what they love to watch in one place.

Included with Prime: Watch thousands of popular movies and TV shows, including our critically-acclaimed Amazon Originals such as the Emmy Award-winning comedies Fleabag and The Marvelous Mrs. Maisel, Tom Clancy’s Jack Ryan, The Boys, Citadel, Mr. & Mrs. Smith, Reacher, the Academy Award-winning Manchester by the Sea and The Salesman, Academy Award-nominated The Big Sick and Cold War, exclusives, live sports including Thursday Night Football and licensed content available in more than 200 countries and territories worldwide

Prime Video Personalization and Discovery team is building a new Data Platform . The platform will streamline ML and engineering teams to access and leverage high-quality, consistent customer-level data for offline processing and in real-time adhering to committed SLAs. We will act as guardians and gatekeepers of data shielding our customers from data issues while collaborating with data sources to address problems at their root. We will own and maintain "golden" datasets — centralized, reliable data assets commonly used across teams to train ML models and deliver top-notch personalized storefront experience, that is unique for every PV end-customer.

We will own Data Registry; we ensure that datasets are enriched with comprehensive classification and metadata, making them easily discoverable and accessible, and enabling ML and engineering teams to focus on deriving value for end customers rather than resolving data inconsistencies. Our goal is to save time for science and engineering teams to work with customer-level data, avoid duplicated boiler-plate data transformations, quality inspections, save IMR costs and increase experimentation velocity.

Key job responsibilities

Design and build big data pipelines via Spark/Flink that can handle Petabytes of data per month

Build resilient data pipelines with extensive unit tests/integration tests with CI/CD development lifecycle

Participate in oncall rotation supporting a large number of production data pipelines

Manage and orchestrate version migrations across the metadata, transform, and storage layers.

Oversee and continually improve production operations, including optimizing data delivery, re-designing infrastructure for greater scalability, code deployments, bug fixes and overall release management and coordination.

Work closely with Product teams, Data Scientists, Software developers and Business Intelligence Engineer to explore new data sources and deliver the data.

Able to read, write, and debug data processing and orchestration code written Python/Scala etc following best coding standards (e.g. version controlled, code reviewed, etc.)

About the team

Our vision is to build a resilient, centralized data platform that distributes and streamlines access to the high quality, easily-discoverable customer-level data for offline processing and in real-time, enabling PVPD science and engineering teams to accelerate the focus on ML and engineering innovations to enhance personalization of PV end customer experiences.

Basic Qualifications

3+ years of data engineering experience

Experience with data modeling, warehousing and building ETL pipelines

Experience with SQL

Preferred Qualifications

Experience with AWS technologies like Redshift, S3, AWS Glue, EMR, Kinesis, FireHose, Lambda, and IAM roles and permissions

Experience building MPP data transforms that process multiple Tb per day.

Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $118,900/year in our lowest geographic market up to $205,600/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, please visit https://www.aboutamazon.com/workplace/employee-benefits . This position will remain posted until filled. Applicants should apply via our internal or external career site.