Data Engineer, Amazon AGI, AGI Data Services

Amazon | Boston, Massachusetts, United States | December 19, 2024

Posted:	19 Dec 2024
Company:	Amazon
Category:	Data Engineering
Country:	US - United States
State:	None - None
City:	Boston
Zip code:	None

AI is the most transformational technology of our time, capable of tackling some of humanity’s most challenging problems. Amazon is investing in generative AI and the responsible development and deployment of large language models (LLMs) across all of our businesses. Come build the future of human-technology interaction with us.
We are looking for those candidates who just don’t think out of the box, but make the box they are in ‘Bigger’. The future is now, do you want to be a part of it? Then read on!

We’re looking for a Data Engineer on Amazon’s AGI team to build world-class data platforms and deploy scalable data ingestion tools with a commitment to foster the safe, responsible, and effective development of AI technologies . The ideal candidate is an expert with Petabyte scale data ingestion, processing data, data modeling, ETL/ELT design and business intelligence tools and passionately partners with the business to identify strategic opportunities where improvements in data infrastructure creates outsized business impact. They are a self-starter, comfortable with ambiguity, able to think big (while paying careful attention to detail) and enjoys working in a fast-paced team. The ideal candidate needs to possess exceptional technical expertise with largescale lakehouses, distributed computing at a scale of thousands of hosts on multiple clusters, Spark, BI systems and AWS services.

Core Responsibilities

· Design, implement, and support a platform providing ad hoc access to large datasets
· Interface with other technology teams to extract, transform, and load data from a wide variety of data sources using Spark or any other state of the art systems
· Implement data structures using best practices for lakehouses
· Model data and metadata for ad hoc and pre-built reporting, meeting read/write/summary optimized storages
· Interface with business customers, gathering requirements and delivering complete reporting solutions
· Build robust and scalable data integration (ETL) pipelines using Kotlin, Python, typescript and Spark
· Build and deliver high quality datasets to support business analyst and customer reporting needs
· Continually improve ongoing automating or simplifying self-service Data ingestion at scale for customers
· Participate in strategic & tactical planning discussions

- 3+ years of data engineering experience
- Experience with data modeling, warehousing and building ETL pipelines
- Knowledge of batch and streaming data architectures like Kafka, Kinesis, Flink, Storm, Beam
- Knowledge of distributed systems as it pertains to data storage and computing
- Experience with SQL

- Experience with AWS technologies like Redshift, S3, AWS Glue, EMR, Kinesis, FireHose, Lambda, and IAM roles and permissions
- Experience with non-relational databases / data stores (object storage, document or key-value stores, graph databases, column-family databases)

Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation, please visit