Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

Annapurna Labs designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.AWS Neuron is the complete software stack for the AWS Trainium (Trn1/Trn2) and Inferentia (Inf1/Inf2) our cloud-scale Machine Learning accelerators. This role is for a Senior Machine Learning Engineer in the Distribute Training team for AWS Neuron, responsible for development, enablement and performance tuning of a wide variety of ML model families, including massive-scale Large Language Models (LLM) such as GPT and Llama, as well as Stable Diffusion, Vision Transformers (ViT) and many more.The ML Distributed Training team works side by side with chip architects, compiler engineers and runtime engineers to create, build and tune distributed training solutions with Trainium instances. Experience with training these large models using Python is a must. FSDP (Fully-Sharded Data Parallel), Deepspeed, Nemo and other distributed training libraries are central to this and extending all of this for the Neuron based system is key.Key job responsibilitiesYou will lead efforts to build distributed training support into PyTorch and JAX using XLA, the Neuron compiler, and runtime stacks. You will optimize models to achieve peak performance and maximize efficiency on AWS custom silicon, including Trainium and Inferentia, as well as Trn2, Trn1, Inf1, and Inf2 servers. Strong software development skills, the ability to deep dive, work effectively within cross-functional teams, and a solid foundation in Machine Learning are critical for success in this role.About the teamAnnapurna Labs was a startup company acquired by AWS in 2015, and is now fully integrated. If AWS is an infrastructure company, then think Annapurna Labs as the infrastructure provider of AWS. Our org covers multiple disciplines including silicon engineering, hardware design and verification, software, and operations. AWS Nitro, ENA, EFA, Graviton and F1 EC2 Instances, AWS Neuron, Inferentia and Trainium ML Accelerators, and in storage with scalable NVMe, are some of the products we have delivered, over the last few years.About the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.BASIC QUALIFICATIONS- Bachelor's degree in computer science or equivalent- 5+ years of non-internship professional software development experience- 5+ years of programming with at least one software programming language experience- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience- 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience- Experience as a mentor, tech lead or leading an engineering team- Experience in machine learning, data mining, information retrieval, statistics or natural language processing ...

Embedded Software Development Engineer, Annapurna Labs

AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for customers who require specialized security solutions for their cloud services.Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.AWS Networking Software enhances every aspect of our customers networking infrastructure and enables all of the benefits of hosting complex workloads in the cloud.We are looking for software development engineers with a background in networking and a passion for innovation for a brand new initiative within the AWS Elastic Network Adapter team.The Elastic Network Adapter team enables enhanced networking on several critical AWS EC2 instances including: Accelerated Computing Instances, Compute Optimized Instances, Memory Optimized Instances, and Storage Optimized Instances running on Linux infrastructure.As a software development engineer on The Elastic Network Adapter team, you will own the architecture and development of features that will revolutionize EC2 core network and work with a brilliant team of experienced engineers. Key job responsibilities- Advanced development of highly scalable and available embedded networking technology- Cross-functional work with architecture and hardware teams- Deep dive into networking protocols- Help to build a wide-scale AWS serviceAbout the teamWe're a team of experienced engineers with diverse background in chip, firmware and embedded software development and deep networking verticals for host side networking. We all work together on core AWS network infrastructure. Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.BASIC QUALIFICATIONS- B.S. in Computer Science or related technical field- 5+ years of professional engineering experience- 3+ years of experience with programming language: C or C++- 3+ years of experience in high speed embedded Linux systems ...

Software Development Manager, Neuron Runtime

Annapurna Labs builds high-performance hardware and software solutions used in AWS data centers globally. We’re looking for a software development manager with a focus on runtime development of Neuron SDK. As a Software Development Manager for the Neuron Runtime team you will be responsible for leading a strong and excellent team of engineers in design, development, test, and deployment of this Neuron runtime software, including drivers, tools and testing of future and past ML platforms (Pre/Post Silicone). A successful candidate will have an established background in developing customer-facing experiences, a strong technical ability, excellent project skills, great communication skills, and a motivation to achieve results in a fast-paced environment. You will be helping to hire and build your team and systems.Key job responsibilities- Responsible for the over-all systems development life cycle for our existing and future ML platforms.- Management and execution against project plans and delivery commitments - Manage the day-to-day activities of the engineering team within an Agile/Scrum environment - Management of departmental resources, staffing, mentoring, enhancement, and maintaining a best-of-class engineering team - Report on status of development, quality, operations, and system performance to management- Work closely with across different teams such as HW Design, HW verification, ML Software teams and make decisions to bring the next ML platforms to AWS.About the team*Utility Computing (UC)* AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services.**Why AWS**Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.**Diverse Experiences**Amazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.**Work/Life Balance* *We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. **Inclusive Team Culture* *Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.**Mentorship and Career Growth**We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. BASIC QUALIFICATIONS- 3+ years of engineering team management experience- 7+ years of working directly within engineering teams experience- 3+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience- Knowledge of engineering practices and patterns for the full software/hardware/networks development life cycle, including coding standards, code reviews, source control management, build processes, testing, certification, and livesite operations- Experience partnering with product or program management teams ...

ASIC Design Engineer, Cloud-Scale Machine Learning Acceleration team

Utility Computing (UC)AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for customers who require specialized security solutions for their cloud services.Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Custom SoCs (System on Chip) live at the heart of AWS Machine Learning servers. As a member of the Cloud-Scale Machine Learning Acceleration team you’ll be responsible for the design and optimization of hardware in our data centers including AWS Inferentia, our custom designed machine learning inference datacenter server. Our success depends on our world-class server infrastructure; we’re handling massive scale and rapid integration of emergent technologies. We’re looking for an ASIC Design Eengineer to help us trail-blaze new technologies and architectures, while ensuring high design quality and making the right trade-offs.Key job responsibilities- integrate multiple subsystems into top level SOC, ensure correct clock/reset/functional/DFT signal routing- As a key member of the ASIC design team, you will implement and deliver high performance, area and power efficient RTL to achieve design targets and specifications.- Analyze design, microarchitecture or architecture to make trade-offs based on features, power, performance or area requirements.- Develop micro-architecture, implement SystemVerilog RTL, and deliver synthesis/timing clean design with constraints.- Perform lint and clock domain crossing quality checks on the design.- Work with with architects, other designers, verification teams, pre- and post-silicon validation teams, synthesis, timing and back-end teams to accomplish your tasks.You will thrive in this role if you:- Are familiar with scripting in Python- Are proficient with assertions- Have good debug skills to analyze RTL test failures- Have a "Learn and Be Curious" mindsetAbout the teamCustom SoCs (System on Chip) live at the heart of AWS Machine Learning servers. As a member of the Cloud-Scale Machine Learning Acceleration team you’ll be responsible for the design and optimization of hardware in our data centers including AWS Inferentia, our custom designed machine learning inference datacenter server. Our success depends on our world-class server infrastructure; we’re handling massive scale and rapid integration of emergent technologies. We’re looking for an ASIC Design Eengineer to help us trail-blaze new technologies and architectures, while ensuring high design quality and making the right trade-offs.Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.BASIC QUALIFICATIONS - B.S. in Electrical Engineering or related technical field - 5+ years of experience in RTL design for SOC - 5+ years of experience VLSI engineering - 5+ years of experience with code quality tools including: Spyglass, LINT, or CDC ...

Software Dev Engineer - Machine Learning Apps, Accelerator, Annapurna ML

By applying to this position, your application will be considered for all locations we hire for in the United States.Are you excited about Machine Learning, chip acceleration, compilers, storage, systems or EC2? Are you passionate about delivering high quality services that affect hundreds of thousands of users? We are the dubbed the "secret sauce" behind AWS's success with development centers in the Canada and Israel, Annarpuna is at the forefront of innovation by combining cloud scale with the world’s most talented engineers.The Annapurna team hires for multiple disciplines Software and Hardware engineers including but not limited to compiler engineer, machine learning engineer, runtime engineer, performance engineer and ML chip accelerator, ASIC, physical designs, SDE in Test. Because of our teams’ breadth of talent, we’ve been able to improve AWS cloud infrastructure in networking and security with products such as AWS Nitro, Enhanced Network Adapter (ENA), and Elastic Fabric Adapter (EFA), in compute with AWS Graviton and F1 EC2 Instances, in machine learning with AWS Neuron, Inferentia and Trainium ML Accelerators, and in storage with scalable NVMe.If this sounds exciting to you - come build the future with us!Key job responsibilitiesInnovating and delivering creative SW Designs to develop new services, solve operational problems, drive improvements in developer velocity, or positively impact operational safetyWriting requirements capturing documents, design documents, integration test plans, and deployment plansCommunicating status and progress of deliverables to schedule, and sharing learnings/ innovations with your team and stakeholdersBASIC QUALIFICATIONS- Currently enrolled in, or completed a Bachelor’s degree program or higher in Computer Science, Computer Engineering, Electrical Engineering or related field- To qualify, applicants should have earned a Bachelor’s or Master’s degree between May 2023 to September 2025. Possible start dates for this role are between January 2025 to October 2025.- Programming experience in internship or coursework with programming language such as Python and/or C or C++.Candidates with strong interests and academic qualifications/research focus in two of the following:- Distributed systems, algorithms (MPI, NCCL, or similar)- Operating System - Linux system programming/services- Computer architecture- System Development- Complexity analysis ...

Sr. Hardware Engineer - ML Acceleration, Annapurna Labs

AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for customers who require specialized security solutions for their cloud services.Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.We are seeking a Hardware Design Engineer with role in the definition, design and validation of AWS next generation ML Chips, Cards and server integration. As a senior member of our hardware team, you will have the outstanding and meaningful opportunity to participate in the design and execution of all PCIe and Serdes topics, with the goal of creating and customized platforms that fit within AWS datacenter’s world leading technology.As a member of the Machine Learning Acceleration team you’ll be responsible for the design and optimization of hardware in our data centers. You’ll provide leadership in the application of new technologies to large scale server deployments in a continuous effort to deliver a world-class customer experience. This is a fast-paced, intellectually challenging position, and you’ll work with thought leaders in multiple technology areas. You’ll have high standards for yourself and everyone you work with, and you’ll be constantly looking for ways to improve your products performance, quality and cost. We’re changing an industry, and we want individuals who are ready for this challenge and want to reach beyond what is possible today.About the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.BASIC QUALIFICATIONS- Deep knowledge with PCIe interface Gen4 or above, both Electrical and Functional at the chip level and at the PCB level. - Deep understanding of Transmission line theory and Electromagnetics and its application in SerDes, Single-ended signal and parallel bus interfaces. * Work with ODMs, IP Silicon vendors, component suppliers and internal design teams on cross-boundary triaging, debugging, and resolving issues.- Hands-on lab equipment skills (VNA, Realtime scope, Sampling scope and its accessories) for electrical validation and characterization. - Scripting skills to automate tests, logs parsing and data collection. - Strong technical communication skills (verbal and written) to interface with cross-functional technical leads within and/or outside of the organization. ...

Serdes PHY Expert, Annapurna Labs

AWS Utility Computing (UC) provides product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cutting-edge cloud computing offerings across the AWS portfolio.Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.We are seeking an Serdes/PCIE Phy expert with role in the definition, design and validation of AWS next generation ML Chips, Cards and server integration. As a senior member of our platform development team, you will have the outstanding and meaningful opportunity to participate in the design and execution of all Serdes/PCIE topics, with the goal of creating and customized platforms that fit within AWS datacenter’s world leading technology. The Serdes/PCIE PHY Expert will need to independently work with vendors, understand the settings, write/modify tests, debug and collect data.Key job responsibilitiesAs a senior member of the team, you will join a group of hardworking engineers to design and implement innovative next generation machine learning chips and servers. In this position, you will make a real impact in a dynamic, technology focused team. Your work will impact the growing field of machine learning.You will collaborate with architects, design teams, software engineers to deliver the next generation ML chip. In this position, you will have the opportunity to be responsible for IP integration, 2.5D design, bring up, Characterization and validation.About the teamAbout the TeamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.BASIC QUALIFICATIONS- BS or MS in EE, ECE or CS- 7+ years of experience in Silicon development with -3+ years in SOC/IO/Subsystems- Deep understanding of Serdes/PCIE at the PHY and controller level including inner workings of PHY component blocks- Familiar with industry standard protocols such as PCIE- Experience with test chip characterization and testing compliance- Experience with post silicon testing include of shmoos including BER, PRBS, Eq settings- Drive the IP Integration and design of silicon and 2.5D packaging- Support the physical design team, review clocking and timing constraints- Drive cross-functional triage effort on complex functional and performance issues- Take the leadership role in post-silicon bring-up including test plans and execution- Knowledge of channel electrical and associated tuning parameters, e.g. TX PSET values, RX equalization- Perform system-level debug and root-cause analysis through bring-up, characterization, validation and production phase- Experience Working with 3rd party IP vendors- Strong Firmware development skills within embedded environments ...

ML Compiler Engineer, Annapurna Labs

The AWS Neuron Compiler team is actively seeking skilled compiler engineers to join our efforts in developing a state-of-the-art deep learning compiler stack. This stack is designed to optimize application models across diverse domains, including Large Language and Vision, originating from leading frameworks such as PyTorch, TensorFlow, and JAX. Your role will involve working closely with our custom-built Machine Learning accelerators, including Inferentia/Trainium, which represent the forefront of AWS innovation for advanced ML capabilities, powering solutions like Generative AI. In this role as a ML Compiler engineer, you'll be instrumental in designing, developing, and optimizing features for our compiler. Your responsibilities will involve tackling crucial challenges alongside a talented engineering team, contributing to leading-edge design and research in compiler technology and deep-learning systems software. Additionally, you'll collaborate closely with cross-functional team members from the Runtime, Frameworks, and Hardware teams to ensure system-wide performance optimization. As part of the Backend team, you'll play a significant role in designing and developing various aspects of our system. This includes but is not limited to instruction scheduling, memory allocation, data transfer optimization, graph partitioning, parallel programing, code generation, Instruction Set Architectures, new hardware bring-up, and hardware-software co-design.AWS Utility Computing (UC) provides product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cutting-edge cloud computing offerings across the AWS portfolio.Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.Key job responsibilitiesOur engineers collaborate across diverse teams, projects, and environments to have a firsthand impact on our global customer base. You will:Solve challenging technical problems, often ones not solved before, at every layer of the stack. Design, implement, test, deploy and maintain innovative software solutions to transform service performance, durability, cost, and security.Research implementations that deliver the best possible experiences for customers.A day in the lifeAs you design and code solutions to help our team drive efficiencies in software architecture, you’ll create metrics, implement automation and other improvements, and resolve the root cause of software defects. You’ll also:Build high-impact solutions to deliver to our large customer base.Participate in design discussions, code review, and communicate with internal and external stakeholders.Work cross-functionally to help drive business decisions with your technical input.Work in a startup-like development environment, where you’re always working on the most important stuff.About the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. BASIC QUALIFICATIONS- B.S. or M.S. in computer science or related field- Proficiency with 1 or more of the following programming languages: C++ (preferred), Python- 3+ years of non-internship professional software development experience- 2+ years of experience developing compiler optimization, graph-theory, hardware bring-up, FPGA placement and routing algorithms, or hardware resource management ...

Lab Engineer, Annapurnna MLA Hardware

AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for customers who require specialized security solutions for their cloud services.Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.Key job responsibilitiesAs a member of the Machine Learning Acceleration team, you’ll manage the organization’s lab activities and develop solutions to improve lab automation.You will support the Hardware and Software Engineering teams with lab setup and maintenance, as well as work with external / internal groups in the organization, including vendors for lab equipment.In this role, you will provide technical support for product development projects in the Machine learning lab including AWS Server integration and maintenance.We are looking for candidates who thrive in a fast-paced, start-up like environment, and who can work independently to deliver multiple projects in parallel across multiple sites. To be successful you need to be hands-on, highly motivated and detailed oriented while meeting the highest standards, time to market, cost and quality goals.About the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.BASIC QUALIFICATIONS- Associates degree in Electrical Engineering or related field - 5+ years of related experience in engineering labs- Soldering and rework skills (0402, QFNs, etc.)- Experience in reading schematics and PCB layout - Great problem solving and analytical skills, including organization and communication skills- Manage inventory and procurement of components and equipment - Good at prioritizing tasks and keeping detailed logs of rework, revision history etc. ...

Sr. Software Development Engineer, Graviton, Annapurna Labs

AWS Utility Computing (UC) provides product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cutting-edge cloud computing offerings across the AWS portfolio.Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.As a senior software engineer, you will work on improving the arm software by rewriting or optimizing the existing libraries to deliver the best performance on Graviton CPUs. You will get an opportunity to work across Gaming, Rendering engines, Machine Learning, High Performance Computing, and several generic libraries using C, C++, Assembly, Python, and Fortran. If you are already an open source developer or passionate about it, you will be able to continue your passion and contribute back to the community across all those projects. In this role you will use and further develop your deep knowledge in areas including design, implementation, and data analysis. You will have the opportunity to understand and improve how AWS delivers on one of the most important shifts at the core of the data center: our migration to Arm applications. About the teamAbout the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.BASIC QUALIFICATIONS- At least 7 years of relevant work experience and software development in HPC, MPI, PGAS, OpenMP, C++, Python, and Fortran.- Expert knowledge of computer programming trends and their application to High Performance Computing (HPC).- Detailed knowledge of computer architecture (CPU, GPU, interconnects) technologies.- A demonstrated ability to lead technical efforts in a team environment. ...

Software Engineer AI/ML, AWS Neuron Distributed Training Team

Annapurna Labs was a startup company acquired by AWS in 2015, and is now fully integrated. If AWS is an infrastructure company, then think Annapurna Labs as the infrastructure provider of AWS. Our org covers multiple disciplines including silicon engineering, hardware design and verification, software, and operations. AWS Nitro, ENA, EFA, Graviton and F1 EC2 Instances, AWS Neuron, Inferentia and Trainium ML Accelerators, and in storage with scalable NVMe, are some of the products we have delivered, over the last few years"AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machine learning accelerators and the Trn1 and Inf1 servers that use them. This role is for a software engineer in the Machine Learning Applications (ML Apps) team for AWS Neuron. This role is responsible for development, enablement and performance tuning of a wide variety of ML model families, including massive scale large language models like GPT2, GPT3 and beyond, as well as stable diffusion, Vision Transformers and many more.In this role, you’ll be responsible for developing integrations with The Neuron SDK and frameworks such as TensorFlow, PyTorch, and MXNet. You’ll be planning and implementing new features, working with customers to create innovative solutions, and actively contributing to open source projects.About the teamInclusive Team CultureHere at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust.Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded professional and enable them to take on more complex tasks in the future.Basic qualifications:B.S. Computer Science or related technical fieldExperience with one or more of the following programming languages: C, C++, Java, or PerlExperience with deep learning frameworks: TensorFlow, PyTorch, and MXNetPreferred qualificationsMasters or PhD in Computer Science, Deep Learning, Artificial Intelligence, Applied Math, or related fieldExperience with distributed training for Deep learning and High Performance ComputingMeets/exceeds Amazon’s leadership principles requirements for this roleMeets/exceeds Amazon’s functional/technical depth and complexity for this role"BASIC QUALIFICATIONS- "B.S. Computer Science or related technical field- 1+ years Experience in ML Infrastructure and system- Experience with one or more of the following programming languages: C, C++, Java, or Perl- Experience with deep learning frameworks: TensorFlow, PyTorch, and MXNet " ...

Silicon Yield and Test Data Analysis Engineer, Annapurna Silicon Operations

We are seeking an experienced Silicon Yield Data Analysis Engineer with expertise in silicon test data analysis, automation and yield debug. AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for customers who require specialized security solutions for their cloud services.Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.AWS-Annapurna team develops the silicon used in our most advanced machine learning accelerator servers at cutting edge process nodes. These SOCs are used in massively scaled server clusters to provide best hardware platform for our customers to run training and inference workloads. Our final product is a server, not just the silicon, so you will find yourself stretching beyond traditional silicon product engineering boundaries and dealing with various system issues and data sets, providing ample opportunities to learn.Key job responsibilitiesThis experienced engineer will be responsible for:- Building our data systems which parse data from various ATE and system level test platforms and generating analysis which provide actionable information impacting key product metrics like yield, performance and test cost. - Developing analysis dashboards that are widely used across the organization and implementing early warning alert systems to warn the test owners about manufacturing excursions. - Interacting with ATE, Systems test teams and Silicon design teams to identify systematic manufacturing issues and work with other product engineers to debug and root cause. - Collaborating with various teams to develop innovative solutions to optimize yield and performance for our products. Strong analytical and problem solving skills, knowledge of semiconductor manufacturing process and expertise in statistical analysis are essential for success in this role. About the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. BASIC QUALIFICATIONS- Bachelors or Masters in Electrical or Computer engineering- 5+ years of experience working on semiconductor test data analysis and automation- 3+ years of experience conducting data analysis of foundry WAT data, ATE test data and/or system level test data using tools like JMP, Python etc. ...

Software Development Manager, AWS Neuron Machine Learning Distributed Training, ML Accuracy

AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machinelearning accelerators and the Trn1,2 and Inf1 servers that use them. As the SDM of Software Development for the Machine Learning Distributed Training team, you will be responsible for leading a strong team of engineers to help design and deploy the ML models. You will be responsible for setting up methodologies for accuracy measurement and baselining for the ML models we deliver. Develop generic solutions for training with low precision. Develop accuracy related reliability/scalability features. Responsible for the full development life cycle of our integrations and extensions for inference and training support in Pytorch, XLA, JAX as well as distributed training libraries like FSDP, DDP and others. Lead the way to ensure support for key ML functionality in a combined chip / software platform. Ensure the right thing is being built and delivered to customers.A successful candidate will have an established background in developing Machine Learning products with direct customer-facing experience, a strong technical ability and a motivation to achieve results. Experience in Machine Learning and software development is also a must.Key job responsibilitiesOur engineers collaborate across diverse teams, projects, and environments to have a firsthand impact on our global customer base. You’ll bring a passion for innovation, data, search, analytics, and distributed systems. You’ll also:- Solve challenging technical problems, often ones not solved before, at every layer of the stack. - Design, implement, test, deploy and maintain innovative software solutions to transform service performance, durability, cost, and security.- Build high-quality, highly available, always-on products.- Research implementations that deliver the best possible experiences for customers.A day in the lifeYou will work with the executive leadership and other senior management and technical leaders to define product directions and deliver them to customers. We build massive-scale distributed training and inference solutions. This organization builds the full stack of software, servers and chips to accelerate at the highest scale.About the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. BASIC QUALIFICATIONS- 3+ years of engineering team management experience- 7+ years of working directly within engineering teams experience- 3+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience- 8+ years of leading the definition and development of multi tier web services experience- Experience partnering with product or program management teams- Knowledge of engineering practices and patterns for the full software/hardware/networks development life cycle, including coding standards, code reviews, source control management, build processes, testing, certification, and livesite operations- 3+ Years of Deep Learning/Machine learning experience ...

Quality & Reliability Engineer, Annapurna Labs USA

AWS Utility Computing (UC) provides product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cutting-edge cloud computing offerings across the AWS portfolio. Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.Annapurna Labs, is looking for a Q&R Engineer.As an IC Quality and Reliability Engineer, you will play a crucial role in ensuring Annapurna Labs' semiconductor products meet customer demands. Your responsibilities will include establishing Design-for-Quality and Reliability (DFQR) during product design, developing reliability validation (RV) plans and DOEs, conducting product qualification and RV, performing failure analysis (FA), and executing reliability modeling, analysis, and reporting.In this role you will be completely exposed to the semi-conductor production flow of all Annapurna Labs products, semi-conductor testing methodologies and FAB process yield improvement activity.Key job responsibilities1) Lead DFQR and DFT activities and conduct risk assessment during the product design stage.2) Develop and execute product Q&R qualification plans, including HTOL, LU and ESD.3) Drive manufacturing vendors for the quality control and improvement, and to dispose the non-conforming materials due to manufacturing excursions. 4) Work closely with the product and test teams for the FT test improvement5) Drive device RMA process, FA, disposition, reporting and improvement.A day in the lifeWork closely with the vendors, internal design, hard ware and testing teams and the external labs to identify and close the quality and reliability gaps of our products. About the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future. Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. About AWS Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.BASIC QUALIFICATIONS1) B.Sc. or M.Sc. degree in electrical engineering, applied physics or related fields.2) Hands on working experience on the HTOL die qualification, including the BIB design, test condition setups and pattern generation, failure disposition etc. 3) 5+ years of working experience on Si device quality and reliability.4) Knowledge and experience in the Q&R fundamentals, data analysis and statistics fundamentals, Si device failure modes, VLSI IC design, Fab/assembly manufacturing, RMA, wafer and device testing and failure analysis. ...

2025 ASIC Design Verification Engineer Intern, Annapurna Labs

Amazon Web Services (AWS) internships are full-time (40 hours/week) for 12 consecutive weeks during summer. By applying to this position, your application will be considered for all locations we hire for in the United States.In Annapurna Labs we are at the forefront of hardware co-design not just in Amazon Web Services (AWS) but across the industry. The work we do is cutting-edge and internet-scale while also being deeply important to our customers. We design and build every component of our hardware and software to come together into products that our customers use for accelerated computing through Machine Learning acceleration and FPGA acceleration. If you are interested in "building a complete product" from inception to delighted customers, Annapurna is a fantastic choice.If this sounds exciting to you - come build the future with us!Responsibilities: • Develop a deep understanding of end customer requirements including software applications, use models, system architecture and SoC architecture/micro-architecture solutions.• Participate in logic design activities as part of Amazon's machine learning custom silicon solutions. • Develop and execute design automation mechanisms and flows.• Work with physical design teams to achieve performance and area requirements.Mentorship & Career GrowthOur team is dedicated to supporting new team members in an environment that celebrates knowledge sharing and mentorship. Projects and tasks are assigned in a way that leverages your strengths and helps you further develop your skillset.Inclusive Team CultureHere at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences.Work/Life HarmonyOur team puts a high value on work-life harmony. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility and encourage you to find your own balance between your work and personal lives.BASIC QUALIFICATIONS- Enrolled in a Bachelors’ degree program or higher in Electrical Engineering, Computer Engineering, or a related field with a graduation conferral date between December 2025 and September 2026- Programming experience in System Verilog or UVM ...

Sr. SDM, ML Acceleration, Neuron Inference Apps

Utility Computing (UC)AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for customers who require specialized security solutions for their cloud services.AWS Neuron is the complete software stack for the Inferentia and Trainium cloud-scale machine learning accelerators and the Trn1 and Inf1/Inf2 servers that use them. As the Sr. SDM for the Neuron Inference Customer Enablement Team, you will be responsible for leading a strong team of Managers and engineers to help optimize customer or open-source models for Inference performance (latency, throughput, scale) on various frameworks such as Pytorch, JAX, Tensorflow. You will be responsible for the full development life cycle of inference performance improvement and reliability/scalability features in our internal Neuronx_Distributed and Transformers_Neuronx Inference Libraries, as well as contribute to other popular open Inference Libraries. You will strive towards enabling our customers adopt and make Trainium and Inferentia devices as the first-class citizens for ML Acceleration workloads including both Text and Multimodal models. Lead the way to ensure support for key ML functionality in a combined chip / software platform. Ensure the right thing is being built and delivered to customers.A successful candidate will have an established background in delivering on ML roadmaps for demanding, fast-changing customers balancing across with internal Product roadmap. Delivered high-performant models using distributed inference libraries and frameworks. The ideal candidate should have a strong technical ability to work/deliver on a vertically integrated system stack that consists of a combinatorial matrix of hardware, frameworks, and workflows. Deep expertise in Framework integrations and development using C++ is a must along-with direct customer-facing experience and a strong motivation to achieve results. A day in the lifeYou will work with the executive leadership and other senior management and technical leaders to define product directions and deliver them to customers. We build massive-scale distributed training and inference solutions. This organization builds the full stack of software, servers and chips to accelerate at the highest scale.About the teamAbout AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. BASIC QUALIFICATIONS- 10+ years of engineering experience- 5+ years of engineering team management experience- 10+ years of planning, designing, developing and delivering consumer software experience- Experience partnering with product or program management teams- Experience managing multiple concurrent programs, projects and development teams in an Agile environment ...

Sr. Software Development Manager, AWS Neuron Machine Learning Distributed Training, Core Technologies and Infra (CoreTex)

Utility Computing (UC)AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for customers who require specialized security solutions for their cloud services.AWS Neuron is the complete software stack for the AWS Inferentia and Trainium (Neuron) cloud-scale machine learning accelerators.As a Sr. SDM of Software Development for the Machine Learning Distributed Training, Core Technologies and Infra org, you will be responsible for leading a strong teams of software engineers and managers to help design and deploy a software that enables ML workloads work seamlessly on these new products.A successful candidate will have an established background in developing Machine Learning products with direct customer-facing experience, a strong technical ability and a motivation to achieve results.This leader will manage the core technology and Infra org, directly managing several teams/managers focused on developing training libraries for PyTorch and Jax, Tooling for Large Scale Training Debug, Development Productivity and Benchmarking, Kernel development and Large scale training stability teams. The leader ensures support for key ML functionality in a combined chip / software platform and that the right thing is being built and delivered to customers. Experience in Machine Learning and software development is a must.Key job responsibilitiesResponsible for the full development life cycle of our integrations and extensions for training support in Pytorch, XLA, JAX as well as distributed training libraries like FSDP and others.In charge of. characterization, enablement and development of existing and future massive-scale ML models like Claude 3, GPT4 as well as ViT, Llava, Stable Diffusion3 and more.Lead the way to ensure support for key ML functionality in a combined chip / software platformEnsure the right thing is being built and delivered to customersA day in the lifeYou will work with the executive leadership and other senior management and technical leaders to define product directions and deliver them to customers. We build massive-scale distributed training and inference solutions. This organization builds the full stack of software, servers and chips to accelerate at the highest scale.About the teamAbout AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. BASIC QUALIFICATIONS- 10+ years of engineering experience- 5+ years of engineering team management experience- 10+ years of planning, designing, developing and delivering consumer software experience- Experience partnering with product and program management teams- Experience managing multiple concurrent programs, projects and development teams in an Agile environment ...

Sr. Software Engineer - AI/ML, AWS Neuron Distributed Training - Next Generation Training

AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machinelearning accelerators and the Trn1 and Inf1 servers that use them. This role for a senior software engineering responsible for driving and enabling the AWS Neuron software stack to support next generation capabilities such as newer model architectures (like Mamba and Mixture of Experts) and lower precision training techniques.This is a cross functional role where you will be responsible for -- Influencing Neuron roadmap to support newer model architectures and training techniques based on your technical assessment of state-of-the-art literature.- Working side by side with chip architects, applied scientists, compiler and runtime engineers to build performant support for the next generation models and training techniques (e.g. low precision training).This role requires experience on two dimensions -- Experience training large models using PyTorch/JAX is a must. FSDP, Deepspeed and other distributed training libraries are central to this and extending all of this for the Neuron based system is key.- Experience with profiling and building an understanding of systems bottlenecks and developing solutions (e.g. custom kernels) to improve performance is a must.About the teamAbout UsInclusive Team CultureHere at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust.Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded professional and enable them to take on more complex tasks in the future.BASIC QUALIFICATIONS- 5+ years of non-internship professional software development experience- 5+ years of programming with at least one software programming language experience- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience- 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience- Experience as a mentor, tech lead or leading an engineering team ...

Senior System Mfg Engineer, Annapurna Labs

AWS Utility Computing (UC) provides product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cutting-edge cloud computing offerings across the AWS portfolio.Annapurna Labs designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.Annapurna Labs part of AWS is seeking highly experienced Hardware Test Engineers, System Test Engineers, Manufacturing Test Engineers, and System Validation Engineers to enable high quality and efficient testing for the next generation of our cloud server platforms. Our success depends on our world-class infrastructure as we are handling massive scale and rapid integration of emergent technologies. As a member of the Machine Learning Acceleration team you will be responsible for the enablement and improvement of our system level manufacturing environment.You will work on developing tests that ensure functionality and capability of our custom hardware used in the AWS server fleet. You will develop expertise in the top-to-bottom functionality of the entire system as well as the intended customer applications and stress the system from a customer perspective. You will work together with other engineering teams to develop, maintain, and improve manufacturing test code for new and existing products. You’ll work with both high-level and low-level operating system constructs to create first-boot images for products in manufacturing. You will develop and maintain the deployment and distribution system to ensure that our manufacturing partners have access to appropriate versions of our software as soon as it’s available. You will respond to new issues raised by our manufacturing partners, analyze logs and failures, and then develop and deploy solutions to those issues. You will develop documentation as well as testing and debug procedures for our manufacturing partners to follow. Key job responsibilities- Enable and maintain mass volume production testing, working with our ODMs and JDMs to verify stable high-quality execution- Drive ODM and JDM deliveries to ensure production manufacturing quality- Identify and develop tests needed to enhance coverage and increase failure granularity.- Debug test hardware and software used for system level and server level mass production- Develop manufacturing tests to exercise hw components and collect data for large scale analysisAbout the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.BASIC QUALIFICATIONS- Bachelor's degree in Electrical Engineering or Computer Engineering- 4 + years of experience developing embedded systems code and hardware interfaces (I2C, UART, SPI, JTAG, PCIe, etc.)- Experience with Python, BASH or other scripting language- Experience analyzing yield and bin pareto- Experience working with system management components (BMC, BIOS, CPLD, etc)- Experience with debugging and root cause investigations using hardware schematics and tools such as logic analyzers- Strong background working in UNIX environments ...

Manufacturing Engineer - SW Tools/Infrastructure, Annapurna Labs

AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for customers who require specialized security solutions for their cloud services.Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.You will work on developing at-scale software solutions to manage the manufacturing environments at board and server level test. You will work together with other engineering teams to unify testing solutions between manufacturing and data center operations groups. You will develop and maintain the test deployment and distribution systems to ensure that our manufacturing partners have access to appropriate versions of our software as soon as it's available. You will respond to new issues raised by our manufacturing partners, analyze logs and failures, and then develop and deploy solutions to those issues. You will develop documentation as well as testing and debug procedures for our manufacturing partners to follow.Key job responsibilities• Develop, validate and deploy test infrastructure mechanisms into manufacturing environments• Manage scaled fleets of custom test equipment and ensure their maintenance• Lead and develop data gathering/parsing solutions to ensure manufacturing data is stored on AWS servers in a structured way.• Support internal lab infrastructure and manufacturing unification effortsAbout the teamAbout the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.BASIC QUALIFICATIONS• Bachelor's degree in Electrical Engineering, Computer Engineering, Computer Science or related field• Experience with Python, BASH or other scripting language• Developing network deployment infrastructure such as PXE• Experience working with system management components (BMC, BIOS, CPLD, etc)• Strong background working in UNIX environments ...