System Manufacturing Engineer

Amazon Web Services provides a highly reliable, scalable, low-cost infrastructure platform in the cloud that powers hundreds of thousands of businesses in 190 countries around the world. AWS has the broadest and deepest set of machine learning and AI services for our customers’ businesses.Annapurna Labs part of AWS is seeking highly experienced Hardware Test Engineers, System Test Engineers, Manufacturing Test Engineers, and System Validation Engineers to enable high quality and efficient testing for the next generation of our cloud server platforms. Our success depends on our world-class infrastructure as we are handling massive scale and rapid integration of emergent technologies. As a member of the Machine Learning Acceleration team you will be responsible for the enablement and improvement of our system level manufacturing environment.You will work on developing tests that ensure functionality and capability of our custom hardware used in the AWS server fleet. You will develop expertise in the top-to-bottom functionality of the entire system as well as the intended customer applications and stress the system from a customer perspective. You will work together with other engineering teams to develop, maintain, and improve manufacturing test code for new and existing products. You’ll work with both high-level and low-level operating system constructs to create first-boot images for products in manufacturing. You will develop and maintain the deployment and distribution system to ensure that our manufacturing partners have access to appropriate versions of our software as soon as it’s available. You will respond to new issues raised by our manufacturing partners, analyze logs and failures, and then develop and deploy solutions to those issues. You will develop documentation as well as testing and debug procedures for our manufacturing partners to follow. Key job responsibilities- Enable and maintain mass volume production testing, working with our ODMs and JDMs to verify stable high-quality execution- Drive ODM and JDM deliveries to ensure production manufacturing quality- Identify and develop tests needed to enhance coverage and increase failure granularity.- Debug test hardware and software used for system level and server level mass production- Develop manufacturing tests to exercise hw components and collect data for large scale analysisWe are open to hiring candidates to work out of one of the following locations:Austin, TX, USABASIC QUALIFICATIONS- Bachelor's degree in Electrical Engineering or Computer Engineering- 4 + years of experience developing embedded systems code and hardware interfaces (I2C, UART, SPI, JTAG, PCIe, etc.)- Experience with Python, BASH or other scripting language- Experience analyzing yield and bin pareto- Experience working with system management components (BMC, BIOS, CPLD, etc)- Experience with debugging and root cause investigations using hardware schematics and tools such as logic analyzers- Strong background working in UNIX environments ...

Software Engineer- AI/ML, AWS Neuron

AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machinelearning accelerators and the Trn1 and Inf1 servers that use them. This role is for a senior software engineer in the Machine Learning Applications (ML Apps) team for AWS Neuron. This role is responsible for development, enablement and performance tuning of a wide variety of ML model families, including massive scale large language models like GPT2, GPT3 and beyond, as well as stable diffusion, Vision Transformers and many more. The ML Apps team works side by side with chip architects, compiler engineers and runtime engineers to create , build and tune distributed training solutions with Trn1. Experience training these large models using Python is a must. FSDP, Deepspeed and other distributed training libraries are central to this and extending all of this for the Neuron based system is key.Key job responsibilitiesThis role will help lead the efforts building distributed training and inference support into Pytorch, Tensorflow, Jax using XLA and the Neuron compiler and runtime stacks. This role will help tune these models to ensure highest performance and maximize the efficiency of them running on the customer AWS Trainium and Inferentia silicon and the TRn1 , Inf1 servers. Strong software development and ML knowledge are both critical to this role.About the teamAbout UsInclusive Team CultureHere at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust.Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded professional and enable them to take on more complex tasks in the future.BASIC QUALIFICATIONS- 3+ years of non-internship professional software development experience- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience- Experience programming with at least one software programming language ...

Software Development Manager - ML Compiler, Annapurna Labs Neuron

The Product: AWS Machine Learning accelerators are at the forefront of AWS innovation. The Inferentia chip delivers best-in-class ML inference performance at the lowest cost in cloud. Trainium delivers the best-in-class ML training performance with the most teraflops (TFLOPS) of compute power for ML in the cloud. This is all enabled by cutting edge software stack, the AWS Neuron Software Development Kit (SDK), which includes ML compiler, runtime and natively integrates into popular ML frameworks, such as PyTorch, TensorFlow and JAX. AWS Neuron along with the Inferentia/Trainium chips are used at scale with customers and partners both internal and external to Amazon.The Team: The Amazon Annapurna Labs team is a responsible for building innovation in silicon and software for AWS customers. We are at the forefront of innovation by combining cloud scale with the world’s most talented engineers. Our team covers multiple disciplines including silicon engineering, hardware design and verification, software and operations. With such breadth of talent, there's opportunity to learn all of the time. We operate in spaces that are very large, yet our teams remain small and agile. There is no blueprint. We're inventing. We're experimenting. When you couple that with the ability to work on so many different products and services, it's a very unique learning culture.AWS Utility Computing (UC) provides product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cutting-edge cloud computing offerings across the AWS portfolio.Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.Key job responsibilitiesYou: We are seeking a talented SW Engineering Manager with strong leadership/ mentoring skills to join our Deep Learning Compiler Team. As a Manager III you be building and leading a team of talented compiler engineers developing graph level optimizations to map SOTA deep learning models efficiently to our accelerator capabilities. You’ll leverage your technical skills to collaborate with ML applications teams and applied scientists developing the models to accelerate research ideas and techniques to bring them to production to faster serve our customers to their performance goals. You will partner with Pytorch, OpenXLA and other open source communities to leverage the work in both directions for the benefit of the machine learning community. About the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. BASIC QUALIFICATIONS- 3+ years of engineering team management experience- 7+ years of working directly within engineering teams experience- 3+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience- Experience partnering with product or program management teams ...

Senior SoC Functional Modeling Engineer, Annapurna Labs, Machine Learning Accelerators

Custom SoCs (system-on-chips) are the brains behind AWS’s Machine Learning servers. Our team builds C++ functional models of these accelerator SoCs for use by internal partner teams. We’re looking for a Senior SoC Modeling Engineer to join the team and deliver new functional models, infrastructure, and tooling for our customers.As part of the ML accelerator modeling team, you will:- Develop and own SoC functional models end-to-end, including model architecture, integration with other model or infrastructure components, testing, and debug- Work closely with architecture, RTL design, design verification, emulation, and software teams- Innovate on the tooling you provide to customers, making it easier for them to use our SoC models- Drive model and modeling infrastructure performance improvements to help our models scale- Develop software which can be maintained, improved upon, documented, tested, and reusedAnnapurna Labs, our organization within AWS, designs and deploys some of the largest custom silicon in the world, with many subsystems that must all be modeled and tested with high quality. Our SoC model is a critical piece of software used in both our SoC development process and by our partner software teams. You’ll collaborate with many internal customers who depend on your models to be effective themselves, and you'll work closely with these teams to push the boundaries of how we're using modeling to build successful products.You will thrive in this role if you:- Are an expert in functional modeling for SoCs, ASICs, TPUs, GPUs, or CPUs- Are comfortable modeling in C++, and familiar with Python- Enjoy learning new technologies, building software at scale, moving fast, and working closely with colleagues as part of a small team within a large organization- Want to jump into an ML-aligned role, or get deeper into the details of ML at the hardware/system-levelAlthough we are building machine learning chips, no machine learning background is needed for this role. This role spans modeling of the ML and management regions of our chips, and you’ll dip your toes into both. You’ll be able to ramp up on ML as part of this role, and any ML knowledge that’s required can be learned on-the-job.This role can be based in either Cupertino, CA or Austin, TX. The team is split between the two sites, with a slight preference for CA, due to colocation with more customer teams.We're changing an industry. We're searching for individuals who are ready for this challenge, who want to reach beyond what is possible today. Come join us and build the future of machine learning!About the teamAWS Utility Computing (UC) provides product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cutting-edge cloud computing offerings across the AWS portfolio.Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. BASIC QUALIFICATIONS- 5+ years of non-internship professional experience writing functional or performance models- Experience programming with C++- Familiarity with SoC, CPU, GPU, and/or ASIC architecture and micro-architecture ...

Sr. Software Development Engineer, ML Infrastructure Team

Want to help drive the success of Machine Learning technologies at AWS? Do you have the skills and motivation to build automation that supports the success of peer teams? We want to talk to you! We seek a Sr. Software Development Engineer for the Machine Learning (ML) Infrastructure team to build the tools that are used to guarantee top performance of AWS ML and High Performance Computing (HPC) technologies developed by our organization. Bring your exceptional knowledge of CI/CD automation, ML and HPC benchmarks and applications to bear on the cutting-edge software we develop. Join us as we expand the AWS offerings for AI, including Trainium, Graviton and the Elastic Fabric Adapter (EFA). Key job responsibilitiesBe the lead engineer on a team that builds and maintains the infrastructure that monitors and reports on functionality and performance of massive testing workloads run at scale. Use internal Amazon CI/CD tools, Linux, and public AWS products to automate the delivery of our software to customers, saving developer time. Write Python code that effortlessly spools up large clusters and runs benchmarks and applications for ML and HPC workloads. Use AWS Managed Grafana and Athena to digest the massive amount of performance data generated by these workloads and create dashboards for developers and stakeholders. Invent automatic mechanisms to alert developers to functional and performance regressions so they never reach reach customers. Manage the complexity of infrastructure that covers many instance types, software stacks, Linux operating systems, cutting-edge releases and make it easy to evolve. A day in the lifeYou use Typescript and the CDK to ensure all infrastructure setup is code (IoC), reviewed and committed to automated pipelines. You find innovative ways to schedule work using SLURM and Active Directory, supporting multiple teams of developers while keeping cluster costs down. You write excellent documents that communicate clearly to peers, stakeholders, and leadership what the team is doing and the plan for future work. You draw on your experience as a software developer to mentor other engineers. About the teamWe are part of Annapurna Labs, a subsidiary in AWS that builds software and hardware that make ML on EC2 work. Our organization is a dedicated group of innovators that have invented new networks, new silicon, new software suites, and combined those to entice customers to move immense ML and HPC workloads to the cloud. The ML Infrastructure team is laser focused on making AWS the best and most cost-effective place for customers to do AI at scale. Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.BASIC QUALIFICATIONS- 5+ years of non-internship professional software development experience- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience- 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience- Experience as a mentor, tech lead or leading an engineering team- 5+ years experience coding in Python, Typescript, CDK- Experience developing highly automated CI/CD pipelines (Jenkins preferred)- Proficiency working with Linux, ideally including Containers- Experience with Clustered ML or HPC Applications or Benchmarks, especially using SLURM or in AWS ...

Software Engineer, Annapurna Labs, ML Accelerator Embedded Firmware

The Machine Learning Platform Software Team is looking for a Software Engineer who wants to develop industry leading acceleration platforms with an affinity towards efficient, robust, and highly available systems.AWS Utility Computing (UC) provides product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cutting-edge cloud computing offerings across the AWS portfolio.Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.Key job responsibilitiesYou will develop software that initializes machine learning accelerators and monitor server health by collecting sensor data, logs, and device metrics.- Evaluate and optimize firmware performance- Develop tests to validate firmware- Develop systems software, kernel drivers- Build data collection and aggregation systems at AWS scale- Build error detection and recovery mitigation systems at AWS scaleA day in the lifeThe team is focused on our organization's ability to scale. Automation, software best practices, and good architectural abstractions are key to this endeavor.You will have the opportunity to develop software in a highly cross-functional environment, working side by side with software and hardware teams to optimize customer experience. You will be responsible for building scalable software systems that can be tested throughout the stages of product development including manufacturing and production. You will leverage automation, continuous integration, and fleet metrics to deploy and monitor your changes.About the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. BASIC QUALIFICATIONS- 3+ years of non-internship professional software development experience- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience- Experience programming with at least one software programming language ...

Software Development Engineer, NVMe / Storage, Annapurna Labs

AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for customers who require specialized security solutions for their cloud services.Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.The AWS Cloud Storage offers a complete range of hardware and software for customers to store, access, govern, and analyze their data, reducing costs, increasing agility, and accelerating innovation.AWS Cloud Storage team is hiring firmware engineers with a background in NVMe memory devices to solve our customers toughest problems.As a firmware engineer on the AWS Cloud Storage team, you will be a thought leader at the forefront of consumer storage and networking solutions. You should feel equally comfortable in server and embedded environments, possess a deep understanding of computer architecture, Linux OS, and programming sophisticated embedded devices.Every day you will be working alongside brilliant engineers and leaders who obsess about performance, availability, scalability and durability of customer data, with the ambitious goal of improving AWS' industry-leading product.Key job responsibilities- Research, design, implement Firmware to support NVMe subsystem, DMA and Crypto through specialized HW units in Nitro Cards.- Debug complex, system-level, multi-component issues across multiple layers from kernel to application- Profile system performance activity and drive optimizations across our software stack- Deliver production-quality code and support its operation in the production environmentAbout the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.BASIC QUALIFICATIONS- Experience as a mentor, tech lead or leading an engineering team- 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience- 5+ years of experience with programming language: C or C++- 5+ years of experience in embedded Linux systems or NVMe Subsystem or other Storage ...

System Development Engineer, Annapurna Labs Infrastructure

Annapurna Labs, is an organization within AWS, that is responsible for building innovation in silicon and software for AWS customers. With development centers in the U.S. and Israel, Annapurna is at the forefront of innovation by combining cloud scale with the world’s most talented engineers. The Annapurna team covers multiple disciplines including silicon engineering, hardware design and verification, software, and operations. Because of Annapurna's breadth of talent, we’ve been able to improve AWS cloud infrastructure in networking and security with products such as AWS Nitro, Enhanced Network Adapter (ENA), and Elastic Fabric Adapter (EFA), in compute with AWS Graviton and F1 EC2 Instances, in machine learning with AWS Neuron, Inferentia and Trainium ML Accelerators, and in storage with scalable NVMe.As part of Annapurna Labs Infrastructure team, you’ll have the opportunity to invent the next generation of cloud computing infrastructure. You’ll experience what it’s like to work in a fast-paced, innovative, and start-up like environment filled with some of the brightest minds in the industry. The work we do is not only cutting-edge and internet-scale but also deeply important to our customers. The team's infrastructure is used to design and build every component of our hardware and software to come together into products that our customers use for accelerated computing: either Machine Learning acceleration, or FPGA acceleration. As member of the Cloud-Scale Machine Learning Acceleration Infrastructure team you’ll be responsible for designing and supporting enterprise-scale infrastructure. Infrastructure is the hardware, software, and networks used to develop, test, monitor, control, or support internal engineering teams. You will be responsible for the design, implementation and quality of services you deliver. The ideal candidate will draw upon technical background, critical thinking, and problem-solving skills that provides innovative solutions to support development teams. The candidate should be open to new challenges, extremely good at multi-tasking, innovative, creative, self-directed and a great team player. Candidates should drive continuous process improvement, and collaborate effectively with cross-functional teams to solve problems and implement new solutions. You’ll provide leadership in the application of new technologies to large scale deployments in a continuous effort to deliver a world-class customer experience. This is a fast-paced, intellectually challenging position, and you’ll work with thought-leaders in multiple technology areas. You’ll have high standards for yourself and everyone you work with, and you’ll be constantly looking for ways to improve our products' performance, quality and cost. We’re changing an industry, and we want individuals who are ready for this challenge andwant to reach beyond what is possible today. If you want a career that makes an impact, allows you to invent, and have first-hand visibility into how your implementations delight customers, then we have a role for you. If you're interested in being on a team that is "building a complete product" from inception to delighted customers, Annapurna is a fantastic choice.Join us in creating the most advanced Machine Learning Accelerators in the world!Key job responsibilitiesThe Systems Development Engineering role involves developing a broad range of skills. The engineer leverages their Linux skills to troubleshoot, innovate fixes and workarounds, keep software up-to-date and provide data and metrics that help manage our services. They draw on their networking knowledge to design networks, develop network monitoring and troubleshoot network connectivity issues. They communicate clearly and collaborate with others to deliver results. They are self-starters, comfortable dealing with ambiguity and change. They are customer-obsessed, always looking to understand customer pain points and find resolutions quickly and completely.You will need to lead across teams to develop and execute in-depth infrastructure plans that enables your customers, the engineering teams doing the development of the Machine Learning Acceleration product family. You will dive deep to solve critical infrastructure issues involving networking, high performance compute clusters, infrastructure automation of hardware/software/firmware testing, and ASIC/EDA development. You will influence within your team, your customers and AWS service teams to help drive and develop the technical implementation for overall infrastructure designs. You will identify and implement process improvements which improve your team’s agility and operations, including improvements to design, automation, development, test or operations. You will define new mechanisms that execute system health monitoring, diagnostics, repair, and automation. You will develop, document and update operational runbooks as you participate in on-call rotations. A day in the lifeEach day you will work with the best engineers in the industry developing Machine Learning Accelerators. Work backwards from your customers to develop cloud and on-premise infrastructure requirements.Deliver to your customers the on-premises infrastructure that meets their needs.Take ownership for testing, deployments and measuring infrastructure healthSupport silicon development workflows, including: ATE testers, Emulators and Lab debug equipmentDefine building infrastructure requirements for labs and server rooms. Act as liaison to contractors and vendors for infrastructure.Measure your customer’s productivity and take responsibility for the quality of your serviceOn-site in Austin, Texas, you will be apart of the team that develops custom silicon as the owner of the infrastructure that enables this innovation. Take a look inside our labs to see what you will learn at Annapurna Labs: https://www.aboutamazon.com/news/aws/take-a-look-inside-the-lab-where-aws-makes-custom-chipshttps://youtu.be/rViVFrQg4HkBASIC QUALIFICATIONS- Experience programming with at least one modern language such as C++, C#, Java, Python, Golang, PowerShell, Ruby- 1+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience- 1+ years of non-internship professional software development experience- 3+ years of systems development in an IT or data center environment experience- 3+ years of deploying and operating in a Linux/Unix environment experience- BS degree in Engineering or related field with 3+ years of IT, DevOps or systems infrastructure experience- Experience of network fundamentals (DNS, DHCP, TCP/IP, routing, switching, HTTP)- Experience with debugging complex issues with HW/SW, networking and storage systems ...

Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training

AWS Utility Computing (UC) provides product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cutting-edge cloud computing offerings across the AWS portfolio.Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.AWS Neuron is the complete software stack for the AWS Inferentia (Inf1/Inf2) and Trainium (Trn1), our cloud-scale Machine Learning accelerators. This role is for a senior machine learning engineer in the Distribute Training team for AWS Neuron, responsible for development, enablement and performance tuning of a wide variety of ML model families, including massive-scale Large Language Models (LLM) such as GPT and Llama, as well as Stable Diffusion, Vision Transformers (ViT) and many more.The ML Distributed Training team works side by side with chip architects, compiler engineers and runtime engineers to create, build and tune distributed training solutions with Trainium instances. Experience with training these large models using Python is a must. FSDP (Fully-Sharded Data Parallel), Deepspeed and other distributed training libraries are central to this and extending all of this for the Neuron based system is key.Key job responsibilitiesYou will help lead the efforts building distributed training support into Pytorch, Tensorflow using XLA and the Neuron compiler and runtime stacks. You will help tune these models to ensure highest performance and maximize the efficiency of them running on the custom AWS Trainium and Inferentia silicon and the Trn1, Inf1/2 servers. Strong software development and Machine Learning knowledge are both critical to this role.About the teamAnnapurna Labs was a startup company acquired by AWS in 2015, and is now fully integrated. If AWS is an infrastructure company, then think Annapurna Labs as the infrastructure provider of AWS. Our org covers multiple disciplines including silicon engineering, hardware design and verification, software, and operations. AWS Nitro, ENA, EFA, Graviton and F1 EC2 Instances, AWS Neuron, Inferentia and Trainium ML Accelerators, and in storage with scalable NVMe, are some of the products we have delivered, over the last few years.About the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.BASIC QUALIFICATIONS- Bachelor's degree in computer science or equivalent- 5+ years of non-internship professional software development experience- 5+ years of programming with at least one software programming language experience- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience- 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience- Experience as a mentor, tech lead or leading an engineering team- Experience in machine learning, data mining, information retrieval, statistics or natural language processing ...

Software Dev Engineer Intern - ML Chip Architect (Fall), Annapurna ML

"Utility Computing (UC) AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS. Within AWS UC, Amazon Dedicated Cloud (ADC) roles engage with AWS customers who require specialized security solutions for their cloud services."Amazon Web Services (AWS) internships are full-time (40 hours/week) for 12 consecutive weeks during summer. By applying to this position, your application will be considered for all locations we hire for in the United States.We are on the lookout for the curious, those who think big and want to define the world of tomorrow. At Amazon, you will grow into the high impact, visionary person you know you’re ready to be. Every day will be filled with exciting new challenges, developing new skills, and achieving personal growth. How often can you say that your work changes the world? At Amazon, you’ll say it often. Join us and define tomorrow.Are you a student interested in computer architecture, machine learning, performance optimization, or application-specific silicon design? We are looking for engineers capable of using a variety of domain expertise to invent, design, evangelize, and implement state-of-the-art solutions for never-before-solved problems.A successful candidate will be a self-starter comfortable with ambiguity, strong attention to detail, and the ability to work in a fast-paced, ever-changing environment.Key job responsibilitiesAs a member of the ML chip architecture team, you will be responsible for accelerating large-scale machine learning workloads holistically across algorithms, software, and hardware, as part of our continuous effort to deliver a world-class customer experience. You will be the interface between SW and HW teams, bridging the gap between silicon capabilities and application requirements. Finally, you will have a chance to drive performance improvements on existing AWS hardware platforms, as well as propose, evaluate, and develop hardware optimizations targeting future generations of our products.If this sounds exciting to you - come build the future with us!Internal job descriptionThis requisition is for external candidates or campus employee referrals only, and is not eligible for internal transfers.Due to the volume of referrals and external applicants received, ECT team is unable to provide status updates on individual applicants. Please help us in setting expectations with our candidates and encourage them to reference their application portal for the most up to date information on their application.About the team"Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Mentorship and Career growthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. "BASIC QUALIFICATIONS- Currently working towards a Bachelor’s degree, or higher, in Computer Science, Computer Engineering, Electrical Engineering, Machine Learning, or related fields, with an expected conferral date between December 2025 and September 2027.- Knowledge or past experience in computer architecture and silicon design.- Experience with C++, Rust, or other programming languages, as well as with Python, or similar scripting language. ...

2025 ASIC Formal Verification Engineer Intern, Annapurna Labs

Amazon Web Services (AWS) internships are full-time (40 hours/week) for 12 consecutive weeks during summer. By applying to this position, your application will be considered for all locations we hire for in the United States.In Annapurna Labs we are at the forefront of hardware co-design not just in Amazon Web Services (AWS) but across the industry. The work we do is cutting-edge and internet-scale while also being deeply important to our customers. We design and build every component of our hardware and software to come together into products that our customers use for accelerated computing through Machine Learning acceleration and FPGA acceleration. If you are interested in "building a complete product" from inception to delighted customers, Annapurna is a fantastic choice.If this sounds exciting to you - come build the future with us!As a member of the Machine Learning Acceleration team you will be responsible for defining and checking the specification of critical hardware modules using formal methods and industrial model checkers. You will be a part of a world class pre-silicon hardware design team. The job entails understanding requirements of specific hardware blocks and writing functional descriptions of correct behavior. Specifications are written in hardware description languages like Verilog and System Verilog Assertions (SVA). Using industrial model checkers you will then learn techniques for proving the hardware being designed matches the modeled specification. Advances proof techniques, such as modeling abstractions, and inductive reasoning will be utilized. Automation techniques and scripting flows are also leveraged to accelerate proof techniques. Mentorship & Career GrowthOur team is dedicated to supporting new team members in an environment that celebrates knowledge sharing and mentorship. Projects and tasks are assigned in a way that leverages your strengths and helps you further develop your skillset.Inclusive Team CultureHere at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences.Work/Life HarmonyOur team puts a high value on work-life harmony. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility and encourage you to find your own balance between your work and personal lives.BASIC QUALIFICATIONS- Currently enrolled in a Bachelor’s degree program or higher in Electrical Engineering, Computer Engineering, Computer Science or related fields with a graduation conferral date between December 2025 and September 2026- Completed coursework or prior internship experience with formal methods (SW/HW)- Coursework or prior internship experience in the basics of computer architecture. ...

Sr. TPM AWS, Annapurna Labs AI Chips GTM, Annapurna Labs

Do you want to help define the future of AWS AI Chips (AWS Inferentia/Trainium) Go to Market (GTM)? You will be part of the core worldwide AWS AI Chips Business and GTM team, driving our most strategic customer and industry partnership engagement programs. Our customers build and deploy GenAI applications on our Chips across many industry segments. You will collaborate with Neuron engineering, Business and product leaders to ensure we meet and exceed our customer expectations. This work includes managing relationship with leading ML frameworks and library providers, and working with cross teams in AWS to accelerate customer adoption of AWS Inferentia and Trainium based instances.At Annapurna, our TPM role is focused on working amongst multiple teams to deliver functionality that those teams are responsible for. The TPM is the “glue” that holds teams together and maintains a bird’s eye view over what those teams are delivering and how those deliverables fit together. This involves two aspects, both equally important: The “Technical” part of “TPM” requires identifying dependencies and technical risks that affect the various teams represented. The “Program Manager” part of “TPM” involves scheduling, creating milestones, and reporting status. As part of your project and program ownership, you focus on the larger business and technology picture (i.e., customer experience, processes, opportunities, and/or problems to be solved). You deeply understand the business and technical requirements of the solutions being built and drive the right outcomes. You take the time to understand the needs of engineers (who have to build what, maintain, and extend features for the life of the project). You help your customers and the engineering teams make appropriate trade-offs by considering the larger picture (e.g., business goals, user experience, dependency impacts, efficiency, availability). You partner with technical managers to secure resources, scope technical efforts, set project priorities, milestones, and drive delivery. You determine if success metrics are in place, and if not work to define them. As part of this role, you will work directly with external technology providers, customers, and partners. To be successful, you have a solid understanding of the design approaches and industry technologies utilized. You make connections (to people and/or technologies) and make sure the right people are part of the conversation. For example, you are able to recognize when a proposed design is too complex or risky (and arrange additional reviews by senior engineers). You will need to be adept at interacting, communicating, and partnering with teams within AWS (product, solutions architecture, sales, marketing, and professional services) and externally with customers and 3rd party model providers.Key job responsibilitiesLead internal/external cross-team technical projects to accelerate adoption of AWS AI ChipsManage complex customer and partner deliveriesDrive scale with external partnersProvide technical direction with limited assistance Communicate project status to the executive team About the teamThe Amazon Annapurna Labs team is responsible for building innovation in silicon and software for our AWS customers. We are at the forefront of innovation by combining cloud scale with the world’s most talented engineers. Our team covers multiple disciplines including silicon engineering, hardware design, software and business development. Along with the AI Chips Inferentia and Trainium, Annapurna Labs has delivered advancements in Networking with AWS Nitro, Amazon’s first ARM based instances with AWS Graviton and first FGPA instances in the cloud.BASIC QUALIFICATIONS- * 5+ years of TPM within a large engineering org, or relevant technical partnership management- * 3+ proven experience in driving the delivery of large technology programs or products.- * 3+ years of software engineering experience- * Design knowledge and expertise to drive technical decisions and anticipate technical risks- * Experience managing programs across cross functional teams, building mechanisms of scale- * Working knowledge of one or more ML Frameworks (e.g., PyTorch, JAX) and ML methods including GenAI foundation models, computer vision models, multimodal techniques- * Bachelor's degree or equivalent ...

SDM, ML Acceleration, Neuron Frameworks

Utility Computing (UC)AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for customers who require specialized security solutions for their cloud services.AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machinelearning accelerators and the Trn1 and Inf1 servers that use them. As the Software Development Manager for the ML Applications - Framework team, you will be responsible for leading a strong team of engineers to help design and deploy ML applications/usecases on various frameworks such as Pytorch, JAX, Tensorflow. You will be responsible for the full development life cycle of our integrations and extensions for inference and training support in Pytorch, XLA, Tensorflow and JAX. Develop reliability/scalability features and performance updates in the Neuron ML Frameworks as well as contribute to other popular open Frameworks to enable them make Trainium and Inferentia devices as the first-class citizens for ML Acceleration. Lead the way to ensure support for key ML functionality in a combined chip / software platform. Ensure the right thing is being built and delivered to customersA successful candidate will have an established background in developing ML frameworks using Pytorch on XLA devices and corresponding framework technology components such as Torch-XLA, Open-XLA project integrations using PJRT or StableHLO, familarity of OpenXLA compilers. The ideal candidate should have a strong technical ability to work/deliver on a vertically integrated system stack that consists of a combinatorial matrix of hardware, frameworks, and workflows. Deep expertise in Framework integrations and development using C++ is a must along-with direct customer-facing experience and a strong motivation to achieve results. A day in the lifeYou will work with the executive leadership and other senior management and technical leaders to define product directions and deliver them to customers. We build massive-scale distributed training and inference solutions. This organization builds the full stack of software, servers and chips to accelerate at the highest scale.About the teamAbout AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. BASIC QUALIFICATIONS- 3+ years of engineering team management experience- 7+ years of working directly within engineering teams experience- 3+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience- Experience partnering with product or program management teams ...

Sr. Software Development Engineer, Annapurna Labs

AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for customers who require specialized security solutions for their cloud services.Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.The AWS Cloud Storage offers a complete range of hardware and software for customers to store, access, govern, and analyze their data, reducing costs, increasing agility, and accelerating innovation.AWS Cloud Storage team is hiring firmware engineers with a background in NVMe memory devices to solve our customers toughest problems.As a firmware engineer on the AWS Cloud Storage team, you will be a thought leader at the forefront of consumer storage and networking solutions. You should feel equally comfortable in server and embedded environments, possess a deep understanding of computer architecture, Linux OS, and programming sophisticated embedded devices.Every day you will be working alongside brilliant engineers and leaders who obsess about performance, availability, scalability and durability of customer data, with the ambitious goal of improving AWS' industry-leading product.Key job responsibilities- Research, design, implement Firmware to support NVMe subsystem, DMA and Crypto through specialized HW units in Nitro Cards.- Debug complex, system-level, multi-component issues across multiple layers from kernel to application- Profile system performance activity and drive optimizations across our software stack- Deliver production-quality code and support its operation in the production environmentAbout the teamOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.About AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.BASIC QUALIFICATIONS- Experience as a mentor, tech lead or leading an engineering team- 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience- 5+ years of experience with programming language: C or C++- 5+ years of experience in embedded Linux systems or NVMe Subsystem ...

Sr. Technical Product Manager - AWS Neuron, Annapurna Labs

AWS Utility Computing (UC) provides product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Additionally, this role may involve exposure to and experience with Amazon's growing suite of generative AI services and other cutting-edge cloud computing offerings across the AWS portfolio. Annapurna Labs (our organization within AWS UC) designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world.The Product: AWS Neuron is the software of Trainium and Inferentia, the AWS Machine Learning chips. Inferentia delivers best-in-class ML inference performance at the lowest cost in the cloud to our AWS customers. Trainium is designed to deliver the best-in-class ML training performance at the lowest training cost in the cloud, and it’s all being enabled by AWS Neuron. Neuron is cutting edge software including an ML compiler and native integration into popular ML frameworks. Our products are being used at scale with external customers like Anthropic and Databricks as well as internal customers like Alexa, Amazon Bedrocks, Amazon Robotics, Amazon Ads, Amazon Rekognition and many more.The Team: the Amazon Annapurna Labs team is responsible for building innovation in silicon and software for our AWS customers. We are at the forefront of innovation by combining cloud scale with the world’s most talented engineers. Our team covers multiple disciplines including silicon engineering, hardware design, software and operations. Because of our teams breadth of talent, we have been able to improve AWS cloud infrastructure in high-performance machine learning with AWS Neuron, Inferentia and Trainium ML chips, in networking and security with products such as AWS Nitro, Enhanced Network Adapter (ENA), and Elastic Fabric Adapter (EFA), and in computing with AWS Graviton and F1 EC2 instances.You: We’re seeking a hands-on product manager who have a passion for machine learning and developer-focused cloud software and hardware products, and are willing to work hard for their customers. Product Management in Annapurna is an opportunity to collaborate with engineering, design, and sales/business development teams to create state of the art machine learning cloud services.In your role as Neuron product manager, you will be in charge of the customer voice within our team, tirelessly working closely with multiple internal teams and customers to develop the new Neuron features for training and inference, and support our growing eco-system. Your mission will be to ensure our customers find new cutting edge offerings pleasing and useful to achieve their aggressive business goals.As a member of the Annapurna team you’ll dive on our technology and work closely with our internal teams, engage with leading developers and customers, and help support Annapurna's products scale to large deployments. We are looking for self-driven individuals who can collaborate with others, and that will continuously work to deliver a world-class customer experience. This is a fast-paced, hands-on, intellectually challenging position, and you’ll work with thought leaders in multiple business and technology areas.You’re a good fit if (a) you can think big and are able to break down the big picture into measurable goals, (b) you have an instinctive understanding of what makes products successful and easy to deploy, and can raise the bar on delivering features beneficial to our customer, (c) you can dive into technical details and ask engineers insightful questions about the services that you own, and finally (d) you can think long-term, can balance conflicting interests and priorities, and converge on outcomes that earn trust and customer loyalty.In this role you will: - Work directly with software engineering teams to define and execute on new features. - Produce clear, concise documents such as functional or technical specifications. - Write user stories and perform user acceptance testing. - Anticipate bottlenecks, manage risk and escalations, and balance business needs against technical constraints. - Find opportunities to innovate on behalf of our customers, design features related to these opportunities, and always push to improve our product user experience. - Drive feature discussions with customers, engineering, and other stakeholders. - Stay connected with industry counterparts and gain insights on technology trends.About the teamAbout the Team Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future. Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. About AWS Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.BASIC QUALIFICATIONS- Bachelor's degree in computer science, engineering, analytics, mathematics, statistics, IT or equivalent- Experience owning/driving roadmap strategy and definition- Experience with feature delivery and tradeoffs of a product- Experience contributing to engineering discussions around technology decisions and strategy related to a product- Experience in representing and advocating for a variety of critical customers and stakeholders during executive-level prioritization and planning- Experience in technical product management, program management or engineering- 10+ years of industry experience, with 5+ years in a technical product management or customer facing roles. Knowledge in full product life cycles, including technical specifications, development, go-to-market, pricing, customer facing presentations and collaboration with engineering and sales teams.- Solid knowledge in computer architecture fundamentals, operating systems and cloud infrastructure engineering concepts- Ability to work in a fast paced and agile work environment with demonstrated collaboration skills and demonstrated strengths in driving through complexity, ambiguity, and unknowns in early-stage programs- Proven experience in delivering modern software products, preferably collaborative open-source projects ...

Design Verification Engineer

Amazon Web Services provides a highly reliable, scalable, low-cost infrastructure platform in the cloud that powers hundreds of thousands of businesses in 190 countries around the world. We have data center locations in the U.S., Europe, Singapore, and Japan, and customers across all industries. We are seeking experienced Hardware Design Engineers to build the next generation of our cloud server infrastructure. Our success depends on our world-class server infrastructure; we’re handling massive scale and rapid integration of emergent technologies.As a member of the Cloud-Scale Machine Learning Acceleration team you’ll be responsible for the design and optimization of hardware in our data centers .Some of your responsibilities will include verifying/validating that our hardware and software solutions achieve their desired functionality, developing and executing multi-faceted verification/validation plans, and measuring the teams progress towards our ambitious customer metrics.This is a fast-paced, intellectually challenging position, and you’ll work with thought-leaders in multiple technology areas. You’ll have high standards for yourself and everyone you work with, and you’ll be constantly looking for ways to improve our products' performance, quality and cost.We’re changing an industry, and we want individuals who are ready for this challenge and want to reach beyond what is possible today.About UsInclusive Team CultureHere at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust.Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future.BASIC QUALIFICATIONS - BS Degree or Higher in EE or CS or CE.- 3+ years of design verification experience using System Verilog and UVM- 3+ years of experience in testbench development including: stimulus, checkers, assertions and coverage ...

Sr Software Engineer, Graviton Software, Annapurna Labs

The AWS Graviton Software team is looking for Software Engineers to develop tools to drive the optimization of open source and internal applications. Annapurna Labs, part of AWS, designed Graviton as a strategic initiative to improve how software works at Amazon scale. Graviton is Arm-based CPU that delivers better performance, lower price and lower carbon footprint over comparable x86-based instances. Key job responsibilitiesAs a Graviton Software Developer, you will:- Build software framework for tools to analyze performance of hardware and software components.- Leverage existing perf tools like sysstat, sysctl, perf, etc.- Automate the collection and analysis of processor, OS and workload performance data.- Help external AWS customers and various internal AWS services like AWS Lambda, Elastic Map Reduce, ElastiCache, and RDS to troubleshoot bottlenecks and to optimize the architecture, algorithms, and deployment on Graviton.- Work on Linux and other open source code, improving it and contributing the changes back to the community.- Play an instrumental role in driving the AWS roadmap to deliver cost-effective and performant computing systems.- Use and further develop your deep knowledge in areas including design, implementation, and data analysis.- Have the opportunity to lead the innovation and deliver software that powers the world largest cloud provider.- If you are already an open source developer or passionate about it, you will be able to continue your passion and contribute back to the community across all those projects. A day in the lifeHere at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust.Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded professional and enable them to take on more complex tasks in the future.About the teamThe Graviton Software organization ports, optimizes, and develops software to drive down the cost of adoption and operation for the AWS Graviton instances. We pro-actively improve and upstream open source software, including Linux kernel, operating system, compilers, libraries, and applications. We help internal and external customers to troubleshoot and fix performance bottlenecks that prevent them from using Graviton. We develop tools to automate most of the heavy lifting and maintain publicly available documentation: Graviton Developer Guide on GitHubBASIC QUALIFICATIONS- 5+ years of non-internship professional software development experience- 5+ years of programming with at least one software programming language experience- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience- 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience ...

Machine Learning - Compiler Engineer II, Annapurna Labs

The Product: AWS Machine Learning accelerators are at the forefront of AWS innovation and one of several AWS tools used for building Generative AI on AWS. The Inferentia chip delivers best-in-class ML inference performance at the lowest cost in cloud. Trainium will deliver the best-in-class ML training performance with the most teraflops (TFLOPS) of compute power for ML in the cloud. This is all enabled by cutting edge software stack, the AWS Neuron Software Development Kit (SDK), which includes an ML compiler, runtime and natively integrates into popular ML frameworks, such as PyTorch, TensorFlow and MxNet. AWS Neuron and Inferentia are used at scale with customers like Snap, Autodesk, Amazon Alexa, Amazon Rekognition and more customers in various other segments.The Team: As a whole, the Amazon Annapurna Labs team is responsible for silicon development at AWS. The team covers multiple disciplines including silicon engineering, hardware design and verification, software and operations.The AWS Neuron team works to optimize the performance of complex neural net models on our custom-built AWS hardware. More specifically, the AWS Neuron team is developing a deep learning compiler stack that takes neural network descriptions created in frameworks such as TensorFlow, PyTorch, and MXNET, and converts them into code suitable for execution. As you might expect, the team is comprised of some of the brightest minds in the engineering, research, and product communities, focused on the ambitious goal of creating a toolchain that will provide a quantum leap in performance.You: Machine Learning Compiler Engineer II on the AWS Neuron team, you will be supporting the ground-up development and scaling of a compiler to handle the world's largest ML workloads. Architecting and implementing business-critical features, publish cutting-edge research, and contributing to a brilliant team of experienced engineers excites and challenges you. You will leverage your technical communications skill as a hands-on partner to AWS ML services teams and you will be involved in pre-silicon design, bringing new products/features to market, and many other exciting projects. A background in Machine Learning and AI accelerators is preferred, but not required.About the teamAbout UsInclusive Team CultureHere at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives. Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded professional and enable them to take on more complex tasks in the future.BASIC QUALIFICATIONS- 3+ years of non-internship professional software development experience- 2+ years of experience architecting and optimizing compilers- Proficiency with 1 or more of the following programming languages: C++ (preferred), C, Python ...

Sr. Software Development Manager, AWS Neuron ML Frameworks

Utility Computing (UC)AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for customers who require specialized security solutions for their cloud services.Join the team that builds AWS Neuron, the software stack that runs all the leading AI models on the AWS Inferentia and Trainium cloud-scale machine learning accelerators.As the Sr. Software Development Manager of ML Frameworks & Ecosystems you will lead the team that develops and extends Neuron support for leading ML frameworks including PyTorch and JAX. You will develop and deliver the framework plugins and libraries that enable a great user experience for developing and optimizing models on Trainium and Inferentia accelerators, and work closely with the open source ecosystem to drive improvements to enable models port seamlessly across accelerators.You will work closely with the Neuron compiler, training and inference optimization teams, ML model developers and users to deliver best performance on top AI models.You should have an established background in AI Frameworks and Machine Learning infrastructure such as PyTorch, PyTorch/XLA, and JAX. Experience with OpenXLA is a significant plus. You should have demonstrated ability to work with open source communities to influence future community direction, a strong technical understanding and a motivation to achieve results. Key job responsibilitiesResponsible for the full life cycle of developing and releasing JAX and Pytorch framework support for AWS Neuron.Understand current and future directions of ML framework development, with a focus on enabling and optimizing the latest features of ML frameworks.Work closely with the PyTorch and JAX community to actively drive the future directions to improve the experience of developing and optimizing ML models across multiple platforms.Develop and grow your team to meet the ever-expanding needs of the AI software ecosystem.A day in the lifeYou will work with the executive leadership and other senior management and technical leaders to define strategic directions and deliver new capabilities to ML model developers and users. You will work closely with your team to enhance our current framework support for the latest ML models and for top customers and to grow and advance your team's capabilities. You will solve challenges facing current users to enable the best performance on the latest accelerators. You will collaborate with the PyTorch and JAX community across the AI industry to drive ML framework technology forward.About the teamAbout AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. BASIC QUALIFICATIONS- 10+ years of engineering experience- 5+ years of engineering team management experience- 10+ years of planning, designing, developing and delivering consumer software experience- Experience partnering with product or program management teams- Experience managing multiple concurrent programs, projects and development teams in an Agile environment ...

Senior Runtime Software Development Engineer, Neuron Runtime

At AWS AI our vision is to make deep learning pervasive for everyday developers and to democratize access to cutting edge infrastructure. In order to deliver on that vision, we’ve created innovative software and hardware solutions that make it possible.AWS Neuron SDK is the complete software stack for the AWS Inferentia and Trainium machine learning accelerators designed by Annapurna Labs inside AWS. The Neuron SDK consists of a compiler, runtime, frameworks, and tooling customers need. It’s also preinstalled in AWS Deep Learning AMIs and Deep Learning Containers for customers to quickly get started with running high performance and cost-effective inference and training.The Neuron team is hiring senior Runtime Software Development Engineers with a background in machine learning and AI accelerators in order to solve our customers toughest problems. As a Runtime Software Development Engineer you will have experience with high-performance Linux drivers, HPC technologies including: libfabric, MPI, and delivering products to customers with a high degree of operational excellence.This is a fast-paced, intellectually challenging position, where you’ll work with thought-leaders in multiple technology areas. You’ll have high standards for yourself and everyone you work with, and you’ll be constantly looking for ways to improve our products' performance, quality and cost.We’re changing an industry, and we want individuals who are ready for this challenge and want to reach beyond what is possible today.Utility Computing (UC)AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for customers who require specialized security solutions for their cloud services.About the teamAbout AWSAmazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.Diverse ExperiencesAWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Work/Life BalanceWe value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Inclusive Team CultureHere at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness.Mentorship & Career GrowthWe’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. BASIC QUALIFICATIONS- 5+ years of non-internship professional software development experience- 5+ years of programming with at least one software programming language experience- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience- 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience- Experience as a mentor, tech lead or leading an engineering team ...