Careers
Careers

job details

Back to jobs search

Jobs search results

3,651 jobs matched
Showing 1 to 20 of 3651 rows
Back to jobs search

Technical Program Manager III, GPU Infrastructure Reliability, Google Cloud

GoogleSunnyvale, CA, USA; Kirkland, WA, USA
In accordance with Washington state law, we are highlighting our comprehensive benefits package, which is available to all eligible US based employees. Benefits for this role include:
  • Health, dental, vision, life, disability insurance
  • Retirement Benefits: 401(k) with company match
  • Paid Time Off: 20 days of vacation per year, accruing at a rate of 6.15 hours per pay period for the first five years of employment
  • Sick Time: 40 hours/year (increased to 69 hours/year for Seattle) including 5 discretionary sick days per instance
  • Maternity Leave (Short-Term Disability + Baby Bonding): 28-30 weeks
  • Baby Bonding Leave: 18 weeks
  • Holidays: 13 paid days per year
Note: By applying to this position you will have an opportunity to share your preferred working location from the following: Sunnyvale, CA, USA; Kirkland, WA, USA.

Minimum qualifications:

  • Bachelor's degree in a technical field, or equivalent practical experience.
  • 5 years of experience in program management.
  • Experience with infrastructure reliability.
  • Experience with GPUs or GPU Systems.

Preferred qualifications:

  • 5 years of experience managing cross-functional or cross-team projects.
  • 5 years of experience in technical program management, with a focus on software engineering and ML infrastructure projects.
  • Knowledge of software development, distributed systems, and ML infrastructure or GPU systems.
  • Ability to think critically and solve problems.
  • Excellent project management skills, and experience with project planning, execution, and risk management.
  • Excellent communication and collaboration skills, with the ability to build relationships and influence across all levels of the organization.

About the job

A problem isn’t truly solved until it’s solved for all. That’s why Googlers build products that help create opportunities for everyone, whether down the street or across the globe. As a Technical Program Manager at Google, you’ll use your technical expertise to lead complex, multi-disciplinary projects from start to finish. You’ll work with stakeholders to plan requirements, identify risks, manage project schedules, and communicate clearly with cross-functional partners across the company. You're equally comfortable explaining your team's analyses and recommendations to executives as you are discussing the technical tradeoffs in product development with engineers.

To empower AI innovation by accelerating the delivery, cloud-based accelerator (GPU) NPIs built into large-scale supercomputer clusters, including next-gen cross-functional development, customer and vendor partnerships, and ML workload monitoring and diagnostic tooling.

As a GPU Technical Program Manager for Google Cloud’s AI and Computing Infrastructure team, you will be at the forefront of AI innovation, leading the end-to-end development and delivery of next-generation Cloud GPU products from initial concept to full-scale production. You will take charge of software qualification and release strategies for AI hypercompute clusters, collaborating deeply with engineering, product, and capacity planning teams to align customer and business priorities. Beyond managing critical escalations and mitigating risks, this is a unique opportunity to shape cross-functional initiatives alongside Application Centric Infrastructure (ACI) leadership and Technical Program Managers (TPMs) across the broader organization to streamline customer onboarding and scaled support for our largest, most complex Cloud ML solutions.

The ML, Systems, & Cloud AI (MSCA) organization at Google designs, implements, and manages the hardware, software, machine learning, and systems infrastructure for all Google services (Search, YouTube, etc.) and Google Cloud. Our end users are Googlers, Cloud customers and the billions of people who use Google services around the world.

We prioritize security, efficiency, and reliability across everything we do - from developing our latest TPUs to running a global network, while driving towards shaping the future of hyperscale computing. Our global impact spans software and hardware, including Google Cloud’s Vertex AI, the leading AI platform for bringing Gemini models to enterprise customers.

Individual pay is determined by factors including job-related skills, experience, and relevant education or training.

US: $163000 - $237000 (USD) + 15% bonus target + equity + benefits

Learn more about benefits at Google.

Responsibilities

  • Lead the end-to-end development, project planning, and delivery of next-gen AI Infra GPU products from concept to production.
  • Lead software qualifications, release strategy, and test infrastructure management for AI hypercompute clusters.
  • Manage escalations and critical incidents while proactively identifying and mitigating risks that could impact project success.
  • Coordinate with TPMs in AI2 (e.g., ACI, Platforms, and CSCO) and ACI leadership on cross-functional initiatives related to AI Infra customer onboarding and production support.
  • Participate in the development of core management software, monitoring, and diagnostic tooling for scalable Cloud ML solutions.

Information collected and processed as part of your Google Careers profile, and any job applications you choose to submit is subject to Google's Applicant and Candidate Privacy Policy.

Google is proud to be an equal opportunity and affirmative action employer. We are committed to building a workforce that is representative of the users we serve, creating a culture of belonging, and providing an equal employment opportunity regardless of race, creed, color, religion, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition (including breastfeeding), expecting or parents-to-be, criminal histories consistent with legal requirements, or any other basis protected by law. See also Google's EEO Policy, Know your rights: workplace discrimination is illegal, Belonging at Google, and How we hire.

If you have a need that requires accommodation, please let us know by completing our Accommodations for Applicants form.

Google is a global company and, in order to facilitate efficient collaboration and communication globally, English proficiency is a requirement for all roles unless stated otherwise in the job posting.

To all recruitment agencies: Google does not accept agency resumes. Please do not forward resumes to our jobs alias, Google employees, or any other organization location. Google is not responsible for any fees related to unsolicited resumes.

Equity is granted exclusively and discretionarily by Alphabet Inc. on the basis of an agreement concluded between you and Alphabet Inc. Alphabet Inc. is your sole contractual partner with respect to equity grants. GSU grants are not guaranteed, are discretionary, are subject to approval by the Alphabet Inc. board of directors or its delegate, the terms of the relevant Alphabet Inc. stock plan, and your grant agreement. They have no impact on statutory payments. Current or past grants do not confer an acquired right.

Google apps
Main menu