Principal Machine Learning Engineer, Distributed vLLM
Company: Red Hat
Location: Boston
Posted on: April 3, 2026
|
|
|
Job Description:
Job Summary At Red Hat we believe the future of AI is open and
we are on a mission to bring the power of open-source LLMs and vLLM
to every enterprise. The Red Hat AI Inference Engineering team
accelerates AI for the enterprise and brings operational simplicity
to GenAI deployments. As leading developers, maintainers of the
vLLM and LLM-D projects, and inventors of state-of-the-art
techniques for model quantization and sparsification, our team
provides a stable platform for enterprises to build, optimize, and
scale LLM deployments. As a Machine Learning Engineer focused on
distributed vLLM infrastructure in the llm-d project, you will be
at the forefront of innovation, collaborating with our team to
tackle the most pressing challenges in scalable inference systems
and Kubernetes-native deployments. Your work with machine learning,
distributed systems, high performance computing, and cloud
infrastructure will directly impact the development of our
cutting-edge software platform, helping to shape the future of AI
deployment and utilization. If you want to solve cutting edge
problems at the intersection of deep learning, distributed systems,
and cloud-native infrastructure the open-source way, this is the
role for you. Join us in shaping the future of AI! What you will do
Architect and lead implementation of new features and solutions for
Red Hat AI Inference Lead and foster a healthy upstream open source
community Design, develop, and maintain distributed inference
infrastructure leveraging Kubernetes APIs, operators, and the
Gateway Inference Extension API for scalable LLM deployments.
Design, develop, and maintain system components in Go and/or Rust
to integrate with the vLLM project and manage distributed inference
workloads. Design, develop, and maintain KV cache-aware routing and
scoring algorithms to optimize memory utilization and request
distribution in large-scale inference deployments. Enhance the
resource utilization, fault tolerance, and stability of the
inference stack. Design, develop, and test various inference
optimization algorithms. Actively lead and facilitate technical
design discussions and propose innovative solutions to complex
challenges for high impact projects Contribute to a culture of
continuous improvement by sharing recommendations and technical
knowledge with team members Collaborate with product management,
other engineering and cross-functional teams to analyze and clarify
business requirements Communicate effectively to stakeholders and
team members to ensure proper visibility of development efforts
Mentor, influence, and coach a distributed team of engineers
Provide timely and constructive code reviews Represent RHAI in
external engagements including industry events, customer meetings,
and open source communities ? What you will bring Strong
proficiency in Python, GoLang and at least one of the following:
Rust, C, or C++. Strong experience with cloud-native Kubernetes
service mesh technologies/stacks such as Istio, Cilium, Envoy (WASM
filters), and CNI. A solid understanding of Layer 7 networking,
HTTP/2, gRPC, and the fundamentals of API gateways and reverse
proxies. Knowledge of serving runtime technologies for hosting
LLMs, such as vLLM, SGLang, TensorRT-LLM, etc. Excellent written
and verbal communication skills, capable of interacting effectively
with both technical and non-technical team members. Experience
providing technical leadership in a global team and delivering on a
vision Autonomous work ethic and the ability to thrive in a
dynamic, fast-paced environment ? Following is considered a plus
Knowledge of high-performance networking protocols and technologies
including UCX, RoCE, InfiniBand, and RDMA is a plus. Deep
experience with the Kubernetes ecosystem, including core concepts,
custom APIs, operators, and the Gateway API inference extension for
GenAI workloads. Experience with GPU performance benchmarking and
profiling tools like NVIDIA Nsight or distributed tracing
libraries/techniques like OpenTelemetry. Experience in writing high
performance code for GPUs and deep knowledge of GPU hardware Strong
understanding of computer architecture, parallel processing, and
distributed computing concepts Bachelor's degree in computer
science or related field is an advantage, though we prioritize
hands-on experience Active engagement in the ML research community
(publications, conference participation, or open source
contributions) is a significant advantage LI-MD2 AI-HIRING The
salary range for this position is $189,600.00 - $312,730.00. Actual
offer will be based on your qualifications. Pay Transparency Red
Hat determines compensation based on several factors including but
not limited to job location, experience, applicable skills and
training, external market value, and internal pay equity. Annual
salary is one component of Red Hat’s compensation package. This
position may also be eligible for bonus, commission, and/or equity.
For positions with Remote-US locations, the actual salary range for
the position may differ based on location but will be commensurate
with job duties and relevant work experience. About Red Hat Red Hat
is the world’s leading provider of enterprise open source software
solutions, using a community-powered approach to deliver
high-performing Linux, cloud, container, and Kubernetes
technologies. Spread across 40 countries, our associates work
flexibly across work environments, from in-office, to office-flex,
to fully remote, depending on the requirements of their role. Red
Hatters are encouraged to bring their best ideas, no matter their
title or tenure. We're a leader in open source because of our open
and inclusive environment. We hire creative, passionate people
ready to contribute their ideas, help solve complex problems, and
make an impact. Benefits ? Comprehensive medical, dental, and
vision coverage ? Flexible Spending Account - healthcare and
dependent care ? Health Savings Account - high deductible medical
plan ? Retirement 401(k) with employer match ? Paid time off and
holidays ? Paid parental leave plans for all new parents ? Leave
benefits including disability, paid family medical leave, and paid
military leave ? Additional benefits including employee stock
purchase plan, family planning reimbursement, tuition
reimbursement, transportation expense account, employee assistance
program, and more! Note: These benefits are only applicable to full
time, permanent associates at Red Hat located in the United States.
Inclusion at Red Hat Red Hat’s culture is built on the open source
principles of transparency, collaboration, and inclusion, where the
best ideas can come from anywhere and anyone. When this is
realized, it empowers people from different backgrounds,
perspectives, and experiences to come together to share ideas,
challenge the status quo, and drive innovation. Our aspiration is
that everyone experiences this culture with equal opportunity and
access, and that all voices are not only heard but also celebrated.
We hope you will join our celebration, and we welcome and encourage
applicants from all the beautiful dimensions that compose our
global village. Equal Opportunity Policy (EEO) Red Hat is proud to
be an equal opportunity workplace and an affirmative action
employer. We review applications for employment without regard to
their race, color, religion, sex, sexual orientation, gender
identity, national origin, ancestry, citizenship, age, veteran
status, genetic information, physical or mental disability, medical
condition, marital status, or any other basis prohibited by law.
Red Hat does not seek or accept unsolicited resumes or CVs from
recruitment agencies. We are not responsible for, and will not pay,
any fees, commissions, or any other payment related to unsolicited
resumes or CVs except as required in a written contract between Red
Hat and the recruitment agency or party requesting payment of a
fee. Red Hat supports individuals with disabilities and provides
reasonable accommodations to job applicants. If you need assistance
completing our online job application, email
application-assistance@redhat.com . General inquiries, such as
those regarding the status of a job application, will not receive a
reply.
Keywords: Red Hat, Attleboro , Principal Machine Learning Engineer, Distributed vLLM, IT / Software / Systems , Boston, Massachusetts