Lead HPC Hardware Engineer

Apply now

Do you want to tackle the biggest questions in finance with near infinite compute power at your fingertips?

G-Research is a leading quantitative research and technology firm, with offices in London and Dallas.

We are proud to employ some of the best people in their field and to nurture their talent in a dynamic, flexible and highly stimulating culture where world-beating ideas are cultivated and rewarded.

This is a hybrid role based in our new Dallas infrastructure hub where we work on the latest technologies in a cutting-edge environment.

The role

As G-Research’s Lead HPC Hardware Engineer, you will play a critical role in managing, scaling and optimizing a large compute infrastructure, which is composed of numerous GPUs and CPU nodes.

In this role, you will work closely with Infrastructure Engineers, Data Centre Operations, AI Engineers, Security Experts and Software Engineers to deliver a robust compute platform that supports high-performance computing needs.

Your expertise will be pivotal in ensuring that our compute infrastructure operates efficiently, while also planning for its growth and maintenance.

Our approach is centred on automation, hardware optimisation and infrastructure best practices. You will help drive improvements, mentor junior engineers and ensure our infrastructure is both secure and scalable.

Key responsibilities of the role include:

Designing, configuring, and manage a high-performance compute infrastructure
Growing and optimizing our infrastructure to meet business demands
Ensuring the efficient operation of the OpenStack-powered environment, with a primary focus on OpenStack Ironic
Monitoring hardware performance, identifying areas for improvement and implementing solutions
Developing and maintaining hardware management procedures to increase server uptime and minimise failures
Performing diagnostics, tuning and capacity planning to ensure smooth scale-out
Performing analysis of existing hardware lifecycle processes and providing recommendations for improvement and optimization
Collaborating with various teams to integrate hardware improvements aligned to organizational goals
Implementing best practices for security hardening of the platform and associated systems
Mentoring junior engineers and fostering a culture of continuous learning and improvement

Who are we looking for?

The ideal candidate will have the following skills and experience:

Demonstrable experience managing large-scale HPC infrastructure
Strong understanding of server hardware architecture, including processors, memory, storage, networking and power systems
Deep understanding of bare-metal provisioning and infrastructure automation
Proven ability to troubleshoot hardware issues, including diagnostics and repairs for both GPU and CPU nodes in production environments
Experience with hardware monitoring, management tools and familiarity with hardware automation techniques and tools, such as Ansible, Puppet and Chef
Knowledge of Redfish API, including iDRAC, iLO, BMC, IPMI
Experience with hardware diagnostics, optimization, performance tuning and capacity planning
Familiarity with thermal management and optimizing data centre layout for efficiency
Knowledge of security best practices for hardware infrastructure
Strong problem-solving skills with the ability to work under pressure in a fast-paced environment
Excellent communication skills and the ability to work collaboratively with cross-functional teams

The following would be beneficial:

Experience with large compute farms or hyperscale data centres
Familiarity with high-performance networking, such as InfiniBand, Ethernet
Knowledge of server configuration management and software deployment in HPC environments
Understanding of Linux-based environments and proficiency in scripting languages such as Python, Bash or PowerShell for automation
Experience with OpenStack or similar cloud platforms
Experience with NVIDIA-SMI and debugging GPU-related issues
Leadership experience including team management, mentoring and developing engineers

Why should you apply?

Market-leading compensation plus annual discretionary bonus
Lunch provided in the office (via GrubHub)
Informal dress code and excellent work/life balance
Excellent paid time off allowance of 25 days
Sick days, military leave, and family and medical leave
Generous 401(k) plan
16-weeks’ fully paid parental leave
Medical and Prescription, Dental, and Vision insurance
Life and Accidental Death & Dismemberment (AD&D) insurance
Employee Assistance and Wellness programs
Generous relocation allowance and support
Great selection of office snacks, and hot and cold drinks
On-site gym and car parking

Location: Dallas, TX

Apply Now

Mia Infrastructure Development Software Engineer

"What I appreciate most about working in G-Research is the supportive and knowledgeable environment. Everyone is incredibly helpful and patient, which ensures there’s a good balance between being challenged and your workload."

Find out more

Interview process

Online Application

Our assessment process kicks off with our Talent Acquisition team, who will review your application and assess your fit for the role.

Stage One: Technical Interview

You will meet with a team member – or take a remote test – where your technical abilities will be put to the test.

Stage Two: Behavioural Interview

We will set aside technical skills and focus on you.

Stage Three: Further Technical Interviews

Here, we will take a deeper dive into your technical skills and competencies.

Stage Four: Management Interviews

The final stage of our interview process is where you will meet members of your team, your future manager, and functional leadership.

Latest news

See all news

G-Research March 2025 Grant Winners

22 Apr 2025

Each month, we provide up to £2,000 in grant money to early career researchers in quantitative disciplines. Hear from our March grant winners.

Read article

Invisible Work of OpenStack: Eventlet Migration

25 Mar 2025

Hear from Jay, an Open Source Software Engineer, on tackling technical debt in OpenStack. As technology evolves, outdated code becomes inefficient and harder to maintain. Jay highlights the importance of refactoring legacy systems to keep open-source projects sustainable and future-proof.

Read article

SXSW 2025: Key takeaways from our Engineers

24 Mar 2025

At G-Research we stay at the cutting edge by prioritising learning and development. That’s why we encourage our people to attend events like SXSW, where they can engage with industry experts and explore new ideas. Hear from two Dallas-based Engineers, as they share their key takeaways from SXSW 2025.

Read article