Deloitte Jobs

Job Information

Deloitte Autonomous Vehicle Infrastructure Systems Lead, Manager - Managed AI in Detroit, Michigan

Autonomous Vehicle Infrastructure Systems Lead, Manager - Managed AI

The Team

The Deloitte Connected and Autonomous Vehicle (CAV) team is catalyzing and shaping the Autonomous Vehicle (AV) market through a suite of turnkey, as-a-service solutions that deliver improved performance and lower total cost of ownership. These solutions will empower Automotive customers to realize their autonomy ambitions as efficiently as possible.

High Level Role

We are looking for a seasoned, "hands-on" HPC/AI infrastructure systems leader who will drive the scope, detailed design, and deployment of AV infrastructure across on-prem, cloud, and hybrid environments. The key success measure of this prototype will be the delivery of Deloitte's offering in POD configurations as a service for our customers with guaranteed SLAs and TCO targets.

Specifics:

  • Establish the detailed specification of the DGX A100 that reflects a representative customer's planning, deployment, and on-going operations optimization requirements on TCO, throughput, scalability, and flexibility with their varied workloads

  • Set up the DGX/Super POD reference environment including DGX A100 compute nodes, fabrics (storage/compute), management networks & software (DeepOps), key system software for optimizing GPU communications I/O and application performance, and user run-time tools for SLURM and Kubernetes containers

  • Design and document the most efficient setup to meet success metrics (TCO, performance, scale). Specific areas of focus:

  • Network switch & fabric considerations for non-blocking, scalable bandwidth needs for best performance with varying dataset sizes & locations

  • Storage and caching hierarchy implementations based on training vs inferencing workloads. Establish storage management guidelines for RAM/NVMe (internal storage) and external high speed storage (DDN, Netapp, etc.) allocation to optimize performance and cost of running varying data-sets and workloads. Establish rules for when to trigger GPU Direct Storage (GDS) feature for lower latency and faster I/O workloads.

  • Management Servers - infrastructure design & setup for enabling- user logins, provisioning (OS images & other internal infrastructure services for the pod), Work-load management (resource management and scheduling/orchestration), container mgmt., system monitors/logs

  • Operations/run-time optimization of A100 compute resources (MIG partitions) for varying workloads to maximize the utilization and throughput of jobs being scheduled in a given node cluster

  • Validate the commercial model with the MVP operational run/playbook

    Minimum Qualifications:

  • Bachelor's Degree equivalent experience in Computer Architecture, Computer Science, Electrical Engineering or related field. Advanced degree preferred

  • 6+ years of proven experience in design, deployment, and operations of HPC production grade environments leveraging both SLURM and Kubernetes clusters

  • Deep understanding of scale out compute, networking, and external storage architectures for optimizing performance and acceleration of AI/HPC workloads

  • Proven experience deploying, upgrading, migrating, and driving user adoption of sophisticated enterprise scale systems.

  • Prior software, solutions development background and proven ability to demonstrate complex new technologies

  • Programming skills to build distributed storage and compute systems, backend services, microservices, and web technologies

  • Well versed in agile methodology

  • Comfortable with a customer focused, high paced environment

  • Ability to travel up to 50% on average, based on the work you do and the clients and industries/sectors you serve

  • Limited immigration sponsorship may be available

AI&DE23

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability or protected veteran status, or any other legally protected basis, in accordance with applicable law.

DirectEmployers