Deloitte Autonomous Vehicle MLOps Lead, Manager - Managed AI in McLean, Virginia
Autonomous Vehicle MLOps Lead, Manager - Managed AI
The Deloitte Connected and Autonomous Vehicle (CAV) team is catalyzing and shaping the Autonomous Vehicle (AV) market through a suite of turnkey, as-a-service solutions that deliver improved performance and lower total cost of ownership. These solutions will empower Automotive customers to realize their autonomy ambitions as efficiently as possible.
High Level Role
We are looking for an MLOps Lead to own the technical development and production release of Deloitte's MLOps-as-a-Service, a disruptive solution that will revolutionize the world of transportation and the growing field of self-driving cars. This solution enables Automotive clients to train their AV models, accelerate DNN development efficacy, and improve data scientist productivity.
Specific tasks include:
Develop an ML pipeline & model management environment for building, training and inferencing models in AV development, simulation, and last mile testing
Ensure support of multiple opensource frameworks (e.g. TensorFlow, PyTorch) and programming languages in multi-GPU workload scenarios involving both model and/or data parallelism
Orchestrate and schedule multiple parallel experiments (AI models for training for example) in pooled GPU resources in a Kubernetes cluster for maximizing utilization, throughput, and priorities
Ensure role-based/self-provisioning of infrastructure resources for data-scientists with automated workflow (model access, build, train, simulate, last mile testing)
Integrate with data pipeline process for target datasets - models during training and simulation
Evaluate MLOps ISVs to determine build vs buy for additional features
Work directly with key AV customers to understand their technology and deliver the best solutions
Experience in HPC/AI distributed computing environments leveraging K8S orchestration and SLURM schedulers + optimization
Understands hybrid cloud considerations for burst capacity and run-time allocation for model training or development in the cloud vs on-prem
Well-versed with orchestration and scheduling of multiple parallel experiments (AI models for training for example) in pooled GPU resources in a Kubernetes cluster for maximizing utilization, throughput, and priorities
Experience with scalability, operations/run-time considerations for dynamic provisioning, suspend-resume, monitors and trouble-shooting model corruption and change control issues
Bachelor's Degree in CS or IE/Data science with 6+ years in this field. Advanced degree preferred
Ability to travel up to 50% on average, based on the work you do and the clients and industries/sectors you serve
Limited immigration sponsorship may be available
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability or protected veteran status, or any other legally protected basis, in accordance with applicable law.