NextGen Innovation Labs
NEXTGEN Innovation Labs - System Administration/Engineer - HPC
Job Location
delhi, India
Job Description
Job Description : Run : - Workload scheduler management - HPC tools and middleware management - HPC Application troubleshooting from infra perspective - System/cluster monitoring - Management, monitoring and maintenance of InfiniBand interconnect, cluster services, cluster hardware - Root Cause Analysis of HPC cluster Build : - System deployment using cluster management tools - OS repository management along with compatibility matrix for various device drivers - Configuration and maintenance of network services - Deployment and management of monitoring systems to automate services and hardware alerts - Internal cluster network connectivity - Cloud migration of HPC cluster (Core HPC system) - Installation and configuration of HPC workload managers - HPC Application integration with job scheduler Skills / Expertise : - Operating systems : Linux : RHEL, Rocky, CentOS, SuSE, Windows - Schedulers & Resource Managers : PBS Pro, LSF, SLURM, Open Grid Scheduler [OGS] - Provisioning : HP-CMU, xCAT, Bright Cluster Manager - Monitoring : Ganglia, Nagios, Zabbix, Grafana - Configuration Management : Chef, Puppet, Ansible, CFEngine. - HPC Application : Openfoam, Star-CCM,Abaqus, Ansys, Ls-Dyna and other CAE & CFD applications - Linux operating system fundamentals, architecture, administration, native service configuration and advanced debugging skills - Knowledge of x86 hardware, system software and system services - Experience in HPC cluster configuration, management, upgrade and migration - Knowledge of Managing parallel file system Like Luster, BeeGFS, GPFS - Scripting and automation - bash, Perl, Python - Knowledge of ITSM processes (ref:hirist.tech)
Location: delhi, IN
Posted Date: 1/11/2025
Location: delhi, IN
Posted Date: 1/11/2025
Contact Information
Contact | Human Resources NextGen Innovation Labs |
---|