Senior Advanced Research Computing Systems Administrator, Victoria
Senior Advanced Research Computing Systems Administrator, Victoria
-
Victoria, Canada
-
Dernière édition le: il y a moins d’une semaine
-
Ajouter
Description
Mandate Reporting to the Manager and Architect of Advanced Research Computing Infrastructure, the Senior Advanced Research Computing Systems Administrator works as part of a team to design, build, and ensure the operational effectiveness of the university’s research servers and storage. Members of this team maintain systems critical to many research groups on-campus and beyond, including web servers, database servers, high‑performance research computing systems (HPC), cloud infrastructure, and container orchestration used by researchers both atUVic, from institutions across the country, and with international collaborations. These systems are required to be in operation 24 hours per day, 365 days of the year, and decisions regarding these systems can impact UVic’s obligations to other parties beyond the institution.
Objectives The Senior Advanced Research Computing System Administrator’s work includes the design, installation, configuration, and maintenance of hardware and software, problem determination/resolution, resource allocation, performance and security monitoring, and usage reporting. Each position has specialized areas of expertise in multiple domains: storage technologies such as Ceph, dCache, GPFs, Lustre, and IBM Spectrum Protect (TSM); deployment technologies like xCAT, Cobbler, Ansible, Puppet, and Terraform; and compute/virtualization technologies such as Kubernetes, OpenStack; HPC schedulers such as SLURM, HTCondor, and Moab; and systems monitoring. The specific technologies that are leveraged in this role will change over time, and this position has the responsibility to help guide the decision on how future technologies are selected and deployed.This position requires the incumbent to have significant problem‑solving skills to analyze and correct software and hardware problems and to automate administration tasks. It also requires effective communication skills to provide technical assistance and advice to peers and the user community, and to inform user areas on the impact and implications of systemfailures, maintenance, and cybersecurity incidents. The role leads project teams and provides recommendations on the university’s server and storage infrastructure. System maintenance is usually performed off‑hours, with major issues responded to on a 24/7 basis. This role may need to work outside of normal hours on an emergency or pre‑scheduled basis and may require travel out of town or country. The position requires a Bachelor’s Degree in Computer Science or another relevant discipline plus at least five years of experience in system administration in a large enterprise or academic/research environment. An equivalent combination of education and experience may be considered.
Required Knowledge, Skills, and Abilities
Expert knowledge of RedHat Enterprise Linux and/or derivatives (e.g., AlmaLinux, Rocky Linux, etc.)
In‑depth experience installing and operating at least one of OpenStack, Kubernetes, or Ceph
In‑depth experience with scripting and revision control (e.g., Bash, PERL, Python, Git, or Subversion)
Working knowledge of provisioning and configuration management tools (e.g., Ansible, Terraform, xCAT, Cobbler)
Experience supporting cloud computing and/or containerized environments
Excellent communication skills, both written and verbal
Ability to build and maintain productive working relationships with all stakeholders
Ability to work collaboratively in a team environment
Proven track record of achieving project goals on time and producing deliverables of high quality
High degree of attention to detail and ability to understand complex technical concepts; requires maintaining broad and in‑depth technical knowledge of all aspects of servers and server operating systems
High level of problem‑solving ability; must effectively identify and resolve unusual and highly complex technical problems
Ability to effectively manage multiple tasks and priorities, and work under pressure to meet time‑sensitive and mission‑critical deadlines in a complex environment
Ability to take initiative and work with limited direction
Ability to mentor and coach technical staff and teams, and act as a resource
Ability to contribute to complex projects by developing project work plans and monitoring and directing the activities of a project team
Excellent written and oral communications skills
Commitment to valuing diversity and contributing to an inclusive and respectful working and learning environment
Assets or Preferences
Working knowledge of load balancers and HA environments
Experience supporting HPC environments
Experience supporting compute and/or storage systems in a research or academic setting
Experience participating with and contributing to open‑source software projects
Working knowledge of GPU acceleration of computational workloads, preferably in a virtualized environment
Working knowledge of KVM/QEMU virtualization, ContainerD or Docker container runtimes, and Calico, Linuxbridge, or OpenVSwitch virtual networking
#J-18808-Ljbffr
Objectives The Senior Advanced Research Computing System Administrator’s work includes the design, installation, configuration, and maintenance of hardware and software, problem determination/resolution, resource allocation, performance and security monitoring, and usage reporting. Each position has specialized areas of expertise in multiple domains: storage technologies such as Ceph, dCache, GPFs, Lustre, and IBM Spectrum Protect (TSM); deployment technologies like xCAT, Cobbler, Ansible, Puppet, and Terraform; and compute/virtualization technologies such as Kubernetes, OpenStack; HPC schedulers such as SLURM, HTCondor, and Moab; and systems monitoring. The specific technologies that are leveraged in this role will change over time, and this position has the responsibility to help guide the decision on how future technologies are selected and deployed.This position requires the incumbent to have significant problem‑solving skills to analyze and correct software and hardware problems and to automate administration tasks. It also requires effective communication skills to provide technical assistance and advice to peers and the user community, and to inform user areas on the impact and implications of systemfailures, maintenance, and cybersecurity incidents. The role leads project teams and provides recommendations on the university’s server and storage infrastructure. System maintenance is usually performed off‑hours, with major issues responded to on a 24/7 basis. This role may need to work outside of normal hours on an emergency or pre‑scheduled basis and may require travel out of town or country. The position requires a Bachelor’s Degree in Computer Science or another relevant discipline plus at least five years of experience in system administration in a large enterprise or academic/research environment. An equivalent combination of education and experience may be considered.
Required Knowledge, Skills, and Abilities
Expert knowledge of RedHat Enterprise Linux and/or derivatives (e.g., AlmaLinux, Rocky Linux, etc.)
In‑depth experience installing and operating at least one of OpenStack, Kubernetes, or Ceph
In‑depth experience with scripting and revision control (e.g., Bash, PERL, Python, Git, or Subversion)
Working knowledge of provisioning and configuration management tools (e.g., Ansible, Terraform, xCAT, Cobbler)
Experience supporting cloud computing and/or containerized environments
Excellent communication skills, both written and verbal
Ability to build and maintain productive working relationships with all stakeholders
Ability to work collaboratively in a team environment
Proven track record of achieving project goals on time and producing deliverables of high quality
High degree of attention to detail and ability to understand complex technical concepts; requires maintaining broad and in‑depth technical knowledge of all aspects of servers and server operating systems
High level of problem‑solving ability; must effectively identify and resolve unusual and highly complex technical problems
Ability to effectively manage multiple tasks and priorities, and work under pressure to meet time‑sensitive and mission‑critical deadlines in a complex environment
Ability to take initiative and work with limited direction
Ability to mentor and coach technical staff and teams, and act as a resource
Ability to contribute to complex projects by developing project work plans and monitoring and directing the activities of a project team
Excellent written and oral communications skills
Commitment to valuing diversity and contributing to an inclusive and respectful working and learning environment
Assets or Preferences
Working knowledge of load balancers and HA environments
Experience supporting HPC environments
Experience supporting compute and/or storage systems in a research or academic setting
Experience participating with and contributing to open‑source software projects
Working knowledge of GPU acceleration of computational workloads, preferably in a virtualized environment
Working knowledge of KVM/QEMU virtualization, ContainerD or Docker container runtimes, and Calico, Linuxbridge, or OpenVSwitch virtual networking
#J-18808-Ljbffr
Informations clefs
-
Nom de l’entrepriseUniversity of Victoria
-
Titre de posteSenior Advanced Research Computing Systems Administrator
Conseils de Sécurité
Veuillez signaler les annonces ou messages vous paraissant suspects.
Informations supplémentaires sur l’annonce
Senior Advanced Research Computing Systems Administrator est visible sur Locanto dans la rubrique Victoria Administration, secrétariat.
Pour le moment, c’est la seule annonce dans cette rubrique pour Victoria.
Il y a encore plus de petites annonces dans un rayon de 15 km pour cette rubrique. Cliquez ici pour consulter ces annonces.