WHO WE ARE:

EOS IT Solutions is a Global Technology and Logistics company, providing Collaboration and Business IT Support services to some of the world's largest industry leaders, delivering forward-thinking solutions based on multi-domain architecture. Customer satisfaction and commitment to superior quality of service are our top business priorities, along with investing in and supporting our partners and employees.

We are a true International IT provider and are proud to deliver our services through global simplicity with trusted transparency.

WHAT YOU'LL DO:

We are seeking an experienced and technically proficient Collaboration Reliability Engineering Lead to join our team. In this role, you will support advanced collaboration technologies in a fast-paced and industry-leading environment. The ideal candidate is a highly motivated technical enthusiast with a strong foundation in IT, operations, networking, scripting, and collaboration technologies, and a passion for continuous learning.

TEAM LEADERSHIP:

  • Lead, mentor, and manage a global team of 8-12 reliability engineers.
  • Foster ownership, accountability, and collaboration within the team.
  • Develop team members' technical and professional skills through coaching and performance reviews.

SYSTEM RELIABILITY AND PERFORMANCE:

  • Oversee maintenance of highly available and scalable architecture including but not limited to cisco server templates, endpoints, edge & proxy appliances.
  • Develop, present, and achieve service-level objectives (SLOs), service-level agreements (SLAs), and key performance indicators (KPIs).
  • Perform quality assurance on video conferencing infrastructure, calendar tooling, touch panel hardware, automation bots, cisco endpoints, and call center tooling.

INCIDENT MANAGEMENT RESOLUTION:

  • Drive incident response, root cause analysis, and post-mortem processes to identify and address reliability issues impacting users.
  • Implement proactive monitoring, alerting, and automation to minimize downtime and improve recovery times in live production environments.
  • Serve as an escalation point for video conferencing infrastructure and network troubleshooting, maintaining up-to-date documentation and on-call runbooks.

RELIABILITY IMPROVEMENTS:

  • Identify opportunities to improve system performance and reduce operational toil.
  • Develop and implement strategies for failure testing, and future-capacity planning.

CROSS FUNCTIONAL COLLABORATION:

  • Work closely with engineering, security, networking, and third-party vendors (e.g., Cisco, Brightsign, Arista, Zoom, Webex) to resolve support cases and critical escalations.
  • Provide highly-visible communications to hundreds of users regarding large scale changes and updates.
  • Advocate for reliability-focused initiatives and communicate their value to stakeholders.

TOOLS AND AUTOMATION:

  • Leverage internal tooling to monitor, analyze, and improve system reliability.
  • Lead efforts to automate repetitive tasks, ensuring efficient system operations.

TECHNICAL REQUIREMENTS:

  • 3+ years of experience in Reliability Engineering or similar roles.
  • Health Monitoring: Experience implementing and coordinating telemetry using monitoring tools like Splunk, Grafana, and Prometheus, or similar technologies.
  • VMware expertise: Hands-on experience with VMware from a VM deployment, lifecycle and API/CLI perspective.
  • ITIL Knowledge: Understanding of ITIL processes, service management principles, and IT service delivery best practices.
  • Automation: Experience as an automation advocate with a history of removing operational toil via software.
  • Experience supporting internet-facing production services and distributed systems, including: Deployments, On-Call rotations, and Incident management.

TECHNICAL SKILLS:

  • Familiarity with Bash, Python, Terraform, and REST APIs.
  • Fundamental understanding of networking protocols (e.g., HTTP, TCP/IP, WebRTC, SIP).
  • Infrastructure components (e.g., load balancers, firewalls, DNS).

ADDITIONAL KEY PRIORITIES:

  • Expertise in disaster recovery and future-capacity planning.
  • Excellent communication and interpersonal skills, with the ability to work effectively in a team-oriented environment.
  • Self-motivated and eager to learn new technologies, tools, and methodologies.

Experience with collaboration hardware, platforms (e.g., Zoom, Microsoft Teams, WebEx), or media delivery networks.

Pay Range: $135,000—$150,000 USD

EOS is committed to creating a diverse and inclusive work environment and is proud to be an equal opportunity employer. We invite you to consider opportunities at EOS regardless of your reassignment, philosophical belief, political opinion, marital or civil partnership status, or other non-merit factors.

#LI-ML1
#LI-Hybrid

#J-18808-Ljbffr