Senior Site Reliability Engineer

Cenimy Twoją prywatność

Chcemy Cię poinformować, że gdy odwiedzasz nasz serwis, za pomocą plików cookies lub podobnych technologii (nazywamy je łącznie cookies), my lub nasi partnerzy, zbieramy informacje o Twojej aktywności w serwisie the:protocol. Dzięki temu możemy zapewnić Ci wygodne i bezpieczne korzystanie z naszego serwisu i naszych usług, dopasować do Twoich preferencji wyświetlane treści, oferty pracy oraz umożliwić Ci korzystanie z funkcji mediów społecznościowych.

Szanujemy Twoją prywatność, dlatego umożliwiamy Ci wybór Twoich preferencji odnośnie cookies. Skorzystaj z przycisku „Akceptuj wszystkie” lub „Dostosuj”, aby dokonać wyboru i udzielić zgód na cookies. Możesz cofnąć lub zmienić zgody w dowolnym momencie. Wystarczy, że wybierzesz „Ustawienia plików cookies” w stopce każdej z naszych podstron.

Pamiętaj jednak – rezygnując z niektórych rodzajów cookies, możesz uniemożliwić lub utrudnić sobie korzystanie z naszego serwisu i jego funkcji.

Szczegółowy wykaz używanych cookies w serwisie the:protocol został wskazany tutaj.

Polityka prywatności·Polityka plików cookies

Praca w miastach:Praca IT Warszawa•Praca IT Kraków•Praca IT Wrocław•Praca IT Szczecin•Praca IT Poznań•Praca IT Lublin•Praca IT Katowice•Praca IT Bydgoszcz•Praca IT Łódź•Praca IT Białystok•Praca IT Toruń•Praca IT Gdańsk

Stanowiska:Grafik komputerowy praca•Tester gier praca•Programista praca•Tester oprogramowania praca•Frontend Developer praca•Data scientist praca•Tester manualny praca•Tester praca•Analytics praca•Tester aplikacji praca•Programista java praca•Web developer praca

Technologie i narzędzia:Python praca•Cyberbezpieczeństwo praca•Java praca•Bazy danych praca•Cisco praca•C++ praca•Javascript praca•SQL praca•Business Intelligence praca•Delphi praca•Wordpress praca•Testowanie aplikacji praca•C# praca•PHP praca

Offer summary

(Summary generated by AI based on the full job description)

The project focuses on monitoring and maintaining reliability of an infrastructure platform supporting AI services, Java APIs, and frontend applications. Key technologies include Kubernetes (AKS), Terraform, Azure (ACR, Key Vault, Virtual Networks), Prometheus, Grafana, GitHub Actions, ArgoCD. Main responsibilities cover defining and maintaining SLO/SLI, incident response, automation, Kubernetes infrastructure management, and development of observability and CI/CD tools. The project emphasizes toil reduction and production environment stability.

newyou can start ASAP

Senior Site Reliability Engineer

Company: Webellian Sp. z o.o.

from: 26 June 2026

to: 26 July 2026

salary not specifiedB2B contract (full-time)

salary not specifiedcontract of employment

Offer parameters

level:senior

working mode:hybrid

Warszawa, Mokotów

Warszawa, MokotówDomaniewska 45View on map

Requirements

Expected technologies

Microsoft Azure

Kubernetes

Terraform

Prometheus

Grafana

Python

GitHub Actions

ArgoCD

Optional technologies

Bicep

Operating system

Windows

Our requirements

5+ years professional experience in site reliability engineering, DevOps, or platform engineering roles.
Strong Kubernetes experience: cluster operations, networking (Ingress, network policies), storage, autoscaling, and hands-on troubleshooting across production environments.
Solid Infrastructure as Code experience with Terraform; familiarity with Bicep or ARM templates is a plus.
Production experience with Azure cloud services: AKS, ACR, Key Vault, Azure Monitor, Application Insights, Virtual Networks, and Private Endpoints.
Strong observability experience: Prometheus, Grafana, centralized logging, alerting configuration, and distributed tracing instrumentation.
Working knowledge of SLO/SLI methodology: error budget principles, reliability target setting, and capacity planning.
Structured incident management experience: on-call ownership, blameless post-incident review, and runbook authorship.
Scripting and automation proficiency in Python or bash for toil elimination and operational tooling.
Strong CI/CD experience: GitHub Actions and ArgoCD or equivalent GitOps tooling.

Optional

Kubernetes certifications: CKA or CKAD.
Experience supporting AI or ML infrastructure workloads: GPU scheduling, model serving platforms, or inference pipeline operations.
Exposure to chaos engineering practices and fault injection testing.
FinOps experience: reserved capacity planning, resource right-sizing programs, and cost attribution per team or workload.
Service mesh experience (Istio, Linkerd) for traffic management and reliability patterns.
Experience in regulated industries (insurance, finance, healthcare) where auditability, change traceability, and secure-by-default operations are standard practice.

Your responsibilities

Define, instrument, and maintain SLOs and SLIs for platform components; own error budget tracking and produce regular reliability reports for hub leadership.
Serve on the on-call rotation as the infrastructure escalation tier; lead incident response for cluster-level, network-level, and storage failures; chair blameless post-incident reviews.
Implement and operate Kubernetes infrastructure (AKS): cluster lifecycle management, networking, resource quotas, autoscaling configuration, and multi-tenancy patterns across spoke namespaces.
Develop Infrastructure as Code (Terraform) to provision and manage Azure resources with consistency, auditability, and repeatable rollback capability.
Build and maintain observability infrastructure: Prometheus, Grafana, Azure Monitor, and Application Insights; own alerting rules, dashboards, and distributed tracing coverage across platform components.
Perform capacity planning and cost-aware resource management: right-size node pools, tune vertical and horizontal pod autoscalers, and identify resource waste across namespaces.
Identify and eliminate toil: automate repetitive operational tasks through scripting and tooling; measure and track toil reduction over time.
Maintain platform reliability procedures: rolling upgrades, backup and recovery testing, disaster recovery runbooks, and change freeze coordination.
Contribute to CI/CD pipelines and GitOps tooling (GitHub Actions, ArgoCD) from a reliability and deployment safety perspective; work with the Platform Team on release gates and rollback mechanisms.
Collaborate with the Run & Change team on incident SLA targets and operational procedures; work with Security Engineers on infrastructure hardening and vulnerability remediation.

About the project

As a Site Reliability Engineer within Advanced Analytics Team you will join the Infra team to own the reliability and operational health of the platform. You will define and maintain service level objectives, drive incident response at the infrastructure layer, and systematically eliminate operational toil through automation. You will work closely with Platform Engineers, Security Engineers, and the Run & Change team to ensure the platform meets its reliability commitments across production workloads spanning AI services, Java APIs, and frontend applications.

Ways of Working

Comfortable in agile, iterative delivery environments with personal ownership and accountability for platform reliability.

Clear communicator across global, cross-functional stakeholders; able to translate technical reliability metrics into business impact for non-technical audiences.

Proactive learner with pragmatic adoption of AI-assisted developer tools (e.g., GitHub Copilot, Claude Code) to improve automation coverage and delivery velocity.

This is how we organize our work

This is how we work

agile

This is how we work on a project

Continuous Deployment
Continuous Integration
DevOps
issue tracking tools
testing environments

Join a growing team of dedicated professionals! We love to pass on the knowledge to grow excellence, speak our minds without playing politics, and just enjoy hanging around together. If you share our passions - we want to meet you! So go ahead and apply.

What we offer

Contract under Polish law: B2B or Umowa o Pracę
Benefits such as private medical care, group insurance, Multisport card
English classes available
Opportunity to work with excellent professionals
High standards of work and focus on the quality of code
New technologies in use
Continuously learning and growth
International team
Pinball, PlayStation & much more (on-site)

Benefits

sharing the costs of sports activities
private medical care
life insurance
remote work opportunities
fruits
video games at work
coffee / tea
drinks
parking space for employees
leisure zone
Pinball, PlayStation & much more
English classes

Recruitment stages

1.
📞A quick phone call with our Recruiter.
2.
📅Online technical interview, testing your skills.
3.
📅II face-to-face interview with your potential supervisor.
4.
🗒️Feedback.

Webellian Sp. z o.o.

Webellian is a well-established Digital Transformation and IT consulting company committed to creating a positive impact for our clients. We strive to make a meaningful difference in diverse sectors such as insurance, banking, healthcare, retail, and manufacturing. Our passion for cutting-edge and disruptive technologies, as well as our shared values and strong principles, are what motivate us. We are a community of engineers and senior advisors who work with our clients across industries, playing a deep and meaningful role in accelerating and realizing their vision and strategy.

This is how we work

Senior Site Reliability Engineer

I apply to:

Webellian Sp. z o.o.

Warszawa, Mokotów

Pracodawca zbiera zgłoszenia przez swój system.

Przejdziesz na zewnętrzny formularz.

By clicking "Aplikuj" you confirm that you've read and accepted our Terms and Conditions.

This is how the employer processes your data

Please include the following statement: I hereby authorize Webellian Poland Sp. z o.o. to the process personal data provided in this document for realising the recruitment process pursuant to the Personal Data Protection Act of 10 May 2018 (Journal of Laws 2018, item 1000) and in agreement with Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation).

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Need more information?

Make sure the body of the offer doesn’t already include what you’re looking for.
Ask a question if you need more information you’re interested in.
We’ll forward your question to the employer and aim to provide a response within 3 business days.

Share this offer

Link