Machine Learning / AI Engineer (RL)
Offer summary

(Summary generated by AI based on the full job description)

The project focuses on developing advanced RL environments and evaluation systems for AI agents to enable safe AGI. Required skills include Langchain, Langraph, mcp-server, and experience in RL environment engineering and machine learning. Responsibilities include designing RL environments, building task pipelines, developing reward models, and collaborating with infrastructure teams for scalability and telemetry.

you can start ASAP

Machine Learning / AI Engineer (RL)

Company: ACAISOFT POLAND Sp. z o.o.

from: 18 May 2026
to: 17 June 2026
160 - 240net (+ VAT)/ hr.B2B contract (full-time)
Salary details
basic salary
Offer parameters
level:mid • senior
working mode:remote
Warszawa, Mokotów
Warszawa, MokotówAleja Niepodległości 18View on map

Requirements

Expected technologies

Python

Optional technologies

Reinforcement Learning

Operating system

macOS
Linux

Our requirements

  • 5+ years of overall experience in the IT industry.
  • Minimum 3 years in Machine Learning/Environment Engineering, Data Scientist
  • Practical knowledge of AI frameworks (Langchain, Langraph, mcp-server ).
  • Extensive practical experience in working with AI, including prompt engineering and vibe coding.
  • Experience in working with business requirements (analysis, summarizing, responding to changes).
  • Expertise in planning your own work or that of a small team.
  • Being able to work 2 p.m. - 10 p.m

Optional

  • Knowledge of Codex or Claude Code.
  • Experience in integrating AI with a system would be an advantage.
  • Understanding of RL concepts - reward modeling, environment dynamics, verifiability, evaluation, and agent interaction loops.
  • Familiarity with instrumentation, metrics, and data pipelines for RL evaluation.

Your responsibilities

  • Design and implement RL environments that support large-scale agent evaluation and reinforcement learning experiments.
  • Build task generation pipelines, dynamic datasets, and scripted environments with controlled complexity and stochasticity.
  • Develop verifiers and reward models to automatically score trajectories and evaluate model reasoning.
  • Collaborate with infrastructure and systems engineers to ensure environments are scalable, reproducible, and instrumented for detailed telemetry.
  • Design APIs and orchestration frameworks for running, resetting, and evaluating agents across environments.
  • Optimize environment performance, logging, and reward reproducibility across distributed setups.

About the project

You will be cooperating with a leading provider of AI evaluation and optimization solutions, trusted by multinational companies to optimize AI agents and detect performance issues in large language models.
In this role, you’ll help develop advanced reinforcement learning (RL) environments and scalable evaluation systems that guide and shape the behavior of cutting-edge AI models. The company’s mission is to enable safe, verifiable, and aligned AGI through rigorous, real-world agent evaluation.
Due to the client’s time zone, we would appreciate a candidate who can work 2 p.m. - 10 p.m.
Join us and make a real impact!
If you’re ready to broaden your horizons and work with an innovative company at the forefront of AI, we’d love to hear from you. You’ll help build the environments that shape how future AI systems are trained, evaluated, and aligned - and collaborate with world-class engineers and researchers on one of the most important technical challenges of our time.

This is how we organize our work

This is how we work

at the client's siteyou focus on a single project at a timeyou can change the projectyou focus on product developmentagile

This is how we work on a project

  • documentation
  • issue tracking tools
  • testing environments
Company

Benefits

  • sharing the costs of sports activities
  • private medical care
  • remote work opportunities
  • flexible working time
  • integration events
  • extra social benefits
  • baby layette
  • school layette
  • employee referral program
  • charity initiatives
  • Gift vouchers for kids (birthdays, Christmas, Child's Day)

Recruitment stages

  • 1.
    HR call (max 15 min.)
  • 2.
    Technical skills assessment via discussion of a case study
  • 3.
    Technical interview with our client (max 30 min.)*

ACAISOFT POLAND Sp. z o.o.

At Acaisoft we specialize in cloud-native application development and transformations from legacy to cloud-native environments.
We provide end-to-end software solutions, from business analysis, through project evaluation, to UI/UX, Frontend, and Backend design and implementation. We integrate manual and automated QA finest practices, to make sure that the final product is top-notch.
Our customers range from startups to large enterprises based in the US, mainly Silicon Valley, and Western Europe.
Since technology is constantly being developed at such a fast pace, we always strive to be one step ahead of the market and keep up with the latest solutions.

This is how we work

Machine Learning / AI Engineer (RL)
160–240 zł / hr. (B2B)
I apply to:
ACAISOFT POLAND Sp. z o.o.
Warszawa, Mokotów
Pracodawca zbiera zgłoszenia przez swój system.
Przejdziesz na zewnętrzny formularz.

By clicking "Aplikuj" you confirm that you've read and accepted our Terms and Conditions.



This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Need more information?

  • Make sure the body of the offer doesn’t already include what you’re looking for.
  • Ask a question if you need more information you’re interested in.
  • We’ll forward your question to the employer and aim to provide a response within 3 business days.

Share this offer