CV-Multimodality Engineer

We are looking for an experienced researcher to work with computer vision models (CV) and models' architectures. The main task at the first stage is to introduce the vision modality into dialog systems, and then work with a team building multimodal dialog systems.

About us aims to help developers all over the world create chatbots, dialog systems, and Multiskill and assistants in a faster and easier way. Our technology stack can be used by developers with different experience and levels of training, from beginners to experts in NLP.

In our research, we are working on multimodal (capable not only of speaking/writing, but also of seeing and being aware of themselves in space) Open Source dialog systems. We will first  develop such a stack for chatbots, while simultaneously working on the development of a platform for full-fledged, embodied dialog systems.


  • Full-time, office work (remote work is possible);
  • 2+ years of experience with CV models;
  • Familiar with of new methods and reading papers in CV;
  • Experience in applying new methods and implementing papers in CV;
  • Reliable experience with docker, bash, git, pytorch;
  • The ability to write efficient, clean and readable code, following a certain code style;
  • The ability to stick to deadlines.

Advantages could be:

  • experience with natural language processing models, in particular NLU and Transformers, Visual Transformers;
  • knowledge of frameworks for distributed computing in deep learning, in particular pytorch-lightning, ray, tensorflow X;
  • experience in explainable machine learning (XAI): the ability to interpret models' predictions and to analyze the errors.

Salary is determined based on the results of the interview

To apply for this Job send your CV to with title: CV-Multimodality Engineer