DEEPPAVLOV PRODUCTS
DeepPavlov journey at Google Summer of Code for Summer in 2021!
Close
 
INTERNS at deeppavlov
At DeepPavlov we are looking for talented people with whom we will share our inner passion to push boundaries of Conversational AI.

We have a special place for young, ambitious graduate and undergraduate students who are willing to make their bold steps on their path to Conversational AI.

Our interns come from the best local and international universities around the world, bringing their fresh knowledge and diverse experience to our lab.
Current Interns
Sergey Gorbunov
Project: AI Assistant
Artem Ponomarev
Project: AI Assistant
Mikhail Nikolaev
Project: AI Assistant
Dilya Hannanova
Project: AI Assistant
Alim Adelshin
Project: Russian News Clustering and Headline Selection
Maxim Eliseev
Project: Language Translation
Past Interns
Mariya Kilina
Project: Recommendation Systems
Aleksey Dorkin
Project: DialoGlue , SuperGlue
Egor Salnikov
Project: DialoGlue , SuperGlue
Evgeny Golovenko
Azat Sultanov
Project: Speech Functions
From February to April, I did an internship at DeepPavlov. I was interested in DeepPavlov as a place for an internship, firstly, because I have always been interested in Conversational AI. Secondly, a very interesting research task for the internship related to discourse and speech functions was proposed. Thus, the task fully corresponds to my research interests. Moreover, creating a speech function classifier for dialog management is an experimental task in the field of AI assistants, for which there are no ready solutions yet. The task of creating speech function classifier consisted of several stages: the study of materials devoted to the theory of speech functions, the search for suitable data for classification, annotation, and, finally, work on the classifier.

First, all the available materials on the theory of speech functions and discourse management (Eggins, S. & Slade, D.) were studied, as well as articles on similar classifiers of speech acts for dialogue management. In the theory of speech functions, there are more than 40 classes that characterize the speaker's intentions, which had to be studied in detail for the subsequent classification of dialogues manually. Santa Barbara Corpus of Spoken American English was chosen as the dataset , which presents mostly dialogues on everyday topics, which was an important criterion for choosing this corpus. Santa Barbara Corpus includes 60 documents in cha format, each of which contains a dialogue of 700-1500 lines, i.e. phrases. The next task was to normalize the data: remove unnecessary characters and spaces, comments on dialogs (for example, " the sound of a hair dryer", "barking"). The data was then converted to the json format used in the Label Studio annotation tool. As a result, speech functions were classified in three dialogues with the size of 700-900 lines, which were later used for training models. A pre-trained model Conversational BERT by DeepPavlov was used to obtain the contextual vectors. To solve the task, experiments were conducted with various classification algorithms: Logistic Regression, Multinomial Naive Bayes, Support Vector Machines, and Bi-LSTM and BERT. The best accuracy that was achieved for several classes - 42%. The main difficulty in this task is a data imbalance. In addition, a model was built for predicting classes based on statistical data about the occurrence of classes in dialogues.Despite the fact that the desired results were not achieved, the impression of the internship was the most pleasant. It was very comfortable working in the DeepPavlov team. During my internship, I received some useful advice on training models. It was also interesting to go through almost full process of development: from data annotation to creating the classifier.

Since my task was very research-oriented, I decided to do it as a part of my Master's thesis. Now I continue to work on this task, improving the classifier in DeepPavlov. Hopefully, I will be able to achieve state-of-the-art results for the classifier, as well as continue to work on dialogue management strategies with the help on.