Coming Closer To The Dream: Announcing Our New DeepPavlov🎅 0.11.0 Release!

Hello everyone and welcome to DeepPavlov blog! My name is Daniel Kornev and I'm a Product Manager at DeepPavlov.ai. Today we are super excited to release v0.11.0 of our core product, DeepPavlov Library.

In our lab we are driven by the big dream of building the next generation of Conversational AI. Getting there is a long road, and along it we will continue shipping individual NLP and NLU components as part of the broader DeepPavlov Library, while the platform is still being developed inside our lab.

In this release we bring some of the key Conversational AI components to the market, including updates to the KBQA model (adding support for running online queries using WikiData API, and using syntax tree-based parser for Russian language queries), and new contributor-driven DSTC2-based Automated Dataset Generation Tool for the goal-oriented bot (go-bot). Speaking of the latter, it is a privilege for us to ship the first component made by one of our fantastic community contributors in this release.

KBQA Model Updates

KBQA stands for Knowledge Base Question-Answering. Answering factoid questions based on the data coming from the reputable data sources, be that public or private knowledge bases, is essential for the modern Conversational AI systems. There are different approaches to this solution; for instance, our library already possesses the ODQA (Open-Domain Question-Answering) which enables answering questions based on the raw information extracted from the provided free text. However, while ODQA can be useful when nothing but free text is available, when you want to have the highest quality answers based on the verified data sources, KBQA is the answer you are looking for.

KBQA utilizes a combination of two different subsystems, with one enabling linking of queried entities (entity linking) to those in the provided structured Knowledge Graph(s), and the other one enabling transformation of user's natural language query into its logical representation in the form of a SPARQL query that references those linked entities in it. This mechanism is extendable via the form of templates (see docs). As the result, queries that you run using KBQA model are structured queries based on the recognized entities, minimizing the risk of ambiguity.

In this release, we bring two updates to the KBQA model.

New KBQA Model With Syntax Tree-Based SPARQL Generation

The first update brings our new (beta) syntax-tree-based Python API for SPARQL queries generation. This API can be used both as a standalone module, as well as a part of the updated KBQA model. It enables developers to define custom templates for SPARQL queries generation using the syntax tree parser. While our KBQA model is bilingual, supporting both Russian and English languages, in this release this (beta) API only supports Russian language. However, following the success of this release, support for English language will be added to the future versions of the DeepPavlov Library.

We would love to remind you that if you feel brave to add your own implementation of the syntax tree parser for English language, we will be happy to ship it as part of the next version of the DeepPavlov Library (subject to the Contribution Guide).

Online Query Support Using WikiData API

The second KBQA update brings support for running a bare-bones KBQA solution using the WikiData API (subject to WikiData API Etiquette). While the original release required developers to download the entire (~200GB) Data Dumps of the WikiData (including index), this update makes it possible to run queries through the live WikiData API. While it might not be the best solution for the production high-load systems, this mode can be found beneficial for the research and hobby project purposes.

Automated Dataset Generation Tool

Go-bot is our simple system for building goal-oriented bots. It has been originally designed using the publicly available DSTC2 datasets. While we are working on the next iteration of our intent understanding and slot filling system, one of our avid users, Eugene2525, experimented with our go-bot model, and found it beneficial to automatically generate training data for the intents created in the bot. The result of his contribution (pull request) is the Automated Dataset Generation Tool, as well as a Colab's notebook showing how one can use the tool to generate training set for go-bot. It's an excellent example of community involvement in building DeepPavlov Library, and we are looking forward to see more such contributions down the road.

Other Improvements and Wrap Up

In addition to the aforementioned updates, this release also includes smaller improvements and fixes.

Start building your Conversational AI systems with our DeepPavlov library on Github and let us know what you think!

P.S. Keep in mind that DeepPavlov has a dedicated forum where all kinds of questions concerning the framework and the models are welcome.

DeepPavlov Library 0.11.0 release