We identify 33 topics, and present full (4.2M samples) and down-sampled (2.2M samples) versions of the "DeepPavlov Topics". The proposed topics are aimed to cover conversational domain in details but maintain interpretability. We also release pre-trained models for topic classification including distilled and multi-lingual versions. The scores are presented in the original paper (see Citation).
We define the following 33 topics: Animals&Pets, Art&Hobbies, Artificial Intelligence, Beauty, Books&Literature, Celebrities&Events, Clothes, Depression, Disasters, Education, Family&Relationships, Finance, Food, Gadgets, Garden, Health&Medicine, Home&Design, Job, Leisure, MassTransit, Movies&Tv, Music, News, Personal Transport, Politics, Psychology, Religion, Science&Technology, Space, Sports, Toys&Games, Travel, VideoGames.
dp_topics_downsampled_dataset_v0.tar.gz -- down-sampled version of the dataset.
dp_topics_full_data_v0.tar.gz -- full version of the dataset.
The structure is the following:
text - text;
topic - topic labels separated by ";".