This is our fourth tutorial where we will guide you through the process of creating your very own assistant using DeepPavlov Dream. In our previous tutorials, we developed a bot capable of engaging in conversations about movies and answering factoid questions and a generative bot with an enthusiastic and adventurous persona. We accomplished this by utilizing existing Dream components with only minor modifications, i.e., altering the prompt and switching from one generative model to another. We have also demonstrated how to use Dream distribution that generates responses by making calls to various APIs and shown how to add a new API of your choice into it.
In this tutorial, we will create a dialog system capable of answering questions over one or several long documents with the use of ChatGPT or other large language model. We will once again utilize the existing components with only slight alterations.
You are a question answering system that can answer the user’s questions based on the text they provide. If user asks a question, answer based on Text that contains some information about the subject.
If necessary, structure your answer as bullet points. You may also present information in tables.
If Text does not contain the answer, apologize and say that you cannot answer based on the given text.
Only answer the question asked, do not include additional information. Do not provide sources.
Text may contain unrelated information. If user does not ask a question, disregard the Text and just talk to them as if you are a friendly question-answering system designed to help them understand long documents.
Text:
pip install git+https://github.com/deeppavlov/deeppavlov_dreamtools.git
git clone https://github.com/deeppavlov/dream.git
cd dream
dreamtools clone dist my_prompted_document_based_qa
--template document_based_qa
--display-name "Prompted Document-Based QA"
--author deepypavlova@email.org
--description "This is a primitive dialog system that can answer your questions about the documents. It uses OpenAI ChatGPT model to generate responses."
--overwrite
dreamtools add component components/sdjkfhaliueytu34ktkrlg.yml
--dist my_prompted_document_based_qa
Optional; run if you will be using GPT-4
dreamtools add component components/jkdhfgkhgodfiugpojwrnkjnlg.yml
--dist my_prompted_document_based_qa
Optional; run if you will be using text-davinci-003
dreamtools add component components/lkjkghirthln34i83df.yml
--dist my_prompted_document_based_qa
files:
image: julienmeerschart/simple-file-upload-download-server
version: '3.7'
assistant_dists/my_prompted_document_based_qa/docker-compose.override.yml, doc-retriever container, replace
environment:
SERVICE_PORT: 8165
SERVICE_NAME: doc_retriever
CONFIG_PATH: ./doc_retriever_config.json
DOC_PATH_OR_LINK: http://files.deeppavlov.ai/dream_data/documents_for_qa/test_file_dream_repo.html,http://files.deeppavlov.ai/dream_data/documents_for_qa/alphabet_financial_report.txt,http://files.deeppavlov.ai/dream_data/documents_for_qa/test_file_jurafsky_chatbots.pdf
PARAGRAPHS_NUM: 5
FILE_SERVER_TIMEOUT: 30
%with
environment:
- FLASK_APP=server
- CUDA_VISIBLE_DEVICES=0
assistant_dists/my_prompted_document_based_qa/docker-compose.override.yml
DOC_PATH_OR_LINK: http://files.deeppavlov.ai/dream_data/documents_for_qa/test_alpaca.html,http://files.deeppavlov.ai/dream_data/documents_for_qa/test_llama_paper.pdf
If you want to provide your own link(s), you will simply have to replace all the default links in DOC_PATH_OR_LINK with your own one(s) as:
DOC_PATH_OR_LINK: http://link_to_file_1,http://link_to_file_2
If you want to provide file(s), put the file(s) into documents/ folder and provide the relative path to them in DOC_PATH_OR_LINK as:
DOC_PATH_OR_LINK: documents/your_file_1.txt,documents/your_file_2.pdf,documents/your_file_3.html
Important: in both cases, if you are using several links or files, they have to be separated by a comma and no whitespace!
Important-2: as of now, only txt, pdf and html formats are supported. We process html documents with BeautifulSoup (MIT license) and pdf documents with pypdfium2 (Apache2.0 license), keeping the distribution free for potential commercial use.
docker-compose exec agent python -m
deeppavlov_agent.run agent.debug=false agent.channel=cmd
agent.pipeline_config=assistant_dists/dream_tools/pipeline_conf.json
Optional; If you want, you may change the generative model in use. By default, we are using ChatGPT (GPT-3.5 based). You may change it to GPT-4 or text-davinci-003 (also known as GPT-3.5). Theoretically, it is also possible to use open-source free models, but they don’t perform well on this task. To change the model, go to assistant_dists/my_prompted_document_based_qa/docker-compose.override.yml and replace the following code:
openai-api-chatgpt:
env_file: [ .env ]
build:
args:
SERVICE_PORT: 8145
SERVICE_NAME: openai_api_chatgpt
PRETRAINED_MODEL_NAME_OR_PATH: gpt-3.5-turbo
context: .
dockerfile: ./services/openai_api_lm/Dockerfile
command: flask run -h 0.0.0.0 -p 8145
environment:
- CUDA_VISIBLE_DEVICES=0
- FLASK_APP=server
deploy:
resources:
limits:
memory: 100M
reservations:
memory: 100M
For GPT-4
openai-api-gpt4:
env_file: [ .env ]
build:
args:
SERVICE_PORT: 8159
SERVICE_NAME: openai_api_gpt4
PRETRAINED_MODEL_NAME_OR_PATH: gpt-4
context: .
dockerfile: ./services/openai_api_lm/Dockerfile
command: flask run -h 0.0.0.0 -p 8159
environment:
- FLASK_APP=server
deploy:
resources:
limits:
memory: 500M
reservations:
memory: 100M
For text-davinci-003 (GPT-3.5):
openai-api-davinci3:
env_file: [ .env ]
build:
args:
SERVICE_PORT: 8131
SERVICE_NAME: openai_api_davinci3
PRETRAINED_MODEL_NAME_OR_PATH: text-davinci-003
context: .
dockerfile: ./services/openai_api_lm/Dockerfile
command: flask run -h 0.0.0.0 -p 8131
environment:
- FLASK_APP=server
deploy:
resources:
limits:
memory: 500M
reservations:
memory: 100M
Optional; If you replaced the generative model, you will have to complete this step as well. In the same file assistant_dists/my_prompted_document_based_qa/docker-compose.override.yml, find WAIT_HOSTS, and replace openai-api-chatgpt:8145 with openai-api-davinci3:8131 (for text-davinci-003) or openai-api-gpt4:8159 (for GPT-4).
Optional; If you replaced the generative model with text-davinci-003, you will have to complete this step as well. In the same file assistant_dists/my_prompted_document_based_qa/docker-compose.override.yml, change GENERATIVE_SERVICE_URL and GENERATIVE_SERVICE_CONFIG fields to http://openai-api-davinci3:8131/respond and openai-text-davinci-003-long.json. If you replaced the generative model with GPT-4, skip this step.
services:
combined-classification:
command: ["nginx", "-g", "daemon off;"]
build:
context: dp/proxy/
dockerfile: Dockerfile
environment:
- PROXY_PASS=proxy.deeppavlov.ai:8087
- PORT=8087
sentseg:
command: ["nginx", "-g", "daemon off;"]
build:
context: dp/proxy/
dockerfile: Dockerfile
environment:
- PROXY_PASS=proxy.deeppavlov.ai:8011
- PORT=8011
sentence-ranker:
command: ["nginx", "-g", "daemon off;"]
build:
context: dp/proxy/
dockerfile: Dockerfile
environment:
- PROXY_PASS=proxy.deeppavlov.ai:8128
- PORT=8128
version: "3.7"
Optional; You can change the text of the prompt to alter the way the model answers your questions. To do that, go to common/prompts/document_qa_instruction.json and edit the original text in the "prompt" field. Just for fun, I will add “You must always talk like a pirate.” as the final line of the prompt.
OPENAI_API_KEY=
docker-compose -f docker-compose.yml -f
assistant_dists/my_prompted_document_based_qa/docker-compose.override.yml -f
assistant_dists/my_prompted_document_based_qa/proxy.yml up --build
docker-compose exec agent python -m deeppavlov_agent.run agent.channel=cmd
agent.pipeline_config=assistant_dists/my_prompted_document_based_qa/pipeline_conf.json
agent.debug=false