-
Notifications
You must be signed in to change notification settings - Fork 12
Add a simple Question Answering notebook with Haystack #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
NOTE: this PR also includes the sample dataset from #11 (same commit) in order to have data to work on. |
@@ -0,0 +1,494 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start! The adversarial example was a good addition :D Haystack makes implementing the reader retriever approach quite intuitive. For the source of context, it should have a method to point back to the file, it seems to be a simple string search problem...
For the PR, I think it has the commits from the file addition PR as well. I guess if we separate them then this can be merged independently.
Looking forward to the generative example from haystack as well!
Reply via ReviewNB
@@ -0,0 +1,494 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #24. details="minimum" ## Choose fromminimum
,medium
, andall
Looks great!! Thanks Pep.
I was observing some of the contexts. I do have some observations and suggestions in this context.
- "All" is the choice means whole document ?
- I also observed that some of the question got included in the context along with the answers. what if we separate the questions from the answers in the corpus text. and train our model with the text only containing the answers and not question. lmkwyt? I will also try to apply it in my QA model testing work.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all
means: show all the details about each answer, e.g. score, offset, ... vs minimum
which only shows the answer and context. I thought "minimum" would be enough here, but happy to change it to show all the details.
About context that includes questions: yes, that comes from some of the source documents that do include questions. Most notably, the FAQ in the ROSA worshop: https://www.rosaworkshop.io/rosa/14-faq/
If we try to pre-process the content to separate Questions from Answers, I'm not sure how successful that would be in general because most of the answers need their respective questions in order to make sense (extreme case: there are answers that are just "Yes"). But for that document in particular (the FAQ) it might make sense to use it for the "squad-like" test... only that I have not verified if all the answers there can be obtained from other documents.
Converted this PR to draft while I'm working to expand it with a Generative QA approach |
d3100d1
to
56a6b2a
Compare
Updated with the current version that adds 3 generative QA types: RAG, LFQA and OpenAI-based. Context now includes the full ROSA docs (plus the ROSA workshop and the MOBB material in the data/external samples) Results are not great, I'm still trying to see if they can be improved a bit - also need to elaborate/document. |
56a6b2a
to
f1db9ea
Compare
Ok, I believe this is ready for another review. The RAG version is not working well for some reason that so far has escaped me. @Shreyanand @suppathak if you have suggestions especially on that part they would be most welcome. I have added the retrieval of the whole ROSA docs from S3 storage, and these docs together with the in-repo samples (ROSA workshop and MOBB) are used for context. There are now more comments/docs and the structure has also been updated, hopefully making it more easy to follow. |
@@ -0,0 +1,1462 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to inspect the model would be to observe what retriever is fetching. Maybe it's not providing enough context to the generator. What happens if we tweak retriever model parameters?
Also. another possible explanation could be that these embedding models may not be advanced enough to capture language constructs leading to poor answers.
Reply via ReviewNB
@@ -0,0 +1,1462 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In essence, this notebook compares free and OS model bart_lfqa
and paid Open AI model text-davinci-003
for the long form generative QA task.
For the extractive qa task it tries roberta-base-squad2
. For all of these experiments, the retriever is BM25.
Also, it has a separate experiment with combined DPR
as retriever and facebook/rag-token-nq
as the generator model.
Could we add this classification in the summary/conclusion? Once we have a validation dataset and metrics defined, we can add results of these experiments based on the metrics as well.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In essence, this notebook compares free and OS model
bart_lfqa
and paid Open AI modeltext-davinci-003
for the long form generative QA task.
While it does have a free model for LFQA and an OpenAI version, the main goal of the notebook is to explore Haystack as a framework.
For all of these experiments, the retriever is BM25.
Dense vectors are used for generative QA - and a dense retriever accordingly. I expanded the introduction a bit to hopefully explain that (although I am not getting into details - should I?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the detail level is appropriate now.
While it does have a free model for LFQA and an OpenAI version, the main goal of the notebook is to explore Haystack as a framework
That makes sense.
Signed-off-by: Pep Turró Mauri <[email protected]>
f1db9ea
to
0f82abb
Compare
Another update:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
In #9 we are exploring various QA systems.
This PR provides a simple experiment of Extractive and Generative QA using Haystack