i'm using Elasticsearch 7.11.1, Python 3.7.13
In the "Build QA engine" section, when I respond to the query as follows:
Enter your query here: what does covid-19 cause
It outputs an error:
WARNING:allennlp.data.fields.sequence_label_field:Your label namespace was 'pos'. We recommend you use a namespace ending with 'labels' or 'tags', so we don't add UNK and PAD tokens by default to your vocabulary.
See documentation for `non_padded_namespaces` parameter in Vocabulary.
INFO:elasticsearch:GET http://localhost:9200/ [status:200 request:0.520s]
INFO:elasticsearch:POST http://localhost:9200/elastic_index/_search [status:200 request:0.353s]
The number of datapacks(including query) is 1
Traceback (most recent call last):
File "./examples/pipeline/inference/search_cord19.py", line 97, in <module>
data_pack = next(nlp.process_dataset()).get_pack_at(1)
File "/home/ubuntu/.pyenv/versions/3.7.13/lib/python3.7/site-packages/forte/data/multi_pack.py", line 491, in get_pack_at
return self.packs[index]
IndexError: list index out of range
It seems I'm not reading the datasets at all, even though I tried to read the sample datasets that were provided in the previous step with
python examples/pipeline/indexer/cordindexer.py --data-dir ./data/document_parses/sample_pdf_json
which output the following really quickly, so it doesn't seem it indexed any data...
WARNING:root:Re-declared a new class named [ConstituentNode], which is probably used in import.
INFO:elasticsearch:GET http://localhost:9200/ [status:200 request:0.008s]
/home/ubuntu/.pyenv/versions/3.7.13/lib/python3.7/site-packages/elasticsearch/connection/base.py:200: ElasticsearchWarning: [types removal] Specifying types in bulk requests is deprecated.
warnings.warn(message, category=ElasticsearchWarning)
INFO:elasticsearch:POST http://localhost:9200/_bulk?refresh=true [status:200 request:0.338s]
and that directory contains three dataset files:
- 55736408816d3f956d830854659f24109444a36c.json
- aadc3e716b6cb0e898953dff056124378b31483c.json
- ffff73d17bc392ee68f3f16ef37d25579cb99322.json
i also noticed that in the config.yml file for the Indexer, it has fields doc_id and content https://github.com/petuum/composing_information_system/blob/main/examples/pipeline/indexer/config.yml#L3, however the above dataset files don't contain those fields at all, most of the content is in fields title, text, and section, but if i update that config.yml to be the following i get the same outcome
create_index:
batch_size: 10000
fields:
# - doc_id
# - content
- title
- text
- section
i'm using Elasticsearch 7.11.1, Python 3.7.13
In the "Build QA engine" section, when I respond to the query as follows:
It outputs an error:
It seems I'm not reading the datasets at all, even though I tried to read the sample datasets that were provided in the previous step with
which output the following really quickly, so it doesn't seem it indexed any data...
and that directory contains three dataset files:
i also noticed that in the config.yml file for the Indexer, it has fields
doc_idandcontenthttps://github.com/petuum/composing_information_system/blob/main/examples/pipeline/indexer/config.yml#L3, however the above dataset files don't contain those fields at all, most of the content is in fieldstitle,text, andsection, but if i update that config.yml to be the following i get the same outcome