Skip to content

Commit 5d4b22c

Browse files
committed
Initial documentation setup
1 parent 50c9ad4 commit 5d4b22c

1 file changed

Lines changed: 3 additions & 123 deletions

File tree

README.md

Lines changed: 3 additions & 123 deletions
Original file line numberDiff line numberDiff line change
@@ -5,128 +5,8 @@
55
A library and corpus for checking/benchmarking LLMs with regard to properties relevant to their use in RAG
66
systems.
77

8-
### Conda
9-
10-
* create a conda environment, e.g. by sourcing `conda-create.sourceme`
11-
* e.g. with bash: `. conda-create.sourceme`
12-
* if necessary (re-)install the requirements `pip install -r requirements.txt` (the conda creation code already does this too)
13-
* if necessary install/update the ragability package `pip install -e .` (the conda creation code already does this too)
14-
* NOTE: as long as the package gets developed/changed, the re-installation steps make sure that changed CLI programs
15-
are being made available.
16-
17-
## Usage
18-
19-
* Activate conda environment: `conda activate ragability`
20-
* After installation, the following commands are available:
21-
* `ragability_info` : show the versions of all relevant Python packages installed
22-
" `ragability-cc-wc1` : convert the wiki-contradict based corpus to ragability format
23-
* `ragability_query` : given an input file with facts and queries, a list of candidate LLMs and a prompt template, produce and output file that contains
24-
LLM answers to the queries
25-
* `ragability_check` : given a query result file and a judge LLM, evaluate the answers received against the pre-defined answers and create an output file that contains the evaluation scores and meta-information for each example
26-
* `ragability_eval` : given the data created with `ragability_check`, calculate detailled performance statistics
27-
* `ragability_hjson_info` : show some infor about the number of entries and keys present in a hjson, json, or jsonl file
28-
* `ragability_hjson_cat` : concatenate several hjson, json or jsonl files into one hjson or json file
29-
* `llms_wrapper_test`: for the configured/specified LLMs, test if they are working and returning an answer. This is useful to check a config file and
30-
test if all the API keys are correctly set and working
31-
* All commands take the `--help` option to get usage information
32-
33-
LLMs currently supported: support is based on the [LiteLLM](https://github.com/BerriAI/litellm) backend via the [llms_wrapper](https://github.com/OFAI/python-llms-wrapper/) package. The supported LLMs are listed [here](https://docs.litellm.ai/docs/providers/)
34-
35-
### Current usage
36-
37-
Example usage with the converted wiki-contradict dataset:
38-
39-
* (optional conversion step, the current converted dataset is already part of the repo): convert the dataset tsv file to ragability format
40-
* `ragability_cc_wc1 --input corpus/wikicontradict1/Dataset_v0.2_short.tsv --output corpus/wikicontradict1/v0d2.hjson`
41-
* create or copy and modify one of the conf*.hjson files to contain the LLMs and LLM configs wanted for the experiment
42-
* Run the base LLMs on the corpus. The following will create also a log file:
43-
* `ragability_query -i corpus/wikicontradict1/v0d2.hjson -o experiments-wc1/v0d2.out1.hjson --config experiments-wc1/conf-all.hjson --promptfile experiments-wc1/prompt.hjson --logfile experiments-wc1/v0d2.log1.txt --verbose`
44-
* Run the checker LLM on the output of the previous step
45-
* `ragability_check -i experiments-wc1/v0d2.out1.hjson -o experiments-wc1/v0d2.out2.hjson --config experiments-wc1/conf-ollama.hjson --promptfile experiments-wc1/prompt.hjson --logfile experiments-wc1/v0d2.log2.txt --verbose`
46-
* run the evaluation program:
47-
* `ragability_eval -i experiments-wc1/v0d2_small.out2.hjson -o experiments-wc1/v0d2_small.eval.tsv --verbose`
48-
* the generated tsv file can be loaded into a pandas dataframe or some spreadsheet app
49-
50-
### Files/File formats
51-
52-
NOTE: tools to convert between jsonl, json, yaml:
53-
54-
* https://github.com/spatialcurrent/go-simple-serializer
55-
* `hjson` command (already available from the package installation can be used to convert between json and hjson
56-
57-
Query file:
58-
59-
* Either a json or hjson file that contains an array of dicts, or a jsonl file that contains one json dict per line or a yaml file that
60-
contains an array of dicts
61-
* Each dict must contain the following fields (fields marked as '(output)' are written to the output file):
62-
* `qid`: the id of the query should be a short reminder, e.g. `kwoks-are-vertebrates01`
63-
* `facts`: a string or an array of strings giving the knowledge snippets we want to query, these simulate the RAG document snippets included in a RAG query
64-
* `query`: the query to ask about the facts
65-
* `pids` : the prompt ids from the configured prompts to use, if not given all configured prompts are used
66-
* `tags`: a comma-separated list of tags which identify the kind, purpose, etc. of the corresponding instance. The presence or absence of a tag
67-
can be used in the eval program for breakign down the LLM-performances.
68-
* `response` (output) : the response as received from the base LLM if there was no error
69-
* `error` (output) : the error if there was an error
70-
* `llm` (output) : the llm alias / name used
71-
* `checks`: (optional, but required if checking and evaluation should get performed later) a list of checks where each check is a dict which contains:
72-
* `query`: if present, a query to ask the checker LLM about the response from the base-LLM. If missing, the checking process will directly analyse
73-
the base-LLM response (e.g. when the base LLM query was a yes/no question)
74-
* `pid`: the prompt id of the configured prompt to use for the checking LLM. If this is missing a default prompt is used.
75-
* `func`: the name of a checking function, which will be called with the response of the checking or base LLM and the additional parameters specified with "args". Each function definition internally knows about the kind of evaluation (binary, multiclass, score). See the `checks.py` module
76-
* `args` : optional additional positional parameters for the checking function, e.g. some value or values to compare against. The meaning of the parameters depends on the concrete checking function. Some checking functions do not need any `args` in which case this field can be omitted.
77-
* `check_for` : optional value to insert into the checker query and all prompt strings using variable name "${check_for}"
78-
* `kwargs` : optional additional keyword-arguments to provide to the checker function
79-
* `response` (output) : the response as received from the checking LLM (if no error)
80-
* `error` (output) : the error if an error during checking occurred
81-
* `result` (output) : the result of the checking function, either a response label that will get compared to a target label for evaluation, or a score
82-
83-
Prompt file:
84-
85-
* Either a json file that contains an array of dicts, or a jsonl file that contains one json dict per line or a yaml file that
86-
contains an array of dicts
87-
* Each dict must contain the following fields:
88-
* `pid`: the id of the prompt, should be a short reminder of what it does
89-
* at least one of `system`, `assistant`, `user`: a string to use for creating the actual final prompt for the LLM. The string can contain
90-
the placeholders `${query}` and `${facts}` in order to insert the current query and facts (if facts is an array of strings, these will get
91-
concatenated with newlines)
92-
* `fact`: how to format a single fact if there are several. This supports the variables `${fact}` and `${n}` (1-based fact index)
93-
* The placeholder `${check_for}` can be used for prompts to be used by the checking LLM to insert some value to check for in the response to check
94-
* The placeholder `${answer}` can be used for prompts to be used by the checking LLM to insert the response from the base-LLM to check
95-
96-
Query Output file / Checker Input file:
97-
98-
* Either a json, hjson or a jsonl file
99-
* each dictionary contains the same fields as the input, plus the fields added (marked with '(output)' above)
100-
* NOTE: if there were transient errors during processing, the output file can be re-used as an input file and by default,
101-
only those entries which do not already have a response will get re-processed
102-
103-
Checker Output file / Eval Input file:
104-
105-
* Either a json, hjson or a jsonl file
106-
* each dictionary contains the same fields as the input, plus the fields added (marked with '(output)' above)
107-
* NOTE: if there were transient errors during processing, the output file can be re-used as an input file and by default,
108-
only those entries which do not already have a response will get re-processed
109-
110-
onfig file:
111-
112-
* a json or hjson or yaml file containing a dictionary with the following keys
113-
* `llms`: a list of strings or dictionaries describing the LLMs to use. A dictionary can contain the following keys:
114-
* `llm`: the name/id of the LLM. This should be in the form provider:llmmodel where "provider" must be a known provider or something defined in the
115-
`providers` part of the config. The "llmmodel" is the provider-specific way to specify a model.
116-
* `api_key`: the API key to use for the model
117-
* `api_key_env`: the name of an environment variable containing the API key
118-
* `api_url`: the URL to use. In this URL the placeholders `${model}`, `${user}`, `${password}` and `${api_key}` can be used to get replaced
119-
with the actual values
120-
* `user`: the user name to use for basic authentication
121-
* `password`: the password to use for basic authentication
122-
* Any specification that is present in the corresponding provider config is overridden with the value provided in the llm config
123-
* `providers`: a dict with provider names as the key and a dict of provider settings as the values where each dict can contain the followign keys:
124-
* `api_key`: the API key to use for the model
125-
* `api_key_env`: the name of an environment variable containing the API key
126-
* `api_url`: the URL to use. In this URL the placeholders `${model}`, `${user}`, `${password}` and `${api_key}` can be used to get replaced
127-
with the actual values
128-
* `user`: the user name to use for basic authentication
129-
* `password`: the password to use for basic authentication
130-
* `prompts` : a list of prompts in the same way as in a separate prompts file
8+
### Documentation
1319

10+
* [https://ofai.github.io/python-ragability/](https://ofai.github.io/python-ragability/)
11+
* PythonDoc: [https://ofai.github.io/python-ragability/pythondoc/ragability/](https://ofai.github.io/python-ragability/pythondoc/ragability/)
13212

0 commit comments

Comments
 (0)