|
5 | 5 | A library and corpus for checking/benchmarking LLMs with regard to properties relevant to their use in RAG |
6 | 6 | systems. |
7 | 7 |
|
8 | | -### Conda |
9 | | - |
10 | | -* create a conda environment, e.g. by sourcing `conda-create.sourceme` |
11 | | - * e.g. with bash: `. conda-create.sourceme` |
12 | | -* if necessary (re-)install the requirements `pip install -r requirements.txt` (the conda creation code already does this too) |
13 | | -* if necessary install/update the ragability package `pip install -e .` (the conda creation code already does this too) |
14 | | -* NOTE: as long as the package gets developed/changed, the re-installation steps make sure that changed CLI programs |
15 | | - are being made available. |
16 | | - |
17 | | -## Usage |
18 | | - |
19 | | -* Activate conda environment: `conda activate ragability` |
20 | | -* After installation, the following commands are available: |
21 | | -* `ragability_info` : show the versions of all relevant Python packages installed |
22 | | -" `ragability-cc-wc1` : convert the wiki-contradict based corpus to ragability format |
23 | | -* `ragability_query` : given an input file with facts and queries, a list of candidate LLMs and a prompt template, produce and output file that contains |
24 | | - LLM answers to the queries |
25 | | -* `ragability_check` : given a query result file and a judge LLM, evaluate the answers received against the pre-defined answers and create an output file that contains the evaluation scores and meta-information for each example |
26 | | -* `ragability_eval` : given the data created with `ragability_check`, calculate detailled performance statistics |
27 | | -* `ragability_hjson_info` : show some infor about the number of entries and keys present in a hjson, json, or jsonl file |
28 | | -* `ragability_hjson_cat` : concatenate several hjson, json or jsonl files into one hjson or json file |
29 | | -* `llms_wrapper_test`: for the configured/specified LLMs, test if they are working and returning an answer. This is useful to check a config file and |
30 | | - test if all the API keys are correctly set and working |
31 | | -* All commands take the `--help` option to get usage information |
32 | | - |
33 | | -LLMs currently supported: support is based on the [LiteLLM](https://github.com/BerriAI/litellm) backend via the [llms_wrapper](https://github.com/OFAI/python-llms-wrapper/) package. The supported LLMs are listed [here](https://docs.litellm.ai/docs/providers/) |
34 | | - |
35 | | -### Current usage |
36 | | - |
37 | | -Example usage with the converted wiki-contradict dataset: |
38 | | - |
39 | | -* (optional conversion step, the current converted dataset is already part of the repo): convert the dataset tsv file to ragability format |
40 | | - * `ragability_cc_wc1 --input corpus/wikicontradict1/Dataset_v0.2_short.tsv --output corpus/wikicontradict1/v0d2.hjson` |
41 | | -* create or copy and modify one of the conf*.hjson files to contain the LLMs and LLM configs wanted for the experiment |
42 | | -* Run the base LLMs on the corpus. The following will create also a log file: |
43 | | - * `ragability_query -i corpus/wikicontradict1/v0d2.hjson -o experiments-wc1/v0d2.out1.hjson --config experiments-wc1/conf-all.hjson --promptfile experiments-wc1/prompt.hjson --logfile experiments-wc1/v0d2.log1.txt --verbose` |
44 | | -* Run the checker LLM on the output of the previous step |
45 | | - * `ragability_check -i experiments-wc1/v0d2.out1.hjson -o experiments-wc1/v0d2.out2.hjson --config experiments-wc1/conf-ollama.hjson --promptfile experiments-wc1/prompt.hjson --logfile experiments-wc1/v0d2.log2.txt --verbose` |
46 | | -* run the evaluation program: |
47 | | - * `ragability_eval -i experiments-wc1/v0d2_small.out2.hjson -o experiments-wc1/v0d2_small.eval.tsv --verbose` |
48 | | - * the generated tsv file can be loaded into a pandas dataframe or some spreadsheet app |
49 | | - |
50 | | -### Files/File formats |
51 | | - |
52 | | -NOTE: tools to convert between jsonl, json, yaml: |
53 | | - |
54 | | -* https://github.com/spatialcurrent/go-simple-serializer |
55 | | -* `hjson` command (already available from the package installation can be used to convert between json and hjson |
56 | | - |
57 | | -Query file: |
58 | | - |
59 | | -* Either a json or hjson file that contains an array of dicts, or a jsonl file that contains one json dict per line or a yaml file that |
60 | | - contains an array of dicts |
61 | | -* Each dict must contain the following fields (fields marked as '(output)' are written to the output file): |
62 | | - * `qid`: the id of the query should be a short reminder, e.g. `kwoks-are-vertebrates01` |
63 | | - * `facts`: a string or an array of strings giving the knowledge snippets we want to query, these simulate the RAG document snippets included in a RAG query |
64 | | - * `query`: the query to ask about the facts |
65 | | - * `pids` : the prompt ids from the configured prompts to use, if not given all configured prompts are used |
66 | | - * `tags`: a comma-separated list of tags which identify the kind, purpose, etc. of the corresponding instance. The presence or absence of a tag |
67 | | - can be used in the eval program for breakign down the LLM-performances. |
68 | | - * `response` (output) : the response as received from the base LLM if there was no error |
69 | | - * `error` (output) : the error if there was an error |
70 | | - * `llm` (output) : the llm alias / name used |
71 | | - * `checks`: (optional, but required if checking and evaluation should get performed later) a list of checks where each check is a dict which contains: |
72 | | - * `query`: if present, a query to ask the checker LLM about the response from the base-LLM. If missing, the checking process will directly analyse |
73 | | - the base-LLM response (e.g. when the base LLM query was a yes/no question) |
74 | | - * `pid`: the prompt id of the configured prompt to use for the checking LLM. If this is missing a default prompt is used. |
75 | | - * `func`: the name of a checking function, which will be called with the response of the checking or base LLM and the additional parameters specified with "args". Each function definition internally knows about the kind of evaluation (binary, multiclass, score). See the `checks.py` module |
76 | | - * `args` : optional additional positional parameters for the checking function, e.g. some value or values to compare against. The meaning of the parameters depends on the concrete checking function. Some checking functions do not need any `args` in which case this field can be omitted. |
77 | | - * `check_for` : optional value to insert into the checker query and all prompt strings using variable name "${check_for}" |
78 | | - * `kwargs` : optional additional keyword-arguments to provide to the checker function |
79 | | - * `response` (output) : the response as received from the checking LLM (if no error) |
80 | | - * `error` (output) : the error if an error during checking occurred |
81 | | - * `result` (output) : the result of the checking function, either a response label that will get compared to a target label for evaluation, or a score |
82 | | - |
83 | | -Prompt file: |
84 | | - |
85 | | -* Either a json file that contains an array of dicts, or a jsonl file that contains one json dict per line or a yaml file that |
86 | | - contains an array of dicts |
87 | | -* Each dict must contain the following fields: |
88 | | - * `pid`: the id of the prompt, should be a short reminder of what it does |
89 | | - * at least one of `system`, `assistant`, `user`: a string to use for creating the actual final prompt for the LLM. The string can contain |
90 | | - the placeholders `${query}` and `${facts}` in order to insert the current query and facts (if facts is an array of strings, these will get |
91 | | - concatenated with newlines) |
92 | | - * `fact`: how to format a single fact if there are several. This supports the variables `${fact}` and `${n}` (1-based fact index) |
93 | | -* The placeholder `${check_for}` can be used for prompts to be used by the checking LLM to insert some value to check for in the response to check |
94 | | -* The placeholder `${answer}` can be used for prompts to be used by the checking LLM to insert the response from the base-LLM to check |
95 | | - |
96 | | -Query Output file / Checker Input file: |
97 | | - |
98 | | -* Either a json, hjson or a jsonl file |
99 | | -* each dictionary contains the same fields as the input, plus the fields added (marked with '(output)' above) |
100 | | -* NOTE: if there were transient errors during processing, the output file can be re-used as an input file and by default, |
101 | | - only those entries which do not already have a response will get re-processed |
102 | | - |
103 | | -Checker Output file / Eval Input file: |
104 | | - |
105 | | -* Either a json, hjson or a jsonl file |
106 | | -* each dictionary contains the same fields as the input, plus the fields added (marked with '(output)' above) |
107 | | -* NOTE: if there were transient errors during processing, the output file can be re-used as an input file and by default, |
108 | | - only those entries which do not already have a response will get re-processed |
109 | | - |
110 | | -onfig file: |
111 | | - |
112 | | -* a json or hjson or yaml file containing a dictionary with the following keys |
113 | | -* `llms`: a list of strings or dictionaries describing the LLMs to use. A dictionary can contain the following keys: |
114 | | - * `llm`: the name/id of the LLM. This should be in the form provider:llmmodel where "provider" must be a known provider or something defined in the |
115 | | - `providers` part of the config. The "llmmodel" is the provider-specific way to specify a model. |
116 | | - * `api_key`: the API key to use for the model |
117 | | - * `api_key_env`: the name of an environment variable containing the API key |
118 | | - * `api_url`: the URL to use. In this URL the placeholders `${model}`, `${user}`, `${password}` and `${api_key}` can be used to get replaced |
119 | | - with the actual values |
120 | | - * `user`: the user name to use for basic authentication |
121 | | - * `password`: the password to use for basic authentication |
122 | | - * Any specification that is present in the corresponding provider config is overridden with the value provided in the llm config |
123 | | -* `providers`: a dict with provider names as the key and a dict of provider settings as the values where each dict can contain the followign keys: |
124 | | - * `api_key`: the API key to use for the model |
125 | | - * `api_key_env`: the name of an environment variable containing the API key |
126 | | - * `api_url`: the URL to use. In this URL the placeholders `${model}`, `${user}`, `${password}` and `${api_key}` can be used to get replaced |
127 | | - with the actual values |
128 | | - * `user`: the user name to use for basic authentication |
129 | | - * `password`: the password to use for basic authentication |
130 | | -* `prompts` : a list of prompts in the same way as in a separate prompts file |
| 8 | +### Documentation |
131 | 9 |
|
| 10 | +* [https://ofai.github.io/python-ragability/](https://ofai.github.io/python-ragability/) |
| 11 | +* PythonDoc: [https://ofai.github.io/python-ragability/pythondoc/ragability/](https://ofai.github.io/python-ragability/pythondoc/ragability/) |
132 | 12 |
|
0 commit comments