...
...
...
...
...
- Time conversion issue when creating and sending
BenchmarkExecutionsto Studio.
StudioClientraisesValueErrorwhen instantiated with project name that belongs to multiple projects.
QdrantInMemoryRetrieverandHybridQdrantInMemoryRetrievernow usepharia-1-embedding-4608-controlas embedding model with an embedding size of 4608.
- Introduced
AsyncDocumentIndexClientandAsyncDocumentIndexRetrieveras drop-in replacements for their blocking counterparts, enabling coroutine-based, non-blocking document indexing and retrieval.
InMemoryDatasetRepositorynow has a more descriptive error message when creating a dataset fails due to an ID clash.StudioClientnow deserializes and serializes examples while maintaining type information, which was previously dropped.RunRepositoryandEvaluationRepositorynow more accurately reflect their actual return types in their signatures. Previously, it was not obvious that failed examples could be returned.FileTracer.tracesnow uses "utf-8" encoding to read the persisted trace file. Previously, the machines default encoding was used, which could lead to a mismatch between the encoding used for writing and reading.
StudioClientnow handles project_id as a string instead of an integer. This is only relevant when you handle project ids (not names) manually.InMemoryDatasetRepositorynow returns the exact types given by users when retrievingExample. Previously, it disregarded the types it was given and returned what was saved.- This is in line with how the other repositories work.
EloQaEvaluationLogicnow has an expected output type ofNoneinstead ofSingleChunkQaOutput. The information was unused.- If you have pipelines that define data to be processed by this logic OR if you subclass from this specific logic, you may need to adapt it.
InMemoryDatasetRepository,InMemoryRunRepository,InMemoryEvaluationRepository, andInMemoryAggregationRepositorynow either return the exact types given by users when retrieving example-related data or fail. Specifically, this means that passing the wrong type when retrieving data will now fail with aValidationError. Previously, the repositories disregarded the types they were given and returned whatever object was saved. - This is in line with how the other repositories work.EloQaEvaluationLogicnow has an expected output type ofNoneinstead ofSingleChunkQaOutput. The information was unused. - If you have pipelines that define data to be processed by this logic, or if you subclass from this specific logic, you may need to adapt it. -log_probsin theCompletionInputof thedo_runmethod been set to 20 instead of the prior value of 30- The legacy
Trace Viewerhas now been removed along with all references to it.
...
- New Pharia Kernel connector (
KernelTask) for calling Skills from a Task - Add
HybridQdrantInMemoryRetrieverenabling hybrid search for in-memory Qdrant collections
- Add warning to
PromptBasedClassifyandPromptBasedClassifyWithDefinitionsto be cautious when using them with model families other than luminous
...
...
- Bump dependency versions
- Fixes an incompatibility where models with tokenizer with no whitespace prefix could not be used for qa examples. Now, no error will be thrown.
- Introduce
BenchmarkandStudioBenchmarkBenchmarkallows you to evaluate and compare the performance of differentTasks with a fixed evaluation logic, aggregation logic andDataset.- Add
how_to_execute_a_benchmark.ipynbto how-tos - Add
studio.ipynbto notebooks to show how one can debug aTaskwith Studio
- Introduce
BenchmarkRepositoryandStudioBenchmarkRepository - Add
create_projectbool toStudioClient.__init__()to enable users to automatically create their Studio projects - Add progressbar to the
Runnerto be able to track theRun - Add
StudioClient.submit_benchmark_lineagesfunction and include it inStudioClient.submit_benchmark_execution
- Add method
DocumentIndexClient.chunks()for retrieving all text chunks of a document. - Add metadata filter
FilterOps.IS_NULL, that allows to filter fields based on whether their value is null.
- The Document Index
SearchQuerynow correctly allows searches with a negativemin_score.
...
- The env variable
POSTGRES_HOSTis split intoPOSTGRES_HOSTandPOSTGRES_PORT. This affects all classes interacting with Studio and theInstructionFinetuningDataRepository. - The following env variables now need to be set (previously pointed to defaults)
CLIENT_URL- URL of your inference stackDOCUMENT_INDEX_URL- URL of the document index
- You can now customise the embedding model when creating an index using the
DocumentIndexClient. - You can now use the
InstructableEmbedembedding strategy when creating an index using theDocumentIndexClient. See thedocument_index.ipynbnotebook for more information and an example.
- The way you configure indexes in the
DocumentIndexClienthas changed. See thedocument_index.ipynbnotebook for more information.- The
EmbeddingTypealias has been renamed toRepresentationto better align with the underlying API. - The
embedding_typefield has been removed from theIndexConfigurationclass. You now configure embedding-related parameters via theembeddingfield. - You now always need to specify an embedding model when creating an index. Previously, this was always
luminous-base.
- The
- Dependency updates
- Add support for Llama3InstructModel in PromptBasedClassify
- Add TextControl to 'to_instruct_prompt' for instruct models
- Add 'attention_manipulation_with_text_controls.ipynb' to tutorial notebooks
- Introduced
InstructionFinetuningDataHandlerto provide methods for storing, retrieving and updating finetuning data samples given anInstructionFinetuningDataRepository. Also has methods for filtered sample retrieval and for dataset formatting. - Introduced
InstructionFinetuningDataRepositoryfor storing and retrieving finetuning samples. Comes in two implementations:PostgresInstructionFinetuningDataRepositoryto work with data stored in a Postgres database.FileInstructionFinetuningDataRepositoryto work with data stored in the local file-system.
- Compute precision, recall and f1-score by class in
SingleLabelClassifyAggregationLogic - Add submit_dataset function to StudioClient
- Add
how_to_upload_existing_datasets_to_studio.ipynbto how-tos
- Add
- Improved some docstring inconsistencies across the codebase and switched the docstring checker to pydoclint.
- Add support for stages and files in Data client.
- Add more in-depth description for
MiltipleChunRetrieverQaOutputandExpandChunks
- Data repository media types now validated with a function instead of an Enum.
- Update names of
pharia-1models to lowercase, aligning with fresh deployments of the api-scheduler.
- Add Catalan and Polish support to
DetectLanguage. - Add utility function
run_is_already_computedtoRunnerto check if a run with the given metadata has already been computed.- The
parameter_optimizationnotebook describes how to use therun_is_already_computedfunction.
- The
- The default
max_retry_timefor theLimitedConcurrencyClientis now set to 3 minutes from a day. If you have long-running evaluations that need this, you can re-set a long retry time in the constructor.
- You can now specify a
hybrid_indexwhen creating an index for the document index to use hybrid (semantic and keyword) search. min_scoreandmax_resultsare now optional parameters inDocumentIndexClient.SearchQuery.kis now an optional parameter inDocumentIndexRetriever.- List all indexes of a namespace with
DocumentIndexClient.list_indexes. - Remove an index from a namespace with
DocumentIndexClient.delete_index. ChatModelnow inherits fromControlModel. Although we recommend to use the new chat interface, you can use thePharia1ChatModelwith tasks that rely onControlModelnow.
DocumentIndexClientnow properly setschunk_overlapwhen creating an index configuration.
- The default model for
Llama3InstructModelis nowllama-3.1-8b-instructinstead ofllama-3-8b-instruct. We also removed the llama3.0 models from the recommended models of theLlama3InstructModel. - The default value of
thresholdin theDocumentIndexRetrieverhas changed from0.5to0.0. This accommodates fusion scoring for searches over hybrid indexes.
- Remove cap for
max_concurrencyinLimitedConcurrencyClient. - Introduce abstract
LanguageModelclass to integrate with LLMs from any API- Every
LanguageModelsupports echo to retrieve log probs for an expected completion given a prompt
- Every
- Introduce abstract
ChatModelclass to integrate with chat models from any API- Introducing
Pharia1ChatModelfor usage with pharia-1 models. - Introducing
Llama3ChatModelfor usage with llama models.
- Introducing
- Upgrade
ArgillaWrapperClientto use Argilla v2.x - (Beta) Add
DataClientandStudioDatasetRepositoryas connectors to Studio for submitting data. - Add the optional argument
generate_highlightstoMultiChunkQa,RetrieverBasedQaandSingleChunkQa. This makes it possible to disable highlighting for performance reasons.
- Increase number of returned
log_probsinEloQaEvaluationLogicto avoid missing a valid answer
- Removed
DefaultArgillaClient - Deprecated
Llama2InstructModel
- We needed to upgrade argilla-server image version from
argilla-server:v1.26.0toargilla-server:v1.29.0to maintain compatibility.- Note: We also updated our elasticsearch argilla backend to
8.12.2
- Note: We also updated our elasticsearch argilla backend to
- Updated
DocumentIndexClientwith support for metadata filters.- Add documentation for filtering to
document_index.ipynb.
- Add documentation for filtering to
- Add
StudioClientas a connector for submitting traces. - You can now specify a
chunk_overlapwhen creating an index in the Document Index. - Add support for monitoring progress in the document index connector when embedding documents.
- TaskSpan now properly sets its status to
Erroron crash.
- Deprecate old Trace Viewer as the new
StudioClientreplaces it. This affectsTracer.submit_to_trace_viewer.
- Update docstrings for 'calculate_bleu' in 'BleuGrader' to now correctly reflect float range from 0 to 100 for the return value.
- Reverted a bug introduced in
MultipleChunkRetrieverQatext highlighting.
- Serialization and deserialization of
ExportedSpanand itsattributesnow works as expected. PromptTemplate.to_rich_promptnow always returns an empty list for prompt ranges that are empty.SingleChunkQano longer crashes if given an empty input and a specific prompt template. This did not affect users who used models provided incore.- Added default values for
labelsandmetadataforEvaluationOverviewandRunOverview - In the
MultipleChunkRetrieverQa, text-highlight start and end points are now restricted to within the text length of the respective chunk.
RunRepository.example_outputnow returnsNoneand prints a warning when there is no associated record for the givenrun_idinstead of raising aValueError.RunRepository.example_outputsnow returns an empty list and prints a warning when there is no associated record for the givenrun_idinstead of raising aValueError.
Runner.run_datasetcan now be resumed after failure by setting theresume_from_recovery_dataflag toTrueand callingRunner.run_datasetagain.- For
InMemoryRunRepositorybasedRunners this is limited to runs that failed with an exception that did not crash the whole process/kernel. - For
FileRunRepositorybasedRunnerseven runs that crashed the whole process can be resumed. DatasetRepository.examplesnow accepts an optional parameterexamples_to_skipto enable skipping ofExamples with the provided IDs.- Add
how_to_resume_a_run_after_a_crashnotebook.
- Remove unnecessary dependencies from IL
- Added default values for
labelsandmetadataforPartialEvaluationOverview
- Add
eot_tokenproperty toControlModeland derived classes (LuminousControlModel,Llama2InstructModelandLlama3InstructModel) and letPromptBasedClassifyuse this property instead of a hardcoded string. - Introduce a new argilla client
ArgillaWrapperClient. This uses theargillapackage as a connection to argilla and supports all question types that argilla supports in theirFeedbackDataset. This includes text and yes/no questions. For more information about the questions, check their official documentation.- Changes to switch:
DefaultArgillaClient->ArgillaWrapperClientQuestion->argilla.RatingQuestion,options->valuesand it takes only a listField->argilla.TextField
- Changes to switch:
- Add
descriptionparameter toAggregator.aggregate_evaluationto allow individual descriptions without the need to create a newAggregator. This was missing from the previous release. - Add optional field
metadatatoDataset,RunOverview,EvaluationOverviewandAggregationOverview- Update
parameter_optimization.ipynbto demonstrate usage of metadata****
- Update
- Add optional field
labeltoDataset,RunOverview,EvaluationOverviewandAggregationOverview - Add
unwrap_metadataflag toaggregation_overviews_to_pandasto enable inclusion of metadata in pandas export. Defaults to True.
- Reinitializing different
AlephAlphaModelinstances and retrieving their tokenizer should now consume a lot less memory. - Evaluations now raise errors if ids of examples and outputs no longer match. If this happens, continuing the evaluation would only produce incorrect results.
- Performing evaluations on runs with a different number of outputs now raises errors. Continuing the evaluation in this case would only lead to an inconsistent state.
- Remove the
Traceclass, as it was no longer used. - Renamed
example_tracetoexample_tracerand changed return type toOptional[Tracer]. - Renamed
example_tracertocreate_tracer_for_example. - Replaced langdetect with lingua as language detection tool. This mean that old thresholds for detection might need to be adapted.
Lineagesnow containTracerfor individualOutputs.convert_to_pandas_data_framenow also creates a column containing theTracers.run_datasetnow has a flagtrace_examples_individuallyto createTracers for each example. Defaults to True.- Added optional
metadatafield toExample.
- ControlModels throw a warning instead of an error in case a not-recommended model is selected.
- The
LimitedConcurrencyClient.max_concurrencyis now capped at 10, which is its default, as the underlyingaleph_alpha_clientdoes not support more currently. - ExpandChunk now works properly if the chunk of interest is not at the beginning of a very large document. As a consequence,
MultipleChunkRetrieverQanow works better with larger documents and should return fewerNoneanswers.
- We removed the
trace_idas a concept from various tracing-related functions and moved them to acontext. If you did not directly use thetrace_idthere is nothing to change.Task.runno longer takes a trace id. This was a largely unused feature, and we revamped the trace ids for the traces.- Creating
Span,TaskSpanor logs no longer takestrace_id. This is handled by the spans themselves, who now have acontextthat identifies them.Span.idis therefore also removed. This can be accessed byspan.context.trace_id, but has a different type.
- The
OpenTelemetryTracerno longer logs a customtrace_idinto the attributes. Use the existing ids from its context instead. - Accessing a single trace from a
PersistentTracer.trace()is no longer supported, as the user does not have access to thetrace_idanyway. The function is now calledtracesand returns all available traces for a tracer.
InMemoryTracerand derivatives are no longerpydantic.BaseModel. Use theexport_for_viewingfunction to export a serializable representation of the trace.- We updated the graders to support python 3.12 and moved away from
nltk-package:BleuGradernow usessacrebleu-package.RougeGradernow uses therouge_score-package.
- When using the
ArgillaEvaluator, attempting to submit to a dataset, which already exists, will no longer work append to the dataset. This makes it more in-line with other evaluation concepts.- Instead of appending to an active argilla dataset, you now need to create a new dataset, retrieve it and then finally combine both datasets in the aggregation step.
- The
ArgillaClientnow has methodscreate_datasetfor less fault-ignoring dataset creation andadd_recordsfor performant uploads.
- Add support for Python 3.12
- Add
skip_example_on_any_failureflag toevaluate_runs(defaults to True). This allows to configure if you want to keep an example for evaluation, even if it failed for some run. - Add
how_to_implement_incremental_evaluation. - Add
export_for_viewingto tracers to be able to export traces in a unified format similar to OpenTelemetry.- This is not supported for the
OpenTelemetryTracerbecause of technical incompatibilities.
- This is not supported for the
- All exported spans now contain the status of the span.
- Add
descriptionparameter toEvaluator.evaluate_runsandRunner.run_datasetto allow individual descriptions without the need to create a newEvaluatororRunner. - All models raise an error during initialization if an incompatible
nameis passed, instead of only when they are used. - Add
aggregation_overviews_to_pandasfunction to allow for easier comparison of multiple aggregation overviews. - Add
parameter_optimization.ipynbnotebook to demonstrate the optimization of tasks by comparing different parameter combinations. - Add
convert_file_for_viewingin theFileTracerto convert the trace file format to the new (OpenTelemetry style) format and save as a new file. - All tracers can now call
submit_to_trace_viewerto send the trace to the Trace Viewer.
- The document index client now correctly URL-encodes document names in its queries.
- The
ArgillaEvaluatornot properly supportsdataset_name. - Update outdated
how_to_human_evaluation_via_argilla.ipynb. - Fix bug in
FileSystemBasedRepositorycausing spurious mkdir failure if the file actually exists. - Update broken README links to Read The Docs.
- Fix a broken multi-label classify example in the
evaluationtutorial.
- Changed the behavior of
IncrementalEvaluator::do_evaluatesuch that it now sends allSuccessfulExampleOutputs todo_incremental_evaluateinstead of only the newSuccessfulExampleOutputs.
- Add generic
EloEvaluationLogicclass for implementation of Elo evaluation use cases. - Add
EloQaEvaluationLogicfor Elo evaluation of QA runs, with optional later addition of more runs to an existing evaluation. - Add
EloAggregationAdapterclass to simplify using theComparisonEvaluationAggregationLogicfor different Elo use cases. - Add
elo_qa_evaltutorial notebook describing the use of an (incremental) Elo evaluation use case for QA models. - Add
how_to_implement_elo_evaluationshow-to as skeleton for implementing Elo evaluation cases
ExpandChunks-task is now fast even for very large documents
We did a major revamp of the ArgillaEvaluator to separate an AsyncEvaluator from the normal evaluation scenario.
This comes with easier to understand interfaces, more information in the EvaluationOverview and a simplified aggregation step for Argilla that is no longer dependent on specific Argilla types.
Check the how-to for detailed information here
- rename:
AggregatedInstructComparisontoAggregatedComparison - rename
InstructComparisonArgillaAggregationLogictoComparisonAggregationLogic - remove:
ArgillaAggregator- the regular aggregator now does the job - remove:
ArgillaEvaluationRepository-ArgillaEvaluatornow usesAsyncRepositorywhich extend existingEvaluationRepositoryfor the human-feedback use-case ArgillaEvaluationLogicnow usesto_recordandfrom_recordinstead ofdo_evaluate. The signature of theto_recordstays the same. TheFieldandQuestionare now defined in the logic instead of passed to theArgillaRepositoryArgillaEvaluatornow takes theArgillaClientas well as theworkspace_id. It inherits from the abstractAsyncEvaluatorand no longer hasevalaute_runsandevaluate. Instead it hassubmitandretrieve.EvaluationOverviewgets attributesend_date,successful_evaluation_countandfailed_evaluation_count- rename:
startis now calledstart_dateand no longer optional
- rename:
- we refactored the internals of
Evaluator. This is only relevant if you subclass from it. Most of the typing and data handling is moved toEvaluatorBase
- Add
ComparisonEvaluationfor the elo evaluation to abstract from the Argilla record - Add
AsyncEvaluatorfor human-feedback evaluation.ArgillaEvaluatorinherits from this.submitpushes all evaluations to Argilla to label them- Add
PartialEvaluationOverviewto store the submission details. .retrievethen collects all labelled records from Argilla and stores them in anAsyncRepository.- Add
AsyncEvaluationRepositoryto store and retrievePartialEvaluationOverview. Also addedAsyncFileEvaluationRepositoryandAsyncInMemoryEvaluationRepository
- Add
EvaluatorBaseandEvaluationLogicBasefor base classes for both async and synchronous evaluation.
- Improve description of using artifactory tokens for installation of IL
- Change
confusion_matrixinSingleLabelClassifyAggregationLogicsuch that it can be persisted in a file repository
AlephAlphaModelnow supports acontext_size-property- Add new
IncrementalEvaluatorfor easier addition of runs to existing evaluations without repeated evaluation.- Add
IncrementalEvaluationLogicfor use inIncrementalEvaluator
- Add
Initial stable release
With the release of version 1.0.0 there have been introduced some new features but also some breaking changes you should be aware of. Apart from these changes, we also had to reset our commit history, so please be aware of this fact.
- The TraceViewer has been exported to its own repository and can be accessed via the artifactory here
HuggingFaceDatasetRepositorynow has a parameter caching, which caches examples of a dataset once loaded.Trueas default value- set to
Falsefor non-breaking-change
- Introduction of
LLama2InstructModelallows support of the LLama2-models: llama-2-7b-chatllama-2-13b-chatllama-2-70b-chat- Introduction of
LLama3InstructModelallows support of the LLama2-models: llama-3-8b-instructllama-3-70b-instruct
DocumentIndexClient has been enhanced with the following set of features:
create_index- feature
index_configuration assign_index_to_collectiondelete_index_from_collectionlist_assigned_index_names
ExpandChunks-task now caches chunked documents by IDDocumentIndexRetrievernow supportsindex_nameRunner.run_datasetnow has a configurable number of workers viamax_workersand defaults to the previous value, which is 10.- In case a
BusyErroris raised during acompletetheLimitedConcurrencyClientwill retry untilmax_retry_timeis reached.
HuggingFaceRepositoryno longer is a dataset repository. This also means thatHuggingFaceAggregationRepositoryno longer is a dataset repository.- The input parameter of the
DocumentIndex.search()-function now has been renamed fromindextoindex_name