Skip to content

Commit f8c64a5

Browse files
authored
Update thesis.md, undo wrong marking
1 parent ef803e0 commit f8c64a5

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

_pages/thesis.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -469,7 +469,7 @@ lower scope in the domain of ATS are also possible).
469469
- *Data Mining and LLM-as-a-Judge to better understand LLM behavior:* While the behavior of LLMs and their nuanced and complex output data is challenging to evaluate, data mining approaches can be leveraged to explain model behavior, to bring structure into evaluation and to gain new insights, e.g. on cultural biases or task failure [1]. In this thesis project, we want to take this approach further by evaluating the use of newly proposed data mining algorithms and/or the combination of LLM-as-a-Judge with data mining processes. The project offers the possibility to work on a technical evaluation of methods as well as develop and evaluate a new method. **References:** [1] [https://aclanthology.org/2025.acl-long.985/](https://aclanthology.org/2025.acl-long.985/)
470470
**Level: MSc.**
471471

472-
- :hourglass_flowing_sand: *Understanding Post-Training Effects Through Model Behavior Analysis and Interpretability:* Post-training has become an essential technique to adapt pretrained language models, e.g. to improve instruction following [1] or abilities for underrepresented languages [2], or to align model behavior with safety standards [3]. Correctly adapting models through post-training is, however, a complex and difficult process which can e.g. trigger broad misalignments and unexpected effects like safety failures [4]. To better control post-training, it is crucial to better understand how models change during the process.
472+
- *Understanding Post-Training Effects Through Model Behavior Analysis and Interpretability:* Post-training has become an essential technique to adapt pretrained language models, e.g. to improve instruction following [1] or abilities for underrepresented languages [2], or to align model behavior with safety standards [3]. Correctly adapting models through post-training is, however, a complex and difficult process which can e.g. trigger broad misalignments and unexpected effects like safety failures [4]. To better control post-training, it is crucial to better understand how models change during the process.
473473
This thesis will study the effects of post-training through a dual lens. Through model behavior analysis tools like Spotlight [5], it will explore how a model changes with respect to non-performance metrics like gender [6] and cultural biases [7]. Using probing, logic lense or other interpretability techniques, it will then go one step further and also start explaining how these changes occur within the model. Depending on scope and resource availability, this thesis can either work with existing model (checkpoints) or post-train specific model aspects.
474474
**References:**
475475
[1] [Ouyang et al. (2022): Training language models to follow instructions with human feedback. arXiv 2203.02155.](https://arxiv.org/pdf/2203.02155)

0 commit comments

Comments
 (0)