interpretingdl
diff --git a/‎_images/38ad8ed74235e5266934e429410cf7f2d5eec7a71769bd9a2165fa7aca9d73f2.png‎
146 KB b/‎_images/38ad8ed74235e5266934e429410cf7f2d5eec7a71769bd9a2165fa7aca9d73f2.png‎
146 KB
diff --git a/‎_images/7f717b55ee33eb46d7862dbcf4d8d511e9fcb6732a362c903e8ada4b8901c1a9.png‎
47.5 KB b/‎_images/7f717b55ee33eb46d7862dbcf4d8d511e9fcb6732a362c903e8ada4b8901c1a9.png‎
47.5 KB
diff --git a/‎_sources/index.md‎
Lines changed: 11 additions & 5 deletions b/‎_sources/index.md‎
Lines changed: 11 additions & 5 deletions
diff --git a/‎_sources/interspeech2025/feature-importance-scoring/context_mixing.ipynb‎
Lines changed: 11 additions & 24 deletions b/‎_sources/interspeech2025/feature-importance-scoring/context_mixing.ipynb‎
Lines changed: 11 additions & 24 deletions
diff --git a/‎_sources/interspeech2025/feature-importance-scoring/feature_attribution.ipynb‎
Lines changed: 2266 additions & 1 deletion b/‎_sources/interspeech2025/feature-importance-scoring/feature_attribution.ipynb‎
Lines changed: 2266 additions & 1 deletion
@@ -1,6 +1,6 @@
----
-edit_url: null
----
+<head><meta property="og:image" content="https://raw.githubusercontent.com/interpretingdl/speech-interpretability-tutorial/refs/heads/main/book/images/tutorial-overview.png"></head>
+
+# Interpretability Techniques for Speech Models
 
 Pre-trained foundation models have revolutionized speech technology like many other adjacent fields. The combination of their capability and opacity has sparked interest in researchers trying to interpret the models in various ways. While interpretability in fields such as computer vision and natural language processing has made significant progress towards understanding model internals and explaining their decisions, speech technology has lagged behind despite the widespread use of complex, black-box neural models. Recent studies have begun to address this gap, marked by a growing body of literature focused on interpretability in the speech domain. This tutorial provides a structured overview of interpretability techniques, their applications, implications, and limitations when applied to speech models, aiming to help researchers and practitioners better understand, evaluate, debug, and optimize speech models while building trust in their predictions. In hands-on sessions, participants will explore how speech models encode distinct features (e.g., linguistic information) and utilize them in their inference. By the end, attendees will be equipped with the tools and knowledge to start analyzing and interpreting speech models in their own research, potentially inspiring new directions.
 
@@ -32,9 +32,15 @@ We will present our tutorial about _Interpretability Techniques for Speech Model
 
 ## Tutorial contents
 
-> **Representational Analysis methods** for speech model interpretability: <br> Probing <br> {cite}`bentumProcessingStressEndEnd2024,shen-etal-2024-encoding,deheerklootsWhatSelfsupervisedSpeech2025,cormacenglishDomainInformedProbingWav2vec2022` <br> Representation space comparisons <br> RSA {cite}`chrupala-etal-2020-analyzing,shenWaveSyntaxProbing2023a`, CCA {cite}`Pasad2021`, CKA {cite}`pmlr-v97-kornblith19a`) <br> CTC & Decoder lenses <br>{cite}`deheerklootsHumanlikeLinguisticBiases2024,langedijkDecoderLensLayerwiseInterpretation2024` <br> Embedding similarities (ABX tests) <br> {cite}`schatz2016,algayresDPParseFindingWord2022,seysselDiscriminatingFormMeaning2025` <br>
+**Representational Analysis methods**: 
+- Probing{cite}`cormacenglishDomainInformedProbingWav2vec2022, choEvidenceVocalTract2023,bentumProcessingStressEndEnd2024,shen-etal-2024-encoding,bentum25_interspeech,deheerklootsWhatSelfsupervisedSpeech2025,cormacenglishDomainInformedProbingWav2vec2022`
+- Representation space comparisons: RSA{cite}`chrupala-etal-2020-analyzing,shenWaveSyntaxProbing2023a`, CCA{cite}`Pasad2021`, CKA{cite}`pmlr-v97-kornblith19a`)
+- CTC & Decoder lenses{cite}`deheerklootsHumanlikeLinguisticBiases2024,langedijkDecoderLensLayerwiseInterpretation2024` 
+- Embedding similarities (ABX tests){cite}`schatz2016,algayresDPParseFindingWord2022,seysselDiscriminatingFormMeaning2025` 
 
-> **Feature Importance Scoring methods** for speech model interpretability, including: <br> Context-mixing <br> Attention {cite}`yangUnderstandingSelfAttentionSelfSupervised2020a,shimUNDERSTANDINGROLESELF2022,alastrueyLocalityAttentionDirect2022,audhkhasiAnalysisSelfAttentionHead2022,kobayashiAttentionNotOnly2020a`, Value-Zeroing {cite}`mohebbiHomophoneDisambiguationReveals2023a`), <br> Feature attribution {cite}`fucciExplainabilitySpeechModels2024,shenReliabilityFeatureAttribution2025` (Gradient-based {cite}`prasadHowAccentsConfound2020,guptaPhonemeDiscretizedSaliency2024` & Perturbation-based {cite}`wuExplanationsforASR2023,pastor-etal-2024-explaining`) <br>
+**Feature Importance Scoring methods**: 
+- Context-mixing: Attention{cite}`yangUnderstandingSelfAttentionSelfSupervised2020a,shimUNDERSTANDINGROLESELF2022,alastrueyLocalityAttentionDirect2022,audhkhasiAnalysisSelfAttentionHead2022`, Attention Norm{cite}`kobayashiAttentionNotOnly2020a`, Value-Zeroing {cite}`mohebbiHomophoneDisambiguationReveals2023a`
+- Feature attribution {cite}`fucciExplainabilitySpeechModels2024,shenReliabilityFeatureAttribution2025` (Gradient-based {cite}`prasadHowAccentsConfound2020,guptaPhonemeDiscretizedSaliency2024` & Perturbation-based {cite}`wuExplanationsforASR2023,pastor-etal-2024-explaining`
 
 ## References
 
 
@@ -5495,29 +5495,25 @@
     {
       "cell_type": "markdown",
       "source": [
-        "# **Quantifying Context-Mixing in Speech Transformers**\n",
+        "# Quantifying Context-Mixing in Speech Transformers\n",
         "\n",
         "---\n",
         "\n",
         "Author: Hosein Mohebbi\n",
         "\n",
+        "_Thanks to Martijn Bentum, Tom Lentz, and Willem Zuidema for their helpful feedback during the notebook's preparation._\n",
+        "\n",
         "---\n",
         "\n",
-        "This notebook is based on EMNLP 2023 paper: [\"Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers\"](https://aclanthology.org/2023.emnlp-main.513.pdf) and part of the [_Interspeech2025_](https://www.interspeech2025.org/home) tutorial on [_\"Interpretability Techniques for Speech Models.\"_](https://interpretingdl.github.io/speech-interpretability-tutorial/) The notebook explores the ways to quantify patterns of *context-mixing* in speech ASR models at word-level.\n"
+        "This notebook is based on the EMNLP 2023 paper: [\"Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers\"](https://aclanthology.org/2023.emnlp-main.513.pdf) and part of the [Interspeech 2025](https://www.interspeech2025.org/home) tutorial on [_Interpretability Techniques for Speech Models_](https://interpretingdl.github.io/speech-interpretability-tutorial/).\n",
+        "\n",
+        "The notebook explores the ways to quantify patterns of *context-mixing* in speech ASR models at word-level.\n",
+        "\n"
       ],
       "metadata": {
         "id": "HEEmFEbkeS3K"
       }
     },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "****"
-      ],
-      "metadata": {
-        "id": "YVAmPe4Aq6Fo"
-      }
-    },
     {
       "cell_type": "markdown",
       "source": [
@@ -5572,7 +5568,7 @@
     {
       "cell_type": "markdown",
       "source": [
-        "# **Dataset**"
+        "# **Homophone Dataset**"
       ],
       "metadata": {
         "id": "eoiDquxxkA8K"
@@ -5782,7 +5778,7 @@
     {
       "cell_type": "markdown",
       "source": [
-        "# **Models**"
+        "# **ASR Models**"
       ],
       "metadata": {
         "id": "q2CMbl_FlLk9"
@@ -6273,7 +6269,7 @@
         "\n",
         "- **Attention**\n",
         "\n",
-        "- **Value Zeroing**"
+        "- **Value Zeroing** [(Mohebbi et al., 2023)](https://aclanthology.org/2023.eacl-main.245.pdf)"
       ],
       "metadata": {
         "id": "vM3PE030g1Sg"
@@ -6379,7 +6375,7 @@
         "S_{i \\leftarrow j}=\\frac{1}{|\\mathcal{I}|} \\sum_{n \\in \\mathcal{I}} \\cos \\left(\\tilde{\\boldsymbol{x}}_{n}, \\tilde{\\boldsymbol{x}}_{n}^{\\neg j}\\right)\n",
         "$$\n",
         "\n",
-        "__Note #1__: Unlike generic perturbation approaches, our proposed method does not remove the input token representations $x_i$ from the input of a transformer layer! Since any changes in the input vectors will lead to changes in the query and key vectors in the multi-head attention module, resulting in a change in the attention distribution. So, there will be a discrepancy between the alternative attention weights that we analyze and those we initially had for the original context. So, basically, we won't analyze the same model anymore! By zeroing only the value vector in the weighted sum, the token representation maintains its identity within the layer, but it does not contribute to forming other token representations.\n",
+        "__Note #1__: Unlike generic perturbation approaches, our proposed method does not remove the input token representations $x_i$ from the input of a transformer layer! Since any changes in the input vectors will lead to changes in the query and key vectors in the multi-head attention module, resulting in a change in the attention distribution. So, there will be a discrepancy between the alternative attention weights that we analyze and those we initially had for the original context; So, basically, we won't analyze the same model anymore! By zeroing only the value vector in the weighted sum, the token representation maintains its identity within the layer, not leading to OOD problem, while not contributing to conttext-mixing to from other token representations.\n",
         "\n",
         "__Note #2__: Since Value Zeroing is computed from the layer's outputs, it incorporates all the components inside a Transformer layer, inlcuding feed-forwards, layer normalization, and residual connection.\n"
       ],
@@ -6925,15 +6921,6 @@
           }
         }
       ]
-    },
-    {
-      "cell_type": "code",
-      "source": [],
-      "metadata": {
-        "id": "5nuh51ZqxuQa"
-      },
-      "execution_count": null,
-      "outputs": []
     }
   ]
 }