11<!DOCTYPE html>
22< html >
3+
34< head >
45 < meta charset ="utf-8 ">
56 <!-- Meta tags for social media banners, these should be filled in appropriatly as they are your "business card" -->
67 <!-- Replace the content tag with appropriate information -->
78 < meta name ="description " content ="Image Reconstruction as a Tool for Feature Analysis ">
8- < meta property ="og:title " content ="Image Reconstruction as a Tool for Feature Analysis "/>
9- < meta property ="og:description " content ="A novel approach for interpreting vision features via image reconstruction "/>
10- < meta property ="og:url " content ="https://fusionbrainlab.github.io/feature_analysis "/>
9+ < meta property ="og:title " content ="Image Reconstruction as a Tool for Feature Analysis " />
10+ < meta property ="og:description "
11+ content ="A novel approach for interpreting vision features via image reconstruction " />
12+ < meta property ="og:url " content ="https://fusionbrainlab.github.io/feature_analysis " />
1113 <!-- Path to banner image, should be in the path listed below. Optimal dimenssions are 1200X630-->
1214 < meta property ="og:image " content ="static/images/v1_vs_v2.png " />
13- < meta property ="og:image:width " content ="1200 "/>
14- < meta property ="og:image:height " content ="630 "/>
15+ < meta property ="og:image:width " content ="1200 " />
16+ < meta property ="og:image:height " content ="630 " />
1517
1618
1719 < meta name ="twitter:title " content ="Image Reconstruction as a Tool for Feature Analysis ">
2628
2729 < title > Image Reconstruction as a Tool for Feature Analysis</ title >
2830 < link rel ="icon " type ="image/x-icon " href ="static/images/favicon.ico ">
29- < link href ="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro "
30- rel ="stylesheet ">
31+ < link href ="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro " rel ="stylesheet ">
3132
3233 < link rel ="stylesheet " href ="static/css/bulma.min.css ">
3334 < link rel ="stylesheet " href ="static/css/bulma-carousel.min.css ">
3435 < link rel ="stylesheet " href ="static/css/bulma-slider.min.css ">
3536 < link rel ="stylesheet " href ="static/css/fontawesome.all.min.css ">
36- < link rel ="stylesheet "
37- href ="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css ">
37+ < link rel ="stylesheet " href ="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css ">
3838 < link rel ="stylesheet " href ="static/css/index.css ">
3939
4040 < script src ="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js "> </ script >
4444 < script src ="static/js/bulma-slider.min.js "> </ script >
4545 < script src ="static/js/index.js "> </ script >
4646</ head >
47+
4748< body >
4849
4950
@@ -79,111 +80,94 @@ <h1 class="title is-1 publication-title">Image Reconstruction as a Tool for Feat
7980 </ span >
8081 </ div >
8182
82- < div class ="is-size-5 publication-authors ">
83- < span class ="author-block "> AIRI< br > Moscow, Russia</ span >
84- < span class ="author-block "> MIPT< br > Dolgoprudny, Russia</ span >
85- </ div >
86-
8783 < div class ="column has-text-centered ">
8884 < div class ="publication-links ">
89- <!-- Arxiv PDF link -->
85+
86+ <!-- Github link -->
9087 < span class ="link-block ">
91- < a href ="https://arxiv.org/pdf/<ARXIV PAPER ID>.pdf " target ="_blank "
92- class ="external-link button is-normal is-rounded is-dark ">
93- < span class ="icon ">
94- < i class ="fas fa-file-pdf "> </ i >
95- </ span >
96- < span > Paper </ span >
97- </ a >
98- </ span >
88+ < a href ="https://github.com/FusionBrainLab/feature_analysis " target ="_blank "
89+ class ="external-link button is-normal is-rounded is-dark ">
90+ < span class ="icon ">
91+ < i class ="fab fa-github "> </ i >
92+ </ span >
93+ < span > Code </ span >
94+ </ a >
95+ </ span >
9996
100- <!-- Supplementary PDF link -->
101- < span class ="link-block ">
102- < a href ="static/pdfs/supplementary_material.pdf " target ="_blank "
103- class ="external-link button is-normal is-rounded is-dark ">
104- < span class ="icon ">
105- < i class ="fas fa-file-pdf "> </ i >
97+ <!-- ArXiv abstract Link -->
98+ < span class ="link-block ">
99+ < a href ="https://arxiv.org/abs/<ARXIV PAPER ID> " target ="_blank "
100+ class ="external-link button is-normal is-rounded is-dark ">
101+ < span class ="icon ">
102+ < i class ="ai ai-arxiv "> </ i >
103+ </ span >
104+ < span > arXiv</ span >
105+ </ a >
106106 </ span >
107- < span > Supplementary</ span >
108- </ a >
109- </ span >
110-
111- <!-- Github link -->
112- < span class ="link-block ">
113- < a href ="https://github.com/FusionBrainLab/feature_analysis " target ="_blank "
114- class ="external-link button is-normal is-rounded is-dark ">
115- < span class ="icon ">
116- < i class ="fab fa-github "> </ i >
117- </ span >
118- < span > Code</ span >
119- </ a >
120- </ span >
121-
122- <!-- ArXiv abstract Link -->
123- < span class ="link-block ">
124- < a href ="https://arxiv.org/abs/<ARXIV PAPER ID> " target ="_blank "
125- class ="external-link button is-normal is-rounded is-dark ">
126- < span class ="icon ">
127- < i class ="ai ai-arxiv "> </ i >
128- </ span >
129- < span > arXiv</ span >
130- </ a >
131- </ span >
132- </ div >
133- </ div >
134- </ div >
135- </ div >
136- </ section >
137-
138-
139-
140- <!-- Paper abstract -->
141- < section class ="section hero is-light ">
142- < div class ="container is-max-desktop ">
143- < div class ="columns is-centered has-text-centered ">
144- < div class ="column is-four-fifths ">
145- < h2 class ="title is-3 "> Abstract</ h2 >
146- < div class ="content has-text-justified ">
147- < p >
148- Vision encoders are increasingly used in modern applications, from vision-only models to multimodal systems such as vision-language models. Despite their remarkable success, it remains unclear how these architectures represent features internally. Here, we propose a novel approach for interpreting vision features via image reconstruction. We compare two related model families, SigLIP and SigLIP2, which differ only in their training objective, and show that encoders pre-trained on image-based tasks retain significantly more image information than those trained on non-image tasks such as contrastive learning. We further apply our method to a range of vision encoders, ranking them by the informativeness of their feature representations. Finally, we demonstrate that manipulating the feature space yields predictable changes in reconstructed images, revealing that orthogonal rotations — rather than spatial transformations — control color encoding. Our approach can be applied to any vision encoder, shedding light on the inner structure of its feature space. We also append the code of our experiments to reproduce them successfully: < a href ="https://github.com/FusionBrainLab/feature_analysis "> https://github.com/FusionBrainLab/feature_analysis</ a > .
149- </ p >
107+ </ div >
108+ </ div >
109+ </ div >
110+ </ div >
111+ </ section >
112+
113+
114+
115+ <!-- Paper abstract -->
116+ < section class ="section hero is-light ">
117+ < div class ="container is-max-desktop ">
118+ < div class ="columns is-centered has-text-centered ">
119+ < div class ="column is-four-fifths ">
120+ < h2 class ="title is-3 "> Abstract</ h2 >
121+ < div class ="content has-text-justified ">
122+ < p >
123+ Vision encoders are increasingly used in modern applications, from vision-only models to multimodal
124+ systems such as vision-language models. Despite their remarkable success, it remains unclear how these
125+ architectures represent features internally. Here, we propose a novel approach for interpreting vision
126+ features via image reconstruction. We compare two related model families, SigLIP and SigLIP2, which differ
127+ only in their training objective, and show that encoders pre-trained on image-based tasks retain
128+ significantly more image information than those trained on non-image tasks such as contrastive learning.
129+ We further apply our method to a range of vision encoders, ranking them by the informativeness of their
130+ feature representations. Finally, we demonstrate that manipulating the feature space yields predictable
131+ changes in reconstructed images, revealing that orthogonal rotations — rather than spatial transformations
132+ — control color encoding. Our approach can be applied to any vision encoder, shedding light on the inner
133+ structure of its feature space.
134+ </ div >
150135 </ div >
151136 </ div >
152137 </ div >
153- </ div >
154- </ section >
155- <!-- End paper abstract -->
138+ </ section >
139+ <!-- End paper abstract -->
156140
157141
158- <!-- Image carousel -->
159- < section class ="hero is-small ">
160- < div class ="hero-body ">
161- < div class ="container ">
162- < div id ="results-carousel " class ="carousel results-carousel ">
163- < div class ="item ">
164- <!-- Your image here -->
165- < img src ="static/images/v1_vs_v2.png " alt ="Comparison of SigLIP and SigLIP2 reconstructions "/>
166- < h2 class ="subtitle has-text-centered ">
167- Comparison of image reconstructions from SigLIP and SigLIP2 feature spaces.
168- </ h2 >
169- </ div >
170- < div class ="item ">
171- <!-- Your image here -->
172- < img src ="static/images/rb_swap.png " alt ="Red-Blue channel swap visualization "/>
173- < h2 class ="subtitle has-text-centered ">
174- Visualization of feature space manipulation through red-blue channel swap.
175- </ h2 >
142+ <!-- Image carousel -->
143+ < section class ="hero is-small ">
144+ < div class ="hero-body ">
145+ < div class ="container ">
146+ < div id ="results-carousel " class ="carousel results-carousel ">
147+ < div class ="item ">
148+ <!-- Your image here -->
149+ < img src ="static/images/v1_vs_v2.png " alt ="Comparison of SigLIP and SigLIP2 reconstructions " />
150+ < h2 class ="subtitle has-text-centered ">
151+ Comparison of image reconstructions from SigLIP and SigLIP2 feature spaces.
152+ </ h2 >
153+ </ div >
154+ < div class ="item ">
155+ <!-- Your image here -->
156+ < img src ="static/images/rb_swap.png " alt ="Red-Blue channel swap visualization " />
157+ < h2 class ="subtitle has-text-centered ">
158+ Visualization of feature space manipulation through red-blue channel swap.
159+ </ h2 >
160+ </ div >
161+ </ div >
176162 </ div >
177163 </ div >
178- </ div >
179- </ div >
180- </ section >
181- <!-- End image carousel -->
164+ </ section >
165+ <!-- End image carousel -->
182166
183167
184168
185169
186- <!--BibTex citation -->
170+ <!--BibTex citation -->
187171 < section class ="section " id ="BibTeX ">
188172 < div class ="container is-max-desktop content ">
189173 < h2 class ="title "> BibTeX</ h2 >
@@ -194,33 +178,38 @@ <h2 class="title">BibTeX</h2>
194178 year={2024}
195179}</ code > </ pre >
196180 </ div >
197- </ section >
198- <!--End BibTex citation -->
181+ </ section >
182+ <!--End BibTex citation -->
199183
200184
201185 < footer class ="footer ">
202- < div class ="container ">
203- < div class ="columns is-centered ">
204- < div class ="column is-8 ">
205- < div class ="content ">
206-
207- < p >
208- This page was built using the < a href ="https://github.com/eliahuhorwitz/Academic-project-page-template " target ="_blank "> Academic Project Page Template</ a > which was adopted from the < a href ="https://nerfies.github.io " target ="_blank "> Nerfies</ a > project page.
209- You are free to borrow the source code of this website, we just ask that you link back to this page in the footer. < br > This website is licensed under a < a rel ="license " href ="http://creativecommons.org/licenses/by-sa/4.0/ " target ="_blank "> Creative
210- Commons Attribution-ShareAlike 4.0 International License</ a > .
211- </ p >
212-
186+ < div class ="container ">
187+ < div class ="columns is-centered ">
188+ < div class ="column is-8 ">
189+ < div class ="content ">
190+
191+ < p >
192+ This page was built using the < a href ="https://github.com/eliahuhorwitz/Academic-project-page-template "
193+ target ="_blank "> Academic Project Page Template</ a > which was adopted from the < a
194+ href ="https://nerfies.github.io " target ="_blank "> Nerfies</ a > project page.
195+ You are free to borrow the source code of this website, we just ask that you link back to this page in the
196+ footer. < br > This website is licensed under a < a rel ="license "
197+ href ="http://creativecommons.org/licenses/by-sa/4.0/ " target ="_blank "> Creative
198+ Commons Attribution-ShareAlike 4.0 International License</ a > .
199+ </ p >
200+
201+ </ div >
213202 </ div >
214203 </ div >
215204 </ div >
216- </ div >
217- </ footer >
205+ </ footer >
206+
207+ <!-- Statcounter tracking code -->
208+
209+ <!-- You can add a tracker to track page visits by creating an account at statcounter.com -->
218210
219- <!-- Statcounter tracking code -->
220-
221- <!-- You can add a tracker to track page visits by creating an account at statcounter.com -->
211+ <!-- End of Statcounter Code -->
222212
223- <!-- End of Statcounter Code -- >
213+ </ body >
224214
225- </ body >
226- </ html >
215+ </ html >
0 commit comments