Decoder residual setting seems incorrect.

Hello! Nice simple version of data2vec2. Thank you for sharing! I realized the decoder network is not doing the residual connections correctly. Since you append each layer, you are doing the residual connections to the layernorms and to the GELU as well. In the paper each "stage" of the convolutional network for the decoder consists of input->conv->layernorm->gelu-> and then add the residual (which is the input at the begining of the stage). I believe the code to create the 'self.convs' variable should look something like this:


```
#create a list of layers
self.convs = nn.ModuleList()

#add the first layer, converting to the decoder dimension (b x embed_dim x h x w -> b x decoder_dim x h x w)
self.convs.append(
     nn.Sequential(
          nn.Conv2d(embed_dim, decoder_dim, kernel_size=kernel_size, padding=padding, groups=groups),
          nn.LayerNorm((decoder_dim, self.h, self.w)),
          nn.GELU(),
     )
)

#add the remaining layers
for i in range(depth - 1):
    self.convs.append(
           nn.Sequential(
                nn.Conv2d(decoder_dim, decoder_dim, kernel_size=kernel_size, padding=padding, groups=groups),
                nn.LayerNorm((decoder_dim, self.h, self.w)),
                nn.GELU(),
           )
    )
```

Your same forward pass works for this configuration. Now the residual is added after each stage instead of after each single layer. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoder residual setting seems incorrect. #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Decoder residual setting seems incorrect. #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions