Skip to content

Decoder residual setting seems incorrect. #1

Description

@jhairgallardo

Hello! Nice simple version of data2vec2. Thank you for sharing! I realized the decoder network is not doing the residual connections correctly. Since you append each layer, you are doing the residual connections to the layernorms and to the GELU as well. In the paper each "stage" of the convolutional network for the decoder consists of input->conv->layernorm->gelu-> and then add the residual (which is the input at the begining of the stage). I believe the code to create the 'self.convs' variable should look something like this:

#create a list of layers
self.convs = nn.ModuleList()

#add the first layer, converting to the decoder dimension (b x embed_dim x h x w -> b x decoder_dim x h x w)
self.convs.append(
     nn.Sequential(
          nn.Conv2d(embed_dim, decoder_dim, kernel_size=kernel_size, padding=padding, groups=groups),
          nn.LayerNorm((decoder_dim, self.h, self.w)),
          nn.GELU(),
     )
)

#add the remaining layers
for i in range(depth - 1):
    self.convs.append(
           nn.Sequential(
                nn.Conv2d(decoder_dim, decoder_dim, kernel_size=kernel_size, padding=padding, groups=groups),
                nn.LayerNorm((decoder_dim, self.h, self.w)),
                nn.GELU(),
           )
    )

Your same forward pass works for this configuration. Now the residual is added after each stage instead of after each single layer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions