Skip to content

Dataset for tutorial missing #1497

@atmelino

Description

@atmelino

The following tutorial and guide use a dataset that is missing:

Neural machine translation with a Transformer and Keras
https://www.tensorflow.org/text/tutorials/transformer

Subword tokenizers
https://www.tensorflow.org/text/guide/subwords_tokenizer

Both use the following line to load the dataset:
examples, metadata = tfds.load('ted_hrlr_translate/pt_to_en', with_info=True, as_supervised=True)
which attempts to download a file from
http://www.phontron.com/data/qi18naacl-dataset.tar.gz

This page also refers to the file qi18naacl-dataset.tar.gz
https://github.com/neulab/word-embeddings-for-nmt
leading to the same broken link.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions