🚀 Feature
Add a TokensLoaderWithMeta class that stores some additional parallel meta data with the tokens. It can be used to store data with a bit more structure than flat sequence, like image tokens. Here's an example:
{
"token": [1,2,3,4,5],
"token_x": [0,0,0,1,1],
"token_y": [0,1,2,0,1]
}
Notice they all have the same length.
Motivation
I've been using TokensLoader to train models and find it to be really handy. But it's unfortunately a bit difficult to use when I want to experiment with different positional encoding schemes.
Alternatives
The alternative is to create a normal LitDataset with these. But it is less efficient to store, load, and harder to pack samples.
🚀 Feature
Add a
TokensLoaderWithMetaclass that stores some additional parallel meta data with the tokens. It can be used to store data with a bit more structure than flat sequence, like image tokens. Here's an example:{ "token": [1,2,3,4,5], "token_x": [0,0,0,1,1], "token_y": [0,1,2,0,1] }Notice they all have the same length.
Motivation
I've been using TokensLoader to train models and find it to be really handy. But it's unfortunately a bit difficult to use when I want to experiment with different positional encoding schemes.
Alternatives
The alternative is to create a normal LitDataset with these. But it is less efficient to store, load, and harder to pack samples.