Retrieval-Enhanced Transformer

Problems To Solve To Scale Down the model size while maintaining the performances. To incorporate External Memory Retrieval in the Large Language Model Modeling. How? Data Construction Training & Evaluation set: \(\text{MassiveText}\) for both training & retrieval data (contains 5 trillion tokens) SentencePiece with a vocabulary of \(128K\) tokens During training, we retrieving \(600B\) tokens from the training The evaluation contains \(1.75T\) tokens Test set leakage: Due to the huge retrieving database, the test set may have appeared in the training set....

June 19, 2022 · 718 words