Large Language Models (LLMs) are often known for their ability to predict the next part of a word, but a recent research paper by Google’s AI subsidiary DeepMind suggests that LLMs can also be seen as strong data compressors. The authors argue for viewing the prediction problem through the lens of compression, as LLMs can effectively compress information as well as or better than traditional compression algorithms. In their experiments, DeepMind researchers repurposed LLMs to perform arithmetic coding, a type of lossless compression algorithm, and found that the models achieved impressive compression rates on text, image, and audio data. However, LLMs are limited by their size and speed compared to classical compression algorithms, making them impractical for data compression in their current state.
A significant finding from the research is that the performance of LLMs is affected by the scale of the model and the size of the dataset. While larger models achieve superior compression rates on larger datasets, their performance diminishes on smaller datasets. This challenges the prevailing belief in the field that bigger LLMs are always better. The researchers suggest that compression can serve as a metric to evaluate the appropriate size of a model, as it provides a quantitative measure of how well the model learns the information in its dataset. These insights into the effect of scale on LLM performance could have important implications for the evaluation of LLMs in the future.
This research also addresses the issue of test set contamination in LLM training, which occurs when a trained model is tested on data from the training set. The researchers propose using compression approaches that consider the model’s complexity, known as Minimum Description Length (MDL), to evaluate the model and avoid test set contamination. MDL penalizes models that simply memorize the training data, providing a framework for evaluating models that goes beyond traditional benchmark tests. While LLMs have limitations as data compressors, this fresh perspective on their capabilities opens up new possibilities for their development and evaluation in the future.