Member-only story
Key technologies for fine tuning LLM
5 min readJun 20, 2023
Attention based transformers are basic build blocks generative pretraining (GPT), and to achieve state of art performance. Increasing the number of parameters in a language model can potentially improve performance on large datasets in several ways:
- Enhanced Representation Power: More parameters allow the model to capture and represent a greater variety of complex patterns and relationships in the data. With a larger capacity, the model can learn intricate features and nuances that may be crucial for understanding the language data. This increased representation power can lead to improved performance on tasks such as language generation, comprehension, translation, or sentiment analysis.
- Better Generalization: While it might seem counterintuitive, increasing the number of parameters can sometimes improve generalization performance, especially on large datasets. With a larger model capacity, the language model can capture a wider range of patterns and generalize from them, effectively learning the underlying structure of the data. This ability to generalize well enables the model to perform better on new, unseen examples that share similar patterns to the training data.
- Reduced Bias: Larger language models tend to have more parameters, which means they can learn more diverse and complex representations of the data. This increased diversity can help mitigate biases present in the training data. By having more parameters available to learn from a variety of examples, the model can better represent different perspectives, reduce over-reliance on specific biases, and make predictions that are fairer and more inclusive.
- When combined with big data, a higher number of parameters can leverage the extensive dataset to improve parameter estimation, leading to more reliable and accurate model representations.
- Big data supports higher model complexity by providing ample examples to support the increased number of parameters, allowing the model to learn more nuanced representations and capture complex language patterns effectively. I.e. enhances the model’s robustness to outliers and rare events, model can generalize better by capturing essential features and disregarding outliers or noise, leading to more reliable and accurate model representations, and therefore improved performance.
As a result, the recent trend has been training large and large models, along with large datasets.