09.12.2024
Droit, Enseignement supérieur et recherche, Innovation, Ressources scientifiques et techniques
Pleias 1.0 models | They Said It Couldn’t Be Done
« Training large language models required copyrighted data until it did not. Today we release Pleias 1.0 models, a family of fully open small language models. Pleias 1.0 models include three base models: 350M, 1.2B, and 3B parameters. They feature two specialized models for knowledge retrieval with unprecedented performance for their size on multilingual Retrieval-Augmented Generation, Pleias-Pico (350M parameters) and Pleias-Nano (1.2B parameters).
These represent the first ever models trained exclusively on open data, meaning data that are either non-copyrighted or are published under a permissible license. These are the first fully EU AI Act compliant models. In fact, Pleias sets a new standard for safety and openness. (…) »
source > huggingface.co/blog, Pierre-Carl Langlais, Anastasia Stasenko, Catherine Arnett, 5 décembre 2024