F2LLM-v2 introduces a new family of general-purpose multilingual embedding models, ranging from 80M to 14B parameters. These models support over 200 languages, with a significant emphasis on improving performance for previously underserved mid- and low-resource languages. Utilizing a two-stage LLM-based training pipeline with techniques like matryoshka learning and knowledge distillation, F2LLM-v2 aims for inclusive, performant, and efficient language understanding globally.
Impact: 8/10
Read Article →