X-ELM: Cross-Lingual Expert Language Models

This page hosts the intermediate pretraining checkpoints from the paper Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models.

Each model below is a Metaseq model checkpoint (3.3GB each) from the higher compute budget (40k updates and 21B training tokens) experimental setting. Models with the same file prefix and different cluster ids correspond to different experts within the same experimental setting. We also provide a Python script to convert these checkpoints into the Huggingface format. If you use these resources in your own work, please cite the corresponding paper.

Typological Clustering k=8 Models
TF-IDF k=8 Models
Dense k=1 Model

Contact

For any questions or comments about the checkpoints or other aspects of this project, please contact Terra Blevins at blvns@cs.washington.edu.