Materials collection of Qi language of ancient and modern Chinese

Last updated on Apr 17, 2019 1 min read NLP

We create a new large-scale Ancient-Modern Chinese parallel corpus which contains 1.24M bilingual pairs. To our best knowledge, this is the first large high-quality Ancient-Modern Chinese dataset which includes 984,611 pairs in training set, 48,980 pairs in validation set, and 50,000 pairs in test set.

Academic Open Source

Materials collection of Qi language of ancient and modern Chinese

Jiancheng Lv

Dean and professor of Computer Science of Sichuan University

Related