Sudachi logo

Sudachi Language Resources

We, Works Applications Tokushima Laboratory of AI and NLP, are releasing language resources for natural language processing. Currently, the following two data are available.

SudachiDict is a lexicon for use with the Japanese morphological analyzer Sudachi.

chiVe is Japanese pretrained word embeddings (word vectors), trained using the ultra-large-scale web corpus NWJC by National Institute for Japanese Langauge and Linguistics, analyzed by Sudachi.

chiTra is a library for using large-scale pre-trained language models with the Japanese tokenizer SudachiPy.

License

Apache-2.0

Download

SudachiDict
SudachiDict Synonym
chiVe
chiTra

Documentation

https://worksapplications.github.io/Sudachi/

Contact

sudachi@worksap.co.jp