We, Works Applications Tokushima Laboratory of AI and NLP, are releasing language resources for natural language processing. Currently, the following two data are available.
SudachiDict is a lexicon for use with the Japanese morphological analyzer Sudachi.
chiVe is Japanese pretrained word embeddings (word vectors), trained using the ultra-large-scale web corpus NWJC by National Institute for Japanese Langauge and Linguistics, analyzed by Sudachi.
chiTra is a library for using large-scale pre-trained language models with the Japanese tokenizer SudachiPy.
Apache-2.0
https://worksapplications.github.io/Sudachi/