Nepali Spelling Correction Using Transformer Models
·129 words
Table of Contents
Nepali Spelling Correction Using Transformer Models #
Introduction #
Nepali, being a low-resource language, lacks robust spelling correction tools. This project leverages transformer-based sequence-to-sequence models to build an accurate Nepali spelling correction system.
Summary #
This project explores the effectiveness of transfer learning for Nepali spelling correction by fine-tuning transformer-based models. It compares three distinct models: Varta-T5, mT5-small, and mBART. The dataset is prepared using pseudo-random synthetic error generation techniques from a large-scale Nepali text corpus. The results demonstrate the strengths and weaknesses of each model in different domains, highlighting mT5-small’s superior domain-specific accuracy and mBART’s generalization ability.
Features #
- Fine-tuned transformer models for Nepali spelling correction
- Synthetic dataset generation for training and evaluation
- Performance comparison of multiple transformer architectures
- Deployed via HuggingFace for easy inference