
Bailarn
Thai NLP Library & Interactive Demonstrations
About
The Bailarn project aims to develop a Thai NLP library and web interface based on state-of-the-art deep learning techniques. With this project, users and developers can train NLP models on their own datasets using the provided model structures and utilities.
In 2017, when this project began, NLP was quite popular and worked well with major languages like English. However, for Thai—a low-resource language—the tools and community support were significantly limited. This project was created to bridge that gap.
Thai NLP libraries provide pre-trained NLP models to process Thai sentences instantly. All pre-trained models were evaluated and compared across various deep learning methods proposed in previous research.
Project Context
NLP Tasks
Explore the different NLP tasks implemented in this project. Each task includes detailed explanations and interactive demonstrations at bailarn.ammarinjtk.com.
Foundation Tasks
Tokenization
Identify word boundaries and divide inputs into meaningful units as words.
Word Embedding
Create word representations that allow words with similar meaning to have similar representations.
Named Entity Recognition
Extract and classify elements in text into pre-defined categories.
Part of Speech Tagging
Mark up words corresponding to particular parts of speech based on definition and context.
Application Tasks
Sentiment Analysis
Interpret and classify emotions (positive, negative, neutral) of given sentences.
Multi-label Text Classification
Assign pre-defined tags or categories to sentences according to their contents.
Keyword Expansion
Discover related terms and phrases to improve search and content discovery.
Resources
Acknowledgments
This project was supported by the Department of Computer Engineering, Chulalongkorn University and Thailand's National Electronics and Computer Technology Center (NECTEC).