Bailarn

Thai NLP Library & Interactive Demonstrations

Open SourceResearchDeep Learning

Interactive Thai NLP demos — tokenization, NER, word embedding, and more

About

The Bailarn project aims to develop a Thai NLP library and web interface based on state-of-the-art deep learning techniques. With this project, users and developers can train NLP models on their own datasets using the provided model structures and utilities.

In 2017, when this project began, NLP was quite popular and worked well with major languages like English. However, for Thai—a low-resource language—the tools and community support were significantly limited. This project was created to bridge that gap.

Thai NLP libraries provide pre-trained NLP models to process Thai sentences instantly. All pre-trained models were evaluated and compared across various deep learning methods proposed in previous research.

Project Context

Timeline

2017 - 2018

Institution

Chulalongkorn University

Degree

Bachelor (Computer Engineering)

Support

NECTEC

NLP Tasks

Explore the different NLP tasks implemented in this project. Each task includes detailed explanations and interactive demonstrations at bailarn.ammarinjtk.com.

Foundation Tasks

Tokenization

Identify word boundaries and divide inputs into meaningful units as words.

Live ↗

Word Embedding

Create word representations that allow words with similar meaning to have similar representations.

Live ↗

Named Entity Recognition

Extract and classify elements in text into pre-defined categories.

Live ↗

Part of Speech Tagging

Mark up words corresponding to particular parts of speech based on definition and context.

Pending

Application Tasks

Sentiment Analysis

Interpret and classify emotions (positive, negative, neutral) of given sentences.

Pending

Multi-label Text Classification

Assign pre-defined tags or categories to sentences according to their contents.

Pending

Keyword Expansion

Discover related terms and phrases to improve search and content discovery.

Pending

Resources

Live Demo

bailarn.ammarinjtk.com — interactive NLP demos

Bailarn Python Library

View source code and documentation

Website Source Code

React 16 + TensorFlow.js in-browser inference

Acknowledgments

This project was supported by the Department of Computer Engineering, Chulalongkorn University and Thailand's National Electronics and Computer Technology Center (NECTEC).