Need Custom Training for Your Team?

Get Quote

Call Us

Toll Free (844) 397-3739

Inquire About This Course


Thumb dd545c16 fb33 47e5 ba60 1661feed335f

Jacky Ma

Jacky has worked in the finance industry for over 10 years with experience including quantitative research and FinTech management. He also led a team of engineers to build a data science driven marketing platform which serves a number of international brands. Jacky holds BSc & M.Phil in Computer Science and Engineering from The Chinese University of Hong Kong. He is now moving to apply data science (and his empathy) in an area of humanity - matchmaking.

Chinese Natural Language Processing in Practice

Instructor: Jacky Ma

Understand machine learning techniques that are specifically applied to Chinese language

  • Gain essential knowledge and practical skills to work with Chinese textual content. Understand machine learning techniques that are specifically applied to Chinese language.
  • Instructor: Data Scientist with substantive expertise in digital marketing and fintech. Led a team of engineers and built a data science driven marketing platform which serves a number of international brands. 

Course Description

Text mining is one of the prospering areas in data science that allows data scientist to work with textual contents – however, some common practices around text mining, such as stopwords and stemming, are not applicable to Chinese texts due to the difference in language structures. On the other hand, a study from InternetWorld Stats showed that Chinese Language Internet users accounted for 23.2% of the World Internet users (as of December 31, 2013), which is the second largest group of users (native English users if the largest group at 28.6%). No doubt that the business world has a strong demand on text-mining skills for Chinese texts. It is important to provide knowledge and necessary tools to extend data scientist text-mining capacity to include Chinese text contents.

What am I going to get from this course?
  • Know the basics of Chinese text structures: characters, vocabulary types, sentences
  • Understand the computer representations of Chinese text encoding and convention: Unicode, GB, HZ, Big5
  • Understand the theory for Chinese text segmentation and applying Chinese segmentation using the Jieba library

Prerequisites and Target Audience

What will students need to know or do before starting this course?
  • Basic knowledge on Python development
  • Basic knowledge on text mining
  • Knowledge on machine learning and statistics
  • Interest in learning to apply their data science skills to Chinese text documents
Who should take this course? Who should not?
This course targets data scientists who is working on natural language processing
and would like to extend into textual contents in Chinese. Students are assumed
to have basic knowledge in Python and text mining. Knowledge in Chinese
language is not a must but having interest in it will make the course easier.


Module 1: Introduction of Basic Structures of Chinese
Lecture 1 Course Overview and Objectives
Lecture 2 Chinese Grammar
Lecture 3 Traditional and Simplified Chinese
Lecture 4 Jain-Fan Conversion
Lecture 5 Chinese Vocabulary
Lecture 6 Chinese Pinyin
Lecture 7 History of Chinese Characters
Module 2: Deep Dive into Text Segmentation
Lecture 8 Chinese Text Simulation
Lecture 9 Jieba Part-of-Speech Tagging
Lecture 10 Chinese NLP in Action
Lecture 11 Jieba Text Segmenation