Code-Mixed Speech Recognition
Project Description
Speech recognition is defined as the ability to understand and convert human speech to readable text. Currently, there are many home assistant products that help our daily routines such as Amazon Alexa, Google Home, Apple Siri, etc. These cognitive services assistants only support monolingual speech. As the world develops and the internet grows, more and more people can speak more than one language. Some people even can speak more than three languages. For example, there was a language competition on social media in 2023 where a person can use four languages in one speech paragraph. Therefore, it is really necessary to study how to train a machine to understand Code-Mixed speech recognition.
The Code-Mixed speech can be divided into two different kinds:
- intrasentence: Switch language within a sentence.
- intersentence: Switch language sentence by sentence.
Our goal for this project aims to collect natural Code-Mixed Speech data as open source and create a model that can train to recognize these speech data.
Keywords
machine learning
, automatic speech recognition
, code-mixed
, artificial intelligence
, cognitive services
Methods
Discuss approaches and current open research problems that are part of this work. Cite third-party research as appropriate and put references to the background and related work below.
Current Team Members
- Yihong Theis - Team Leader
- Tinashe Sekabanja, Undergraduate Research Programmer
- William H. Hsu - Professor, Computer Science, Kansas State University
Affiliates
- Natasha Jacques - Ph.D. candidate, MIT
Alumni
Data Sets
We are collecting data through Youtube, Wechat channels, Facebook stories, etc. Currently, all the data are in video format and we need to preprocess the data after we finish to collect.
Trello Board
Every KDD Lab project must have a Trello Team and Trello Board, which must be private. Link to the Trello Board for the project here.
Source Code
Every KDD Lab project must have a Bitbucket repository, which may be public or private. Link to the repository or repositories for the project here.
References
Background and Related Work
- Yihong Theis Master Thesis: Learning to detect named entities in bilingual code-mixed open speech corpora
KDD Lab Publications
Use APA citation format and make sure citations are synchronized with the pages listing conference papers, journal articles, book chapters, posters, and student publications.
- De La Torre, M. F., Aguirre, C. A., Anshutz, B., & Hsu, W. (2018). MATESC: Metadata-Analytic Text Extractor and Section Classifier for Scientific Publications. Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2018): International Conference on Knowledge Discovery and Information Retrieval (KDIR 2018), Seville, Spain, September 18-20, 2018
- Yates, H., Chamberlain, B., Healey, J., & Hsu, W. (2018). Binary Classification of Arousal in Built Environments using Machine Learning. Working Notes of the 2nd International Joint Conference on Artificial Intelligence (IJCAI) Workshop on Artificial Intelligence in Affective Computing, Stockholm, Sweden, July 15, 2018.
Last updated by vinnysun1 on Nov 30, 2023