Topical Knowledge Maps
Project Description
Topic modeling is a form of unsupervised machine learning in the area of natural language processing. Topic modeling has many applications such as text mining, social media analytics, and information retrieval and discovery. It enables information organization and discovery of knowledge from large amounts of unstructured data.
Our goal is to create and visualize a knowledge map of existing theses and dissertations using K-State's current ETDR collection. We will build a front-end system to automatically extract, process, and visualize scientific papers, clustering them by topic. This will enable other students to discover similar papers in the same "knowledge space" and make them aware of their peers' work.
We hope to build on this work in the future by implementing several variations of topic models and visualizations, such as hierarchal topic modeling, dynamic topic models, and a version of previous Ph.D. work in our lab involving a combination of a continuous-time dynamic topic model and an online Hierarchal Dirichlet Process model.
Keywords
machine learning
, topic modeling
, information extraction
, latent dirichlet allocation
Methods
Current Project Personnel
- Wesley Baldwin
- Huichen Yang
- Timothy Tucker
- William H. Hsu - Professor, Computer Science, Kansas State University
Affiliates
- Carol Sevin, Associate Professor, KSU Libraries
Alumni
- Marissa Shivers
Data Sets
To be posted
Trello Board
Source Code
Project background and documentation (in progress)
Topic Modeling and Visualization of Scientific Theses and Dissertations
References
Background and Related Work
- De La Torre, M. F., Aguirre, C. A., Anshutz, B., & Hsu, W. (2018). MATESC: Metadata-Analytic Text Extractor and Section Classifier for Scientific Publications. Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2018): International Conference on Knowledge Discovery and Information Retrieval (KDIR 2018), Seville, Spain, September 18-20, 2018.
KDD Lab Publications
Last updated by vinnysun1 on Nov 30, 2023