Wiki Contents

Topical Knowledge Maps


Project Description

Topic modeling is a form of unsupervised machine learning in the area of natural language processing. Topic modeling has many applications such as text mining, social media analytics, and information retrieval and discovery. It enables information organization and discovery of knowledge from large amounts of unstructured data.

Our goal is to create and visualize a knowledge map of existing theses and dissertations using K-State's current ETDR collection. We will build a front-end system to automatically extract, process, and visualize scientific papers, clustering them by topic. This will enable other students to discover similar papers in the same "knowledge space" and make them aware of their peers' work.

We hope to build on this work in the future by implementing several variations of topic models and visualizations, such as hierarchal topic modeling, dynamic topic models, and a version of previous Ph.D. work in our lab involving a combination of a continuous-time dynamic topic model and an online Hierarchal Dirichlet Process model.

Keywords

machine learning, topic modeling, information extraction, latent dirichlet allocation

Methods

Current Project Personnel

  • Wesley Baldwin
  • Huichen Yang
  • Timothy Tucker
  • William H. Hsu - Professor, Computer Science, Kansas State University

Affiliates

Alumni

  • Marissa Shivers

Data Sets

To be posted

Trello Board

Source Code

Project background and documentation (in progress)

Topic Modeling and Visualization of Scientific Theses and Dissertations

References

  • De La Torre, M. F., Aguirre, C. A., Anshutz, B., & Hsu, W. (2018). MATESC: Metadata-Analytic Text Extractor and Section Classifier for Scientific Publications. Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2018): International Conference on Knowledge Discovery and Information Retrieval (KDIR 2018), Seville, Spain, September 18-20, 2018.

KDD Lab Publications

Last updated by vinnysun1 on Nov 30, 2023