KSU KDD Wiki: llm-overview

LLM Overview

LLM Setup for Inference and Finetuning

All Models downloaded from https://huggingface.co/

We have many models saved on volare: https://volare.kdd.cs.ksu.edu/llm/
Feel free to upload more models to volare
We have 5 stock dockerfiles to run these models: located here: https://volare.kdd.cs.ksu.edu/dockerfiles/dockerfilesKDD/llm/
- Models can be run anywhere, obviously they run best when utilizing GPU memory
- llama 2: Meta's flagship llm
- gpt-j: developed by EleutherAI, designed to provide a powerful open-source alternative for natural language processing tasks
- falcon: TII's flagship series of large language models
- flant5: a variant of the T5 (Text-to-Text Transfer Transformer) language model, which was developed by Google
- pythia - Another LLM designed by TII, the model set was designed to promote scientific research on large language models.
Use the Tansformers Library for Inference and Finetuning
Use the Datasets Library to handle datasets
Train your model using the TRL Library (Transformer Reinforcement Learning)
Tutorials

Deploy the model a as an API using the powerful Text Generation Inference toolkit.

Having finetuned/trained your model, you can utlize the open-source toolkit provided by Huggingface to deploy the LLM as an API.

Documentation can be found here
Quantization makes the model smaller so it can be run with less impressive hardware
- Does affect model quality
How to consume the API here
List of explicitly supported models here
Deploy on gpu2 if you want to use the LLM UI to chat with the LLM

Sign into the K-State VPN and interact with the deployed LLMs via the LLM UI

LLM UI is deployed and hosted at http://llm-service-ui.kdd.cs.ksu.edu

There is no firewall between the llm-service-ui.kdd.cs.ksu.edu server and gpu2.kdd.cs.ksu.edu. This means that if you want to be able to use the inference UI, you must serve you LLM from gpu2
Must be on the K-State Network to access http://llm-service-ui.kdd.cs.ksu.edu

Last updated by rotclanny on Apr 17, 2024