LLM Overview
LLM Setup for Inference and Finetuning
All Models downloaded from https://huggingface.co/
- We have many models saved on volare:
https://volare.kdd.cs.ksu.edu/llm/
- Feel free to upload more models to volare
- We have 5 stock dockerfiles to run these models: located here:
https://volare.kdd.cs.ksu.edu/dockerfiles/dockerfilesKDD/llm/
- Models can be run anywhere, obviously they run best when utilizing GPU memory
- llama 2: Meta's flagship llm
- gpt-j: developed by EleutherAI, designed to provide a powerful open-source alternative for natural language processing tasks
- falcon: TII's flagship series of large language models
- flant5: a variant of the T5 (Text-to-Text Transfer Transformer) language model, which was developed by Google
- pythia - Another LLM designed by TII, the model set was designed to promote scientific research on large language models.
- Use the Tansformers Library for Inference and Finetuning
- Use the Datasets Library to handle datasets
- Train your model using the TRL Library (Transformer Reinforcement Learning)
- Tutorials
Deploy the model a as an API using the powerful Text Generation Inference toolkit.
Having finetuned/trained your model, you can utlize the open-source toolkit provided by Huggingface to deploy the LLM as an API.
- Documentation can be found here
- Quantization makes the model smaller so it can be run with less impressive hardware
- Does affect model quality
- How to consume the API here
- List of explicitly supported models here
- Deploy on
gpu2
if you want to use the LLM UI to chat with the LLM
Sign into the K-State VPN and interact with the deployed LLMs via the LLM UI
LLM UI is deployed and hosted at http://llm-service-ui.kdd.cs.ksu.edu
- There is no firewall between the
llm-service-ui.kdd.cs.ksu.edu
server andgpu2.kdd.cs.ksu.edu
. This means that if you want to be able to use the inference UI, you must serve you LLM fromgpu2
- Must be on the K-State Network to access http://llm-service-ui.kdd.cs.ksu.edu
Last updated by rotclanny on Apr 17, 2024