Package LLM Inference Libraries
Description of the project: Package Large Language Model (LLM) inference libraries, in particular vLLM. It is needless to explain how LLMs are important. Currently, in the Debian archive, we only have ?PyTorch, but downstream applications are still missing. One of the most promising downstream applications is LLM inference. There are already people working on llama.cpp and Ollama, but vLLM still lacks lots of dependencies to land onto Debian. For multi-GPU inference and concurrency, vLLM has its advantages over llama.cpp. The missing packages are, for instance, transformers, huggingface-hub, etc. We would like to trim the dependency tree a little bit at the beginning until we get a minimum working instance of vLLM. Such, this project involves the Debian packaging work for vLLM and its dependencies that are missing from Debian, as well as fixing issues (if there is any) in existing packages to make vLLM work.
Confirmed Mentor: Mo Zhou
How to contact the mentor: lumin@debian.org
Confirmed co-mentors: Christian Kastner (ckk@debian.org), Xuanteng Huang (xuanteng.huang@outlook.com). On the other hand, Debian Deep Learning Team (debian-ai@lists.debian.org) could offer help.
Difficulty level: Medium (There might be some hard bits. Some packages that we are going to deal with have a clearly above-average difficulty than general Debian packages.
Project size: 350 hour (large). I get this rough estimate by looking at the pipdeptree of the vllm package. The tree is a little deep.
Deliverables of the project: Eventually I hope we can make vLLM into Debian archive, based on which we can deliver something for LLM inference out-of-the-box. If the amount of work eventually turns to be beyond my expectation, I'm still happy to see how far we can go towards this goal. If the amount of work required for vLLM is less than I expected, we can also look at something else like SGLang, another open source LLM inference library.
Desirable skills: Long term Linux user (familiarity with Debian family is preferred), Python, ?PyTorch, and experience of running Large Language Models locally.
What the intern will learn: Through this project, the intern will learn about the Debian development process, and gain more experience of running LLMs locally, including the inference performance tuning.
Application tasks: Analyze how ?PyTorch is packaged in Debian, including how the CUDA variant of ?PyTorch is prepared. Those details are very important for the whole reverse dependency tree. And, the intern also needs to setup vLLM locally using pip or uv, and run the LLM inference locally for reference.
Related projects: The ?PyTorch packaging repository is here: https://salsa.debian.org/deeplearning-team/pytorch