@theunknownmuncher

theunknownmuncher@lemmy.world · 15 days ago

You can overwrite the model by using the same name instead of creating one with a new name if it bothers you. Either way there is no duplication of the llm model file

theunknownmuncher@lemmy.world · 15 days ago

What I am talking about is when layers are split across GPUs. I guess this is loading the full model into each GPU to parallelize layers and do batching

theunknownmuncher@lemmy.world · edit-2 15 days ago

Can you try setting the num_ctx and num_predict using a Modelfile with ollama? https://github.com/ollama/ollama/blob/main/docs/modelfile.md#parameter

theunknownmuncher@lemmy.world · edit-2 15 days ago

Are you using a tiny model (1.5B-7B parameters)? ollama pulls 4bit quant by default. It looks like vllm does not used quantized models by default so this is likely the difference. Tiny models are impacted more by quantization

I have no problems with changing num_ctx or num_predict

theunknownmuncher@lemmy.world · 16 days ago

Models are computed sequentially (the output of each layer is the input into the next layer in the sequence) so more GPUs do not offer any kind of performance benefit

theunknownmuncher@lemmy.world · edit-2 16 days ago

Ummm… did you try /set parameter num_ctx # and /set parameter num_predict #? Are you using a model that actually supports the context length that you desire…?

theunknownmuncher@lemmy.world · 1 month ago

My guess is an x86 32bit machine

theunknownmuncher@lemmy.world · 1 month ago

4690k was solid! Mine is retired, though. Now I selfhost on ARM

theunknownmuncher@lemmy.world · 2 months ago

theunknownmuncher@lemmy.world · edit-2 3 months ago

Well, configuring the kernel is where things get tricky and is the major difference between the Gentoo and Arch installation, so that makes sense.

To be annoyingly pendantic, you did get past partitioning the drivea, then!

theunknownmuncher@lemmy.world · 3 months ago

As a Gentoo user who has used Arch in the past, I have no clue what problems this commenter could have run into because paritioning the drives is exactly the same for both distributions… if they were able to figure it out for Arch, then they can do it for Gentoo

theunknownmuncher@lemmy.world · 3 months ago

…paritioning the drives is exactly the same for Arch as it is Gentoo lol if you did it for Arch, why can’t you do it for Gentoo?