tehnomad@lemm.eetoSelfhosted@lemmy.world•Using Mac M2 Ultra 192GB to Self-Host LLMs?English
1·
2 days agoI found a VRAM calculator for LLMs here: https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Wow it seems like for 128K context size you do need a lot of VRAM (~55 GB). Qwen 72B will take up ~39 GB so you would either need 4x 24GB Nvidia cards or the Mac Pro 192 GB RAM. Probably the cheapest option would be to deploy GPU instances on a service like Runpod. I think you would have to do a lot of processing before you get to the breakeven point of your own machine.
One thing I would do differently is setup LDAP and OIDC so you can use the same authentication credentials for different apps (at least the ones that support them). I use LLDAP and Authelia for this purpose.