I’m doing a lot of coding and what I would ideally like to have is a long context model (128k tokens) that I can use to throw in my whole codebase.

I’ve been experimenting e.g. with Claude and what usually works well is to attach e.g. the whole architecture of a CRUD app along with the most recent docs of the framework I’m using and it’s okay for menial tasks. But I am very uncomfortable sending any kind of data to these providers.

Unfortunately I don’t have a lot of space so I can’t build a proper desktop. My options are either renting out a VPS or going for something small like a MacStudio. I know speeds aren’t great, but I was wondering if using e.g. RAG for documentation could help me get decent speeds.

I’ve read that especially on larger contexts Macs become very slow. I’m not very convinced but I could get a new one probably at 50% off as a business expense, so the Apple tax isn’t as much an issue as the concern about speed.

Any ideas? Are there other mini pcs available that could have better architecture? Tried researching but couldn’t find a lot

Edit: I found some stats on GitHub on different models: https://github.com/ggerganov/llama.cpp/issues/10444

Based on that I also conclude that you’re gonna wait forever if you work with a large codebase.

  • just_another_person@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    2
    ·
    edit-2
    4 days ago

    I’ve not run such things on Apple hardware, so can’t speak to the functionality, but you’d definitely be able to do it cheaper with PC hardware.

    The problem with this kind of setup is going to be heat. There are definitely cheaper minipcs, but I wouldn’t think they have the space for this much memory AND a GPU, so you’d be looking for an AMD APU/NPU combo maybe. You could easily build something about the size of a game console that does this for maybe $1.5k.

    • awesomesauce309@midwest.social
      link
      fedilink
      English
      arrow-up
      10
      ·
      4 days ago

      For context length, vram is important, you can’t break contexts across memory pools so it would be limited to maybe 16gb. With m series you can have a lot more space since ram/vram are the same, but its ram at apple prices. You can get a +24gb setup way cheaper than some nvidia server card though

      • shaserlark@sh.itjust.worksOP
        link
        fedilink
        English
        arrow-up
        4
        ·
        4 days ago

        Yeah the VRAM of Mac M series is very attractive for running models at full context length and the memory bandwidth is quite good for token generation compared to the price, power consumption and heat generation of NVidia GPUs.

        Since I’ll have to put this in my kitchen/living room that’d be a big plus but idk how well prompt processing would work if I send over like 80k tokens.

    • BorgDrone@lemmy.one
      link
      fedilink
      English
      arrow-up
      3
      ·
      3 days ago

      you’d definitely be able to do it cheaper with PC hardware.

      You can get a GPU with 192GB VRAM for less than a Mac? Sign me up please.

    • shaserlark@sh.itjust.worksOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      4 days ago

      I’d honestly be open for that but would an AMD setup not take up a lot of space and consume lots of power / be loud?

      It seems like in terms of price & speed, the Macs suck compared to other options, but if you don’t have a lot of space and don’t want to hear an airplane engine constantly I’m wondering if there are options.

      • just_another_person@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        4
        ·
        edit-2
        3 days ago

        I just looked, and the MM maxes out at 24G anyway. Not sure where you got the thought of 196GB at. NVM you said m2 ultra

        Look, you have two choices. Just pick one. Whichever is more cost effective and works for you is the winner. Talking it down to the Nth degree here isn’t going to help you with the actual barriers to entry you’ve put in place.