Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Any way to borrow compute from Apple M1
49 points by 2Gkashmiri 8 months ago | hide | past | favorite | 45 comments
Hi.

i have a friend who owns an m1 max. i would like to "borrow" his gpu for llama 3 or SD. is there a way for me to use his compute when idle ? i do not want to remote into his machine, an easy local api would be fine (i could tailscale/zerotier) and then get the api that way.




For llama3 just ask him to install ollama and serve the model. Ollama has auto memory management and will free the model when not used, and whenever you make a call to the API (do let your friend know before you do this) ollama will reload the model back to memory again.

Not sure whether there are anything similar for SD though.


This, plus connect via Tailscale and you can access it from anywhere (assuming you're friends laptop is online).


There are tons of options - https://github.com/anderspitman/awesome-tunneling. I will advocate for zrok.io as I work on its parent project, OpenZiti. zrok is open source and has a free SaaS.


[zrok](https://zrok.io/), an alternative to ngrok does access management too. It's like tailscale but can give access to a specific service.


> i do not want to remote into his machine

> tailscale/zerotier

Same thing isn't it?

In any case it wouldn't be hard for you to just have an account on his machine, tailscale being perhaps the simplest setup. SSH in, cook his laptop at your leisure.


With Tailscale you could access the port serving the models API (assumedly via ollama or similar) so the friend wouldn't have to grant any access beyond that.


I had to do this fairly recently to make krita-diffusion available for my friends and family who don't have a 3090ti laying around. Probably the simplest way would be to run a local http service on your friend's M1 that is ssh tunneled to a server that you'll access over http. On the server you'll need to reverse-proxy the tunneled port to a public address and port.

You make http requests to the shared server, those get proxied via the ssh tunnel to his machine, and the client on his machine could make the determination when/whether to run the workload.


M1 can’t really handle SD the inference times are closer to a minute and with SDXL you can feel the machine straining under it, battery depletes quick and the machine often completely freezes up for a second if you’re trying to do other things at the same time (M1 Max 32gb).

Think you’d be way better off just paying for a service designed for this or renting a GPU from a service set up for this cost won’t be that significant.


Use Ollama's api


Or OpenWebUI over it if you want acceptable ui experience.


cool


This. Works fine. The M1 can run most small models (phi3, gemma, etc.) at usable speeds even with just 8GB of RAM.


This! Tailscale plus Ollama API will definitely do the job


An off-topic question, are Apple's M-series chips any good at current AI/ML work? How does it compare with dedicated GPUs?


I have an M3 chip in my laptop, it has more memory than my 4090 but it's still way slower when inferencing. So as long as the model fits in memory, Nvidia GPUs are going to be way faster just because they have more/faster compute cores.

Of course, if the model fits in memory on your M chip and doesn't in your Nvidia chip, the M chip wins by default. However, I would say, if you load a 70B model in your M chip, while it WILL work, the tokens/sec will be slow as fuck... so it kinda doesn't matter anyway.


The latest Nvidia drivers offer an option to start using system memory when VRAM is insufficient. It certainly slows things down, but it does work. It's not perfect in my experience, but it is an alternative for large models.


Sounds great! Do you have source maybe a wiki?



How does the inference for LLMs impact battery life? For SD it can be 5% battery per image at times.


Mid-tier gaming GPU performance, but (potentially) access to gobs of memory (if running on a host with gobs of memory) owing to the unified memory design. For certain use cases which require loading huge datasets but don't necessarily require massive compute (i.e. inference on large models) they can be cost-competitive relative to something like an H100.


Compared to a GPU like a 4090 with equal vram it probably won't fare well as others point out, however it far outperforms any CPU. On an M1 Ultra MacBook Pro I was seeing like 40 tokens/second with llama3:7B vs 9 tokens/second on various Intel servers/desktops with sufficient ram.


I've only tried it on my M1, running Llama-3 via Ollama. It works, but it's slow to the point where it's not really usable. Maybe there are smaller models you can run that will perform better.


What size model did you try and how much memory does your M1 have? See my other comment, my experience has been that llama3 was very fast on an M1.


I was just running: ollama run llama3, So that would be 8B parameters, on a 8GB M1 Air.

Maybe it's just my expectations, but it seems rather slow to process queries. Depending on the prompt somewhere between the 10 - 40 tokens per second, but that very much depends on the prompt.

My complaint is the time between the prompt is entered and output starts.


If apple offers these in a data center that’s open access it’s game over for NVDA


Based on which fantasy premise?


The premise where these are readily available for mass purchase, have a hardware and software stack that already works reliably, and have a lower energy footprint than other offerings

and somewhat competitive on cost, but that wont be the main selling point, just availability


For the purposes we're discussing, they're nowhere near competitive with Nvidia.


lower energy but at least they're slow? You need to add up to the same performance level and then consider the cost and running cost. I bet it's not close on that.

Availability maybe, but as you've noted - zero availability for data center environments. Those volumes would also then fall onto TSMC/Samsung/Whatever where Nvidia is stuck as well.


I’m aware Apple is also beholden to TSMC’s capacity too

sadly


They aren't even on the same league, computing-wise.


This was bouncing around the last few days, if you have a few devices as well as the M1 (though i'm not sure it able to work over the internet as opposed to a local network): https://github.com/exo-explore/exo

Otherwise set up Ollama's API


You mean the llama.cpp server API right? Ollama keeps taking credit for things they put a thin wrapper around, and it's seriously annoying.


I don't, in the same way I don't say that i'm taking an internal combustion engine bus to get somewhere - what powers the bus is not relevant to the solution


What’s the llama.cpp CLI equivalent of `ollama pull`?


wget <huggingface link>?


I feel like Ollama just came out and now y'all are doing model based laptop resource sharing.

Should I take this as an indicator that embedded GenAI is moving quite quickly?

(Also just wanted to say I find this thread incredibly cool generally, some very interesting stuff going on!!! :D )


The entire point of embedded models is that you can run them locally. If it'll anyways take an internet roundtrip then what's the point of connecting to your friend's laptop over a cloud GPU or a managed service like ChatGPT-4o?


Presumably a cloud GPU is not $0 ?


It won't cost actual money?


If you are in the same network try https://pinokio.computer


With "borrow" in scare quotes, do you intend for him to be aware of his generosity?


Probably just because in the full phrasing, ``i would like to "borrow" his gpu``, omitting the scare quotes would paint the picture of their friend unplugging/unsoldering their GPU and lending it to the author.


My friend Kevin was once building a graphics engine, and to test it on various GPUs he’d borrow them from Best Buy and return them within the 30 day window. Since there was a restocking fee, it seemed like a non-harmful and clever way to test a bunch of configurations.


so Kevin is why we can't have nice things....




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: