You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So I decided to build my own AI server. Why? Well, a few reasons.
10
12
1. As part of de-googling and trying to host things away from big corp servers, this seems like the next step as we start using them more for AI services. Keeping data close to you, especially if you start talking to LLMs is worth it, IF you have the right security.
11
13
2. Cost-wise you can't beat the price for the larger models in the cloud (at the moment). But strangely you can do some things that are quite expensive for their cloud versions (see more below)
12
14
3. Plus, who doesn't love a good hardware project? 🤖
The goal was simple: build a powerful AI inference server that could handle local LLM serving, fine-tuning experiments, and general ML workloads without breaking the bank.
After plenty of research and part swapping, here's the final configuration:
@@ -91,13 +93,12 @@ After these steps, the system finally booted up! But I discovered another issue
91
93
92
94
Fortunately, I was able to resolve this by entering the BIOS and disabling the monitor connection requirement. This would allow the server to boot without a display attached, which was crucial for my intended use case.
After installing the base Ubuntu Server, the first critical step was installing the NVIDIA drivers natively on the OS. This is absolutely essential before attempting to set up any Docker containers, as the containers need to access the GPU through the host system's drivers.
@@ -206,7 +205,7 @@ volumes:
206
205
207
206
What I love about Open WebUI is how it offers a ChatGPT-like interface but for my local models. The setup was quick, and it automatically discovered my Ollama instance thanks to the container networking.
This thing is hungry. According to my Netdata dashboard, the GPU alone can draw over 400W under full load when running the larger models. The RTX 3090 is definitely power-hungry, but considering the computational work it's doing, it's impressively efficient compared to running these workloads in the cloud.
248
245
249
246
I've found that different models have different power profiles. For instance, I'm using Granite specifically for meeting summarization tasks, and it offers a nice balance between power consumption and capability for this specific use case.
So far, temperatures are looking good! The combination of large case fans and the substantial cooling solution on the RTX 3090 itself is handling the thermal output effectively. Since the AI workloads I'm running tend to be bursty rather than sustained, the system has plenty of time to cool down between intensive operations.
256
253
@@ -304,7 +301,7 @@ This is definitely a topic I'll revisit in a future blog post. The ethical consi
304
301
* Local AI is incredibly satisfying when it works
305
302
306
303
Here's a closer look at those 180° power adapters that saved the day:
0 commit comments