Running Qwen 3 Coder for Free: The Future of AI is Here

TLDR;

This video explores the possibility of running a powerful AI coding model, specifically Alibaba's Qwen 3 Coder, locally on a personal machine as a replacement for paid cloud-based AI assistants. The experiment involves using LM Studio to manage the model, overcoming connectivity issues with Cursor using Cline extension, and testing the AI on real coding tasks. The conclusion weighs the benefits of privacy and cost against the performance limitations of local hardware.

LM Studio simplifies the process of downloading and running open-source language models.
Cline extension provides a seamless connection between local AI models and code editors like VS Code and Cursor.
Running AI models locally offers enhanced privacy and control over your data.

Intro: Can You Run a Powerful AI Locally? [0:00]

The video introduces the idea of running Alibaba's Qwen 3 Coder model locally on a machine, specifically an M4 Max MacBook with 48GB of RAM and 40 GPU cores. The goal is to determine if a locally run AI model can replace paid cloud AI assistants, like the one built into Cursor, offering a private and powerful coding partner. The video will walk through the entire process, including successes, roadblocks, and solutions.

The Tool: Setting Up Qwen 3 with LM Studio [0:53]

To run the 32-billion parameter model locally, LM Studio is used. LM Studio is a desktop application that simplifies the discovery, download, and running of open-source large language models. The Qwen 3 Coder model is downloaded within LM Studio, and a quantized version is selected to ensure compatibility with consumer hardware. After downloading, the model is loaded into memory, and the context window is extended to its maximum. The local API server is started in LM Studio, creating a local web address for other applications to connect to.

The Problem: Why Cursor Won't Connect to Localhost [2:00]

The goal was to integrate the local model into Cursor, the primary code editor. The local server address from LM Studio is pasted into Cursor's settings, along with a dummy API key. However, Cursor fails to connect, and after investigation, it's discovered that Cursor requires an external address to verify the API server, which doesn't work with a simple localhost connection.

The Unstable Workaround: Using Ngrok [2:34]

As a workaround, Ngrok is used to create a public internet URL that tunnels back to the local machine. After installing and configuring Ngrok, a public URL is obtained for the local server. Cursor connects, and autocompletion starts working, but it's unclear if the responses are from the local Qwen Coder model or Cursor's default cloud models. The inline editor produces inconsistent responses, and the agent feature fails. This setup is deemed unstable and not a real solution.

The Real Solution: The Cline Extension [3:20]

The Cline extension, compatible with VS Code, is installed. Cline has first-class support for local models. LM Studio is selected from a dropdown list, and Cline automatically fills in the correct local server address and settings, without requiring the Ngrok public URL.

The Test: A Real Coding Task for the Local AI [4:01]

With Cline and the Qwen 3 model connected, a real coding task is performed using the code for a Slack bot, The Gray Cat. The speed is slower than paid cloud tools, but the quality of the answers is good for a locally running model. The model is tasked with modifying the bot to generate images of different sizes. The model initially suggests outdated image sizes, but quickly corrects the code when the error is pointed out. It successfully implements a feature to add a --square flag for 1024x1024 image size. The coding ability is impressive.

The Verdict: Is It Worth It? (Privacy vs. Performance) [5:14]

Running Qwen 3 Coder locally is a trade-off. The main advantage is ownership and privacy, as the code never leaves the machine. A free local model can be intelligent and useful for coding tasks. However, the hardware requirements are a consideration. While the 32-billion model can run on a system with 48GB of RAM, it is slower than the cloud. Larger models would require more powerful hardware. The Cline extension simplifies the setup process.

Summary of the Experiment [6:14]

The experiment successfully demonstrates how to use LM Studio to download and serve the Qwen 3 Coder model. The initial attempt to connect directly to Cursor faced limitations, leading to the discovery of the Ngrok workaround and the better solution of the Cline extension. The quality of the AI was good, although the speed could be improved. VRAM usage increased during testing, requiring the closure of other applications. The Qwen 3 Coder will be used more in daily workflow. The future of AI involves running it on personal machines.

Watch the Video

Date: 8/23/2025 Source: www.youtube.com