A small breakthrough for running local LLMs

June 15, 2025

Since the release of ChatGPT 3 back in January 2023, I was always interested in how the future would look when it comes to local LLMs. I think we have achieved it. at least somewhat.

Especially since the release of DeepSeek and OpenAI not being that dominant anymore, I would even suggest that people with enough hardware should run local LLMs.

I had left the local LLM niche for a while, but since Gemma3 was released, I put it on my workstation and even on my Thinkpad X12. It was really good for answering some simple questions, summarizing texts, or using it as a little helper for general tasks. Sure, it does not come near the big LLMs such as Claude, ChatGPT, and so on, but heck, back in 2023 we were not even able to get a single sentence out of them that made sense. Now they are able to replace searching Google. But still, I was not able to control Gemma to make it run some code only when needed. So I left that topic behind again.

But now, being in my last semester, I had the opportunity to dive into any topic I found interesting. And oh man, I dove into it.

So to keep it short: The developers who also release the Qwen LLM just made it really easy to dock ollama to your Python code and write some functions.

You can find it here: GitHub - QwenLM/Qwen-Agent: Agent framework and applications built upon Qwen

And here is a small snippet of how I embedded it into my university project. I built this RSS agent that can actually understand when someone asks about RSS feeds and then automatically scrapes them. The cool part is that the LLM doesn't just tell you "yeah, you should check that RSS feed" - it actually calls my RSS parsing function, fetches all the entries, and gives you back a structured JSON response with titles, links, and summaries. It's integrated into a FastAPI backend with session management, so each user gets their own persistent chat agent that remembers the conversation. The whole thing runs locally with Ollama, so no API costs and full privacy. What used to require manual parsing and multiple steps now just works by asking "hey, can you check this RSS feed for me?" in natural language.

import json5
from qwen_agent.agents import Assistant
from qwen_agent.utils.output_beautify import typewriter_print
from typing import List, Dict, Any
from qwen_agent.tools.base import BaseTool, register_tool
import feedparser


def rss(feed_url: str):
    print("got following feed url:", feed_url)
    d = feedparser.parse(feed_url)
    print(d)
    title = d.items()
    collect_entries: List[Dict[str, str]] = [{}]
    for key, tup in enumerate(title):
        # print(value)
        if len(tup) > 1:
            if tup[0] == "entries":
                entries = tup[1]
                for entry in entries:
                    title = entry["title"]
                    link = entry["link"]
                    summary = entry["summary"]
                    found_entry: Dict[str, str] = {
                        "title": title,
                        "link": link,
                        "summary": summary,
                    }
                    collect_entries.append(found_entry)
    return collect_entries


@register_tool("rss_tool")
class RSSAgent(BaseTool):
    description = "RSS Feed scraper. Die Eingabe eines RSS Feed URL wird benoetigt. Anschliessend werden dann alle Beitraege des RSS Feeds in einer JSON Struktur zurueckgegeben"
    parameters = [
        {
            "name": "rss_link",
            "type": "string",
            "description": "Der Link des RSS Feeds",
            "required": True,
        }
    ]

    def call(self, params: str, **kwargs) -> str:
        # Verwende die geerbte Methode zur Parameter-Validierung
        params_dict = self._verify_json_format_args(params)
        rss_link = params_dict["rss_link"]
        print(f"RSS Link: {rss_link}")

        rss_feed = rss(rss_link)
        print(rss_feed)
        # Konvertiere das Ergebnis zu einem JSON-String
        return json5.dumps(rss_feed, ensure_ascii=False, indent=2)


class ChatAgent:
    """Basis-Agent-Klasse für Chat-Funktionalität mit History-Unterstützung"""

    def __init__(
        self,
        llm_cfg: Dict[str, Any],
        system_message: str = None,
        tools: List[str] = None,
        files: List[str] = None,
    ):
        self.llm_cfg = llm_cfg
        self.system_message = system_message or "Du bist ein hilfreicher Assistent."
        self.tools = tools or []
        self.files = files or []
        self.chat_history: List[Dict[str, str]] = []

        # Erstelle den Assistant
        self.bot = Assistant(
            llm=self.llm_cfg,
            system_message=self.system_message,
            function_list=self.tools,
            files=self.files,
        )

    def send_message(self, message: str) -> str:
        """Sendet eine Nachricht an den Agenten und gibt die Antwort zurück"""
        # Füge die Benutzernachricht zur History hinzu
        self.chat_history.append({"role": "user", "content": message})

        response_text = ""
        print("Bot Response:")

        for response in self.bot.run(messages=self.chat_history):
            response_text = typewriter_print(response, response_text)

        # Füge die Bot-Antworten zur History hinzu
        self.chat_history.extend(response)

        return response_text

    def get_chat_history(self) -> List[Dict[str, str]]:
        """Gibt die komplette Chat-History zurück"""
        return self.chat_history

    def clear_history(self):
        """Löscht die Chat-History"""
        self.chat_history = []

    def get_last_bot_response(self) -> str:
        """Gibt die letzte Bot-Antwort zurück (nützlich für API-Integration)"""
        for message in reversed(self.chat_history):
            if message.get("role") == "assistant":
                return message.get("content", "")
        return ""


# LLM-Konfiguration
llm_cfg = {
    "model": "qwen3:8b",
    "model_server": "http://localhost:11434/v1",  # Ollama
    "api_key": "EMPTY",
}


def create_base_agent() -> ChatAgent:
    """Erstellt einen einfachen Chat-Agenten"""
    system_message = "Antworte immer auf deutsch. Falls der Nutzer nach etwas RSS relevantem fragt, so nutze die rss_tool function. Andernfalls assistiere den Nutzer"
    return ChatAgent(llm_cfg=llm_cfg, system_message=system_message, tools=["rss_tool"])


# Für FastAPI-Integration
class ChatAgentManager:
    """Manager-Klasse für die Integration in FastAPI"""

    def __init__(self):
        self.agents: Dict[str, ChatAgent] = {}

    def get_or_create_agent(self, session_id: str) -> ChatAgent:
        """Holt oder erstellt einen Agenten für eine Session"""
        if session_id not in self.agents:
            self.agents[session_id] = create_base_agent()
        return self.agents[session_id]

    def send_message(self, session_id: str, message: str) -> str:
        """Sendet eine Nachricht an den Agenten einer bestimmten Session"""
        agent = self.get_or_create_agent(session_id)
        agent.send_message(message)
        return agent.get_last_bot_response()