Metronisys LLM & AI Model Selection

Metronisys utilizes Ollama as its core local inference engine, creating a highly modular, "Human-First" AI appliance. By leveraging local LLMs, Metronisys ensures data privacy, reduces latency, and eliminates the recurring costs of cloud-based APIs.

The system doesn't rely on a single "jack-of-all-trades" model. Instead, it employs a specialized multi-model strategy, where specific LLMs are assigned to tasks that match their architectural strengths.

To improve speed we permanently pre-load the A. Reasoning & Intelligence model (currently Llama 3.1: 8B) into memory so this large model is not loading/unloading between tasks.
We can optionally do similar to the C. Vision & Web Nav model (currently Qwen3-VL: 4B), depending on how often this model is being called.

1. The Core Architecture: Specialized Roles

Metronisys breaks down AI operations into distinct functional layers to ensure peak performance and reliability:

Layer	Model Assigned	Purpose & Function	Why It’s the Best Choice
A. Reasoning & Intelligence	Llama 3.1: 8B	Primary "brain" for complex chat, RAG answering, and the ReAct loop.	High-tier reasoning balanced with local efficiency.
B. Intent & Validation	Qwen 2.5: 1.5B	High-speed Intent Routing, Skill Selection, and Hallucination Checking.	Ultra-fast execution for "gatekeeping" logic.
C. Vision & Web Nav	Qwen3-VL: 4B	Powers Object Detection, Image Description, and autonomous Browser Navigation.	Built for spatial awareness and visual element interaction.
D. Knowledge Retrieval	Nomic-Embed-Text	Non-chat model for RAG Vector Search and document indexing.	High-performance semantic accuracy for local data.
E. OCR Extraction From Documents	glm-ocr	Extract data from documents and images to json/text	Highly specialized for layout-aware text recognition in complex PDFs.
F. Non-Document Data Extraction	qwen2.5:1.5b	Extracts required data from web articles etc	Low latency ensures scraping workflows remain fast and iterative.
G. Tool Skill Selection	qwen2.5:1.5b	Determines the most suitable tool/skill for agent to use	Exceptional precision in mapping text prompts to JSON tool definitions.
H. Answer Validation	qwen2.5:1.5b	Is the answer relevant to the initial user query?	Acts as an objective critic without the bias of the generating model.
I. Hallucination Check	qwen2.5:1.5b	Determine if the agent has hallucinated (if yes, re-process task)	Maintains strict grounding by cross-referencing output with source context.

2. Why this Multi-Model approach is superior

Efficiency: By using a tiny model for routing and a medium model for complex thinking, Metronisys optimizes CPU/GPU usage without "burning" resources on simple checks.
Privacy: All models run on localhost:11434. No data ever leaves the local machine, keeping documents and logs strictly private.
Resilience: Specific timeouts and batch processing limits (e.g., 300s vision timeout) ensure the system remains stable during heavy ingestion.
Local Image Generation: Integration with Stable Diffusion provides a full-spectrum multimodal experience, from text reasoning to photorealistic image creation.