LLM & AI Model Selection

Strategic deployment of specialized local intelligence layers.

Home



Metronisys LLM & AI Model Selection

Metronisys utilizes Ollama as its core local inference engine, creating a highly modular, "Human-First" AI appliance. By leveraging local LLMs, Metronisys ensures data privacy, reduces latency, and eliminates the recurring costs of cloud-based APIs.

The system doesn't rely on a single "jack-of-all-trades" model. Instead, it employs a specialized multi-model strategy, where specific LLMs are assigned to tasks that match their architectural strengths.

To improve speed we permanently pre-load the A. Reasoning & Intelligence model (currently Llama 3.1: 8B) into memory so this large model is not loading/unloading between tasks.
We can optionally do similar to the C. Vision & Web Nav model (currently Qwen3-VL: 4B), depending on how often this model is being called.

1. The Core Architecture: Specialized Roles

Metronisys breaks down AI operations into distinct functional layers to ensure peak performance and reliability:

Layer Model Assigned Purpose & Function Why It’s the Best Choice
A. Reasoning & Intelligence Llama 3.1: 8B Primary "brain" for complex chat, RAG answering, and the ReAct loop. High-tier reasoning balanced with local efficiency.
B. Intent & Validation Qwen 2.5: 1.5B High-speed Intent Routing, Skill Selection, and Hallucination Checking. Ultra-fast execution for "gatekeeping" logic.
C. Vision & Web Nav Qwen3-VL: 4B Powers Object Detection, Image Description, and autonomous Browser Navigation. Built for spatial awareness and visual element interaction.
D. Knowledge Retrieval Nomic-Embed-Text Non-chat model for RAG Vector Search and document indexing. High-performance semantic accuracy for local data.
E. OCR Extraction From Documents glm-ocr Extract data from documents and images to json/text Highly specialized for layout-aware text recognition in complex PDFs.
F. Non-Document Data Extraction qwen2.5:1.5b Extracts required data from web articles etc Low latency ensures scraping workflows remain fast and iterative.
G. Tool Skill Selection qwen2.5:1.5b Determines the most suitable tool/skill for agent to use Exceptional precision in mapping text prompts to JSON tool definitions.
H. Answer Validation qwen2.5:1.5b Is the answer relevant to the initial user query? Acts as an objective critic without the bias of the generating model.
I. Hallucination Check qwen2.5:1.5b Determine if the agent has hallucinated (if yes, re-process task) Maintains strict grounding by cross-referencing output with source context.

2. Why this Multi-Model approach is superior

  • Efficiency: By using a tiny model for routing and a medium model for complex thinking, Metronisys optimizes CPU/GPU usage without "burning" resources on simple checks.
  • Privacy: All models run on localhost:11434. No data ever leaves the local machine, keeping documents and logs strictly private.
  • Resilience: Specific timeouts and batch processing limits (e.g., 300s vision timeout) ensure the system remains stable during heavy ingestion.
  • Local Image Generation: Integration with Stable Diffusion provides a full-spectrum multimodal experience, from text reasoning to photorealistic image creation.


Return To Previous Page