1. The Core Architecture: Specialized Roles
Metronisys breaks down AI operations into distinct functional layers to ensure peak performance and reliability:
| Layer | Model Assigned | Purpose & Function | Why It’s the Best Choice |
|---|---|---|---|
| A. Reasoning & Intelligence | Llama 3.1: 8B | Primary "brain" for complex chat, RAG answering, and the ReAct loop. | High-tier reasoning balanced with local efficiency. |
| B. Intent & Validation | Qwen 2.5: 1.5B | High-speed Intent Routing, Skill Selection, and Hallucination Checking. | Ultra-fast execution for "gatekeeping" logic. |
| C. Vision & Web Nav | Qwen3-VL: 4B | Powers Object Detection, Image Description, and autonomous Browser Navigation. | Built for spatial awareness and visual element interaction. |
| D. Knowledge Retrieval | Nomic-Embed-Text | Non-chat model for RAG Vector Search and document indexing. | High-performance semantic accuracy for local data. |
| E. OCR Extraction From Documents | glm-ocr | Extract data from documents and images to json/text | Highly specialized for layout-aware text recognition in complex PDFs. |
| F. Non-Document Data Extraction | qwen2.5:1.5b | Extracts required data from web articles etc | Low latency ensures scraping workflows remain fast and iterative. |
| G. Tool Skill Selection | qwen2.5:1.5b | Determines the most suitable tool/skill for agent to use | Exceptional precision in mapping text prompts to JSON tool definitions. |
| H. Answer Validation | qwen2.5:1.5b | Is the answer relevant to the initial user query? | Acts as an objective critic without the bias of the generating model. |
| I. Hallucination Check | qwen2.5:1.5b | Determine if the agent has hallucinated (if yes, re-process task) | Maintains strict grounding by cross-referencing output with source context. |