Augment Reality LLM Training System for Complex Machine Operations
Our AR-MLLM system integrates augmented reality with multimodal language models (e.g., ChatGPT) for context-aware guidance and activity recognition in complex machine tasks reducing errors and training time for non-experts.
Key Components
- Context-Aware Guidance: Converts technical manuals and machine feedback into step-by-step AR overlays anchored to physical equipment using TARCO prompt framework for precise, deterministic outputs.
- Hardware: Microsoft HoloLens 2 for immersive AR, Unity engine for development, Azure OpenAI for MLLM integration, with real-time image capture and spatial anchoring.
- Algorithm: Dual validation (heuristic for image quality, semantic via MLLM) processes inputs, extracts features/actions, and activates prefabs for interactive training.
- Simple Operation: Capture images of manuals or machine states, system interprets via prompts, renders guidance; supports tasks like stylus qualification and feature measurement.
Benefits
- Enhances efficiency with reduced workload and faster task completion.
- Improves accuracy in activity recognition and measurements.
- Adaptable for broader industrial applications, minimizing supervision needs.