Why can't you see your agent? Can your agent see you?

Agents can't see what we can unless we share the same space. Robotics isn't here yet — and won't be for a while. Until then, embodied AI, agentic avatars, AI NPCs, or "meta humans" (there are so many names!) is going to be the next frontier for agents.

Right now your AI assistant lives inside a text box. It processes words, returns words, and has no idea what's happening around you. You can describe a scene to it, but it cannot observe one. You can tell it you're frustrated, but it cannot read your face. The interface is one-directional by default.

The missing layer

Vision models exist. Voice models exist. Emotion-detection models exist. What doesn't exist yet is a coherent, local-first stack that ties them together into something that actually lives with you — an agent that can look through a camera, hear your voice, watch your screen, and respond with the full context of your environment.

That gap is exactly what we're working on at Miniloader.

Embodied AI isn't just robotics

When most people think about embodied AI they think about robot arms and delivery drones. That's one path. But the more immediate path is software-native embodiment: an AI agent that has eyes through your webcam, ears through your microphone, and presence through an animated interface — something that can see you wave hello and say hi back.

The image at the top of this post is an early prototype. An agent with a visual presence, a speech bubble, real-time speech recognition, and a live camera feed of the person talking to it. It's rough. But it works.

What's next

This kind of interface is the natural next step past the chat box. The technologies are ready. The workflow just hasn't been assembled yet — which is exactly what Miniloader is designed to do.

Join our early access to get in on this early.