Agent_Studio

English | 한국어 | 简体中文

🔨 AgentStudio

PseudoLab Discord Community Stars Badge Forks Badge Pull Requests Badge Issues Badge GitHub contributors


🔨 AgentStudio - Pseudo-Lab 11th AI Agent Project
“Bridging the intergenerational knowledge gap with AI and sharing positive influence.”


🤖 Kiosk Agent

Vision-Language-Action (VLA) Agent for Automated Kiosk Interaction

Kiosk Agent is an AI system that utilizes Vision-Language Models (VLM) to automatically control Android kiosk applications. It interprets visual interfaces and executes precise actions to assist users who may find digital kiosks challenging.

AgentStudio_Banner

✨ Features


🧠 Model Configuration

AgentStudio allows you to switch between different Vision-Language Models depending on your needs.

Provider Model Status Key Advantage
Google gemini-3-flash ✅ Supported Low latency and cost-efficient
Google gemini-3-pro ✅ Supported Advanced reasoning for complex UI
OpenAI gpt-4o-mini ✅ Supported Robust performance across various tasks
Google gemma-3-27b 🔜 Roadmap Optimized for on-device/local privacy
Microsoft Fara-7B 🔜 Roadmap Optimized Computed Ondevice Agent

To switch models, update your .env file:

MODEL_PROVIDER=gemini
GEMINI_MODEL=gemini-3-flash # Options: gemini-3-flash, gemini-3-pro


📐 Architecture

🔄 VLA Workflow

The VLA paradigm is a continuous cycle where the agent observes, reasons, and executes.

flowchart LR
    A[Screen Capture] --> B[VLM Reasoning]
    B --> C[Action Decode]
    C --> D[Execute ADB]
    D --> E{Done?}
    E -->|No| A
    E -->|FINISH| F[Complete]
    E -->|INTERRUPT| G[Human Input]
    G --> A

Phase Description
Screen Capture Captures Android device screen via ADB
VLM Reasoning Gemini analyzes the screen to decide the next action
Action Decode Parses VLM output into structured executable commands
Execute ADB Controls the device using ADB (tap, swipe, input)
INTERRUPT Triggers HITL when user intervention is required

🔀 LangGraph State Machine

We manage the agent’s logic flow using LangGraph for stable state transitions.

flowchart TD
    START([Start]) --> VLM[VLM Node]
    VLM --> EXEC[Execute Node]
    EXEC --> ROUTER{Router}
    ROUTER -->|LOOP| VLM
    ROUTER -->|INTERRUPT| HUMAN[Human Node]
    ROUTER -->|FINISH| END([End])
    HUMAN -->|Resume| VLM
    HUMAN -->|Abort| END


🚀 Installation

Prerequisites

Step 1: Clone Repository

git clone [https://github.com/Pseudo-Lab/Agent_Studio.git](https://github.com/Pseudo-Lab/Agent_Studio.git)
cd Agent_Studio

Step 2: Environment Setup (using uv)

# Create and activate virtual environment
uv venv .venv
source .venv/bin/activate

# Install dependencies in editable mode
uv pip install -e backend/

Step 3: Configure Environment Variables

cp .env.example .env
# Edit .env with your GOOGLE_API_KEY


🎯 Supported Actions

Action Parameters Description
CLICK x, y Tap specific coordinates
INPUT text Type text into a field
SWIPE x1, y1, x2, y2 Scroll or navigate
INTERRUPT question Ask user for guidance (HITL)
FINISH - Task completed successfully

🗓️ Roadmap

✅ v1.0.0 (Current)

🔜 v1.1.0 (Scheduled Jan 2026)


👥 Team: Agent Studio (Pseudo-Lab)

Name Role Focus
Jaehyun Kim Builder Frontend (Next.js), Backend (FastAPI)
Seunghyeok Kim Runner LangGraph, Reasoning, Prompt Engineering
Gyumin Lee Runner VLA Mechanism, LangGraph Architecture
Minjung Jeon Runner Voice (TTS/STT), Google ADK

🗞 License

This project is licensed under the Apache License 2.0.


Developed with ❤️ by Pseudo-Lab