Skip to content

HKUDS/LightAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

✨LightAgent✨: Lightweight and Cost-Effective
Mobile Agents

Typing Animation
ζΌ”η€ΊεŠ¨η”»

Hugging Face Hugging Face

LightAgent is a mobile agentic framework designed for efficient smartphone task execution. It features lightweight 3B-scale Vision-Language Models that can run directly on devices. The system combines these compact models with a dynamic device-cloud collaboration approach to optimize both performance and resource usage.

The framework uses a two-stage training methodology combining SFT and GRPO reinforcement learning with synthetic data generation. This approach enables the 3B models to achieve performance comparable to much larger 7B-9B models. Through intelligent task orchestration and structured memory mechanisms, LightAgent reduces cloud dependency by approximately 10% while maintaining robust performance across over 25 mobile applications in real-world scenarios.


πŸ“– Table of Contents


🌟 Key Features of LightAgent

πŸ€– Lightweight Agentic Foundation Models

β€’ Compact Architecture: Specialized 3B-scale Vision-Language Models optimized for mobile GUI tasks with minimal computational footprint.
β€’ On-Device Deployment: True smartphone-compatible models that maintain competitive performance while running locally without cloud dependency.

☁️ Device-Cloud Collaboration Framework

β€’ Dynamic Orchestration: Real-time task complexity assessment that intelligently switches between device and cloud models based on execution requirements.
β€’ Cost-Performance Optimization: Strategic resource allocation that leverages cost-efficient on-device models while compensating limitations through selective cloud model usage.

🎯 Comprehensive Mobile Agent Evaluation Playground

β€’ Extended Benchmark Suite: Beyond AndroidLab, incorporating 25+ additional tasks across popular mobile applications for real-world validation.
β€’ Multi-Dimensional Assessment: Comprehensive evaluation covering performance metrics, computational efficiency, and practical deployment scenarios.


🌟 Core Solutions of LightAgent

🧠 Model Training: SFT+RL

β€’ Synthetic Data Generation: Leverages advanced MLLMs to create high-quality reasoning chain training data, addressing the scarcity of manual annotations.
β€’ Two-Stage Training: SFT injects GUI foundational knowledge, while GRPO reinforcement learning optimizes task completion accuracy.
β€’ Small Model Enhancement: Enables 3B models to achieve performance comparable to 7B-9B models on GUI tasks through structured training.

☁️ Device-Cloud Collaboration Framework

β€’ Dynamic Task Assessment: Real-time complexity evaluation determines when and how frequently to monitor device model performance.
β€’ Intelligent Orchestration: Seamlessly switches between device and cloud models based on execution progress and failure patterns.
β€’ Cost-Performance Optimization: Reduces cloud invocations by ~10% while maintaining high task success rates through strategic resource allocation.

πŸ’Ύ Efficient Memory Mechanism for Mobile Agents

β€’ Long-Horizon Reasoning: Multi-step chain-of-thought reasoning with reflective error correction to enhance decision-making capabilities.
β€’ Text-Based Summarization: Compresses high-resolution screenshots into compact textual representations for efficient memory management.
β€’ Structured Context Retention: Maintains 10-20 steps of historical context in resource-constrained environments through optimized token usage.



πŸš€ Quick Start

This project comprises three core components designed for comprehensive mobile agent development and evaluation:

  • ⚑ For model training, please refer to the training guide README for comprehensive setup and execution instructions.
  • πŸ”§ For the data generation pipeline, please refer to the data preparation guide README for detailed implementation steps.

Below, we focus on evaluation using the AndroidLab benchmark framework.

πŸ“± AndroidLab Benchmark Setup

Installation: Follow the official AndroidLab documentation AndroidLab for complete setup instructions.

Environment Configuration:

  • Recommended Mode: AVD on Mac (arm64) - validated in our experiments.
  • App Setup: Manual installation and task-specific configuration required.
  • Compatibility Note: Original Docker images are not compatible with AVD environments.

πŸš€ Model Deployment & Inference

vLLM Integration:

  • Inference scripts available in ./vllm_script/ directory
  • Optimized for efficient small model serving

Model Access:

  • LightAgent Weights: 3B parameter model hosted on HuggingFace
  • Deployment Process: Download weights β†’ Deploy via vLLM β†’ Configure inference service
  • Service Ready: Seamless integration with evaluation pipeline

βš™οΈ Pre-Testing Configuration

  • API Setup Required: Configure cloud model credentials in ./evaluation/evaluation.py: Line 63, Line 75, Line 81
  • Coming Soon: Streamlined configuration interface in development

πŸ§ͺ Testing & Evaluation

Single Task Testing

Test individual tasks using the following command structure:

python eval.py -n test_name -c your path to config.yaml --task_id task_id

Example Usage:

python eval.py -n all_cloud_v1_hyper -c ./configs/example_xml_cloud_hyper.yaml --task_id zoom_1

Batch Evaluation Scripts

Convenient batch testing scripts are available in ./test_script:

β€’ all_test_cloud_v1_hyper.sh: Evaluates all 138 AndroidLab benchmark tasks
β€’ all_test_cloud_v1_hyper_add.sh: Evaluates tasks for four additional mobile apps

Additional App Documentation

For comprehensive details about the four additional app tasks, refer to the documentation: Additional Apps Documentation


πŸ“Š Result Generation

LLM Evaluator Setup

Required Configuration: Set up LLM service credentials in ./evaluation/tasks/llm_evaluator.py:

β€’ Line 10: API configuration
β€’ Line 12: Service URL

πŸ’‘ Enhancement: Our implementation replaces AndroidLab's rule-based evaluation with LLM-powered assessment, providing more nuanced and accurate task completion evaluation.

Generate Evaluation Results

Execute result generation with the following command:

python generate_result.py --input_folder ./logs/evaluation/ --output_folder ./logs/evaluation/ --output_excel ./logs/evaluation/test_name.xlsx

Batch Testing File Management

⚠️ Important: When using batch scripts from ./test_script/:
β€’ Manual Transfer Required: Move generated evaluation files from script directory to ./logs/
β€’ Then Execute: Run the result generation command above
β€’ Error Prevention: This step prevents file path conflicts and ensures proper result compilation

🎯 Evaluation Results

The key findings from our online evaluation on AndroidLab are summarized as follows:

  • LightAgent, when deployed in a device-cloud collaborative setting, incurs only a relatively small performance drop while effectively reducing the number of cloud model invocations.
  • Notably, prompting large models for extended reasoning does not always yield better resultsβ€”this benefit depends on the capability of the cloud model, and only sufficiently strong models can take advantage of such strategies.
  • We also report a comparison between LightAgent-3B and both similar-sized and larger models (such as 9B models), showing that LightAgent-3B achieves performance close to that of 9B models, making it a true "small powerhouse."
  • Furthermore, when compared with closed-source models, LightAgent-3B's performance is comparable to previous or lightweight versions of these proprietary models.

For each MLLM, we measure the average total steps required to complete tasks, the proportion of steps handled by the on-device model versus the cloud model, and the average steps when using only the cloud model to quantify the reduction in cloud calls. The main results are as follows:

  • The cloud model is still responsible for about 65% of the steps, mainly due to the limited capacity of the smaller on-device model.
  • Introducing the on-device model leads to approximately a 10% reduction in cloud calls.
  • Stronger cloud models (such as GLM-4.5V) experience a smaller reduction in cloud calls, as they are capable of solving more tasks independently without relying on the on-device model.

We evaluate the average inference time per step using vLLM under different GPU setups. GLM-4.1V-9B-Thinking could not run on a single 3090 GPU due to context length limits, so only two-GPU results are shown.

LightAgent, thanks to its lightweight architecture, demonstrates a clear advantage in inference speed, making it more suitable for real-world on-device scenarios. This advantage becomes even more pronounced as computational resources become constrained. In contrast, although GLM-4.1V-9B-Thinking achieves higher performance, its inference time on two 3090s is 3.5 times that of LightAgent on a single 3090, and 4 times that of LightAgent on two 3090s. Its inability to run on a single 3090 further limits its feasibility for on-device deployment.

Model GPUs Size SR Time Cost / Step
Qwen2.5-VL-7B-Instruct Single 3090 7B 10.1 6289.15 ms
LightAgent Single 3090 3B 15.2 4170.63 ms
GLM-4.1V-9B-Thinking Two 3090s 9B 24.6 14584.89 ms
Qwen2.5-VL-7B-Instruct Two 3090s 7B 10.1 4587.79 ms
LightAgent Two 3090s 3B 15.2 3524.25 ms

πŸ”— Related Projects

LightAgent builds upon excellent open-source projects. We sincerely thank their authors and contributors:

  • AndroidLab - The benchmark framework.
  • R1-V - Implementation details for the GRPO training methodology.
  • LLaMA Factory - The unified training framework enabling efficient model fine-tuning.

πŸ“œ License

This project is released under the MIT License.


❀️ Thanks for visiting ✨ LightAgent!

Views

About

"LightAgent: Lightweight and Cost-Effective Mobile Agents"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •