✨LightAgent✨: Lightweight and Cost-Effective
Mobile Agents

LightAgent is a mobile agentic framework designed for efficient smartphone task execution. It features lightweight 3B-scale Vision-Language Models that can run directly on devices. The system combines these compact models with a dynamic device-cloud collaboration approach to optimize both performance and resource usage.

The framework uses a two-stage training methodology combining SFT and GRPO reinforcement learning with synthetic data generation. This approach enables the 3B models to achieve performance comparable to much larger 7B-9B models. Through intelligent task orchestration and structured memory mechanisms, LightAgent reduces cloud dependency by approximately 10% while maintaining robust performance across over 25 mobile applications in real-world scenarios.

📖 Table of Contents

✨LightAgent✨: Mobile Agentic Foundation Models

🌟 Key Features of LightAgent

🤖 Lightweight Agentic Foundation Models

• Compact Architecture: Specialized 3B-scale Vision-Language Models optimized for mobile GUI tasks with minimal computational footprint.
• On-Device Deployment: True smartphone-compatible models that maintain competitive performance while running locally without cloud dependency.

☁️ Device-Cloud Collaboration Framework

• Dynamic Orchestration: Real-time task complexity assessment that intelligently switches between device and cloud models based on execution requirements.
• Cost-Performance Optimization: Strategic resource allocation that leverages cost-efficient on-device models while compensating limitations through selective cloud model usage.

🎯 Comprehensive Mobile Agent Evaluation Playground

• Extended Benchmark Suite: Beyond AndroidLab, incorporating 25+ additional tasks across popular mobile applications for real-world validation.
• Multi-Dimensional Assessment: Comprehensive evaluation covering performance metrics, computational efficiency, and practical deployment scenarios.

🌟 Core Solutions of LightAgent

🧠 Model Training: SFT+RL

• Synthetic Data Generation: Leverages advanced MLLMs to create high-quality reasoning chain training data, addressing the scarcity of manual annotations.
• Two-Stage Training: SFT injects GUI foundational knowledge, while GRPO reinforcement learning optimizes task completion accuracy.
• Small Model Enhancement: Enables 3B models to achieve performance comparable to 7B-9B models on GUI tasks through structured training.

☁️ Device-Cloud Collaboration Framework

• Dynamic Task Assessment: Real-time complexity evaluation determines when and how frequently to monitor device model performance.
• Intelligent Orchestration: Seamlessly switches between device and cloud models based on execution progress and failure patterns.
• Cost-Performance Optimization: Reduces cloud invocations by ~10% while maintaining high task success rates through strategic resource allocation.

💾 Efficient Memory Mechanism for Mobile Agents

• Long-Horizon Reasoning: Multi-step chain-of-thought reasoning with reflective error correction to enhance decision-making capabilities.
• Text-Based Summarization: Compresses high-resolution screenshots into compact textual representations for efficient memory management.
• Structured Context Retention: Maintains 10-20 steps of historical context in resource-constrained environments through optimized token usage.

🚀 Quick Start

This project comprises three core components designed for comprehensive mobile agent development and evaluation:

⚡ For model training, please refer to the training guide README for comprehensive setup and execution instructions.
🔧 For the data generation pipeline, please refer to the data preparation guide README for detailed implementation steps.

Below, we focus on evaluation using the AndroidLab benchmark framework.

📱 AndroidLab Benchmark Setup

Installation: Follow the official AndroidLab documentation AndroidLab for complete setup instructions.

Environment Configuration:

Recommended Mode: AVD on Mac (arm64) - validated in our experiments.
App Setup: Manual installation and task-specific configuration required.
Compatibility Note: Original Docker images are not compatible with AVD environments.

🚀 Model Deployment & Inference

vLLM Integration:

Inference scripts available in ./vllm_script/ directory
Optimized for efficient small model serving

Model Access:

LightAgent Weights: 3B parameter model hosted on HuggingFace
Deployment Process: Download weights → Deploy via vLLM → Configure inference service
Service Ready: Seamless integration with evaluation pipeline

⚙️ Pre-Testing Configuration

API Setup Required: Configure cloud model credentials in ./evaluation/evaluation.py: Line 63, Line 75, Line 81
Coming Soon: Streamlined configuration interface in development

🧪 Testing & Evaluation

Single Task Testing

Test individual tasks using the following command structure:

python eval.py -n test_name -c your path to config.yaml --task_id task_id

Example Usage:

python eval.py -n all_cloud_v1_hyper -c ./configs/example_xml_cloud_hyper.yaml --task_id zoom_1

Batch Evaluation Scripts

Convenient batch testing scripts are available in ./test_script:

• all_test_cloud_v1_hyper.sh: Evaluates all 138 AndroidLab benchmark tasks
• all_test_cloud_v1_hyper_add.sh: Evaluates tasks for four additional mobile apps

Additional App Documentation

For comprehensive details about the four additional app tasks, refer to the documentation: Additional Apps Documentation

📊 Result Generation

LLM Evaluator Setup

Required Configuration: Set up LLM service credentials in ./evaluation/tasks/llm_evaluator.py:

• Line 10: API configuration
• Line 12: Service URL

💡 Enhancement: Our implementation replaces AndroidLab's rule-based evaluation with LLM-powered assessment, providing more nuanced and accurate task completion evaluation.

Generate Evaluation Results

Execute result generation with the following command:

python generate_result.py --input_folder ./logs/evaluation/ --output_folder ./logs/evaluation/ --output_excel ./logs/evaluation/test_name.xlsx

Batch Testing File Management

⚠️ Important: When using batch scripts from ./test_script/:
• Manual Transfer Required: Move generated evaluation files from script directory to ./logs/
• Then Execute: Run the result generation command above
• Error Prevention: This step prevents file path conflicts and ensures proper result compilation

🎯 Evaluation Results

The key findings from our online evaluation on AndroidLab are summarized as follows:

LightAgent, when deployed in a device-cloud collaborative setting, incurs only a relatively small performance drop while effectively reducing the number of cloud model invocations.
Notably, prompting large models for extended reasoning does not always yield better results—this benefit depends on the capability of the cloud model, and only sufficiently strong models can take advantage of such strategies.
We also report a comparison between LightAgent-3B and both similar-sized and larger models (such as 9B models), showing that LightAgent-3B achieves performance close to that of 9B models, making it a true "small powerhouse."
Furthermore, when compared with closed-source models, LightAgent-3B's performance is comparable to previous or lightweight versions of these proprietary models.

For each MLLM, we measure the average total steps required to complete tasks, the proportion of steps handled by the on-device model versus the cloud model, and the average steps when using only the cloud model to quantify the reduction in cloud calls. The main results are as follows:

The cloud model is still responsible for about 65% of the steps, mainly due to the limited capacity of the smaller on-device model.
Introducing the on-device model leads to approximately a 10% reduction in cloud calls.
Stronger cloud models (such as GLM-4.5V) experience a smaller reduction in cloud calls, as they are capable of solving more tasks independently without relying on the on-device model.

We evaluate the average inference time per step using vLLM under different GPU setups. GLM-4.1V-9B-Thinking could not run on a single 3090 GPU due to context length limits, so only two-GPU results are shown.

LightAgent, thanks to its lightweight architecture, demonstrates a clear advantage in inference speed, making it more suitable for real-world on-device scenarios. This advantage becomes even more pronounced as computational resources become constrained. In contrast, although GLM-4.1V-9B-Thinking achieves higher performance, its inference time on two 3090s is 3.5 times that of LightAgent on a single 3090, and 4 times that of LightAgent on two 3090s. Its inability to run on a single 3090 further limits its feasibility for on-device deployment.

Model	GPUs	Size	SR	Time Cost / Step
Qwen2.5-VL-7B-Instruct	Single 3090	7B	10.1	6289.15 ms
LightAgent	Single 3090	3B	15.2	4170.63 ms
GLM-4.1V-9B-Thinking	Two 3090s	9B	24.6	14584.89 ms
Qwen2.5-VL-7B-Instruct	Two 3090s	7B	10.1	4587.79 ms
LightAgent	Two 3090s	3B	15.2	3524.25 ms

🔗 Related Projects

LightAgent builds upon excellent open-source projects. We sincerely thank their authors and contributors:

AndroidLab - The benchmark framework.
R1-V - Implementation details for the GRPO training methodology.
LLaMA Factory - The unified training framework enabling efficient model fine-tuning.

📜 License

This project is released under the MIT License.

❤️ Thanks for visiting ✨ LightAgent!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
__pycache__		__pycache__
agent		agent
app_data		app_data
configs		configs
demo		demo
docs		docs
evaluation		evaluation
figures		figures
ground_data		ground_data
logs		logs
model_training		model_training
page_executor		page_executor
prepare_data		prepare_data
recorder		recorder
templates		templates
test_script		test_script
tools		tools
utils_mobile		utils_mobile
vllm_script		vllm_script
.DS_Store		.DS_Store
.gitignore		.gitignore
Communication.md		Communication.md
LICENSE		LICENSE
README.md		README.md
adb_client.py		adb_client.py
eval.py		eval.py
generate_result.py		generate_result.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

✨LightAgent✨: Lightweight and Cost-Effective
Mobile Agents

📖 Table of Contents

🌟 Key Features of LightAgent

🤖 Lightweight Agentic Foundation Models

☁️ Device-Cloud Collaboration Framework

🎯 Comprehensive Mobile Agent Evaluation Playground

🌟 Core Solutions of LightAgent

🧠 Model Training: SFT+RL

☁️ Device-Cloud Collaboration Framework

💾 Efficient Memory Mechanism for Mobile Agents

🚀 Quick Start

📱 AndroidLab Benchmark Setup

🚀 Model Deployment & Inference

⚙️ Pre-Testing Configuration

🧪 Testing & Evaluation

Single Task Testing

Batch Evaluation Scripts

Additional App Documentation

📊 Result Generation

LLM Evaluator Setup

Generate Evaluation Results

Batch Testing File Management

🎯 Evaluation Results

🔗 Related Projects

📜 License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

HKUDS/LightAgent

Folders and files

Latest commit

History

Repository files navigation

✨LightAgent✨: Lightweight and Cost-Effective Mobile Agents

📖 Table of Contents

🌟 Key Features of LightAgent

🤖 Lightweight Agentic Foundation Models

☁️ Device-Cloud Collaboration Framework

🎯 Comprehensive Mobile Agent Evaluation Playground

🌟 Core Solutions of LightAgent

🧠 Model Training: SFT+RL

☁️ Device-Cloud Collaboration Framework

💾 Efficient Memory Mechanism for Mobile Agents

🚀 Quick Start

📱 AndroidLab Benchmark Setup

🚀 Model Deployment & Inference

⚙️ Pre-Testing Configuration

🧪 Testing & Evaluation

Single Task Testing

Batch Evaluation Scripts

Additional App Documentation

📊 Result Generation

LLM Evaluator Setup

Generate Evaluation Results

Batch Testing File Management

🎯 Evaluation Results

🔗 Related Projects

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

✨LightAgent✨: Lightweight and Cost-Effective
Mobile Agents

Packages