The rapid adoption of agentic AI is turning large language model inference into a large-scale operational system rather than a purely software product. A massive population of devices, including personal terminals, enterprise clients, and autonomous agents, continuously generates inference requests for search, coding, monitoring, summarization, and workflow automation. In practice, these requests are offloaded to cloud or edge inference centers and billed in tokens. Major providers currently use posted token-based pricing schedules, typically charging separately for input, output, and sometimes cached tokens, while system capacity is managed through quotas and regional rate limits rather than explicit local congestion pricing.Recommended citation: Pengyu Wang and Shi Chen. (2026). "A Two-Layer Hybrid Graph Mean-Field Game for Distributed LLM Inference Demand." Work in progress.
Download Paper | Download Slides