Abstract: Mobile-edge large language model (LLM) deployments face inherent constraints, such as limited computational resources and network bandwidth. Although retrieval-augmented generation (RAG) ...