Large Language Model Inference on Mobile Phones - Solve the Resource Shortage via Processing-in-Memory
Active
Large language models (LLMs) on mobile phones are a promising area in industry and academia. LLMs on-device enhance data privacy and security, and sensitive data never leaves the device. Running LLMs locally means they can deliver real-time responses more quickly with lower latency, which will improve performance and give users a better experience. However, LLMs require enormous memory and power resources, making them difficult to implement on mobile devices. In this proposal, Processing in memory (PIM) techniques are used to cut down the energy cost and improve processing speed by reducing the computing and storage resources of LLMs.