Google has introduced a new AI compression method called TurboQuant that can reduce the memory usage of large language models (LLMs) by up to six times. This breakthrough means less energy consumption in data centers and opens the door for running powerful AI models directly on smartphones, a significant leap at a time when RAM […]

