- Alibabas Zerosearch can generate training material to its AI
- Cost savings of up to 88% are possible
- The technique requires additional GPUs
Alibaba’s Tongyi -Laboratory has found a way to train AI search models without using real search engines, as it says can reduce search training costs by up to 88% compared to commercial APIs like Google.
In a paper entitled “Incentive Search Capacity for LLMS without searching,” explains Alibaba how the development uses simulated AI-generated documents to mimic real search engine outputs.
Interestingly, Alibaba’s researchers also note that using simulated documents can actually improve the quality of training because “the quality of documents returned by search engines is often unpredictable” and risks introduce noise in the training process.
Alibaba will educate AI search models on AI-Generated Documents
“The primary difference between a real search engine and a simulation LLM is in the textual style of the returned content,” the researchers wrote. Zerosearch can also gradually break down the quality of documents to simulate increasingly challenging retrieval scenarios.
Of course, the most important advantage of this technology is the significant cost savings available. Training with Zerosearch’s 14B model costs about $ 70.80 per day. 64,000 queries compared to about $ 586.70 via Google’s APIs. The cost is even lower for the 7B and 3B models, to $ 35.40 and $ 17.70 per day. 64,000 queries, and yet all three of the Zerosearch models and the Google API method take the same amount of time.
However, Alibaba acknowledged that one, two or four A100 GPUs are required for its zero shielding method compared to no GPU requirements via the Google API method, which can have a negative impact on sustainability, such as energy consumption and emissions.
“Our approach has certain restrictions. Implementation of the simulated Search LLM requires access to GPU servers. Although there is more cost-effective than commercial API use, this introduces additional infrastructure costs,” the researchers concluded.
To challenge the dependence on expensive and closed platforms such as Google Search APIs and reduce costs could further help democratization of AI development.