Prompt-Based Length Controlled Generation with Reinforcement Learning
Researchers propose a novel method for controlling content length in large language models, enhancing accuracy and reducing inference cost.
In a groundbreaking study titled "Prompt-Based Length Controlled Generation with Reinforcement Learning," researchers Renlong Jie, Xiaojun Meng, Lifeng Shang, Xin Jiang, and Qun Liu have proposed a novel method for controlling the length of generated content in large language models (LLMs) like ChatGPT and GPT-4. This is particularly significant as it allows users to generate answers or essays of a desired length, thereby expanding the potential real-world applications of LLMs.
The researchers have adopted a prompt-based length control method, which is achieved using reinforcement learning and sample filtering. This method includes a standard prompt extractor that parses length control instructions from user inputs into standard control prompts. The team designed a rule-based reward model for a fast and accurate implementation of both reinforcement fine-tuning and inference of LLMs. They also applied a modified PPO algorithm to enhance the length-controlled generation.
The team's method has been found to significantly improve the accuracy of prompt-based length control. This is achieved by reducing the control error compared to the baseline model, which only uses the prompt-based strategy. The researchers have used automatic metrics such as BLEU, ROUGE, and METEOR to evaluate the improvement in the accuracy of prompt-based length control for summarization tasks.
In addition to improving accuracy, the ability to control the length of generated content in LLMs can also reduce the inference cost. This is achieved by limiting the length of the content, thereby reducing the number of decoding steps needed for autoregressive generation.
The effectiveness of the proposed method was tested using popular datasets like CNNDM and NYT. The results of these experiments have shown that the proposed method is effective for three GPTs with different sizes on both CNNDM and NYT summarization datasets.
In conclusion, the researchers' novel method provides a promising way to improve the accuracy of prompt-based length control and reduce inference cost in large language models, paving the way for more efficient and versatile applications of these models.
Read the whole article here: http://arxiv.org/abs/2308.12030v1
Bereit, KI in Ihrem Unternehmen einzusetzen?
Entdecken Sie, wie higent Ihnen hilft, Prozesse zu automatisieren und KI-Agenten in Ihrem Betrieb zu verankern.