Improving Language Model Accuracy Based on Reinforced Learning from Human Feedback


Client X, an IT company based in Europe but with subsidiary branches across Western Africa, approached us to help train a language model that could assist with writing code as well as answering queries from prompts. The existing language model was trained using supervised learning, but the outputs it provided were not sufficiently accurate.

We would be provided with prompts and answers from the model that would need to be corrected and refined.




London, UK

Company Size


The Challenge

  1. To get to the desired accuracy of the outputs, the language model would need to be trained/retrained based on reinforced learning from human feedback.
  2. The scale of and time frame for the project would necessitate the engagement of a full-time team of data annotation experts and programmers.
  3. The language model would need to be responsive to prompts from several different languages, with the primary language being English.

Contact us


Aya Data engaged its in-house programmers for the coding outputs and full-time NLP experts for the written prompts and answers that were provided to us. English and other-language prompts were worked on concurrently. The outputs from the language model were corrected where appropriate and other outputs were given rankings.

The feedback from our programmers and NLP experts was collected and used to update the model. Aya Data would undertake two additional cycles of gathering data, collecting human feedback, and using it to update the language model

The Results

The retraining of the language model based on reinforced learning from human feedback was completed one week before the deadline. The accuracy of the outputs improved by 34% across all languages and by 41% for English. We are currently in discussions with the Client to add additional languages to the model.