Ethical Data Sourcing in Africa


Machine learning projects require data - that goes without saying. However, obtaining high-quality data for training and testing purposes is one of the most pressing challenges facing AI teams today. 

For example, Cognlytica found that some 25% of the time spent on machine learning projects was spent on data sourcing and labeling. 

Data sourcing involves retrieving data from public and open sources, collecting data from cloud and relational databases, and creating new, unique data. Creating new training and testing data involves taking photos and videos of real-world sources for computer vision projects, extracting and preparing data from business chat logs and creating usable data from databases and other assets. 

Aya Data have worked with several clients to source training data for their ML projects. We can obtain data from databases in accordance with strict jurisdictional privacy laws and regulations, or work to create new data using images, videos or text. 





Company Size

5 to 250+

The Challenge

Sourcing data for ML and AI projects is one of the greatest bottlenecks facing the progression and development of the industry. Creating data for NLP, CV and other ML models is an ongoing issue, especially when high-variance, high-dimensionality data is needed to train complex models. 

Many unique and novel ML models also require custom-made data, especially when there are no existing open datasets available for the use case. 

Furthermore, ML and AI data is often sensitive to international and national laws regarding consent, use and privacy. While synthetic data provides a viable solution in some cases, obtaining real, quality information tops the priority list of many data science teams today. 

The Solution

Aya Data have assisted our clients in obtaining, processing and preparing training and testing data across multiple sectors and industries. We obtained real photos of diseased maize plants to help a client successfully build a maize disease classification model, photos of people’s faces for facial recognition tasks, and images of car insurance damage to train claims verification model. 

In our other case studies, Aya Data worked to clean and process data to prepare it for labeling. Our skilled team of data labelers collaborate with both clients and domain specialists to maintain tight quality control over the quality of labels and ground truth.

Moreover, Aya Data understands the role of regulation in ML data sourcing - we can obtain compliant data across a wide range of industries. Our West African data experts have a complete knowledge of what public data we can retrieve/use, whether that be permitted medical records, medical scans and diagnostic images, or photos and video of real-world environments.


Aya Data has sourced, obtained and created quality data to help multiple clients build successful models. We are able to assist clients with sourcing data for their projects, in addition to the labeling and annotation services themselves.

Explore Case Studies

Medical Imaging

Medical imaging is a time-intensive task that is being revolutionized by AI and ML.

Content Moderation For Advertising and Marketing

Reducing the risk of reputational damage for brands and businesses.

Image Segmentation in Satellite Imagery

Using satellite images to measure land-use changes in South America.

Real-Time Transcription of US Police Radios

Private companies are turning to technology to help combat crime and reduce the burden on society.

Identifying Shoplifting Events in Retail Stores

Companies are looking for innovative ways to curb their losses with the use of technology.

Preparing For The Growing Popularity of E-Scooters

The global electric scooters market size is expected to reach USD 34.7 billion by 2028.

Detecting Disease in Ghana’s Maize plants

As global demand for maize continues to rise, the opportunity for technology to improve yields is more important than ever.