Data Cleaning
Data cleaning requirements vary from case to case. Aya Data will identify and correct errors and inconsistencies. This process involves fixing typos, handling missing values, removing duplicates, and standardizing formats. It’s necessary to convert raw data into an easily consumed format through encoding, scaling, and normalization.
Data Splitting and Sampling
Training sets are split into training, validation, and testing sets. Sampling techniques ensure the model is trained and evaluated on a representative sample.
Data Augmentation and Feature Engineering
In cases where data is limited, we can augment datasets to artificially increase their size and/or dimensionality and variance. New data can be generated through rotation, flipping, scaling, noise injection, pitch shifting, etc.