Pokemon images from eight generations were from three different online databases:
https://www.kaggle.com/datasets/vishalsubbiah/Pokemon-images-and-typeshttps://www.kaggle.com/datasets/hlrhegemony/Pokemon-image-datasethttps://github.com/jackw-ai/CNN-Pokemon-Classifier
9885 images were organized into the proper classes from these three sources and were then split into train, validation and test datasets in a ratio of 70%-15%-15% respectively. This was done once for primary types and once for secondary types. These images were then resized to 120x120 pixels and converted to 3-channels for RGB, ignoring any alpha channels. In addition, to reduce the chances for overfitting, data augmentation, including flipping, cropping, colour jitter, and rotations were implemented. The images were resized to 3x224x224 and extracted features to produce the same amount of 256x6x6 sized tensors.
The image to the right displays the augmented Pokemon images and their primary types that we fed into the CNN model.
A challenge we came across was collecting enough samples from each type. For example, there are only 3 ‘Flying’ primary type Pokemon out of 905 existing Pokemon. To avoid training on unbalanced data and creating a biased accuracy, we duplicated images from datasets with fewer samples and also created augmented copies. In addition, since the ‘None’ and ‘Flying’ secondary types were so heavily populated, we randomly deleted 75% of ‘None’ and 50% of ‘Flying’ secondary images. After this process, the most common primary type, ‘Water’ occupied 7.29% of all primary type samples and the least common primary type, ‘Flying’, occupied 2.06% of all primary type samples. The most common secondary type, ‘None’ occupied 11.06%, ‘Flying’ occupied 7.25%, and the least common secondary type, ‘Normal’, occupied 2.68% of all secondary samples. These different classes were still slightly imbalanced, as duplicating too many images risks creating an overfitted model and augmenting too many times can cause the images to lose their underlying patterns. After these changes to the datasets, primary type had a total sample size of 19982 images, with 13810 training images, and secondary type had a total sample size of 11933 images, with 7682 training images.
The figure below on the left shows the unbalanced data for secondary types and the right figure shows the data after balancing.