Pokemon AI Classifier

Deep Learning Class Project
Project Overview
I took a deep learning course offered by UofT (APS360) with a couple of my friends in Summer 2023 and we completed a very fun project about Pokemon Type classification using deep learning. The goal of this project was to develop a model that when given the image of a Pokemon, could accurately predict the primary and secondary types of it.
My Contributions
I was primarily responsible for data collection and data processing. This included searching for image databases, webscraping, image processing, and image augmentation. I also helped train and tune the hyperparameters of our CNN model.

Check out our Github: https://github.com/GoldFishGod/Pokemon-Type-Classifier
Data Collection + Processing
Pokemon images from eight generations were from three different online databases:
https://www.kaggle.com/datasets/vishalsubbiah/Pokemon-images-and-typeshttps://www.kaggle.com/datasets/hlrhegemony/Pokemon-image-datasethttps://github.com/jackw-ai/CNN-Pokemon-Classifier
9885 images were organized into the proper classes from these three sources and were then split into train, validation and test datasets in a ratio of 70%-15%-15% respectively. This was done once for primary types and once for secondary types. These images were then resized to 120x120 pixels and converted to 3-channels for RGB, ignoring any alpha channels. In addition, to reduce the chances for overfitting, data augmentation, including flipping, cropping, colour jitter, and rotations were implemented. The images were resized to 3x224x224 and extracted features to produce the same amount of 256x6x6 sized tensors.

The image to the right displays the augmented Pokemon images and their primary types that we fed into the CNN model.
A challenge we came across was collecting enough samples from each type. For example, there are only 3 ‘Flying’ primary type Pokemon out of 905 existing Pokemon. To avoid training on unbalanced data and creating a biased accuracy, we duplicated images from datasets with fewer samples and also created augmented copies. In addition, since the ‘None’ and ‘Flying’ secondary types were so heavily populated, we randomly deleted 75% of ‘None’ and 50% of ‘Flying’ secondary images. After this process, the most common primary type, ‘Water’ occupied 7.29% of all primary type samples and the least common primary type, ‘Flying’, occupied 2.06% of all primary type samples. The most common secondary type, ‘None’ occupied 11.06%, ‘Flying’ occupied 7.25%, and the least common secondary type, ‘Normal’, occupied 2.68% of all secondary samples. These different classes were still slightly imbalanced, as duplicating too many images risks creating an overfitted model and augmenting too many times can cause the images to lose their underlying patterns. After these changes to the datasets, primary type had a total sample size of 19982 images, with 13810 training images, and secondary type had a total sample size of 11933 images, with 7682 training images.

The figure below on the left shows the unbalanced data for secondary types and the right figure shows the data after balancing.
Deep Learning Model
We coded a CNN because of it's performative capabilities with images and we thoroughly tested several models before finding our most optimal hyperparameters.

Due to the scale and complexity of our project, not to mention our computational limitations, we chose to use the feature learning aspect of AlexNet to produce a competent final model. While we used a pre-trained network to extract features, we have still created our own CNN to produce the final outputs.

First, AlexNet is used to extract features from our training, testing and validation data. Next, the training tensors derived from those features are fed into our CNN model with two convolutional layers for hierarchical representation, followed by two fully connected layers used for classifying Pokemon into their respective types. 

The model operated on 13810 samples in the primary type training dataset using a batch size of 256 and a learning rate of 0.01, and 7682 samples in the secondary type training dataset using a batch size of 512 and a learning rate of 0.005. 

The figure to the left shows the architecture of the CNN model that produced our final results.

We chose to perform the quantitative analysis of our model in comparison to the baseline, random forest model, to see how well it is performing. To begin, we review the training and validation of accuracies of both models. The baseline model, trained with the same dataset had the following accuracies: Training: primary 29.7% & secondary 30%. Testing: primary 29.8% primary & 6.9 secondary.

Due to the nature of the random forest model, an average of all the different training accuracies was taken, and these numbers are where the model plateaued at. In comparison, our model, which used AlexNet’s transfer learning and additional training managed to achieve the following accuracies: Training: primary 98.7% & secondary 99.4% Testing: primary 63.3% & secondary 66.7%
Video Presentation
Part of the project was creating a presentation. I was responsible for the video editing and I had a lot of fun experimenting with different visual options!

I'd also like to give special thanks to my team for being so amazing for this entire project - it was a blast!