Analysing the Pokedex: Type Effectiveness and Experience Classification

‘Gotta Catch ’Em All!’

The goal of this project is exploration - to find insights in the pokedex data while also practicing the creation and customisation of visualisations in R.

More specifically, we will answer these questions:

  1. How do the various Pokemon types perform against each other (considering only Pokemon with single types)?
  2. Can the type effectiveness be shown in a visually appealing table?
  3. Can the experience growth be classified, or grouped, into a new variable for easy summarisation?
  4. What are the average stats (attack, defense, speed, hp) per experience classification? What can we conclude from this?

Data for this project was scraped from Serebii and posted on Kaggle by Rounak Banik.

According to the documentation this dataset contains information on all 802 Pokemon from all Seven Generations of Pokemon. The information contained in this dataset include base stats, performance against other types, height, weight, classification, egg steps, experience points, abilities, etc.

Data Preparation

First, we download the pokedex dataset directly from Kaggle and store it in the data folder.

Import Data

pokedexRawData <- read.csv("data/pokemon.csv", sep = ",")

First Look at the Data

pokedexRawData %>%
  glimpse()
## Rows: 801
## Columns: 41
## $ abilities         <chr> "['Overgrow', 'Chlorophyll']", "['Overgrow', 'Chlor…
## $ against_bug       <dbl> 1.00, 1.00, 1.00, 0.50, 0.50, 0.25, 1.00, 1.00, 1.0…
## $ against_dark      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ against_dragon    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ against_electric  <dbl> 0.5, 0.5, 0.5, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 1.0, 1…
## $ against_fairy     <dbl> 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.0, 1.0, 1.0, 1.0, 1…
## $ against_fight     <dbl> 0.50, 0.50, 0.50, 1.00, 1.00, 0.50, 1.00, 1.00, 1.0…
## $ against_fire      <dbl> 2.0, 2.0, 2.0, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 2.0, 2…
## $ against_flying    <dbl> 2.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2…
## $ against_ghost     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, …
## $ against_grass     <dbl> 0.25, 0.25, 0.25, 0.50, 0.50, 0.25, 2.00, 2.00, 2.0…
## $ against_ground    <dbl> 1.0, 1.0, 1.0, 2.0, 2.0, 0.0, 1.0, 1.0, 1.0, 0.5, 0…
## $ against_ice       <dbl> 2.0, 2.0, 2.0, 0.5, 0.5, 1.0, 0.5, 0.5, 0.5, 1.0, 1…
## $ against_normal    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ against_poison    <dbl> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1…
## $ against_psychic   <dbl> 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, …
## $ against_rock      <dbl> 1, 1, 1, 2, 2, 4, 1, 1, 1, 2, 2, 4, 2, 2, 2, 2, 2, …
## $ against_steel     <dbl> 1.0, 1.0, 1.0, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.0, 1…
## $ against_water     <dbl> 0.5, 0.5, 0.5, 2.0, 2.0, 2.0, 0.5, 0.5, 0.5, 1.0, 1…
## $ attack            <int> 49, 62, 100, 52, 64, 104, 48, 63, 103, 30, 20, 45, …
## $ base_egg_steps    <int> 5120, 5120, 5120, 5120, 5120, 5120, 5120, 5120, 512…
## $ base_happiness    <int> 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70, 70,…
## $ base_total        <int> 318, 405, 625, 309, 405, 634, 314, 405, 630, 195, 2…
## $ capture_rate      <chr> "45", "45", "45", "45", "45", "45", "45", "45", "45…
## $ classfication     <chr> "Seed Pokémon", "Seed Pokémon", "Seed Pokémon", "Li…
## $ defense           <int> 49, 63, 123, 43, 58, 78, 65, 80, 120, 35, 55, 50, 3…
## $ experience_growth <int> 1059860, 1059860, 1059860, 1059860, 1059860, 105986…
## $ height_m          <dbl> 0.7, 1.0, 2.0, 0.6, 1.1, 1.7, 0.5, 1.0, 1.6, 0.3, 0…
## $ hp                <int> 45, 60, 80, 39, 58, 78, 44, 59, 79, 45, 50, 60, 40,…
## $ japanese_name     <chr> "Fushigidaneフシギダネ", "Fushigisouフシギソウ", "Fushigibana…
## $ name              <chr> "Bulbasaur", "Ivysaur", "Venusaur", "Charmander", "…
## $ percentage_male   <dbl> 88.1, 88.1, 88.1, 88.1, 88.1, 88.1, 88.1, 88.1, 88.…
## $ pokedex_number    <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, …
## $ sp_attack         <int> 65, 80, 122, 60, 80, 159, 50, 65, 135, 20, 25, 90, …
## $ sp_defense        <int> 65, 80, 120, 50, 65, 115, 64, 80, 115, 20, 25, 80, …
## $ speed             <int> 45, 60, 80, 65, 80, 100, 43, 58, 78, 45, 30, 70, 50…
## $ type1             <chr> "grass", "grass", "grass", "fire", "fire", "fire", …
## $ type2             <chr> "poison", "poison", "poison", "", "", "flying", "",…
## $ weight_kg         <dbl> 6.9, 13.0, 100.0, 8.5, 19.0, 90.5, 9.0, 22.5, 85.5,…
## $ generation        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ is_legendary      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

Tidy Data

Our data tidying steps are minimal. We start with converting blanks to NA and then if the same Pokemon type appears in both type1 and type2 then we make type2 blank.

Lastly, a Pokemon is classified as having a single or dual type. We will use this later when evaluating type effectiveness of each Pokemon.

pokedexTidyData <- pokedexRawData %>%
  # Find all instances of blanks in the data and replace with NA
  mutate_if(is.character, list(~na_if(.,""))) %>%
  # If the same type appears in both type1 and type2 then make type2 blank
  mutate(type2 = ifelse(type1 == type2, NA, type2)) %>%
  # Classify Pokemon as either 'single type' or 'dual type' depending on whether it has a type listed under `type2`
  mutate(type_category = ifelse(is.na(type2) == TRUE, "Single", "Dual")) 

Type Effectiveness

Pokemon of certain types tend to do better against Pokemon of some types and worse against others. For example, fire-type Pokemon do double damage against grass-type Pokemon but only half damage against water-type Pokemon.

Below, we create a summary table of the effectiveness of single-type Pokemon.

pokedexTypeEff_tbl <- pokedexTidyData %>%
  mutate(type1 = str_to_title(type1)) %>%
  filter(type_category == "Single") %>%
  group_by(type1) %>%
  summarise(Bug = mean(against_bug),
            Dark = mean(against_dark),
            Dragon = mean(against_dragon),
            Electric = mean(against_electric),
            Fairy = mean(against_fairy),
            Fight = mean(against_fight),
            Fire = mean(against_fire),
            Flying = mean(against_flying),
            Ghost = mean(against_ghost),
            Grass = mean(against_grass),
            Ground = mean(against_ground),
            Ice = mean(against_ice),
            Normal = mean(against_normal),
            Poison = mean(against_poison),
            Psychic = mean(against_psychic),
            Rock = mean(against_rock),
            Steel = mean(against_steel),
            Water = mean(against_water))

A visually appealing table is created below using the formattable package. Damage is highlighted on a gradient from 0.0 to 2.0.

formattable(pokedexTypeEff_tbl, 
            width = "100px", 
            align = c("l","c","c","c","c", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c", "c"), 
            list(`type1` = formatter("span", style = ~ style(color = "grey",font.weight = "bold")), 
            area(col = 2:19) ~ color_tile(customBlue0, customBlue)))
type1BugDarkDragonElectricFairyFightFireFlyingGhostGrassGroundIceNormalPoisonPsychicRockSteelWater
Bug1.01.01.01.01.00.52.02.01.00.50.51.01.01.01.02.01.01.0
Dark2.00.51.01.02.02.01.01.00.51.01.01.01.01.00.01.01.01.0
Dragon1.01.02.00.52.01.00.51.01.00.51.02.01.01.01.01.01.00.5
Electric1.01.01.00.51.01.01.00.51.01.02.01.01.01.01.01.00.51.0
Fairy0.50.50.01.01.00.51.01.01.01.01.01.01.02.01.01.02.01.0
Fighting0.50.51.01.02.01.01.02.01.01.01.01.01.01.02.00.51.01.0
Fire0.51.01.01.00.51.00.51.01.00.52.00.51.01.01.02.00.52.0
Flying0.51.01.02.01.00.51.01.01.00.50.02.01.01.01.02.01.01.0
Ghost0.52.01.01.01.00.01.01.02.01.01.01.00.00.51.01.01.01.0
Grass2.01.01.00.51.01.02.02.01.00.50.52.01.02.01.01.01.00.5
Ground1.01.01.00.01.01.01.01.01.02.01.02.01.00.51.00.51.02.0
Ice1.01.01.01.01.02.02.01.01.01.01.00.51.01.01.02.02.01.0
Normal1.01.01.01.01.02.01.01.00.01.01.01.01.01.01.01.01.01.0
Poison0.51.01.01.00.50.51.01.01.00.52.01.01.00.52.01.01.01.0
Psychic2.02.01.01.01.00.51.01.02.01.01.01.01.01.00.51.01.01.0
Rock1.01.01.01.01.02.00.50.51.02.02.01.00.50.51.01.02.02.0
Steel0.51.00.51.00.52.02.00.51.00.52.00.50.50.00.50.50.51.0
Water1.01.01.02.01.01.00.51.01.02.01.00.51.01.01.01.00.50.5

Experience Classification

Experience growth represents the amount of experience it took for the Pokemon to reach level 100. Pokemon with a lower experience value means that it got to level 100 faster; while a higher experience value means the Pokemon leveled up a lot slower, requiring more experience to get there.

Experience growth are classified into categories, provided by the website Serebii using a formula that they explain on their website.

These are the categories:

  • Erratic - 600,000 EXP
  • Fast - 800,000 EXP
  • Medium-Fast - 1,000,000 EXP
  • Medium-Slow - 1,059,860 EXP
  • Slow - 1,250,000 EXP
  • Fluctuating - 1,640,000 EXP
# Create experience classification
pokedexExpClass <- pokedexTidyData %>%
  mutate(experience_class = ifelse(experience_growth < 600000, "Erratic", ifelse(experience_growth >= 600000 & experience_growth < 800000, "Fast", ifelse(experience_growth >= 800000 & experience_growth < 1000000, "Medium Fast", ifelse(experience_growth >= 1000000 & experience_growth < 1059860, "Medium Slow", ifelse(experience_growth >= 1059860 & experience_growth < 1250000, "Slow", ifelse(experience_growth >= 1250000, "Fluctuating", NA)))))))

# Reorder factor levels
pokedexExpClass <- pokedexExpClass %>%  
  mutate(experience_class = fct_relevel(experience_class, c("Fast", "Medium Fast", "Medium Slow", "Slow", "Fluctuating")))

The four core stats of a Pokemon are: * Attack - the amount of physical attack damage a Pokemon can do * Defense - the amount of physical attack damage a Pokemon can take * Speed - determines when a Pokemon can attack in a turn * HP - the number of health points a Pokemon has

expClass_tbl <- pokedexExpClass %>%
  group_by(experience_class) %>%
  summarise(avg_attack = mean(attack),
            avg_defense = mean(defense),
            avg_speed = mean(speed),
            avg_hp = mean(hp))
## `summarise()` ungrouping output (override with `.groups` argument)

In the plot below, the average stat is shown by experience classification.

# Attack
attack <- ggplot(data = expClass_tbl, aes(x = experience_class, y = avg_attack)) +
  geom_bar(stat = "identity", fill = customBlue, color = customBlue) +
  geom_text(aes(label = round(avg_attack, 0)), vjust = 1.6, color = "white") +
  labs(title = "Attack", x = "", y = "") +
  theme_pander() +
  theme(axis.title.x = element_text(margin = margin(10, 5, 10, 5)),
        plot.title = element_text(hjust = 0.5, size = 20, margin = margin(10, 5, 10, 5)))

# Defense
defense <- ggplot(data = expClass_tbl, aes(x = experience_class, y = avg_defense)) +
  geom_bar(stat = "identity", fill = customBlue, color = customBlue) +
  geom_text(aes(label = round(avg_defense, 0)), vjust = 1.6, color = "white") +
  labs(title = "Defense", x = "", y = "") +
  theme_pander() +
  theme(axis.title.x = element_text(margin = margin(10, 5, 10, 5)),
        plot.title = element_text(hjust = 0.5, size = 20, margin = margin(10, 5, 10, 5)))

# Speed
speed <- ggplot(data = expClass_tbl, aes(x = experience_class, y = avg_speed)) +
  geom_bar(stat = "identity", fill = customBlue, color = customBlue) +
  geom_text(aes(label = round(avg_speed, 0)), vjust = 1.6, color = "white") +
  labs(title = "Speed", x = "", y = "") +
  theme_pander() +
  theme(axis.title.x = element_text(margin = margin(10, 5, 10, 5)),
        plot.title = element_text(hjust = 0.5, size = 20, margin = margin(10, 5, 10, 5)))

# HP
hp <- ggplot(data = expClass_tbl, aes(x = experience_class, y = avg_hp)) +
  geom_bar(stat = "identity", fill = customBlue, color = customBlue) +
  geom_text(aes(label = round(avg_hp, 0)), vjust = 1.6, color = "white") +
  labs(title = "HP", x = "", y = "") +
  theme_pander() +
  theme(axis.title.x = element_text(margin = margin(10, 5, 10, 5)),
        plot.title = element_text(hjust = 0.5, size = 20, margin = margin(10, 5, 10, 5)))

# Plot all          
ggarrange(attack, defense, speed, hp,
                    ncol = 2, nrow = 2)

Based on this plot it is clear that Pokemon with ‘fluctuating’ experience growth (i.e. experience growth over 1 250 000) have higher average stats across all 4 core stats.

So it seems that patience wins here. Hang on to the Pokemon that take longer to reach level 100, it will surely pay off!

Join the community and stay up to date on the latest blog posts and projects

    Joleen Belinda
    Joleen Belinda
    Data Science Enthusiast

    Statistics graduate paving my own way through the world of data science

    comments powered by Disqus