Ripping dex sprites from Pokémon Sun/Moon

Recently I've been getting back into pokémon datamining. One of the things i wanted to find was the dex sprites in Sun & Moon — the charasmatic poses that slowly fill the Alola dex as you encounter and catch each pokémon species. While most of the game uses animated 3D models or small icons of the pokémon, the pokédex is one of the few exceptions.

[screenshot of pichu and pikachu in the alola dex]
Source: pokemon.co.jp

Veekun has sprites for Sun & Moon but they aren't the dex sprites. They look like they're just a static capture of the first frame of the pokémon's idle animation.

[boring wailord.jpg]

As i later learned, these are actually the images that show up on the screen behind the nurse when you heal your party in a pokémon center.

They're Very Boring. The dex sprites are much more characterful:

[wailord with a big dorky grin]

The remainder of this post will be a sort of stream-of-consciousness explanation of how i went about finding and extracting these sprites. Partly as a reminder for my future self, but also for any aspiring ROM hackers who might be interested.

Ok. The first step is to mount a copy of the ROM using ctrfuse or ninfs.

./ctrfuse -s Moon.3ds ./moon

Next we have to find the file that the sprites are in. They aren't listed in this list of files from the demo (just the icons, in a/0/6/2). Either they aren't labeled or they are missing. The full games added a bunch of files, so that could be the case.

I guess I'll have to search for them myself. Sprites are usually one of the largest files, so let's sort all the new files by file size.

% ll 2/[456789]/* 3/*/* -S
-r--r--r--. 1 root root  17M Dec 31  1969 2/9/3
-r--r--r--. 1 root root  17M Dec 31  1969 2/9/4
-r--r--r--. 1 root root  17M Dec 31  1969 2/9/5
-r--r--r--. 1 root root  17M Dec 31  1969 2/9/6
-r--r--r--. 1 root root  17M Dec 31  1969 2/9/7
-r--r--r--. 1 root root  17M Dec 31  1969 2/9/8
-r--r--r--. 1 root root  17M Dec 31  1969 2/9/9
-r--r--r--. 1 root root  17M Dec 31  1969 3/0/0
-r--r--r--. 1 root root 6.7M Dec 31  1969 2/4/0
-r--r--r--. 1 root root 5.1M Dec 31  1969 2/6/9
-r--r--r--. 1 root root 2.0M Dec 31  1969 2/4/9
-r--r--r--. 1 root root 2.0M Dec 31  1969 2/8/1
...

Ignore the 17MB files at the top. They're too large and regular to be what we're looking for. That leaves a/2/4/0 and a/2/6/9. a/2/4/0 is where the other sprites were, so let's try a/2/6/9.

Viewing it with viewgarc.go shows 854 records of 0x10028 bytes each. That's enough records to plausibly be pokémon sprites, and they're large enough to be images (0x10000 is 256x256 bytes). Let's extract them and take a closer look.

% go run viewgarc.go -z a/2/6/9 | most
% go run extractgarc.go a/2/6/9 ~/pokemon/dumps/a269/

Examining one of the records shows that it is definitely some sort of image:

% xxd 0.0 | tail
0000ff90: 0000 0000 0000 0000 ffff 0000 6fd0 d0d0  ............o...
0000ffa0: 0000 0000 0000 0000 ffff 0000 6fd0 d0d0  ............o...
0000ffb0: 0000 0000 0000 0000 ffff 0000 6fd0 d0d0  ............o...
0000ffc0: 0000 0000 0000 0000 ffff 0000 6fd0 d0d0  ............o...
0000ffd0: 0000 0000 0000 0000 ffff 0000 6fd0 d0d0  ............o...
0000ffe0: 0000 0000 0000 0000 ffff 0000 6fd0 d0d0  ............o...
0000fff0: 0000 0000 0000 0000 ffff 0000 6fd0 d0d0  ............o...
00010000: 464c 494d fffe 1400 0001 0207 2800 0100  FLIM........(...
00010010: 0100 0000 696d 6167 1000 0000 0001 0001  ....imag........
00010020: 8000 0b08 0000 0100                      ........

See the tell-tale FLIM and imag? That's a BFLIM header. (Or footer, as it were.)

I went through a couple dead-ends trying to decode these. First with 3dstex and then etc1tool, neither of which worked. Eventually i broke down and used eevee's code from pokedex.extract.lib, which i extracted into a standalone package.

% for file in *.0; do  ~/src/flim/bin/python -m flim "$file" >"png/${file%.0}.png"; done

Decoding is super slow — which was why i had been trying to avoid python — but it works.

And hey! Looks like i was right and these are indeed the dex sprites.

Last step is to rename them to something useful. The sprites are in national dex order, but not every pokémon has a sprite — only those in the Alola dex — so the first sprite is Caterpie. All the forms for a given pokémon included in sequence with that pokémon, and after each sprite is the shiny version of the sprite. For pokémon with gender differences, the female sprite comes before the male one. Only the base form can have gender differences; alternate forms cannot themselves have gender differences.

For example, Ratatta's sprites are in the following order:

  1. Rattata-f
  2. Rattata-f (shiny)
  3. Rattata-m
  4. Rattata-m (shiny)
  5. Alolan Rattata
  6. Alolan Rattata (shiny)

I'm sure there's a data file somewhere that maps each pokémon to its sprite, but i'm too lazy to go looking for it and veekun already has most of the data anyway, so intstead I cooked up an awful SQL query to do it. (For US/UM, change the version group id to 18 and the pokedex id to 21.)

with pokemon_forms_with_genders as (
    select id, identifier, form_identifier, pokemon_id, introduced_in_version_group_id, 1 as gender_order from pokemon_forms
    union select id, identifier||'-f', 'female', pokemon_id, introduced_in_version_group_id, 0 as gender_order from pokemon_forms pf
          where true = (select has_gender_differences from pokemon_species ps join pokemon p on p.species_id = ps.id where p.id = pokemon_id and p.is_default) or pf.identifier = 'spinda'
    union values (10220, 'zygarde-10-power-construct', '10-power-construct', 10118, 17, 2))
select (row_number() over (order by ps.id, p.id, gender_order, pf.id))*2 - 1 as "file",
       ps.id, pf.identifier, coalesce(pf.form_identifier, '')
    from pokemon_species ps
    join pokemon_dex_numbers dex on ps.id = dex.species_id and dex.pokedex_id = 16
    join pokemon p on p.species_id = ps.id
    join pokemon_forms_with_genders pf on pf.pokemon_id = p.id
    join pokemon_form_generations pfg on pfg.pokemon_form_id = pf.id and pfg.generation_id=7
    where pf.introduced_in_version_group_id <= 17;

From here, it's a simple matter to get postgres to write out a TSV file and then use standard unix tools to convert that into a bunch of cp commands to do the renaming.

% psql -tA -F $'\t' -o /tmp/dex-sprites.tsv pokedex <sm-dex-sprites-query.sql
% cut -f 1,2,4 /tmp/dex-sprites.tsv | while read i n form; do
>  echo cp -i "png/$i.png" "normal/$n${form:+-$form}.png";
>  echo cp -i "png/$((i+1)).png" "shiny/$n${form:+-$form}.png";
> done >tmp/name-sprites.sh
% mkdir normal shiny
% bash /tmp/name-sprites.sh

Some of the sprites are blank (mostly totem pokémon); we can delete those.

rm -i 20-totem-alola.png 718-10-power-construct.png 718-50.png 735-totem.png 738-totem.png 754-totem.png 758-totem.png 778-totem-busted.png 778-totem-disguised.png 784-totem.png

Now just optimize them and they're pretty much ready to add to pokedex-media!

% optipng *.png
% advdef -z3 *.png