JacketGAN

Development Timeline

PG-Pad-Aug d50-200e

Model 1

This model uses a Progressive Growing (PG) architecture with padding and data augmentation, trained for 200 epochs.

PGv4-RefPad d50-300e

Model 2

The 4th version of the PG model, using reflection padding. It was trained for 300 epochs.

PGv5-RefPad d50-300e

Model 3

The 5th version of the PG model, also using reflection padding and trained for 300 epochs.

PG-Aug d50-200e

Model 4

This iteration returns to the original PG architecture with augmentation, trained for 200 epochs.

Std d25-150e

Model 5

A standard (non-progressive) architecture with a dataset split of 25%, trained for 150 epochs.

Std-VLR-L d48-250e

Model 6

This standard model uses a very low learning rate (VLR) and a larger dataset split (48%), trained for 250 epochs.

Project Overview

This case study chronicles a multi-month project to build and refine a Generative Adversarial Network (GAN) for creating high-resolution j-card “jacket” designs. This project was inspired by frustration with the high cost and labor that goes into manually reformatting game art, especially given the high price these pieces can fetch.

The project addresses a specific, monotonous, and labor-intensive problem: the translation of three-part Nintendo 3DS game case covers into j-card layouts. When done manually, this process can take 20-45 minutes per cover. As a case in point, creating just 37 covers for the mainline Pokémon games took roughly two weeks of on-and-off work. The goal was to build a deep learning pipeline that could automate this task intelligently.

The core solution is an ETL pipeline powered by Pix2Pix, a conditional GAN framework known for image-to-image translation. The process involved scraping 157 high-quality 3DS cover scans using game serial IDs from databases like 3dsdb and GameTDB. These source images were paired with 50 manually created "target" images, which served as the ground truth for training. The project was intentionally scoped to 3DS covers to ensure a manageable and high-quality dataset could be sourced.

Key challenges included developing a semi-automated GUI tool using OpenCV for consistent image segmentation and navigating the iterative process of model training and refinement. The project ultimately trained eight specialized sub-models in a human-in-the-loop cycle, underscoring a core lesson: robust data preparation is often more critical than complex architectural changes. Currently, the top-performing model achieves about 80% of the desired output, with a clear path to improvement by expanding the training set.

Methodology Highlights

Pix2Pix Architecture: Utilized a U-Net based Generator and a PatchGAN Discriminator for the image translation task.
Structured Data Acquisition: Sourced an initial dataset of 157 covers from GameTDB and 3dsdb.
Semi-Automated Segmentation: Built an interactive tool with OpenCV for edge detection, segmenting images into 6 sections.
Specialized Sub-Models: Trained 8 distinct sub-models to handle different components of the j-card design.
Accelerated Training: Leveraged Google Colab Pro with A100 GPUs and enabled mixed-precision for significant speed gains.
Human-in-the-Loop Feedback: Used an iterative HITL cycle for continuous model refinement and improvement.

ETL Process

Quantitative Results

PSNR Comparison

Peak Signal-to-Noise Ratio (PSNR) measures image quality by comparing the maximum possible power of a signal to the power of corrupting noise. Higher PSNR values generally indicate higher quality reconstructions.

SSIM Comparison

The Structural Similarity Index Measure (SSIM) assesses the perceptual difference between two images. It is better aligned with human visual perception than PSNR. A value closer to 1 indicates a higher similarity.

About the Creator

Nicholas Taylor

Hey! Thanks for checking out this project. I'm a Data Visualization student at the University of Washington, and if you like this project I'd love to connect! Check me out on the paltforms below.

JacketGAN

Development Timeline

PG-Pad-Aug d50-200e

PGv4-RefPad d50-300e

PGv5-RefPad d50-300e

PG-Aug d50-200e

Std d25-150e

Std-VLR-L d48-250e

Project Overview

Methodology Highlights

ETL Process

Quantitative Results

PSNR Comparison

SSIM Comparison

About the Creator

Nicholas Taylor

Case Study: JacketGAN

Background

The Problem: Digital Labor Meets AI

The Solution: A GAN-Powered Pipeline

Phase 1: Data Acquisition

Phase 2: Image Segmentation

Phase 3: Model Training

Phase 4: Iteration & Refinement

Technical Details: Training Specification

Current State & Next Steps