Storybook Library API

Description

Cortical Visual Impairment (CVI) is a visual processing disability where brain can process realistic photos much easier than illustrations, and is the leading cause of visual impairments in the US. Children with CVI have an inability to process abstract shapes like drawings and cartoons into meaningful object associations, making it difficult to learn how to read and limiting their ability to thrive. Storybook Library solves this by generating photorealistic images from literature, with limited characters on a pure black background.

This is achieved as a multi-step process that involves: parsing the text with an LLM to identify characters, clothing, scenes and themes, rendering these photo-realistically using an AI image model, monitoring and adjusting prompts for safety, and maintaining character consistency and continuity from scene to scene using a database record. Designed, architected, and developed from concept to delivery for educators working across 12 school districts in central Connecticut.

Visual Demo

Key Technical Decisions

LLM text preprocessing before illustration

Raw story text is preprocessed through GPT-4.1-mini before image generation. The LLM extracts scene boundaries, identifies characters present, and generates detailed visual descriptions — converting narrative prose into structured illustration prompts. This separation ensures the image model receives consistent, visually-oriented instructions rather than raw literary text.

Character consistency through context propagation

A significant challenge with AI illustration is that each render is independent, so a character can look completely different from one page to the next. The pipeline solves this by maintaining a persistent character registry per book. Character descriptions are stored in the database and evolve — when a character's appearance is approved after regeneration, the reference description updates to match. Every image prompt includes the full character context for all characters in that scene, plus reference images from previously approved renders.

Separate API and Worker services

Cost control pattern — the deployed API is intentionally read-only. The AI pipeline runs locally where costs can be monitored. The public endpoint only serves pre-processed content.

Upstash Redis polling with sorted-set delayed jobs

Upstash has no blocking pop (BLPOP). The worker polls with a 5-second interval and uses a sorted set with timestamp scores for rate limit backoff, promoting delayed jobs when their wait time expires.

Hand-rolled image regeneration UX

Admin can click any thumbnail to open a lightbox, edit the AI prompt, regenerate, then approve or reject in a side-by-side comparison view. Rejected images are deleted from R2; approved ones replace the original.