StyleBabel: Artistic Style Tagging and Captioning

Dan Ruta*, Andrew Gilbert, Pranav Aggarwal, Ajinkya Kale, Jo Briggs, Chris Speed, Halin Jin, Baldo Faieta, Alex Filipkowski , Zhe Lin, John Collomosse

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


We present StyleBabel, a unique open access dataset of natural language captions and free-form tags describing the artistic style of over 135K digital artworks, collected via a novel participatory method from experts studying at specialist art and design schools. StyleBabel was collected via an iterative method, inspired by ‘Grounded Theory’: a qualitative approach that enables annotation while co-evolving a shared language for fine-grained artistic style attribute description. We demonstrate several downstream tasks for StyleBabel, adapting the recent ALADIN architecture for fine-grained style similarity, to train cross-modal embeddings for: 1) free-form tag generation; 2) natural language description of artistic style; 3) fine-grained text search of style. To do so, we extend ALADIN with recent advances in Visual Transformer (ViT) and cross-modal representation learning, achieving a state of the art accuracy in fine-grained style retrieval.
Original languageEnglish
Title of host publicationComputer Vision – ECCV 2022
Subtitle of host publication17th European Conference
Place of PublicationCham, Switzerland
Publication statusAccepted/In press - 3 Jun 2022
EventECCV 2022: European Conference on Computer Vision (ECCV) - Expo Tel Aviv, Tel Aviv, Israel
Duration: 23 Oct 202227 Oct 2022

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


ConferenceECCV 2022
CityTel Aviv
Internet address


Dive into the research topics of 'StyleBabel: Artistic Style Tagging and Captioning'. Together they form a unique fingerprint.

Cite this