Developing Domain-Specific Generative Models

Speaker

Kathleen Lewis

CSAIL MIT

Host

John Guttag

CSAIL MIT

Abstract:

As generative AI research shifts towards large-scale foundation models, careful thought has to go into adapting these models to new domains and tasks. Our work focuses on this issue and proposes novel methods for adapting large-scale generative models to domain-specific tasks. We demonstrate our methods on three specific applications: virtual try-on, conceptual art, and fine-grained object classification. In addition to the technical contributions, my thesis explores broader open questions about domain-specific generative models; for example, how can we carefully construct our training data to mitigate bias? What do human-in-the-loop methods for creative generative AI look like in practice? To what extent are large-scale vision-language models useful for traditionally image-only tasks?

This talk will focus on two of our generative methods, TryOnGAN and GIST. In the TryOnGAN work, we present a modified StyleGAN2 architecture and introduce a layered latent space interpolation method for photorealistic virtual try-on. GIST combines existing foundation models in a novel way to generate image-specific fine-grained text descriptions from image-only datasets. We demonstrate the utility of GIST by fine-tuning vision-language models on the image-and-generated-text pairs to learn an aligned vision-language representation space for improved classification. We evaluate our learned representation space in full-shot and few-shot scenarios across four diverse fine-grained classification datasets and demonstrate state-of-the-art classification performance.

Committee: John Guttag (MIT), Frédo Durand (MIT), Guha Balakrishnan (Rice University), Adrian Dalca (MIT/HMS/MGH)

Add to Calendar 2023-08-25 10:00:00 2023-08-25 11:00:00 America/New_York Developing Domain-Specific Generative Models Abstract:As generative AI research shifts towards large-scale foundation models, careful thought has to go into adapting these models to new domains and tasks. Our work focuses on this issue and proposes novel methods for adapting large-scale generative models to domain-specific tasks. We demonstrate our methods on three specific applications: virtual try-on, conceptual art, and fine-grained object classification. In addition to the technical contributions, my thesis explores broader open questions about domain-specific generative models; for example, how can we carefully construct our training data to mitigate bias? What do human-in-the-loop methods for creative generative AI look like in practice? To what extent are large-scale vision-language models useful for traditionally image-only tasks? This talk will focus on two of our generative methods, TryOnGAN and GIST. In the TryOnGAN work, we present a modified StyleGAN2 architecture and introduce a layered latent space interpolation method for photorealistic virtual try-on. GIST combines existing foundation models in a novel way to generate image-specific fine-grained text descriptions from image-only datasets. We demonstrate the utility of GIST by fine-tuning vision-language models on the image-and-generated-text pairs to learn an aligned vision-language representation space for improved classification. We evaluate our learned representation space in full-shot and few-shot scenarios across four diverse fine-grained classification datasets and demonstrate state-of-the-art classification performance.Committee: John Guttag (MIT), Frédo Durand (MIT), Guha Balakrishnan (Rice University), Adrian Dalca (MIT/HMS/MGH) Star Seminar Room (32-D463); Zoom link: https://mit.zoom.us/j/95380044616

Organizer & Contact

Sheila M. Marian

sheila@csail.mit.edu

Developing Domain-Specific Generative Models

Speaker

Host

August 25 2023

Location

Organizer & Contact