Project Name
Learning from Cross-Modality Data for Image Semantics Understanding, Description, Synthesis, & Manipulation
Project Goal
- Learning from data across modalities such as image and text typically requires proper data and label supervision. How to learn AI models from cross-modality data without observing such supervision would be the challenging yet practical problem to tackle.
- As a core technology project, we target at four different and mutually related vision-and-language tasks in this project: novel image captioning, unsupervised image manipulation, image scene graph understanding & completion, and semantics-guided image completion.
Project Description
1. Scene Graph (SG) Expansion
*Unknown semantic inference
*Self-attention and masked language modelling
2. Semantics-Guided Image Completion
*Scene graph to layout and layout to image
*Conditional graph convolutional network (GCN)
3. Novel Object Captioning
*Pseudo-caption and Self-retrieval Cycle Consistency
*Self-attention
4. Text-to-Image Manipulation
*Learning how to modified images
*Learning where to modified images