- This event has passed.
Using Deep Learning to Better Understand Vision and Language
January 17 @ 17:30 - 19:00
Co-sponsored by: IS&T Rochester
Deep learning has enabled incredible advances in computer vision, natural language processing, and general image and video understanding. Recurrent neural networks have demonstrated the ability to generate text from visual stimuli, while image generative networks have demonstrated remarkable ability to create photorealistic images. Towards appreciating these methods, this talk is divided into two broad groups. Firstly, we introduce a general purpose Steered Gaussian Attention Model for video understanding. The use of an attention based hierarchical approach along with automatic boundary detection delivers state-of-the-art results on popular video captioning datasets. In the second part of the talk, we discuss four modality transformations: visual to text, text to visual, visual to visual and text to text. In addition to reviewing recent techniques, we introduce improvements in all transformations. To conclude, we show interesting results how the generative methods can seamlessly integrate bidirectional written and visual modalities.
Speaker(s): Shagan Sah,
Room: 1st floor fish bowl room
Bldg: CIS, bldg 76
Rochester Institute of Technology
one lomb memorial drive
Rochester, New York