Abstract: While RNNs have proven themselves beyond doubt in NLP, training them can be painfully slow due their sequential nature. In this talk we'll leverage the inherent parallelism of constitutional architectures and explore their applications to NLP. We'll focus on capturing multi time scale dependencies while maximizing gradient flow in our network and conclude by investigating "up sampling" techniques for text.
Description: RNNS work great for text but convolutions can do it faster. Any part of a sentence can influence the semantics of a word. For that reason we want our network to see the entire input at onceGetting that big a receptive can make gradients vanish and our networks failWe can solve the vanishing gradient problem with DenseNets or Dilated ConvolutionsSometimes we need to generate text. We can use “deconvolutions” to generate arbitrarily long outputs.
Bio: Tal Perry is an entrepreneur and data scientist. Tal worked at Citi applying deep learning methodologies to various NLP tasks within the bank. Previously Tal was CTO of Superfly a provider of alternative data to the financial industry as well as founder of an algorithmic trading fund. Tal holds a B.Sc in mathematics from Tel Aviv University.