There are 1011 accepted papers for this year conference including 30 orals, 168 spotlights and 813 posters out of 4856 papers with an acceptance rate of 20.8%.(source)I wanted to read all the abstracts in 24 waking hours I could get before the conference starts..I got 1440 mins to read 1011 abstracts, having an average time of 1.42 mins..Totally being stupid, I wanted to summarize the abstract to make a mini-abstract so that it would be easy to follow a concise abstract when I come back to it later or to share it.I started reading abstracts, taking a set of 20 (first 20) from the first poster session of the conference ‘Tue Poster Session A’ (it has 168 papers)..Its really intimidating and overwhelming to read a concentrated abstract of solid research investigation, even one, and I have to read 20 such papers and keep reading..And I felt the ease, when reading the abstract, to attend to the problem they are solving and the novelty, validity, and the impact of their solution to the field.Overall, I am really happy that I made myself to read not-a-regular number of abstracts, even though it seemed fatal in many ways!!.I still want to read all the abstracts from the conference but it could take, may be, a week..I will get you posted.These are the must read from the papers I have gone through (18 papers short of the entire ‘Tue Poster Session A’)- Sort-of-tags are not so efficient in representing these papers, they are just a mere human latent perceptive overhead, sometimes seen as feelings.Generalizing Point Embeddings using the Wasserstein Space of Elliptical DistributionsFUNDAMENTALSA novel framework for embeddings which are are numerical flexible and which extend the point embeddings, elliptical embeddings in wessserstein space..Wasserstein elliptical embeddings are more intuitive and yield tools that are better behaved numerically than the alternative choice of Gaussian embeddings with the Kullback-Leibler divergence..The paper demonstrates the advantages of elliptical embeddings by using them for visualization, to compute embeddings of words, and to reflect entailment or hypernymy.Are GANs Created Equal?.Generally, network structures designed specifically for image classification are directly used as default backbone structure for other tasks including detection and segmentation, but there is seldom backbone structure designed under the consideration of unifying the advantages of networks designed for pixel-level or region-level predicting tasks, which may require very deep features with high resolution..A Faster R-CNN detection model trained on MNIST detection showed 24% better IOU when using CoordConv, and in the Reinforcement Learning (RL) domain agents playing Atari games benefit significantly from the use of CoordConv layers.Which Neural Net Architectures Give Rise to Exploding and Vanishing Gradients?FUNDAMENTALS, UNDERSTANDINGWe give a rigorous analysis of the statistical behavior of gradients in a randomly initialized fully connected network N with ReLU activations..From this point of view, we rigorously compute finite width corrections to the statistics of gradients at the edge of chaos.A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized CommunicationPRACTICALThe large communication overhead has imposed a bottleneck on the performance of distributed Stochastic Gradient Descent (SGD) for training deep neural networks..Our evaluation validates our theoretical results and shows that our PQASGD can converge as fast as full-communication SGD with only 3%−5% communication data size.Regularizing by the Variance of the Activations’ Sample-VariancesFUNDAMENTALS, NORMALIZATIONNormalization techniques play an important role in supporting efficient and often more effective training of deep neural networks.. More details