November 7, 2019 - November 8, 2019

MSRA Academic Day 2019

Location: Beijing, China

Technology Showcase by Microsoft Research Asia

  • Presenter: Mao Yang, Microsoft Research

    As computer systems and networking get increasingly complicated, optimizing them manually with explicit rules and heuristics becomes harder than ever before, sometimes impossible. At Microsoft Research Asia, our AutoSys project applies learning to large-scale system performance tuning. Our AutoSys framework (1) defines interfaces to expose system features for learning, (2) introduces monitors to detect learning-induced failures, and (3) runs resource management to support heterogenous requirements of learning-related tasks. Based on AutoSys, we have built a tool to help many crucial system scenarios within Microsoft. These scenarios include multimedia search for Bing (e.g., tail latency reduced by up to ~40%, and capacity increased by up to ~30%), job scheduling for Bing Ads (e.g., tail latency reduced by up to ~13%), and so on.

  • Presenter: Yingce Xia and Xu Tan, Microsoft Research

    Many AI tasks are emerged in dual forms, e.g., English-to-French translation vs. French-to-English translation, speech recognition vs. speech synthesis, question answering vs. question generation, and image classification vs. image generation. Dual learning is a new learning framework that leverages the primal-dual structure of AI tasks to obtain effective feedback or regularization signals to enhance the learning/inference process. In this demo, we will show two applications of dual learning: machine translation and speech synthesis.

  • Presenter: Tao Ge, Microsoft Research

    Neural sequence-to-sequence (seq2seq) approaches have proven to be successful in grammatical error correction (GEC). Based on the seq2seq framework, we propose a novel fluency boost learning and inference mechanism. Fluency boosting learning generates diverse error-corrected sentence pairs during training, enabling the error correction model to learn how to improve a sentence’s fluency from more instances, while fluency boosting inference allows the model to correct a sentence incrementally with multiple inference steps. Combining fluency boost learning and inference with conventional seq2seq models, our approach achieves the state-of-the-art performance in the GEC benchmarks.

  • Presenter: Qiang Huo, Microsoft Research

    In Microsoft, we have been developing a new generation OCR engine (aka OneOCR), which can detect both printed and handwritten text in an image captured by a camera or mobile phone, and recognize the detected text for follow-up actions. Our unified OneOCR engine can recognize mixed printed and handwritten English text lines with arbitrary orientations (even flipped), outperforming significantly other leading industrial OCR engines on a wide range of application scenarios. Empowered by OneOCR engine, Computer Vision Read (opens in new tab) capability and Cognitive Search capability of Azure Search (opens in new tab) are generally available, and a Form Recognizer (opens in new tab) with Receipt Understanding (opens in new tab) capability is available for preview, all in Azure Cognitive Services, which can power enterprise workflows and Robotic Process Automation (RPA) to spur digital transformation. In this presentation, I will demonstrate the capabilities of Microsoft’s latest OneOCR engine, highlight its core component technologies, and explain the roadmap ahead.

  • Presenter: Shi Han, Microsoft Research

    Ideas in Excel aims at such one-click intelligence—when a user clicks the Ideas button on the Home tab of Excel, the intelligent service will empower the user to understand his or her data via automatic recommendation of visual summaries and interesting patterns. Then the user can insert the recommendations to the spreadsheet to help further analysis or as analysis result directly. To enable such one-click intelligence, there are underlying technical challenges to solve. At the Data, Knowledge and Intelligence group of Microsoft Research Asia, we have long-term research on spreadsheet intelligence and automated insights accordingly. And via close collaboration with Excel product teams, we transferred a suite of technologies and shipped Ideas in Excel together. In this demo presentation, we will show this intelligent feature and introduce corresponding technologies.

Technology Showcase by Academic Collaborators

  • Presenter: Yucheol Jung, Wonjong Jang, and Seungyong Lee, POSTECH

    A 3D caricature can be defined as a 3D mesh with cartoon-style shape exaggeration of a face. We present a novel deep learning based framework that generates a 3D caricature for a given real face image. Our approach exploits 3D geometry information in the caricature generation process and produces more convincing 3D shape exaggerations than 2D caricature-based approaches.

  • Presenter: Minlie Huang, Tsinghua University

    A Co-Training Method towards Machine Reading Comprehension

  • Presenter: Hiroki Watanabe, Hokkaido University

    A Method for Controlling Human Hearing by Editing the Frequency of the Sound in Real Time

  • Presenter: Gunhee Kim, Seoul National University

    We address the problem of abstractive summarization in two directions: proposing a novel dataset and a new model. First, we collect Reddit TIFU dataset, consisting of 120K posts from the online discussion forum Reddit. We use such informal crowd-generated posts as text source, in contrast with existing datasets that mostly use formal documents as source such as news articles. Thus, our dataset could less suffer from some biases that key sentences usually locate at the beginning of the text and favorable summary candidates are already inside the text in similar forms. Second, we propose a novel abstractive summarization model named multi-level memory networks (MMN), equipped with multi-level memory to store the information of text from different levels of abstraction. With quantitative evaluation and user studies via Amazon Mechanical Turk, we show the Reddit TIFU dataset is highly abstractive and the MMN outperforms the state-of-the-art summarization models.

  • Presenter: TianZhu Zhang, University of Science and Technology of China

    We adapt the attention mechanism for visual and semantic elements representation.

    We adaptively construct graphs and update the features for objects and words, making good use of both the intra modality relationship and inter modality relationship.

    We consider the structure information across different graphs by proposing a constraint on the semantic element, forcing the semantic element aligning to the corresponded visual element.

    The proposed model obtains the promising results on dataset Flickr30K and MS-COCO.

  • Presenter: Yinpeng Dong, Tsinghua University

    Adversarial Attacks and Defenses in Deep Learning

  • Presenter: Huamin Qu, The Hong Kong University of Science and Technology

    Existing visualization designs are often based on manual design and need lots of human efforts. How can we apply deep learning techniques to automatically generating visualization products? We report our two recent progresses on this direction:

    Automated Graph Drawing: We propose a graph-LSTM-based model to directly generate graph drawings with desirable visual properties similar to the training drawings, which do not need users to tune different algorithm-specific parameters.

    Automated Design of Timeline Infographics: We contribute an end-to-end approach to automatically extract an extensible template from a bitmap timeline image. The output can be used to generate new timelines with updated data.

  • Presenter: Ai-Chun Pang, National Taiwan University

    Ensure data effectiveness by the blockchain technology so as to hold data properties like immutability and credibility during the whole transaction process.

  • Presenter: Sangwoo Ji and Jong Kim, POSTECH

    Bin Zhu, Microsoft Research

    Bypass two backdoor detection method: suspicious data instance detection and backdoor trigger detection.

  • Presenter: Chuck Yoo, Korea University

    • Existing network optimizations suffer from poor stability, low resource efficiency, and a need for API changes
    • Solution: Kernel-based optimization for high-performance networking
    • L3 forwarding achieves performance similar to DPDK
    • A virtual switch achieves 67.5% performance of DPDK-OVS and three times greater resource efficiency
  • Presenter: Xiangyang Ji, Tsinghua University

    CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation

  • Presenter: Hongming Zhang, The Hong Kong University of Science and Technology

    Understanding human‘s language requires complex commonsense knowledge. However, existing large-scale knowledge graphs mainly focus on knowledge about entities while ignoring commonsense knowledge about activities, states, or events, which are used to describe how entities or things act in the real world. To fill this gap, we develop ASER (activities, states, events, and their relations), a large-scale eventuality knowledge graph extracted from more than 11-billion-token unstructured textual data. ASER contains 15 relation types belonging to five categories, 194-million unique eventualities, and 64-million unique edges among them. Both human and extrinsic evaluations demonstrate the quality and effectiveness of ASER.

  • Presenter: Zizhao Zhang, Tsinghua University

    A well constructed hypergraph structure can represent the data correlation accurately, yet leading to better performance.How to construct a good hypergraph to fit complex data?

  • Presenter: Youyou Lu, Tsinghua University

    Divides coherence responsibility between the switch and servers. The switch serializes conflicted requests and forwards them to correct destinations via a lock-check-forward pipeline. Servers execute requester-driven coherence control to reach coherence and transit states.

  • Presenter: Sung Ju Hwang, KAIST

    • Perform effective knowledge transfer from earlier tasks to later tasks.
    • Prevent catastrophic forgetting, where the earlier task performance gets negatively affected by semantic drift of the representations as the model adapts to later tasks.
    • Obtain maximal performance with minimal increase in the network capacity.
  • Presenter: Chao Liao, Shanghai Jiao Tong University

    Counting Hypergraph Colorings in the Local Lemma Regime

  • Presenter: Chenhui Chu, Osaka University

    Cross-Lingual Visual Grounding and Multimodal Machine Translation

  • Presenter: Gunhee Kim, Seoul National University

    Exploration based on state novelty has brought great success in challenging reinforcement learning problems with sparse rewards. However, existing novelty-based strategies become inefficient in real-world problems where observation contains not only task-dependent state novelty of our interest but also task-irrelevant information that should be ignored. We introduce an information- theoretic exploration strategy named Curiosity-Bottleneck that distills task-relevant information from observation. Based on the information bottleneck principle, our exploration bonus is quantified as the compressiveness of observation with respect to the learned representation of a compressive value network. With extensive experiments on static image classification, grid-world and three hard-exploration Atari games, we show that Curiosity-Bottleneck learns an effective exploration strategy by robustly measuring the state novelty in distractive environments where state-of-the-art exploration methods often degenerate.

  • Presenter: Dong-Ok Won and Seong-Whan Lee, Korea University

    Recently, deep reinforcement learning (DRL) has even enabled real world applications such as robotics. Here we teach a robot to succeed in curling (Olympic discipline), which is a highly complex real-world application where a robot needs to carefully learn to play the game on the slippery ice sheet in order to compete well against human opponents. This scenario encompasses fundamental challenges: uncertainty, non-stationarity, infinite state spaces and most importantly scarce data. One fundamental objective of this study is thus to better understand and model the transfer from simulation to real-world scenarios with uncertainty. We demonstrate our proposed framework and show videos, experiments and statistics about Curly our AI curling robot being tested on a real curling ice sheet. Curly performed well both, in classical game situations and when interacting with human opponents.

  • Presenter: Rui Yan, Peking University

    Deep Text Generation: Conversation and Application

  • Presenter: Ryo Furukawa, Hiroshima City University

    Development of 3D capsule endoscopic system

  • Presenter: Hiroshi Kawasaki, Kyushu University

    Our project aims to research on human representation and understanding human motion based on vision-based approach and develop new applications.

  • Presenter: Jingwen Leng, Shanghai Jiao Tong University

    The proposed graph instrumentation framework can observe and modify neural networks using user-defined analysis code without changes in source code.

  • Presenter: Lei Chen, The Hong Kong University of Science and Technology

    Our Goal in Domain-Specific KB Construction

    • Entity Extraction, Entity Typing and Relation Extraction related to the target domain.
    • Training data generation based on distant-supervision without human annotation.
  • Presenter: Shijie Cao, Harbin Institute of Technology

    Efficient and Effective Sparse DNNs with Bank-Balanced Sparsity

  • Presenter: Huanjing Yue, Tianjin University

    We propose an end-to-end noise estimation and removal network, where the estimated noise map is weighted concatenated with the noisy input to improve the denoising performance.

    The proposed noise estimation network takes advantage of the Bayer pattern prior of the noise maps, which not only improves the estimation accuracy but also reduces the memory cost.

    We propose a RSD block to fully take advantage of the spatial and channel correlations of realistic noise. The ablation study demonstrates the effectiveness of the proposed module.

  • Presenter: Zhenpeng Chen, Peking University

    Emoji-Powered Representation Learning for Cross-Lingual Sentiment Analysis

  • Presenter: Muoi Tran, National University of Singapore

    We present the Erebus attack, which allows large malicious Internet Service Providers (ISPs) isolate any targeted public Bitcoin nodes from the Bitcoin peer-to-peer network. The Erebus attack does not require routing manipulation (e.g., BGP hijacks) and hence it is virtually undectable to any control-plane and even typical data-plane detectors.

  • Presenter: Shou-de Lin, National Taiwan University

    We propose transforming word embeddings into interpretable representations disentangling explainable factors

    Examples of factors: a) Topical factors: food, location, animal, etc. b) Part-of-Speech factors: noun, adj, verb, etc.

    We define and propose 4 desirable properties of our disentangled word vectors: a) Modularity, b) Compactness, c) Explicitness, d) Feature preservation

  • Presenter: Winston Hsu, National Taiwan University.

    Free-form Video Inpainting with 3D Gated Conv, TPD, and LGTSM

  • Presenter: Lei Chen, The Hong Kong University of Science and Technology

    Fluid: A Blockchain based Framework for Crowdsourcing

  • Presenter: Insik Shin, KAIST

    Key idea: separation between app logic & UI parts1) Distributing target UI objects to remote devices and rendering them2) Giving an illusion as if app logic and UI objects were in the same process

  • Presenter: Youngjoo Ko and Jong Kim, POSTECH

    Bin Zhu, Microsoft Research

    Increase the performance of fuzzing to discover more bugs in multi-threading programs using interleaving coverage.

  • Presenter: Jinyoung Lee and Hong-Goo Kang, Yonsei University

    • Remove ambient noise to improve automatic speech recognition performance
    • Overcome the problems of conventional masking-based speech enhancement algorithms, e.g. speech signal distortion
    • Propose a generative and adversarial model-based approach that effectively utilizes spectro-temporal characteristics of speech and noise components
  • Presenter: Shiliang Zhang, Peking University

    • Propose Dilated Temporal Convolution (DTC) to learn short-term temporal cues
    • Propose Temporal Self Attention (TSA) to learn the long-term temporal cues
    • DTC and TSA learn complementary temporal feature
  • Presenter: Liwei Wang, Peking University

    Gradient Descent Finds Global Minima of DNNs

  • Presenter: Wei HU and Gusi Te, Peking University

    This project aims to explore the emerging graph neural networks (GNN) based on texture plus depth features to address the problem of 3D face anti spoofing. Various spoofing attacks are growing by presenting a fake or copied facial evidence to obtain valid authentication. While anti spoofingtechniques using 2D facial data have matured, 3D face anti spoofing hasn’t been studied much, thus allowing advanced spoofing techniques such as 3D masking at large. Hence, we propose to address this problem, based on texture plus depth cues acquired from RGBD cameras, and in the framework of GNN.

  • Presenter: Hongzhi Wang, Harbin Institute of Technology

    Graph-structured Knowledge Base Management and Applications

  • Presenter: YingcaiWu, Zhejiang University

    This study characterizes the problem of reachabilitycentric multi-criteria decision-making for choosing ideal homes.The system can also be adopted inother location selection scenarios, in which the reachability of locations is considered (e.g., selecting a location for a convenience store).

  • Presenter: Wensheng Dou, Chinese Academy of Sciences

    Identifying Structures in Spreadsheets

  • Presenter: Jaegul Choo, Korea University

    Recently, unsupervised exemplar-based image-to-image translation has accomplished substantial advancements. In order to transfer the information from an exemplar to an input image, existing methods often use a normalization technique, e.g., adaptive instance normalization, that controls the channel-wise statistics of an input activation map at a particular layer, such as the mean and the variance. Meanwhile, style transfer approaches similar task to image translation by nature, demonstrated superior performance by using the higher-order statistics such as covariance among channels in representing a style. However, applying this approach in image translation is computationally intensive and error-prone due to the expensive time complexity and its non-trivial backpropagation. In response, this paper proposes an end-to-end approach tailored for image translation that efficiently approximates this transformation with our novel regularization methods. We further extend our approach to a group-wise form for memory and time efficiency as well as image quality. Extensive qualitative and quantitative experiments demonstrate that our proposed method is fast, both in training and inference, and highly effective in reflecting the style of an exemplar.

  • Presenter: Jaewoo Jung, Kyungwon Lee and Seung Ah Lee, Yonsei University

    We developed a new hybrid digital-biological system that provide interactive and immersive experiences between humans and biological objects for applications in life science education and research. The scope of this work includes;

    • Construction of an automated optical stimulation microscope, which uses light to both image and interface with light-sensitive cells.
    • Use of human interaction modalities to convert human’s natural input into stimuli for the microscopic biological objects.
    • Comparative user study as a public installation that evaluated user behaviors, user engagement and learning outcomes.

    We expect that this platform will transform microscopes from a passive observation tool to an active interaction medium, assisting scientific research, life science education and clinical interventions.

  • Presenter: TaiNing Wang and Chee-Yong Chan, National University of Singapore

    Improving Join Reorderability with Compensation Operators

  • Presenter: Hai Truong, Rajesh Krishna Balan, Singapore Management University

    Automatic analysis of the behaviour of large groups of people is an important requirement for a large class of important applications such as crowd management, traffic control, and surveillance. For example, attributes such as the number of people, how they are distributed, which groups they belong to, and what trajectories they are taking can be used to optimize the layout of a mall to increase overall revenue. A common way to obtain these attributes is to use video camera feeds coupled with advanced video analytics solutions. However, solely utilizing video feeds is challenging in high people-density areas, such as a normal mall in Asia, as the high people density significantly reduces the effectiveness of video analytics due to factors such as occlusion. In this work, we propose to combine video feeds with WiFi data to achieve better classification results of the number of people in the area and the trajectories of those people. In particular, we believe that our approach will combine the strengths. of the two different sensors, WiFi and video, while reducing the weaknesses of each sensor. This work has started fairly recently and we will present our thoughts and current results up to now.

  • Presenter: Jiaying Liu, Peking University

    Intelligent Action Analytics

  • Presenter: Changjian Chen, Tsinghua University

    Interactive Methods to Improve Data Quality

  • Presenter: Nobuaki MINEMATSU, University of Tokyo

    Inter-learner shadowing framework for comprehensibility-based assessment of learners’ speech

  • Presenter: Heejo Lee, Korea University

    An open platform for feedback based fuzzing improves its testing performance using two factors: binary feedback and user feedback.

  • Presenter: Xueming Qian, Xi’an Jiaotong University

    1.We proposed Attention Fusion Network (AFN). it pay attention to food discrimination region against unstru-ctured defeat, and generate the feature embeddings jointly aware the ingredients and food.

    2.We proposed the balance focal loss (BFL) to enhance the joint learning of ingredients and food, optimize feature expression ability for multi-label ingredients

    3. The effectiveness is proved through the comparative experiments.  In particular, the use of balance focal loss make the Micro-F1, Macro-F1 and Accuracy of ingredi-ents improved by 5.76%, 12.62% and 5.78%.

  • Presenter: Insu Han, KAIST

    MAP Inference for Customized Determinantal Point Processes via Maximum Inner Product Search

  • Presenter: Hong Xu, City University of Hong Kong

    Minimizing Network Footprint in Distributed Deep Learning

  • Presenter: Hirofumi Inaguma, Kyoto University

    Directly translate source speech to target languages with a single sequence-to-sequence (S2S) model

    • Many-to-many (M2M)
    • One-to-many (O2M)

    Outperformed the bilingual end-to-end speech translation (E2E-ST) models

    Shared representations obtained from multilingual E2E-ST were more effective than those from the bilingual one for transfer learning to a very low-resource ST task: Mboshi->French (4.4h)

  • Presenter: Mingkui Tan, South China University of Technology

    • We propose a novel MWGAN to optimize the multi-marginal distance among different domains.
    • We define and analyze the generalization performance of MWGAN for the multiple domain translation task.
    • Extensive experiments demonstrate the effectiveness of MWGAN on balanced and imbalanced translation tasks.
  • Presenter: Mingkui Tan, South China University of Technology

    • Propose a novel Neural Architecture Transformer (NAT) to optimize any arbitrary architecture.
    • Cast the problem into a Markov Decision Process.
    • Employ Graph Convolution Network to learn the policy.
  • Presenter: Wenfei Wu, Tsinghua University

    We propose a new NF development framework named NFD which consists of an NF abstraction layer to develop NF behavior models and a compiler to adapt NF models to specific runtime environments.

  • Presenter: Seung-won Hwang, Yonsei University

    Question Answering (QA) has been mostly studied in the context of factoid, providing concise facts. In contrast, we study Non-factoid QA, extending to cover more realistic questions such as how- or why-questions with long answers, from long texts or videos. This demo and poster address the following questions:

    • Non-factoid QA for text, combining the complementary strength of representation- and interaction-focused approaches [EMNLP19]. Extending this task for video has the opportunity and challenge, coming from multimodality and having no pre-divided answer candidates (e.g. paragraph), which is our ongoing MSRA collaboration.
    • Human-in-the-loop debugging for QA Demo [SIGIR19]
  • Presenter: Chuhan Wu, Tsinghua University

    • Different users usually have different interests in news.
    • Different users may click the same news article due to different interests.
    • We need personalized news and user representation!
  • Presenter: Hiroaki Yamane, The University of Tokyo

    We construct methods for converting contextual language to numerical variables for quantitative/numerical common sense in natural language processing.

  • Presenter: Shiyin Lu, Nanjing University

    Online Convex Optimization in Non-stationary Environments

  • Presenter: Yonggang Wen, Nanyang Technological University

    QoE depending multiple families of Influential Factors (IF), to be optimized jointly for the best user experience.

    How to develop a unified and scalable framework to optimize QoE for multimedia communications, in the presence of system dynamics?

  • Presenter: Tadashi Nomoto, National Institute of Japanese Literature

    This work explores the impact of the subword representation on paraphrasing and text simplification. Experiments found that when combined with REINFORCE, the subword scheme boosted performance beyond the current state of the art both in paraphrasing and text simplification.

  • Presenter: Jun Takamatsu, Nara Institute of Science and Technology

    Pick-Carry-Place Household Tasks Using Labanotation for Learning-from-Observation Robots

  • Presenter: Wei-Shi Zheng, Sun Yat-sen University

    Predicting Future Instance Segmentation

    • Given several frames in a video, this task is to predict future instance segmentation before the corresponding frames are observed.
    • It is challenging due to the uncertainty in appearance variation caused by object moving, occlusion between objects, and viewpoint changing in videos.
  • Presenter: Yaoan Jin and Atsuko Miyaji, Graduate School of Engineering Osaka University

    Any attack based on information, such as timing information and power consumption, gained from the implementation of a cryptosystem.

    • Simple Power Analysis (SPA)
    • Safe Error Attack
  • Presenter: Hang Su, Tsinghua University

    In this work, we find that pre-training an over-parameterized model is not necessary for obtaining an efficient pruned structure. We propose a novel network pruning pipeline which allows pruning from scratch.

  • Presenter: Jun Du, University of Science and Technology of China

    Recent Progress of Handwritten Mathematical Expression Recognition

  • Presenter: Dahun Kim, KAIST

    • To remove unwanted object from a video
    • Frame-by-frame image inpainting
  • Presenter: Wonpyo park, Dongju Kim, and Minsu Cho, POSTECH

    Yan Lu, Microsoft Research

    Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller. Previous approaches can be expressed as a form of training the student to mimic output activations of individual data examples represented by the teacher. We introduce a novel approach, dubbed relational knowledge distillation (RKD), that transfers mutual relations of data examples instead. For concrete realizations of RKD, we propose distance-wise and angle-wise distillation losses that penalize structural differences in relations. Experiments conducted on different tasks show that the proposed method improves educated student models with a significant margin. In particular for metric learning, it allows students to outperform their teachers’ performance, achieving the state of the arts on standard benchmark datasets.

  • Presenter: Yu Zhang, YuxiangZhang, YitongHuang, Xing Guo, University of Science and Technology of China

    Research on Deep Learning Framework for Julia

  • Presenter: Ting Liu, Xi’an Jiaotong University

    SARA: Self-Replay Augmented Record and Replay for Android in Industrial Cases

  • Presenter: Fengyuan Xu, Nanjing University

    Video transformation needs to meet new requirements in actual use, such as privacy protection under surveillance scenarios:

    • The transformed video can be restored to the original ones.
    • The transformed video only can be restored by the authorized party.

    We need a unified translation style and a unique stenography.

  • Presenter: Shintami Chusnul Hidayati, Institut Teknologi Sepuluh Nopember; Wen-Huang Cheng, National Chiao Tung University; Jianlong Fu, Microsoft Research

    StyleMe: An AI Fashion Consultant for Personal Shopping and Style Advice

  • Presenter: Cheng Li, University of Science and Technology of China

    System support for designing efficient gradient compression algorithms for distributed DNN training

  • Presenter: Tackgeun You, POSTECH and Bohyung Han, Seoul National University

    • Introduce a benchmark for temporal cause and effect localization on car crash videos.
    • Propose a multi-task baseline for simultaneously conducting temporal cause and effect localization.
    • Propose a multi-task neural architecture search that decides to share or separate building blocks
  • Presenter: Chaoyu Guan, Shanghai Jiao Tong University

    A unified information based measure : quantify the information of each input word that is encoded in an intermediate layer of a deep NLP model.

    The information based measure as a tool

    • Evaluating different explanation methods.
    • Explaining different deep NLP models

    This measure enriches the capability of explaining DNNs.

  • Presenter: Ting Liu, Xi’an Jiaotong University

    Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation

  • Presenter: Seungmoon Choi and Seungjae Oh, POSTECH

    • Recognize contact finger(s) on any rigid surfaces by decoding transmitted frequencies
    • Identify a grasped object by visualizing the propagation dynamics of vibration
  • Presenter: Kibeom Hong and Hyeran Byun, Yonsei University

      • Video can be created by separating Background and Foreground, and Foreground can be divided into Object and Action.
      • We can get background and foreground information for video generation from text.
      • In the Image domain, previous works[1,2,3] have studied image generation with text extensively, [4,5,6] expanded this idea to video domain.
      • In this work, we want to create a video with three components in order to control more realistic and fine-grained parts.
  • Presenter: Zhou Zhao, Zhejiang University

    Video dialog is a new and challenging task, which requires the agent to answer questions combining video information with dialog history. And different from single-turn video question answering, the additional dialog history is important for video dialog, which often includes contextual information for the question. Existing visual dialog methods mainly use RNN to encode the dialog history as a single vector representation, which might be rough and straightforward. Some more advanced methods utilize hierarchical structure, attention and memory mechanisms, which still lack an explicit reasoning process. In this paper, we introduce a novel progressive inference mechanism for video dialog, which progressively updates query information based on dialog history and video content until the agent think the information is sufficient and unambiguous. In order to tackle the multi- modal fusion problem, we propose a cross-transformer module, which could learn more fine-grained and comprehensive interactions both inside and between the modalities. And besides answer generation, we also consider question generation, which is more challenging but significant for a complete video dialog system. We evaluate our method on two largescale datasets, and the extensive experiments show the effectiveness of our method.

  • Presenter: Zheng Yang, Tsinghua University

    Widar 3.0: Zero-Effort Cross-Domain Gesture Recognition with Wi-Fi

  • Presenter: Min Zhang, Tsinghua University

    The key to solving this problem is to conduct better user profiling.

    How about off-topic features in other platforms, such as tweets?

        • On-topic features are helpful in understanding users’ interests and preference.
        • Off-topic features are able to describe users too.

    We will try to introduce these off-topic features (tweets) into different rating prediction algorithms.