November 7, 2019 - November 8, 2019

MSRA Academic Day 2019

Location: Beijing, China

Technology Showcase by Microsoft Research Asia

Presenter: Mao Yang, Microsoft Research

As computer systems and networking get increasingly complicated, optimizing them manually with explicit rules and heuristics becomes harder than ever before, sometimes impossible. At Microsoft Research Asia, our AutoSys project applies learning to large-scale system performance tuning. Our AutoSys framework (1) defines interfaces to expose system features for learning, (2) introduces monitors to detect learning-induced failures, and (3) runs resource management to support heterogenous requirements of learning-related tasks. Based on AutoSys, we have built a tool to help many crucial system scenarios within Microsoft. These scenarios include multimedia search for Bing (e.g., tail latency reduced by up to ~40%, and capacity increased by up to ~30%), job scheduling for Bing Ads (e.g., tail latency reduced by up to ~13%), and so on.
Presenter: Yingce Xia and Xu Tan, Microsoft Research

Many AI tasks are emerged in dual forms, e.g., English-to-French translation vs. French-to-English translation, speech recognition vs. speech synthesis, question answering vs. question generation, and image classification vs. image generation. Dual learning is a new learning framework that leverages the primal-dual structure of AI tasks to obtain effective feedback or regularization signals to enhance the learning/inference process. In this demo, we will show two applications of dual learning: machine translation and speech synthesis.
Presenter: Tao Ge, Microsoft Research

Neural sequence-to-sequence (seq2seq) approaches have proven to be successful in grammatical error correction (GEC). Based on the seq2seq framework, we propose a novel fluency boost learning and inference mechanism. Fluency boosting learning generates diverse error-corrected sentence pairs during training, enabling the error correction model to learn how to improve a sentence’s fluency from more instances, while fluency boosting inference allows the model to correct a sentence incrementally with multiple inference steps. Combining fluency boost learning and inference with conventional seq2seq models, our approach achieves the state-of-the-art performance in the GEC benchmarks.
Presenter: Qiang Huo, Microsoft Research

In Microsoft, we have been developing a new generation OCR engine (aka OneOCR), which can detect both printed and handwritten text in an image captured by a camera or mobile phone, and recognize the detected text for follow-up actions. Our unified OneOCR engine can recognize mixed printed and handwritten English text lines with arbitrary orientations (even flipped), outperforming significantly other leading industrial OCR engines on a wide range of application scenarios. Empowered by OneOCR engine, Computer Vision Read (opens in new tab) capability and Cognitive Search capability of Azure Search (opens in new tab) are generally available, and a Form Recognizer (opens in new tab) with Receipt Understanding (opens in new tab) capability is available for preview, all in Azure Cognitive Services, which can power enterprise workflows and Robotic Process Automation (RPA) to spur digital transformation. In this presentation, I will demonstrate the capabilities of Microsoft’s latest OneOCR engine, highlight its core component technologies, and explain the roadmap ahead.
Presenter: Shi Han, Microsoft Research

Ideas in Excel aims at such one-click intelligence—when a user clicks the Ideas button on the Home tab of Excel, the intelligent service will empower the user to understand his or her data via automatic recommendation of visual summaries and interesting patterns. Then the user can insert the recommendations to the spreadsheet to help further analysis or as analysis result directly. To enable such one-click intelligence, there are underlying technical challenges to solve. At the Data, Knowledge and Intelligence group of Microsoft Research Asia, we have long-term research on spreadsheet intelligence and automated insights accordingly. And via close collaboration with Excel product teams, we transferred a suite of technologies and shipped Ideas in Excel together. In this demo presentation, we will show this intelligent feature and introduce corresponding technologies.

Technology Showcase by Academic Collaborators

Presenter: Yucheol Jung, Wonjong Jang, and Seungyong Lee, POSTECH

A 3D caricature can be defined as a 3D mesh with cartoon-style shape exaggeration of a face. We present a novel deep learning based framework that generates a 3D caricature for a given real face image. Our approach exploits 3D geometry information in the caricature generation process and produces more convincing 3D shape exaggerations than 2D caricature-based approaches.
Presenter: Minlie Huang, Tsinghua University

A Co-Training Method towards Machine Reading Comprehension
Presenter: Hiroki Watanabe, Hokkaido University

A Method for Controlling Human Hearing by Editing the Frequency of the Sound in Real Time
Presenter: Gunhee Kim, Seoul National University

We address the problem of abstractive summarization in two directions: proposing a novel dataset and a new model. First, we collect Reddit TIFU dataset, consisting of 120K posts from the online discussion forum Reddit. We use such informal crowd-generated posts as text source, in contrast with existing datasets that mostly use formal documents as source such as news articles. Thus, our dataset could less suffer from some biases that key sentences usually locate at the beginning of the text and favorable summary candidates are already inside the text in similar forms. Second, we propose a novel abstractive summarization model named multi-level memory networks (MMN), equipped with multi-level memory to store the information of text from different levels of abstraction. With quantitative evaluation and user studies via Amazon Mechanical Turk, we show the Reddit TIFU dataset is highly abstractive and the MMN outperforms the state-of-the-art summarization models.
Presenter: TianZhu Zhang, University of Science and Technology of China

We adapt the attention mechanism for visual and semantic elements representation.

We adaptively construct graphs and update the features for objects and words, making good use of both the intra modality relationship and inter modality relationship.

We consider the structure information across different graphs by proposing a constraint on the semantic element, forcing the semantic element aligning to the corresponded visual element.

The proposed model obtains the promising results on dataset Flickr30K and MS-COCO.
Presenter: Yinpeng Dong, Tsinghua University

Adversarial Attacks and Defenses in Deep Learning
Presenter: Huamin Qu, The Hong Kong University of Science and Technology

Existing visualization designs are often based on manual design and need lots of human efforts. How can we apply deep learning techniques to automatically generating visualization products? We report our two recent progresses on this direction:

Automated Graph Drawing: We propose a graph-LSTM-based model to directly generate graph drawings with desirable visual properties similar to the training drawings, which do not need users to tune different algorithm-specific parameters.

Automated Design of Timeline Infographics: We contribute an end-to-end approach to automatically extract an extensible template from a bitmap timeline image. The output can be used to generate new timelines with updated data.
Presenter: Ai-Chun Pang, National Taiwan University

Ensure data effectiveness by the blockchain technology so as to hold data properties like immutability and credibility during the whole transaction process.
Presenter: Sangwoo Ji and Jong Kim, POSTECH

Bin Zhu, Microsoft Research

Bypass two backdoor detection method: suspicious data instance detection and backdoor trigger detection.
Presenter: Chuck Yoo, Korea University
- Existing network optimizations suffer from poor stability, low resource efficiency, and a need for API changes
- Solution: Kernel-based optimization for high-performance networking
- L3 forwarding achieves performance similar to DPDK
- A virtual switch achieves 67.5% performance of DPDK-OVS and three times greater resource efficiency
Opens in a new tab
Presenter: Xiangyang Ji, Tsinghua University

CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation
Presenter: Hongming Zhang, The Hong Kong University of Science and Technology

Understanding human‘s language requires complex commonsense knowledge. However, existing large-scale knowledge graphs mainly focus on knowledge about entities while ignoring commonsense knowledge about activities, states, or events, which are used to describe how entities or things act in the real world. To fill this gap, we develop ASER (activities, states, events, and their relations), a large-scale eventuality knowledge graph extracted from more than 11-billion-token unstructured textual data. ASER contains 15 relation types belonging to five categories, 194-million unique eventualities, and 64-million unique edges among them. Both human and extrinsic evaluations demonstrate the quality and effectiveness of ASER.
Presenter: Zizhao Zhang, Tsinghua University

A well constructed hypergraph structure can represent the data correlation accurately, yet leading to better performance.How to construct a good hypergraph to fit complex data?
Presenter: Youyou Lu, Tsinghua University

Divides coherence responsibility between the switch and servers. The switch serializes conflicted requests and forwards them to correct destinations via a lock-check-forward pipeline. Servers execute requester-driven coherence control to reach coherence and transit states.
Presenter: Sung Ju Hwang, KAIST
- Perform effective knowledge transfer from earlier tasks to later tasks.
- Prevent catastrophic forgetting, where the earlier task performance gets negatively affected by semantic drift of the representations as the model adapts to later tasks.
- Obtain maximal performance with minimal increase in the network capacity.
Opens in a new tab
Presenter: Chao Liao, Shanghai Jiao Tong University

Counting Hypergraph Colorings in the Local Lemma Regime
Presenter: Chenhui Chu, Osaka University

Cross-Lingual Visual Grounding and Multimodal Machine Translation
Presenter: Gunhee Kim, Seoul National University

Exploration based on state novelty has brought great success in challenging reinforcement learning problems with sparse rewards. However, existing novelty-based strategies become inefficient in real-world problems where observation contains not only task-dependent state novelty of our interest but also task-irrelevant information that should be ignored. We introduce an information- theoretic exploration strategy named Curiosity-Bottleneck that distills task-relevant information from observation. Based on the information bottleneck principle, our exploration bonus is quantified as the compressiveness of observation with respect to the learned representation of a compressive value network. With extensive experiments on static image classification, grid-world and three hard-exploration Atari games, we show that Curiosity-Bottleneck learns an effective exploration strategy by robustly measuring the state novelty in distractive environments where state-of-the-art exploration methods often degenerate.
Presenter: Dong-Ok Won and Seong-Whan Lee, Korea University

Recently, deep reinforcement learning (DRL) has even enabled real world applications such as robotics. Here we teach a robot to succeed in curling (Olympic discipline), which is a highly complex real-world application where a robot needs to carefully learn to play the game on the slippery ice sheet in order to compete well against human opponents. This scenario encompasses fundamental challenges: uncertainty, non-stationarity, infinite state spaces and most importantly scarce data. One fundamental objective of this study is thus to better understand and model the transfer from simulation to real-world scenarios with uncertainty. We demonstrate our proposed framework and show videos, experiments and statistics about Curly our AI curling robot being tested on a real curling ice sheet. Curly performed well both, in classical game situations and when interacting with human opponents.
Presenter: Rui Yan, Peking University

Deep Text Generation: Conversation and Application
Presenter: Ryo Furukawa, Hiroshima City University

Development of 3D capsule endoscopic system
Presenter: Hiroshi Kawasaki, Kyushu University

Our project aims to research on human representation and understanding human motion based on vision-based approach and develop new applications.
Presenter: Jingwen Leng, Shanghai Jiao Tong University

The proposed graph instrumentation framework can observe and modify neural networks using user-defined analysis code without changes in source code.
Presenter: Lei Chen, The Hong Kong University of Science and Technology

Our Goal in Domain-Specific KB Construction
- Entity Extraction, Entity Typing and Relation Extraction related to the target domain.
- Training data generation based on distant-supervision without human annotation.
Opens in a new tab
Presenter: Shijie Cao, Harbin Institute of Technology

Efficient and Effective Sparse DNNs with Bank-Balanced Sparsity
Presenter: Huanjing Yue, Tianjin University

We propose an end-to-end noise estimation and removal network, where the estimated noise map is weighted concatenated with the noisy input to improve the denoising performance.

The proposed noise estimation network takes advantage of the Bayer pattern prior of the noise maps, which not only improves the estimation accuracy but also reduces the memory cost.

We propose a RSD block to fully take advantage of the spatial and channel correlations of realistic noise. The ablation study demonstrates the effectiveness of the proposed module.
Presenter: Zhenpeng Chen, Peking University

Emoji-Powered Representation Learning for Cross-Lingual Sentiment Analysis
Presenter: Muoi Tran, National University of Singapore

We present the Erebus attack, which allows large malicious Internet Service Providers (ISPs) isolate any targeted public Bitcoin nodes from the Bitcoin peer-to-peer network. The Erebus attack does not require routing manipulation (e.g., BGP hijacks) and hence it is virtually undectable to any control-plane and even typical data-plane detectors.
Presenter: Shou-de Lin, National Taiwan University

We propose transforming word embeddings into interpretable representations disentangling explainable factors

Examples of factors: a) Topical factors: food, location, animal, etc. b) Part-of-Speech factors: noun, adj, verb, etc.

We define and propose 4 desirable properties of our disentangled word vectors: a) Modularity, b) Compactness, c) Explicitness, d) Feature preservation
Presenter: Winston Hsu, National Taiwan University.

Free-form Video Inpainting with 3D Gated Conv, TPD, and LGTSM
Presenter: Lei Chen, The Hong Kong University of Science and Technology

Fluid: A Blockchain based Framework for Crowdsourcing
Presenter: Insik Shin, KAIST

Key idea: separation between app logic & UI parts1) Distributing target UI objects to remote devices and rendering them2) Giving an illusion as if app logic and UI objects were in the same process
Presenter: Youngjoo Ko and Jong Kim, POSTECH

Bin Zhu, Microsoft Research

Increase the performance of fuzzing to discover more bugs in multi-threading programs using interleaving coverage.
Presenter: Jinyoung Lee and Hong-Goo Kang, Yonsei University
- Remove ambient noise to improve automatic speech recognition performance
- Overcome the problems of conventional masking-based speech enhancement algorithms, e.g. speech signal distortion
- Propose a generative and adversarial model-based approach that effectively utilizes spectro-temporal characteristics of speech and noise components
Opens in a new tab
Presenter: Shiliang Zhang, Peking University
- Propose Dilated Temporal Convolution (DTC) to learn short-term temporal cues
- Propose Temporal Self Attention (TSA) to learn the long-term temporal cues
- DTC and TSA learn complementary temporal feature
Opens in a new tab
Presenter: Liwei Wang, Peking University

Gradient Descent Finds Global Minima of DNNs
Presenter: Wei HU and Gusi Te, Peking University

This project aims to explore the emerging graph neural networks (GNN) based on texture plus depth features to address the problem of 3D face anti spoofing. Various spoofing attacks are growing by presenting a fake or copied facial evidence to obtain valid authentication. While anti spoofingtechniques using 2D facial data have matured, 3D face anti spoofing hasn’t been studied much, thus allowing advanced spoofing techniques such as 3D masking at large. Hence, we propose to address this problem, based on texture plus depth cues acquired from RGBD cameras, and in the framework of GNN.
Presenter: Hongzhi Wang, Harbin Institute of Technology

Graph-structured Knowledge Base Management and Applications
Presenter: YingcaiWu, Zhejiang University

This study characterizes the problem of reachabilitycentric multi-criteria decision-making for choosing ideal homes.The system can also be adopted inother location selection scenarios, in which the reachability of locations is considered (e.g., selecting a location for a convenience store).
Presenter: Wensheng Dou, Chinese Academy of Sciences

Identifying Structures in Spreadsheets
Presenter: Jaegul Choo, Korea University

Recently, unsupervised exemplar-based image-to-image translation has accomplished substantial advancements. In order to transfer the information from an exemplar to an input image, existing methods often use a normalization technique, e.g., adaptive instance normalization, that controls the channel-wise statistics of an input activation map at a particular layer, such as the mean and the variance. Meanwhile, style transfer approaches similar task to image translation by nature, demonstrated superior performance by using the higher-order statistics such as covariance among channels in representing a style. However, applying this approach in image translation is computationally intensive and error-prone due to the expensive time complexity and its non-trivial backpropagation. In response, this paper proposes an end-to-end approach tailored for image translation that efficiently approximates this transformation with our novel regularization methods. We further extend our approach to a group-wise form for memory and time efficiency as well as image quality. Extensive qualitative and quantitative experiments demonstrate that our proposed method is fast, both in training and inference, and highly effective in reflecting the style of an exemplar.
Presenter: Jaewoo Jung, Kyungwon Lee and Seung Ah Lee, Yonsei University

We developed a new hybrid digital-biological system that provide interactive and immersive experiences between humans and biological objects for applications in life science education and research. The scope of this work includes;
- Construction of an automated optical stimulation microscope, which uses light to both image and interface with light-sensitive cells.
- Use of human interaction modalities to convert human’s natural input into stimuli for the microscopic biological objects.
- Comparative user study as a public installation that evaluated user behaviors, user engagement and learning outcomes.
We expect that this platform will transform microscopes from a passive observation tool to an active interaction medium, assisting scientific research, life science education and clinical interventions.
Opens in a new tab
Presenter: TaiNing Wang and Chee-Yong Chan, National University of Singapore

Improving Join Reorderability with Compensation Operators
Presenter: Hai Truong, Rajesh Krishna Balan, Singapore Management University

Automatic analysis of the behaviour of large groups of people is an important requirement for a large class of important applications such as crowd management, traffic control, and surveillance. For example, attributes such as the number of people, how they are distributed, which groups they belong to, and what trajectories they are taking can be used to optimize the layout of a mall to increase overall revenue. A common way to obtain these attributes is to use video camera feeds coupled with advanced video analytics solutions. However, solely utilizing video feeds is challenging in high people-density areas, such as a normal mall in Asia, as the high people density significantly reduces the effectiveness of video analytics due to factors such as occlusion. In this work, we propose to combine video feeds with WiFi data to achieve better classification results of the number of people in the area and the trajectories of those people. In particular, we believe that our approach will combine the strengths. of the two different sensors, WiFi and video, while reducing the weaknesses of each sensor. This work has started fairly recently and we will present our thoughts and current results up to now.
Presenter: Jiaying Liu, Peking University

Intelligent Action Analytics
Presenter: Changjian Chen, Tsinghua University

Interactive Methods to Improve Data Quality
Presenter: Nobuaki MINEMATSU, University of Tokyo

Inter-learner shadowing framework for comprehensibility-based assessment of learners’ speech
Presenter: Heejo Lee, Korea University

An open platform for feedback based fuzzing improves its testing performance using two factors: binary feedback and user feedback.
Presenter: Xueming Qian, Xi’an Jiaotong University

1.We proposed Attention Fusion Network (AFN). it pay attention to food discrimination region against unstru-ctured defeat, and generate the feature embeddings jointly aware the ingredients and food.

2.We proposed the balance focal loss (BFL) to enhance the joint learning of ingredients and food, optimize feature expression ability for multi-label ingredients

3. The effectiveness is proved through the comparative experiments. In particular, the use of balance focal loss make the Micro-F1, Macro-F1 and Accuracy of ingredi-ents improved by 5.76%, 12.62% and 5.78%.
Presenter: Insu Han, KAIST

MAP Inference for Customized Determinantal Point Processes via Maximum Inner Product Search
Presenter: Hong Xu, City University of Hong Kong

Minimizing Network Footprint in Distributed Deep Learning
Presenter: Hirofumi Inaguma, Kyoto University

Directly translate source speech to target languages with a single sequence-to-sequence (S2S) model
- Many-to-many (M2M)
- One-to-many (O2M)
Outperformed the bilingual end-to-end speech translation (E2E-ST) models

Shared representations obtained from multilingual E2E-ST were more effective than those from the bilingual one for transfer learning to a very low-resource ST task: Mboshi->French (4.4h)
Opens in a new tab
Presenter: Mingkui Tan, South China University of Technology
- We propose a novel MWGAN to optimize the multi-marginal distance among different domains.
- We define and analyze the generalization performance of MWGAN for the multiple domain translation task.
- Extensive experiments demonstrate the effectiveness of MWGAN on balanced and imbalanced translation tasks.
Opens in a new tab
Presenter: Mingkui Tan, South China University of Technology
- Propose a novel Neural Architecture Transformer (NAT) to optimize any arbitrary architecture.
- Cast the problem into a Markov Decision Process.
- Employ Graph Convolution Network to learn the policy.
Opens in a new tab
Presenter: Wenfei Wu, Tsinghua University

We propose a new NF development framework named NFD which consists of an NF abstraction layer to develop NF behavior models and a compiler to adapt NF models to specific runtime environments.
Presenter: Seung-won Hwang, Yonsei University

Question Answering (QA) has been mostly studied in the context of factoid, providing concise facts. In contrast, we study Non-factoid QA, extending to cover more realistic questions such as how- or why-questions with long answers, from long texts or videos. This demo and poster address the following questions:
- Non-factoid QA for text, combining the complementary strength of representation- and interaction-focused approaches [EMNLP19]. Extending this task for video has the opportunity and challenge, coming from multimodality and having no pre-divided answer candidates (e.g. paragraph), which is our ongoing MSRA collaboration.
- Human-in-the-loop debugging for QA Demo [SIGIR19]
Opens in a new tab
Presenter: Chuhan Wu, Tsinghua University
- Different users usually have different interests in news.
- Different users may click the same news article due to different interests.
- We need personalized news and user representation!
Opens in a new tab
Presenter: Hiroaki Yamane, The University of Tokyo

We construct methods for converting contextual language to numerical variables for quantitative/numerical common sense in natural language processing.
Presenter: Shiyin Lu, Nanjing University

Online Convex Optimization in Non-stationary Environments
Presenter: Yonggang Wen, Nanyang Technological University

QoE depending multiple families of Influential Factors (IF), to be optimized jointly for the best user experience.

How to develop a unified and scalable framework to optimize QoE for multimedia communications, in the presence of system dynamics?
Presenter: Tadashi Nomoto, National Institute of Japanese Literature

This work explores the impact of the subword representation on paraphrasing and text simplification. Experiments found that when combined with REINFORCE, the subword scheme boosted performance beyond the current state of the art both in paraphrasing and text simplification.
Presenter: Jun Takamatsu, Nara Institute of Science and Technology

Pick-Carry-Place Household Tasks Using Labanotation for Learning-from-Observation Robots
Presenter: Wei-Shi Zheng, Sun Yat-sen University

Predicting Future Instance Segmentation
- Given several frames in a video, this task is to predict future instance segmentation before the corresponding frames are observed.
- It is challenging due to the uncertainty in appearance variation caused by object moving, occlusion between objects, and viewpoint changing in videos.
Opens in a new tab
Presenter: Yaoan Jin and Atsuko Miyaji, Graduate School of Engineering Osaka University

Any attack based on information, such as timing information and power consumption, gained from the implementation of a cryptosystem.
- Simple Power Analysis (SPA)
- Safe Error Attack
Opens in a new tab
Presenter: Hang Su, Tsinghua University

In this work, we find that pre-training an over-parameterized model is not necessary for obtaining an efficient pruned structure. We propose a novel network pruning pipeline which allows pruning from scratch.
Presenter: Jun Du, University of Science and Technology of China

Recent Progress of Handwritten Mathematical Expression Recognition
Presenter: Dahun Kim, KAIST
- To remove unwanted object from a video
- Frame-by-frame image inpainting
Opens in a new tab
Presenter: Wonpyo park, Dongju Kim, and Minsu Cho, POSTECH

Yan Lu, Microsoft Research

Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller. Previous approaches can be expressed as a form of training the student to mimic output activations of individual data examples represented by the teacher. We introduce a novel approach, dubbed relational knowledge distillation (RKD), that transfers mutual relations of data examples instead. For concrete realizations of RKD, we propose distance-wise and angle-wise distillation losses that penalize structural differences in relations. Experiments conducted on different tasks show that the proposed method improves educated student models with a significant margin. In particular for metric learning, it allows students to outperform their teachers’ performance, achieving the state of the arts on standard benchmark datasets.
Presenter: Yu Zhang, YuxiangZhang, YitongHuang, Xing Guo, University of Science and Technology of China

Research on Deep Learning Framework for Julia
Presenter: Ting Liu, Xi’an Jiaotong University

SARA: Self-Replay Augmented Record and Replay for Android in Industrial Cases
Presenter: Fengyuan Xu, Nanjing University

Video transformation needs to meet new requirements in actual use, such as privacy protection under surveillance scenarios:
- The transformed video can be restored to the original ones.
- The transformed video only can be restored by the authorized party.
We need a unified translation style and a unique stenography.
Opens in a new tab
Presenter: Shintami Chusnul Hidayati, Institut Teknologi Sepuluh Nopember; Wen-Huang Cheng, National Chiao Tung University; Jianlong Fu, Microsoft Research

StyleMe: An AI Fashion Consultant for Personal Shopping and Style Advice
Presenter: Cheng Li, University of Science and Technology of China

System support for designing efficient gradient compression algorithms for distributed DNN training
Presenter: Tackgeun You, POSTECH and Bohyung Han, Seoul National University
- Introduce a benchmark for temporal cause and effect localization on car crash videos.
- Propose a multi-task baseline for simultaneously conducting temporal cause and effect localization.
- Propose a multi-task neural architecture search that decides to share or separate building blocks
Opens in a new tab
Presenter: Chaoyu Guan, Shanghai Jiao Tong University

A unified information based measure : quantify the information of each input word that is encoded in an intermediate layer of a deep NLP model.

The information based measure as a tool
- Evaluating different explanation methods.
- Explaining different deep NLP models
This measure enriches the capability of explaining DNNs.
Opens in a new tab
Presenter: Ting Liu, Xi’an Jiaotong University

Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation
Presenter: Seungmoon Choi and Seungjae Oh, POSTECH
- Recognize contact finger(s) on any rigid surfaces by decoding transmitted frequencies
- Identify a grasped object by visualizing the propagation dynamics of vibration
Opens in a new tab
Presenter: Kibeom Hong and Hyeran Byun, Yonsei University
- - Video can be created by separating Background and Foreground, and Foreground can be divided into Object and Action.
  - We can get background and foreground information for video generation from text.
  - In the Image domain, previous works[1,2,3] have studied image generation with text extensively, [4,5,6] expanded this idea to video domain.
  - In this work, we want to create a video with three components in order to control more realistic and fine-grained parts.
Opens in a new tab
Presenter: Zhou Zhao, Zhejiang University

Video dialog is a new and challenging task, which requires the agent to answer questions combining video information with dialog history. And different from single-turn video question answering, the additional dialog history is important for video dialog, which often includes contextual information for the question. Existing visual dialog methods mainly use RNN to encode the dialog history as a single vector representation, which might be rough and straightforward. Some more advanced methods utilize hierarchical structure, attention and memory mechanisms, which still lack an explicit reasoning process. In this paper, we introduce a novel progressive inference mechanism for video dialog, which progressively updates query information based on dialog history and video content until the agent think the information is sufficient and unambiguous. In order to tackle the multi- modal fusion problem, we propose a cross-transformer module, which could learn more fine-grained and comprehensive interactions both inside and between the modalities. And besides answer generation, we also consider question generation, which is more challenging but significant for a complete video dialog system. We evaluate our method on two largescale datasets, and the extensive experiments show the effectiveness of our method.
Presenter: Zheng Yang, Tsinghua University

Widar 3.0: Zero-Effort Cross-Domain Gesture Recognition with Wi-Fi
Presenter: Min Zhang, Tsinghua University

The key to solving this problem is to conduct better user profiling.

How about off-topic features in other platforms, such as tweets?
- - On-topic features are helpful in understanding users’ interests and preference.
    
    Off-topic features are able to describe users too.
We will try to introduce these off-topic features (tweets) into different rating prediction algorithms.
Opens in a new tab

MSRA Academic Day 2019

Technology Showcase by Microsoft Research Asia

AutoSys: Learning based approach for system optimization

Dual Learning and Its Applications to Machine Translation and Speech Synthesis

Fluency Boost Learning and Inference for Neural Grammar Checker

OneOCR For Digital Transformation

Spreadsheet Intelligence for Ideas in Excel

Technology Showcase by Academic Collaborators

3D Caricature Generation from Real Face Images

A Co-Training Method towards Machine Reading Comprehension

A Method for Controlling Human Hearing by Editing the Frequency of the Sound in Real Time

Abstractive Summarization of Reddit Posts with Multi-level Memory Networks

Adaptive Graph Structure Learning for Image Sentence Matching

Adversarial Attacks and Defenses in Deep Learning

AI+VIS: Automated Visualization Production

Blockchain-Enabled Incentive and Trading Mechanism Design for AIoT Service Platform

Bypassing Defense Methods for Neural Network Backdoor

Can Kernel Networking Become Fast Enough?

CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation

Commonsense Reasoning with Structured Knowledge

Complex Correlation Modeling and Analysis Framework for Incomplete, Multimodal and Dynamic Data

Concordia: Distributed Shared Memory with In-Network Cache Coherence

Continual Learning with Dynamic Network Expansion

Counting Hypergraph Colorings in the Local Lemma Regime

Cross-Lingual Visual Grounding and Multimodal Machine Translation

Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty

Deep Reinforcement Learning for the Transfer from Simulation to the Real World with Uncertainties for AI Curling Robot System

Deep Text Generation: Conversation and Application

Development of 3D capsule endoscopic system

Development of automatic Labanotation estimation system from video using Deep Learning

Dissecting and Accelerating Neural Network via Graph Instrumentation

Distant Supervised Domain-Specific Knowledge Base Construction and Population

Efficient and Effective Sparse DNNs with Bank-Balanced Sparsity

Efficient Deep Neural Networks for Realistic Noise Removal

Emoji-Powered Representation Learning for Cross-Lingual Sentiment Analysis

Erebus: A Stealthier Partitioning Attack against Bitcoin Peer-to-Peer Network

Explaining Word Embeddings via Disentangled Representations

Free-form Video Inpainting with 3D Gated Conv, TPD, and LGTSM

Fluid: A Blockchain based Framework for Crowdsourcing

FLUID: Flexible User Interface Distribution for Ubiquitous Multi-device Interaction

Fuzzing with Interleaving Coverage for Multi-threading Program

Generative Model-based Speech Enhancement for Speech Recognition

Global-Local Temporal Representations For Video Person Re-Identification

Gradient Descent Finds Global Minima of DNNs

Graph Neural Networks for 3D Face Anti-spoofing

Graph-structured Knowledge Base Management and Applications

Home Location Selection with Reachability

Identifying Structures in Spreadsheets

Image-to-Image Translation via Group-wise Deep Whitening-and-Coloring Transformation

Immersive Biology - An Interactive Microscope for Informal Biology Education

Improving Join Reorderability with Compensation Operators

Improving the Performance of Video Analytics Using WIFI Signal

Intelligent Action Analytics

Interactive Methods to Improve Data Quality

Inter-learner shadowing framework for comprehensibility-based assessment of learners' speech

IoTcube: An Open Platform for Feedback based Protocol Fuzzing

Learning Multi-label Feature for Fine-Grained Food Recognition

MAP Inference for Customized Determinantal Point Processes via Maximum Inner Product Search

Minimizing Network Footprint in Distributed Deep Learning

Multilingual End-to-End Speech Translation

Multi-marginal Wasserstein GAN

NAT: Neural Architecture Transformer for Accurate and Compact Architectures

NFD: Using Behavior Models to Develop Cross-Platform NFs

Non-factoid Question Answering for Text and Video

NPA: Neural News Recommendation with Personalized Attention

Numerical/quantitative system for common sense natural language processing

Online Convex Optimization in Non-stationary Environments

Optimizing Quality of Experience (QoE) for Adaptive Bitrate Streaming via Deep Video Analytics

Paraphrasing and Simplification with Lean Vocabulary

Pick-Carry-Place Household Tasks Using Labanotation for Learning-from-Observation Robots

Predicting Future Instance Segmentation with Contextual Pyramid ConvLSTMs

Project Title: Secure and compact elliptic curve cryptosystems

Pruning from Scratch

Recent Progress of Handwritten Mathematical Expression Recognition

Recurrent Temporal Aggregation Framework for Deep Video Inpainting

Relational Knowledge Distillation

Research on Deep Learning Framework for Julia

SARA: Self-Replay Augmented Record and Replay for Android in Industrial Cases

secGAN: A Cycle-Consistent GAN for Securely-Recoverable Video Transformation

StyleMe: An AI Fashion Consultant for Personal Shopping and Style Advice