Topic Tag: video

home Forums Topic Tag: video

 Laying Down the Yellow Brick Road: Development of a Wizard-of-Oz Interface for Collecting Human-Robot Dialogue

 

We describe the adaptation and refinement of a graphical user interface designed to facilitate a Wizard-of-Oz (WoZ) approach to collecting human-robot dialogue data. The data collected will be used to develop a dialogue system for robot navigation. Building on an interface previously used in the de…


 Cooperating with Machines

 

Since Alan Turing envisioned Artificial Intelligence (AI) [1], a major driving force behind technical progress has been competition with human cognition. Historical milestones have been frequently associated with computers matching or outperforming humans in difficult cognitive tasks (e.g. face rec…


 Neural Task Programming: Learning to Generalize Across Hierarchical Tasks

In this work, we propose a novel robot learning framework called Neural Task Programming (NTP), which bridges the idea of few-shot learning from demonstration and neural program induction. NTP takes as input a task specification (e.g., video demonstration of a task) and recursively decomposes it in…


 Feasibility Study: Moving Non-Homogeneous Teams in Congested Video Game Environments

 

Multi-agent path finding (MAPF) is a well-studied problem in artificial intelligence, where one needs to find collision-free paths for agents with given start and goal locations. In video games, agents of different types often form teams. In this paper, we demonstrate the usefulness of MAPF algorit…


 SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Planning and Control

 

In this work, we present an approach to deep visuomotor control using structured deep dynamics models. Our deep dynamics model, a variant of SE3-Nets, learns a low-dimensional pose embedding for visuomotor control via an encoder-decoder structure. Unlike prior work, our dynamics model is structured…


 A Brief Survey of Deep Reinforcement Learning

  

Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, …


 A Simple Reinforcement Learning Mechanism for Resource Allocation in LTE-A Networks with Markov Decision Process and Q-Learning

  

Resource allocation is still a difficult issue to deal with in wireless networks. The unstable channel condition and traffic demand for Quality of Service (QoS) raise some barriers that interfere with the process. It is significant that an optimal policy takes into account some resources available …


 Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image

 

We consider the problem of dense depth prediction from a sparse set of depth measurements and a single RGB image. Since depth estimation from monocular images alone is inherently ambiguous and unreliable, we introduce additional sparse depth samples, which are either collected from a low-resolution…


 Temporal Multimodal Fusion for Video Emotion Classification in the Wild

  

This paper addresses the question of emotion classification. The task consists in predicting emotion labels (taken among a set of possible labels) best describing the emotions contained in short video clips. Building on a standard framework — lying in describing videos by audio and visual fea…


 Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video

   

Object detection is considered one of the most challenging problems in this field of computer vision, as it involves the combination of object classification and object localization within a scene. Recently, deep neural networks (DNNs) have been demonstrated to achieve superior object detection per…


 Depression Scale Recognition from Audio, Visual and Text Analysis

      

Depression is a major mental health disorder that is rapidly affecting lives worldwide. Depression not only impacts emotional but also physical and psychological state of the person. Its symptoms include lack of interest in daily activities, feeling low, anxiety, frustration, loss of weight and eve…


 Continuous Multimodal Emotion Recognition Approach for AVEC 2017

  

This paper reports the analysis of audio and visual features in predicting the emotion dimensions under the seventh Audio/Visual Emotion Subchallenge (AVEC 2017). For visual features we used the HOG (Histogram of Gradients) features, Fisher encodings of SIFT (Scale-Invariant Feature Transform) feat…


 A Causal And-Or Graph Model for Visibility Fluent Reasoning in Human-Object Interactions

 

Tracking humans that are interacting with the other subjects or environment remains unsolved in visual tracking, because the visibility of the human of interests in videos is unknown and might vary over times. In particular, it is still difficult for state-of-the-art human trackers to recover compl…


 Joint Parsing of Cross-view Scenes with Spatio-temporal Semantic Parse Graphs

 

Cross-view video understanding is an important yet under-explored area in computer vision. In this paper, we introduce a joint parsing method that takes view-centric proposals from pre-trained computer vision models and produces spatio-temporal parse graphs that represents a coherent scene-centric …


 Multi-Label Zero-Shot Human Action Recognition via Joint Latent Embedding

 

Human action recognition refers to automatic recognizing human actions from a video clip, which is one of the most challenging tasks in computer vision. In reality, a video stream is often weakly-annotated with a set of relevant human action labels at a global level rather than assigning each label…


 Deep Reinforcement Learning for Conversational AI

  

Deep reinforcement learning is revolutionizing the artificial intelligence field. Currently, it serves as a good starting point for constructing intelligent autonomous systems which offer a better knowledge of the visual world. It is possible to scale deep reinforcement learning with the use of dee…


 ClickBAIT: Click-based Accelerated Incremental Training of Convolutional Neural Networks

  

Today’s general-purpose deep convolutional neural networks (CNN) for image classification and object detection are trained offline on large static datasets. Some applications, however, will require training in real-time on live video streams with a human-in-the-loop. We refer to this class of…


 Shared Learning : Enhancing Reinforcement in Q-Ensembles

  

Deep Reinforcement Learning has been able to achieve amazing successes in a variety of domains from video games to continuous control by trying to maximize the cumulative reward. However, most of these successes rely on algorithms that require a large amount of data to train in order to obtain resu…


 Robust Physical-World Attacks on Deep Learning Models

   

Although deep neural networks (DNNs) perform well in a variety of applications, they are vulnerable to adversarial examples resulting from small-magnitude perturbations added to the input data. Inputs modified in this way can be mislabeled as a target class in targeted attacks or as a random class …


 End-to-End United Video Dehazing and Detection

 

The recent development of CNN-based image dehazing has revealed the effectiveness of end-to-end modeling. However, extending the idea to end-to-end video dehazing has not been explored yet. In this paper, we propose an End-to-End Video Dehazing Network (EVD-Net), to exploit the temporal consistency…


 Build your own Machine Learning Visualizations with the new TensorBoard API

  

Posted by Chi Zeng and Justine Tunney, Software Engineers, Google Brain Team When we open-sourced TensorFlow in 2015, it included TensorBoard, a suite of visualizations for inspecting and understanding your TensorFlow models and runs. Tensorboard included a small, predetermined set of visualization…


 Robust Emotion Recognition from Low Quality and Low Bit Rate Video: A Deep Learning Approach

 

Emotion recognition from facial expressions is tremendously useful, especially when coupled with smart devices and wireless multimedia applications. However, the inadequate network bandwidth often limits the spatial resolution of the transmitted video, which will heavily degrade the recognition rel…


 Recurrent Ladder Networks

  

We propose a recurrent extension of the Ladder networks whose structure is motivated by the inference required in hierarchical latent variable models. We demonstrate that the recurrent Ladder is able to handle a wide variety of complex learning tasks that benefit from iterative inference and tempor…


 A multi-agent reinforcement learning model of common-pool resource appropriation

  

Humanity faces numerous problems of common-pool resource appropriation. This class of multi-agent social dilemma includes the problems of ensuring sustainable use of fresh water, common fisheries, grazing pastures, and irrigation systems. Abstract models of common-pool resource appropriation based …


 Human Pose Forecasting via Deep Markov Models

 

Human pose forecasting is an important problem in computer vision with applications to human-robot interaction, visual surveillance, and autonomous driving. Usually, forecasting algorithms use 3D skeleton sequences and are trained to forecast for a few milliseconds into the future. Long-range forec…