Topic Tag: video

home Forums Topic Tag: video

 Frame-Recurrent Video Super-Resolution

    

Recent advances in video super-resolution have shown that convolutional neural networks combined with motion compensation are able to merge information from multiple low-resolution (LR) frames to generate high-quality images. Current state-of-the-art methods process a batch of LR frames to generate…


 Introducing the CVPR 2018 Learned Image Compression Challenge

  

Posted by Michele Covell, Research Scientist, Google Research Image compression is critical to digital photography — without it, a 12 megapixel image would take 36 megabytes of storage, making most websites prohibitively large. While the signal-processing community has significantly improved imag…


 TFGAN: A Lightweight Library for Generative Adversarial Networks

     

Posted by Joel Shor, Senior Software Engineer, Machine Perception (Crossposted on the Google Open Source Blog) Training a neural network usually involves defining a loss function, which tells the network how close or far it is from its objective. For example, image classification networks are often…


 SLAC: A Sparsely Labeled Dataset for Action Classification and Localization

   

This paper describes a procedure for the creation of large-scale video datasets for action classification and localization from unconstrained, realistic web data. The scalability of the proposed procedure is demonstrated by building a novel video benchmark, named SLAC (Sparsely Labeled ACtions), co…


 Multi-shot Pedestrian Re-identification via Sequential Decision Making

   

Multi-shot pedestrian re-identification problem is at the core of surveillance video analysis. It matches two tracks of pedestrians from different cameras. In contrary to existing works that aggregate single frames features by time series model such as recurrent neural network, in this paper, we pr…


 Scale-invariant temporal history (SITH): optimal slicing of the past in an uncertain world

 

In both the human brain and any general artificial intelligence (AI), a representation of the past is necessary to predict the future. However, perfect storage of all experiences is not possible. One possibility, utilized in many applications, is to retain information about the past in a buffer. A …


 Introducing Appsperiments: Exploring the Potentials of Mobile Photography

  

Posted by Alex Kauffmann, Interaction Researcher, Google Research Each of the world’s approximately two billion smartphone owners is carrying a camera capable of capturing photos and video of a tonal richness and quality unimaginable even five years ago. Until recently, those cameras behaved …


 Facebook SOSP papers present real-world solutions to complex system challenges

SVE: Distributed Video Processing at Facebook Scale Video is a growing part of the experience of the billions of people […] Facebook SOSP papers present real-world solutions to complex system challenges by Kelly Berschauer


 Learning to Recognize Actions from Limited Training Examples Using a Recurrent Spiking Neural Model

 

A fundamental challenge in machine learning today is to build a model that can learn from few examples. Here, we describe a reservoir based spiking neural model for learning to recognize actions with a limited number of labeled videos. First, we propose a novel encoding, inspired by how microsaccad…


 Laying Down the Yellow Brick Road: Development of a Wizard-of-Oz Interface for Collecting Human-Robot Dialogue

 

We describe the adaptation and refinement of a graphical user interface designed to facilitate a Wizard-of-Oz (WoZ) approach to collecting human-robot dialogue data. The data collected will be used to develop a dialogue system for robot navigation. Building on an interface previously used in the de…


 Cooperating with Machines

 

Since Alan Turing envisioned Artificial Intelligence (AI) [1], a major driving force behind technical progress has been competition with human cognition. Historical milestones have been frequently associated with computers matching or outperforming humans in difficult cognitive tasks (e.g. face rec…


 Neural Task Programming: Learning to Generalize Across Hierarchical Tasks

In this work, we propose a novel robot learning framework called Neural Task Programming (NTP), which bridges the idea of few-shot learning from demonstration and neural program induction. NTP takes as input a task specification (e.g., video demonstration of a task) and recursively decomposes it in…


 Feasibility Study: Moving Non-Homogeneous Teams in Congested Video Game Environments

 

Multi-agent path finding (MAPF) is a well-studied problem in artificial intelligence, where one needs to find collision-free paths for agents with given start and goal locations. In video games, agents of different types often form teams. In this paper, we demonstrate the usefulness of MAPF algorit…


 SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Planning and Control

 

In this work, we present an approach to deep visuomotor control using structured deep dynamics models. Our deep dynamics model, a variant of SE3-Nets, learns a low-dimensional pose embedding for visuomotor control via an encoder-decoder structure. Unlike prior work, our dynamics model is structured…


 A Brief Survey of Deep Reinforcement Learning

  

Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, …


 A Simple Reinforcement Learning Mechanism for Resource Allocation in LTE-A Networks with Markov Decision Process and Q-Learning

  

Resource allocation is still a difficult issue to deal with in wireless networks. The unstable channel condition and traffic demand for Quality of Service (QoS) raise some barriers that interfere with the process. It is significant that an optimal policy takes into account some resources available …


 Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image

 

We consider the problem of dense depth prediction from a sparse set of depth measurements and a single RGB image. Since depth estimation from monocular images alone is inherently ambiguous and unreliable, we introduce additional sparse depth samples, which are either collected from a low-resolution…


 Temporal Multimodal Fusion for Video Emotion Classification in the Wild

  

This paper addresses the question of emotion classification. The task consists in predicting emotion labels (taken among a set of possible labels) best describing the emotions contained in short video clips. Building on a standard framework — lying in describing videos by audio and visual fea…


 Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video

   

Object detection is considered one of the most challenging problems in this field of computer vision, as it involves the combination of object classification and object localization within a scene. Recently, deep neural networks (DNNs) have been demonstrated to achieve superior object detection per…


 Depression Scale Recognition from Audio, Visual and Text Analysis

      

Depression is a major mental health disorder that is rapidly affecting lives worldwide. Depression not only impacts emotional but also physical and psychological state of the person. Its symptoms include lack of interest in daily activities, feeling low, anxiety, frustration, loss of weight and eve…


 Continuous Multimodal Emotion Recognition Approach for AVEC 2017

  

This paper reports the analysis of audio and visual features in predicting the emotion dimensions under the seventh Audio/Visual Emotion Subchallenge (AVEC 2017). For visual features we used the HOG (Histogram of Gradients) features, Fisher encodings of SIFT (Scale-Invariant Feature Transform) feat…


 A Causal And-Or Graph Model for Visibility Fluent Reasoning in Human-Object Interactions

 

Tracking humans that are interacting with the other subjects or environment remains unsolved in visual tracking, because the visibility of the human of interests in videos is unknown and might vary over times. In particular, it is still difficult for state-of-the-art human trackers to recover compl…


 Joint Parsing of Cross-view Scenes with Spatio-temporal Semantic Parse Graphs

 

Cross-view video understanding is an important yet under-explored area in computer vision. In this paper, we introduce a joint parsing method that takes view-centric proposals from pre-trained computer vision models and produces spatio-temporal parse graphs that represents a coherent scene-centric …


 Multi-Label Zero-Shot Human Action Recognition via Joint Latent Embedding

 

Human action recognition refers to automatic recognizing human actions from a video clip, which is one of the most challenging tasks in computer vision. In reality, a video stream is often weakly-annotated with a set of relevant human action labels at a global level rather than assigning each label…


 Deep Reinforcement Learning for Conversational AI

  

Deep reinforcement learning is revolutionizing the artificial intelligence field. Currently, it serves as a good starting point for constructing intelligent autonomous systems which offer a better knowledge of the visual world. It is possible to scale deep reinforcement learning with the use of dee…