I am a Research Scientist at Covariant, where I invent deep learning / reinforcement learning methods for robots to perceive and act intelligently in the physical world.
I completed my PhD at Berkeley AI Research (BAIR) at UC Berkeley with Pieter Abbeel, where I pursued novel exploration techniques for deep reinforcement learning. During my internship at DeepMind with Oriol Vinyals, I contributed hierarchical reinforcement learning methods to AlphaStar. Previously, I performed research in mathematics at The Chinese University of Hong Kong, University of Waterloo, and UC Berkeley.
Long-horizon sequential decision making is important to generative models (RLHF in ChatGPT) , robotics (PaLM-SayCan), strategic games (Go, StarCraft II), and many other fields. Reinforcement learning (RL) is the key to unlocking the full potential of supervisedly/unsupervisedly trained models on such problems. My research focuses on how RL can drive the model to explore and find the truly optimal solution.
Reinforcement Learning with Deep Energy-Based Policies
We propose to learn maximum-entropy policies that follow a Boltzmann distribution of the "soft" Q-value, which has benefits including improved exploration and skill transfer via compositionality. To solve the maximum entropy problem, we propose a new algorithm, Soft Q-learning, and confirm its strengths in simulated experiments with swimming and walking robots. We also draw a connection to actor-critic methods, which can be viewed as performing approximate inference on the corresponding energy-based model
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
Haoran Tang*, Rein Houthooft*, Davis Foote, Adam Stooke, OpenAI Xi Chen, Yan Duan, John Schulman, Filip DeTurck, Pieter Abbeel
Advances in neural information processing systems (NeurIPS), 2017
[PDF] [OpenAI Website] [Videos] [Code]
We describe a surprising finding: a simple generalization of the classic (tabular) count-based exploration method can reach near state-of-the-art performance on various deep RL benchmarks. We propose simple heuristic/random/learned hash functions that map states into hash codes. Bonus rewards are given to under-visited states to facilitate exploration. Our method can serve as a simple yet powerful baseline for solving MDPs that require considerable exploration.
Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?
We isolate and evaluate the claimed benefits of hierarchical RL (learning/exploring over multiple transitions, semantically meaningful action space, etc.) on locomotion, navigation, and manipulation tasks. Surprisingly, most observed benefits of hierarchy can be attributed to improved exploration. Given this insight, we present simple exploration techniques inspired by hierarchy that achieve competitive performance with hierarchical RL.
Convolutional Neural Networks for Grazing Incidence X-Ray Scattering Patterns: Thin Film Structure Identification
Shuai Liu, Charles N Melton, Singanallur Venkatakrishnan, Ronald J Pandolfi, Guillaume Freychet, Dinesh Kumar, Haoran Tang, Alexander Hexemer, Daniela M Ushizima
MRS Communications, 2019
This paper highlights the design of multiple Convolutional Neural Networks (CNN) to classify nanoparticle orientation in a thin film by learning patterns of Grazing Incidence Small Angle x-ray Scattering images, achieving a success rate of 94%. We demonstrate CNN robustness under different noises as well as demonstrate the potential of our proposed approach as a strategy to decrease scattering pattern analysis time.
Modular Architecture for StarCraft II with Deep Reinforcement Learning
Dennis Lee*, Haoran Tang*, Jeffrey Zhang, Huazhe Xu, Trevor Darrell, Pieter Abbeel
Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 2018
We present a modular architecture for StarCraft II AI, which splits responsibilities between multiple modules that each control one aspect of the game. Modules can be optimized independently or jointly. We present the first result of applying deep reinforcement learning techniques to training a modular agent with selfplay, achieving a win rate of 92%/86%, with/without fog-of-war, against the "Harder”(level 5) built-in Blizzard bot.
Trajectory Optimization Using Neural Networks
Haoran Tang, Xi Chen, Yan Duan, Nikhil Mishra, Shiyao Wu, Maximilian Sieb, Yide Shentu
U.S. Patent Application 17/193,820
Various embodiments of the technology described herein generally relate to systems and methods for trajectory optimization with machine learning techniques. More specifically, certain embodiments relate to using neural networks to quickly predict optimized robotic arm trajectories in a variety of scenarios. Systems and methods described herein use deep neural networks to quickly predict optimized robotic arm trajectories according to certain constraints. Optimization, in accordance with some embodiments of the present technology, may include optimizing trajectory geometry and dynamics while satisfying a number of constraints, including staying collision-free, and minimizing the time it takes to complete the task.
Training Artificial Networks for Robotic Picking
Yan Duan, Haoran Tang, Yide Shentu, Nikhil Mishra, Xi Chen
U.S. Patent Application 17/014,558
Various embodiments of the present technology generally relate to robotic devices and artificial intelligence. More specifically, some embodiments relate to an artificial neural network training method that does not require extensive training data or time expenditure. The few-shot training model disclosed herein includes attempting to pick up items and, in response to a failed pick up attempt, transferring and generalizing information to similar regions to improve probability of success in future attempts. In some implementations, the training method is used to robotic device for picking items from a bin and perturbing items in a bin. When no picking strategies with high probability of success exist, the robotic device may perturb the contents of the bin to create new available pick-up points. In some implementations, the device may include one or more Computer-vision systems.
Systems and methods for robotic picking
Yan Duan, Xi Chen, Mostafa Rohaninejad, Nikhil Mishra, Yu Xuan Liu, Andrew Amir Vaziri, Haoran Tang, Yide Shentu, Ian Rust, and Carlos Florensa
U.S. Patent Application 17/014,545
Various embodiments of the present technology generally relate to robotic devices and artificial intelligence. More specifically, some embodiments relate to a robotic device for picking items from a bin and perturbing items in a bin. In some implementations, the device may include one or more computer-vision systems. A computer-vision system, in accordance with the present technology, may use at least two two-dimensional images to generate three-dimensional (3D) information about the bin and items in the bin. Based on the 3D information, a strategy for picking up items from the bin is determined. When no strategies with high probability of success exist, the robotic device may perturb the contents of the bin to create new available pick-up points and re-attempt to pick up an item.
This video, produced by the Covariant marketing team to celebrate National Robotics Week (2023), features some projects that I participate(d) in.
00:00 ~ 00:03: Robotic kitting
00:03 ~ 00:07: Robotic putwall
00:07 ~ 00:18: Dense packing
I am currently working on the Robotic Kitting project at Covariant. This complex system presents many AI challenges: diverse objects, high throughput, and near-zero tolerance for error. I create multiple innovations to address such challenges.