Deep reinforcement learning

Continuous Control using DRL

Deep Reinforcement Learning can be used for training AI agents to perform continuous controls. The projects demonstrated here utilizes a couple of Unity ML-Agents environment to teach single-agent and multiple-agents in both independent and competitive settings. Deep Deterministic Policy Gradient (DDPG) policy optimization technique is used for all the tasks.

[Crawler] - A creature with 4 arms and 4 forearms

In this environment, The agents must move its body toward the goal direction without falling. The environment has 12 agents, each observes a state with length 129, and outputs a vector of action of size 12. This task is episodic.

Untrained Agents in Tennis Environment

Testing the Agents after training the agents for 1800 episodes

[Tennis] environment for training two agents to play against each other in a competitive manner.

In this environment, two agents control rackets to bounce a ball over a net. If an agent hits the ball over the net, it receives a reward of +0.1. If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01. Thus, the goal of each agent is to keep the ball in play.

The observation space consists of 8 variables corresponding to the position and velocity of the ball and racket. Each agent receives its own, local observation. Two continuous actions are available, corresponding to movement toward (or away from) the net, and jumping.

Testing on Single Agent after training multiple agents for 50 episodes

Testing on Multiple Agents after training multiple agents for 50 episodes

[Reacher] environment for training Single and Multiple agents.

In this environment, a double-jointed arm can move to target locations. A reward of +0.1 is provided for each step that the agent’s hand is in the goal location. Thus, the goal of the agent is to maintain its position at the target location for as many time steps as possible.

The observation space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm. Each action is a vector with four numbers, corresponding to torque applicable to two joints. Every entry in the action vector should be a number between -1 and 1.

Two separate versions of the Unity environment is used:

  • The first version contains a single agent.

  • The second version contains 20 identical agents, each with its own copy of the environment.

robot navigation

An A* Curriculum Approach to Reinforcement Learning for RGBD Indoor Robot Navigation

Authors: Kaushik Balakrishnan, Punarjay Chakravarty, and Shubham Shrivastava

Training robots to navigate diverse environments is a challenging problem as it involves the confluence of several different perception tasks such as mapping and localization, followed by optimal path-planning and control. Recently released photo-realistic simulators such as Habitat allow for the training of networks that output control actions directly from perception: agents use Deep Reinforcement Learning (DRL) to regress directly from the camera image to a control output in an end-to-end fashion. This is data-inefficient and can take several days to train on a GPU. Our paper tries to overcome this problem by separating the training of the perception and control neural nets and increasing the path complexity gradually using a curriculum approach. Specifically, a pre-trained twin Variational AutoEncoder (VAE) is used to compress RGBD (RGB & depth) sensing from an environment into a latent embedding, which is then used to train a DRL-based control policy. A*, a traditional path-planner is used as a guide for the policy and the distance between start and target locations is incrementally increased along the A* route, as training progresses. We demonstrate the efficacy of the proposed approach, both in terms of increased performance and decreased training times for the PointNav task in the Habitat simulation environment. This strategy of improving the training of direct-perception based DRL navigation policies is expected to hasten the deployment of robots of particular interest to industry such as co-bots on the factory floor and last-mile delivery robots.

Deep q-learning

Lunar Landing in OpenAI’s Gym Environment

This project demonstrates training a DRL agent using Deep Q-Learning to perform Lunar Landing within the OpenAI Gym’s environment.

Navigation of an Agent in Unity-ML Environment

In this project, an AI agent is trained to navigate (and collect bananas!) in a large, square world.

A reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana. Thus, the goal of your agent is to collect as many yellow bananas as possible while avoiding blue bananas.

The state space has 37 dimensions and contains the agent’s velocity, along with ray-based perception of objects around the agent’s forward direction. Given this information, the agent has to learn how to best select actions. Four discrete actions are available, corresponding to:

0 - move forward.

1 - move backward.

2 - turn left.

3 - turn right.