dmlab30

DMLab-30

DMLab-30 is a set of environments designed for DeepMind Lab. These environments enable a researcher to develop agents for a large spectrum of interesting tasks either individually or in a multi-task setting.

rooms_collect_good_objects_{test,train}
rooms_exploit_deferred_effects_{test,train}
rooms_select_nonmatching_object
rooms_watermaze
rooms_keys_doors_puzzle
language_select_described_object
language_select_located_object
language_execute_random_task
language_answer_quantitative_question
lasertag_one_opponent_small
lasertag_three_opponents_small
lasertag_one_opponent_large
lasertag_three_opponents_large
natlab_fixed_large_map
natlab_varying_map_regrowth
natlab_varying_map_randomized
skymaze_irreversible_path_hard
skymaze_irreversible_path_varied
psychlab_arbitrary_visuomotor_mapping
psychlab_continuous_recognition
psychlab_sequential_comparison
psychlab_visual_search
explore_object_locations_small
explore_object_locations_large
explore_obstructed_goals_small
explore_obstructed_goals_large
explore_goal_locations_small
explore_goal_locations_large
explore_object_rewards_few
explore_object_rewards_many

Rooms

Collect Good Objects

The agent must learn to collect good objects and avoid bad objects in two environments. During training, only some combinations of objects/environments are shown, hence the agent could assume the environment matters to the task due to this correlational structure. However it does not and will be detrimental in a transfer setting. We explicitly verify that by testing transfer performance on a held-out objects/environment combination. For more details, please see: Higgins, Irina et al. "DARLA: Improving Zero-Shot Transfer in Reinforcement Learning" (2017).

Test Regime: Test set consists of held-out combinations of objects/environments never seen during training.

Observation Spec: RGBD

Level Name: rooms_collect_good_objects_{test,train}

Exploit Deferred Effects

This task requires the agent to make a conceptual leap from picking up a special object to getting access to more rewards later on, even though this is never shown in a single environment and is costly. Expected to be hard for model-free agents to learn, but should be simple when using some model-based/predictive strategy.

Test Regime: Tested in a room configuration never seen during training, where picking up a special object suddenly becomes useful.

Observation Spec: RGBD

Level Name: rooms_exploit_deferred_effects_{test,train}

Select Non-matching Object

This task requires the agent to choose and collect an object that is different from the one it is shown. The agent is placed into a small room containing an out-of-reach object and a teleport pad. Touching the pad awards the agent with 1 point, and teleports them to a second room. The second room contains two objects, one of which matches the object in the previous room.

Collect matching object: -10 points.
Collect non-matching object: +10 points.

Once either object is collected the agent is returned to the first room, with the same initial object being shown.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: rooms_select_nonmatching_object

Watermaze

The agent must find a hidden platform which, when found, generates a reward. This is difficult to find the first time, but in subsequent trials the agent should try to remember where it is and go straight back to this place. Tests episodic memory and navigation ability.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: rooms_watermaze

Keys Doors Puzzle

A procedural planning puzzle. The agent must reach the goal object, located in a position that is blocked by a series of coloured doors. Single use coloured keys can be used to open matching doors and only one key can be held at a time. The objective is to figure out the correct sequence in which the keys must be collected and the rooms traversed. Visiting the rooms or collecting keys in the wrong order can make the goal unreachable.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: rooms_keys_doors_puzzle

Language

For details on the addition of language instructions, see: Hermann, Karl Moritz, & Hill, Felix et al. "Grounded language learning in a simulated 3D world. (2017)".

Select Described Object

The agent is placed into a small room containing two objects. An instruction is used to describe one of the objects. The agent must successfully follow the instruction and collect the goal object.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD and language

Level Name: language_select_described_object

Select Located Object

The agent is asked to collect a specified coloured object in a specified coloured room. Example instruction: “Pick the red object in the blue room.” There are four variants of the task, each of which have an equal chance of being selected. Variants have a different amount of rooms (between 2-6). Variants with more rooms have more distractors, making the task more challenging.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD and language

Level Name: language_select_located_object

Execute Random Task

The agent is given one of seven possible tasks, each with a different type of language instruction. Example instruction: “Get the red hat from the blue room.” The agent is rewarded for collecting the correct object, and penalised for collecting the wrong object. When any object is collected, the level restarts and a new task is selected.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD and language

Level Name: language_execute_random_task

Answer Quantitative Question

The agent is given a yes or no question based on object colors and counts. The agent selects a certain object to respond:

White sphere = yes
Black sphere = no
Example questions:
“Are all cars blue?”
“Is any car blue?”
“Is anything blue?”
“Are most cars blue?”

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD and language

Level Name: language_answer_quantitative_question

LaserTag

One Opponent Small

This task requires the agent to play laser tag in a procedurally generated map containing random gadgets and power-ups. The map is small and there is 1 opponent bot of difficulty level 4. The agent begins the episode with the default Rapid Gadget and a limit of 100 tags. The agent’s Shield will begin at 125 and slowly drop to the max amount of 100. The gadgets, powerups and map layout are random per episode and so the agent must adapt to each new environment.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: lasertag_one_opponent_small

Three Opponents Small

This task requires the agent to play laser tag in a procedurally generated map containing random gadgets and power-ups. The map is small and there are 3 opponent bots of difficulty level 4. The agent begins the episode with the default Rapid Gadget and a limit of 100 tags. The agent’s Shield will begin at 125 and slowly drop to the max amount of 100. The gadgets, powerups and map layout are random per episode and so the agent must adapt to each new environment.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: lasertag_three_opponents_small

One Opponent Large

This task requires the agent to play laser tag in a procedurally generated map containing random gadgets and power-ups. The map is large and there is 1 opponent bot of difficulty level 4. The agent begins the episode with the default Rapid Gadget and a limit of 100 tags. The agent’s Shield will begin at 125 and slowly drop to the max amount of 100. The gadgets, powerups and map layout are random per episode and so the agent must adapt to each new environment.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: lasertag_one_opponent_large

Three Opponents Large

This task requires the agent to play laser tag in a procedurally generated map containing random gadgets and power-ups. The map is large and there are 3 opponent bots of difficulty level 4. The agent begins the episode with the default Rapid Gadget and a limit of 100 tags. The agent’s Shield will begin at 125 and slowly drop to the max amount of 100. The gadgets, powerups and map layout are random per episode and so the agent must adapt to each new environment.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: lasertag_three_opponents_large

NatLab

Fixed Large Map

This is a long term memory variation of a mushroom foraging task. The agent must collect mushrooms within a naturalistic terrain environment to maximise score. The mushrooms do not regrow. The map is a fixed large-sized environment. The time of day is randomised (day, dawn, night). Each episode the spawn location is picked randomly from a set of potential spawn locations.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: natlab_fixed_large_map

Varying Map Regrowth

This is a short term memory variation of a mushroom foraging task. The agent must collect mushrooms within a naturalistic terrain environment to maximise score. The mushrooms regrow after around one minute in the same location throughout the episode. The map is a randomized small-sized environment. The topographical variation, and number, position, orientation and sizes of shrubs, cacti and rocks are all randomized. The time of day is randomised (day, dawn, night). The spawn location is randomised for each episode.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: natlab_varying_map_regrowth

Varying Map Randomized

This is a randomized variation of a mushroom foraging task. The agent must collect mushrooms within a naturalistic terrain environment to maximise score. The mushrooms do not regrow. The map is randomly generated and of intermediate size. The topographical variation, and number, position, orientation and sizes of shrubs, cacti and rocks are all randomised. Locations of mushrooms are randomized. The time of day is randomized (day, dawn, night). The spawn location is randomized for each episode.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: natlab_varying_map_randomized

SkyMaze

Irreversible Path Hard

This task requires agents to reach a goal located at a distance from the agent’s starting position. The goal and target are connected by a sequence of platforms placed at different heights. Jumping is disabled, so higher platforms are unreachable and the agent won’t be able to backtrack to a higher platform. This means that the agent is required to plan their route to ensure they do not become stuck and fail the task.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: skymaze_irreversible_path_hard

Irreversible Path Varied

A variation of the Irreversible Path Hard task. This version of the task will select a map layout of random difficulty for the agent to solve. The jump action is disabled (NOOP) for this task.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: skymaze_irreversible_path_varied

PsychLab

For details, see: Leibo, Joel Z. et al. "Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents (2018)".

Arbitrary Visuomotor Mapping

In this task, the agent is shown consecutive images with which they must remember associations with specific movement patterns (locations to point at). The agent is rewarded if it can remember the action associated with a given object. The images are drawn from a set of ~ 2500, and the specific associations are randomly generated and different in each episode.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: psychlab_arbitrary_visuomotor_mapping

Continuous Recognition

This task tests familiarity memory. Consecutive images are shown, and the agent must indicate whether or not they have seen the image before during that episode. Looking at the left square indicates no, and right indicates yes. The images (drawn from a set of ~2500) are shown in a different random order in every episode.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: psychlab_continuous_recognition

Sequential Comparison

Two consecutive patterns are shown to the agent. The agent must indicate whether or not the two patterns are identical. The delay time between the study pattern and the test pattern is variable.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: psychlab_sequential_comparison

Visual Search

A collection of shapes are shown to the agent. The agent must identify whether or not a specific shape is present in the collection. Each trial consists of the agent searching for a pink ‘T’ shape. Two black squares at the bottom of the screen are used for ‘yes’ and ‘no’ responses.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: psychlab_visual_search

Explore

Object Locations Small

This task requires agents to collect apples. Apples are placed in rooms within the maze. The agent must collect as many apples as possible before the episode ends to maximise their score. Upon collecting all of the apples, the level will reset, repeating until the episode ends. Apple locations, level layout and theme are randomized per episode. Agent spawn location is randomised per reset.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: explore_object_locations_small

Object Locations Large

This task is the same as Object Locations Small, but with a larger map and longer episode duration. Apple locations, level layout and theme are randomised per episode. Agent spawn location is randomised per reset.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: explore_object_locations_large

Obstructed Goals Small

This task is similar to Goal Locations Small - agents are required to find the goal as fast as possible, but now with randomly opened and closed doors. After the goal is found, the level restarts. Goal location, level layout and theme are randomized per episode. Agent spawn location is randomised per reset. Door states (open/closed) are randomly selected per reset, but a path to the goal always exists.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: explore_obstructed_goals_small

Obstructed Goals Large

This task is the same as Obstructed Goals Small, but with a larger map and longer episode duration. Goal location, level layout and theme are randomised per episode. Agent spawn location is randomised per reset. Door states (open/closed) are randomly selected per reset, but a path to the goal always exists.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: explore_obstructed_goals_large

Goal Locations Small

This task requires agents to find the goal object as fast as possible. After the goal object is found, the level restarts. Goal location, level layout and theme are randomised per episode. Agent spawn location is randomised per reset.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: explore_goal_locations_small

Goal Locations Large

This task is the same as Goal Locations Small, but with a larger map and longer episode duration. Goal location, level layout and theme are randomised per episode. Agent spawn location is randomised per reset.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: explore_goal_locations_large

Object Rewards Few

This task requires agents to collect human-recognisable objects placed around a room. Some objects are from a positive rewarding category, and some are negative. After all positive category objects are collected, the level restarts. Level theme, object categories and object reward per category are randomised per episode. Agent spawn location, object locations and number of objects per category are randomised per reset.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: explore_object_rewards_few

Object Rewards Many

This task is a more difficult variant of Object Rewards Few, with an increased number of goal objects and longer episode duration. Level theme, object categories and object reward per category are randomised per episode. Agent spawn location, object locations and number of objects per category are randomised per reset.

Test Regime: Training and testing levels drawn from the same distribution.

Observation Spec: RGBD

Level Name: explore_object_rewards_many

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md	[DMLab-30] Release remaining two levels.	May 14, 2018
explore_goal_locations_large.lua	explore_goal_locations_large.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
explore_goal_locations_small.lua	explore_goal_locations_small.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
explore_object_locations_large.lua	explore_object_locations_large.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
explore_object_locations_small.lua	explore_object_locations_small.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
explore_object_rewards_few.lua	explore_object_rewards_few.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
explore_object_rewards_many.lua	explore_object_rewards_many.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
explore_obstructed_goals_large.lua	explore_obstructed_goals_large.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
explore_obstructed_goals_small.lua	explore_obstructed_goals_small.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
language_answer_quantitative_question.lua	language_answer_quantitative_question.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
language_execute_random_task.lua	language_execute_random_task.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
language_select_described_object.lua	language_select_described_object.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
language_select_located_object.lua	language_select_located_object.lua	[game_scripts] Cosmetic whitespace harmonisation	Dec 7, 2020
lasertag_one_opponent_large.lua	lasertag_one_opponent_large.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
lasertag_one_opponent_small.lua	lasertag_one_opponent_small.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
lasertag_three_opponents_large.lua	lasertag_three_opponents_large.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
lasertag_three_opponents_small.lua	lasertag_three_opponents_small.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
natlab_fixed_large_map.lua	natlab_fixed_large_map.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
natlab_varying_map_randomized.lua	natlab_varying_map_randomized.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
natlab_varying_map_regrowth.lua	natlab_varying_map_regrowth.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
psychlab_arbitrary_visuomotor_mapping.lua	psychlab_arbitrary_visuomotor_mapping.lua	[DMLab-30] Release remaining two levels.	May 14, 2018
psychlab_continuous_recognition.lua	psychlab_continuous_recognition.lua	[DMLab-30] Release remaining two levels.	May 14, 2018
psychlab_sequential_comparison.lua	psychlab_sequential_comparison.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
psychlab_visual_search.lua	psychlab_visual_search.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
rooms_collect_good_objects_test.lua	rooms_collect_good_objects_test.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
rooms_collect_good_objects_train.lua	rooms_collect_good_objects_train.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
rooms_exploit_deferred_effects_test.lua	rooms_exploit_deferred_effects_test.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
rooms_exploit_deferred_effects_train.lua	rooms_exploit_deferred_effects_train.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
rooms_keys_doors_puzzle.lua	rooms_keys_doors_puzzle.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
rooms_select_nonmatching_object.lua	rooms_select_nonmatching_object.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
rooms_watermaze.lua	rooms_watermaze.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
skymaze_irreversible_path_hard.lua	skymaze_irreversible_path_hard.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018
skymaze_irreversible_path_varied.lua	skymaze_irreversible_path_varied.lua	[levels] Release DMLab-30 levels.	Feb 7, 2018

Files

dmlab30

Directory actions

More options

Directory actions

More options

Latest commit

History

dmlab30

Folders and files

parent directory

README.md

DMLab-30

Rooms

Collect Good Objects

Exploit Deferred Effects

Select Non-matching Object

Watermaze

Keys Doors Puzzle

Language

Select Described Object

Select Located Object

Execute Random Task

Answer Quantitative Question

LaserTag

One Opponent Small

Three Opponents Small

One Opponent Large

Three Opponents Large

NatLab

Fixed Large Map

Varying Map Regrowth

Varying Map Randomized

SkyMaze

Irreversible Path Hard

Irreversible Path Varied

PsychLab

Arbitrary Visuomotor Mapping

Continuous Recognition

Sequential Comparison

Visual Search

Explore

Object Locations Small

Object Locations Large

Obstructed Goals Small

Obstructed Goals Large

Goal Locations Small

Goal Locations Large

Object Rewards Few

Object Rewards Many