Creating Your Environment
In this section, we look at the polycraft_gym_main.yaml config used in the following three files:
-
manual_novelty_test1.py – keyboard agent with rendering,
-
manual_sanity_checker.py – loads trained model and sees what action it selects,
-
train.py – used for training.
Specifically, we explain how the environment can be modified using the config only, i.e. without having to write any code. Later sections cover what can be implemented from scratch:
-
entities such as objects and actions (see Examples of Objects & Actions),
-
spaces (see Defining Spaces).
For details on how anything can become a novelty to the agent, see Implementing Novelties.
Layout
map_size
Width and height in cells of the gridworld navigated by the agent.
map_size: `[16, 16]`
rooms
Coordinates of the upper-left and lower-right corner of each room. Where rooms overlap on a row or column, a wall with a door is created.
rooms:
'1':
start: [0, 0]
end: [10, 10]
'2':
start: [10, 0]
end: [15, 15]
Objects
object_types
Source modules, break cost, and collect cost of the object types in the game.
object_types:
tree_tap:
module: gym_novel_gridworlds2.contrib.polycraft.objects.TreeTap
collect_cost: 50000
objects
Quantity and location of the objects initially placed in the environment. The chunked
key set to True
places all objects of the same type next to each other.
objects:
oak_log:
quantity: 5
room: 2
chunked: 'False'
Entities
entities
There are several subkeys to this key:
-
agent
– source of behaviour for the agent:-
KeyboardAgent
, -
RandomAgent
, -
a more complex setting for an RL agent such as in config/polycraft_gym_rl.yaml,
-
(see Combining Planning & RL Agents for more detail on integrating intelligent agents),
-
-
entity
– source code of the agent, -
id
– unique identifier of an entity, used in actions such asapproach_entity_<id>
, -
action_set
– attributes to the entity one of the actions sets (multiple entities can share the same action set), -
action_sets
– action sets available, -
room
– the room the entity is placed in at the start of the game, -
inventory
– what the entity has in their inventory at the start of the game (the inventory is variable throughout the game), -
max_step_cost
– the maximum cost that can be incurred on an intelligent (non-keyboard, non-random) agent at any step.
entities:
main_1:
agent: gym_novel_gridworlds2.agents.KeyboardAgent
name: entity.polycraft.Player.name
type: agent
entity: gym_novel_gridworlds2.contrib.polycraft.objects.PolycraftEntity
action_set: main
inventory:
iron_pickaxe: 1
tree_tap: 1
id: 0
room: 2
max_step_cost: 100000
trades
The input and output of a trade and the id
of the trader with whom this trade can be executed.
trades:
block_of_titanium_1:
input:
block_of_platinum: 1
output:
block_of_titanium: 1
trader:
- 103
auto_pickup_agents
List of ids of those entities that are to automatically collect all objects around them at each time step.
auto_pickup_agents:
- 0
Actions
actions
Source modules and step cost of actions in the environment. In the case of actions involving interactions with other agents, the entity_id
must be provided. Compound actions include
-
break_<object>
, -
approach_<object/entity>
, -
interact_<entity>
, -
select_<object>
, -
craft_<object>
, -
trade_<object>
.
Notice nop_placeholder
, a placeholder for a novelty action.
actions:
break_block:
module: gym_novel_gridworlds2.contrib.polycraft.actions.Break
step_cost: 3600
action_sets
Unique sets of actions that can be attributed to any entity. Any set of actions can be shared by entities.
action_sets:
main:
- collect
- break_block
- approach_oak_log
- select_oak_log
- deselect_item
- craft_stick
- nop_placeholder1
- give_up
Goal
recipies
Input, output, and step cost of all the recipies the agent can craft. In the base implementation includes the recipe for the pogo_stick
, the goal craft of the game.
recipies:
pogo_stick:
input:
- stick
- block_of_titanium
- stick
- diamond
- '0'
- '0'
- '0'
- rubber
- '0'
output:
pogo_stick: 1
step_cost: 8400
Training
All of the below keys take integer values.
sleep_time
Time delay after each environment step when training.
sleep_time: 0
time_limit
Limit on how many steps the agent can take in attempting the goal during training.
time_limit: 89000
seed
For the reproducibility of the experiment run.
seed: 23
num_episodes
Number of episodes to run when training.
num_episodes: 10