Getting Started¶
Creating a vectorized environment¶
To create a vectorized environment that runs multiple sub-environments, you can wrap your sub-environments inside gym.vector.SyncVectorEnv
(for sequential execution), or gym.vector.AsyncVectorEnv
(for parallel execution, with multiprocessing). These vectorized environments take as input a list of callable specifying how the sub-environments are created.
>>> envs = gym.vector.AsyncVectorEnv([
... lambda: gym.make("CartPole-v1"),
... lambda: gym.make("CartPole-v1"),
... lambda: gym.make("CartPole-v1")
... ])
Alternatively, to create a vectorized environment of multiple copies of the same registered sub-environment, you can use the function gym.vector.make()
.
>>> envs = gym.vector.make("CartPole-v1", num_envs=3) # Equivalent
Note
To enable automatic batching of actions and observations, all the sub-environments must share the same action_space
and observation_space
. However, all the sub-environments are not required to be exact copies of one another. For example, you can run 2 instances of Pendulum-v0
with different values of the gravity in a vectorized environment with
>>> env = gym.vector.AsyncVectorEnv([
... lambda: gym.make("Pendulum-v0", g=9.81),
... lambda: gym.make("Pendulum-v0", g=1.62)
... ])
See also Observation & Action spaces for more information about automatic batching.
Warning
When using AsyncVectorEnv
with either the spawn
or forkserver
start methods, you must wrap your code containing the vectorized environment with if __name__ == "__main__":
. See this documentation for more information.
if __name__ == "__main__":
envs = gym.vector.make("CartPole-v1", num_envs=3, context="spawn")
Working with vectorized environments¶
While standard Gym environments take a single action and return a single observation (with a reward, and boolean indicating termination), vectorized environments take a batch of actions as input, and return a batch of observations, together with an array of rewards and booleans indicating if the episode ended in each sub-environment.
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.reset()
array([[ 0.00198895, -0.00569421, -0.03170966, 0.00126465],
[-0.02658334, 0.00755256, 0.04376719, -0.00266695],
[-0.02898625, 0.04779156, 0.02686412, -0.01298284]],
dtype=float32)
>>> actions = np.array([1, 0, 1])
>>> observations, rewards, dones, infos = envs.step(actions)
>>> observations
array([[ 0.00187507, 0.18986781, -0.03168437, -0.301252 ],
[-0.02643229, -0.18816885, 0.04371385, 0.3034975 ],
[-0.02803041, 0.24251814, 0.02660446, -0.29707024]],
dtype=float32)
>>> rewards
array([1., 1., 1.])
>>> dones
array([False, False, False])
>>> infos
({}, {}, {})
Vectorized environments are compatible with any sub-environment, regardless of the action and observation spaces (e.g. container spaces like Dict
, or any arbitrarily nested spaces). In particular, vectorized environments can automatically batch the observations returned by reset()
and step()
for any standard Gym space (e.g. Box
, Discrete
, Dict
, or any nested structure thereof). Similarly, vectorized environments can take batches of actions from any standard Gym space.
>>> class DictEnv(gym.Env):
... observation_space = gym.spaces.Dict({
... "position": gym.spaces.Box(-1., 1., (3,), np.float32),
... "velocity": gym.spaces.Box(-1., 1., (2,), np.float32)
... })
... action_space = gym.spaces.Dict({
... "fire": gym.spaces.Discrete(2),
... "jump": gym.spaces.Discrete(2),
... "acceleration": gym.spaces.Box(-1., 1., (2,), np.float32)
... })
...
... def reset(self):
... return self.observation_space.sample()
...
... def step(self, action):
... observation = self.observation_space.sample()
... return (observation, 0., False, {})
>>> envs = gym.vector.AsyncVectorEnv([lambda: DictEnv()] * 3)
>>> envs.observation_space
Dict(position:Box(-1.0, 1.0, (3, 3), float32), velocity:Box(-1.0, 1.0, (3, 2), float32))
>>> envs.action_space
Dict(fire:MultiDiscrete([2 2 2]), jump:MultiDiscrete([2 2 2]), acceleration:Box(-1.0, 1.0, (3, 2), float32))
>>> envs.reset()
>>> actions = {
... "fire": np.array([1, 1, 0]),
... "jump": np.array([0, 1, 0]),
... "acceleration": np.random.uniform(-1., 1., size=(3, 2))
... }
>>> observations, rewards, dones, infos = envs.step(actions)
>>> observations
{"position": array([[-0.5337036 , 0.7439302 , 0.41748118],
[ 0.9373266 , -0.5780453 , 0.8987405 ],
[-0.917269 , -0.5888639 , 0.812942 ]], dtype=float32),
"velocity": array([[ 0.23626241, -0.0616814 ],
[-0.4057572 , -0.4875375 ],
[ 0.26341468, 0.72282314]], dtype=float32)}
Note
The sub-environments inside a vectorized environment automatically call reset
at the end of an episode. In the following example, the episode of the 3rd sub-environment ends after 2 steps (the agent fell in a hole), and the sub-environment gets reset (observation 0
).
>>> envs = gym.vector.make("FrozenLake-v1", num_envs=3, is_slippery=False)
>>> envs.reset()
array([0, 0, 0])
>>> observations, rewards, dones, infos = envs.step(np.array([1, 2, 2]))
>>> observations, rewards, dones, infos = envs.step(np.array([1, 2, 1]))
>>> dones
array([False, False, True])
>>> observations
array([8, 2, 0])
Observation & Action spaces¶
Like any Gym environment, vectorized environments contain two properties observation_space
and action_space
to specify the observation and action spaces of the environment. Since vectorized environments operate on multiple sub-environments, where the observations and actions of sub-environments are batched together, the observation and action spaces are adequately batched as well so that the input actions are valid elements of action_space
, and the observations are valid elements of observation_space
.
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.observation_space
Box([[-4.8 ...]], [[4.8 ...]], (3, 4), float32)
>>> envs.action_space
MultiDiscrete([2 2 2])
Note
In order to appropriately batch the observations and actions in vectorized environments, the observation and action spaces of all the sub-environments are required to be identical.
>>> envs = gym.vector.AsyncVectorEnv([
... lambda: gym.make("CartPole-v1"),
... lambda: gym.make("MountainCar-v0")
... ])
RuntimeError: Some environments have an observation space different from `Box([-4.8 ...], [4.8 ...], (4,), float32)`. In order to batch observations, the observation spaces from all environments must be equal.
However, sometimes it may be handy to have access to the observation and action spaces of a sub-environment, and not the batched spaces. You can access those with the properties single_observation_space
and single_action_space
of the vectorized environment.
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.single_observation_space
Box([-4.8 ...], [4.8 ...], (4,), float32)
>>> envs.single_action_space
Discrete(2)
This is convenient, for example, if you instantiate a policy. In the following example, we used single_observation_space
and single_action_space
to define the weights of a linear policy. Note that thanks to the vectorized environment, you can apply the policy directly to the whole batch of observations with a single call to policy
.
>>> from gym.spaces.utils import flatdim
>>> from scipy.special import softmax
>>> def policy(weights, observations):
... logits = np.dot(observations, weights)
... return softmax(logits, axis=1)
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> weights = np.random.randn(
... flatdim(envs.single_observation_space),
... envs.single_action_space.n
... )
>>> observations = envs.reset()
>>> actions = policy(weights, observations).argmax(axis=1)
>>> observations, rewards, dones, infos = envs.step(actions)