Getting Started¶
Creating a vectorized environment¶
To create a vectorized environment that runs multiple sub-environments, you can wrap your sub-environments inside gym.vector.SyncVectorEnv
(for sequential execution), or gym.vector.AsyncVectorEnv
(for parallel execution, with multiprocessing). These vectorized environments take as input a list of callable specifying how the sub-environments are created.
>>> envs = gym.vector.AsyncVectorEnv([
... lambda: gym.make("CartPole-v1"),
... lambda: gym.make("CartPole-v1"),
... lambda: gym.make("CartPole-v1")
... ])
Alternatively, to create a vectorized environment of multiple copies of the same registered sub-environment, you can use the function gym.vector.make()
>>> envs = gym.vector.make("CartPole-v1", num_envs=3) # Equivalent
To enable automatic batching of actions and observations, all the sub-environments must share the same action_space
and observation_space
. However, all the sub-environments are not required to be exact copies of one another. For example, you can run 2 instances of Pendulum-v0
with different values of the gravity in a vectorized environment with
>>> env = gym.vector.AsyncVectorEnv([
... lambda: gym.make("Pendulum-v0", g=9.81),
... lambda: gym.make("Pendulum-v0", g=1.62)
... ])
See also Observation & Action spaces for more information about automatic batching.
When using AsyncVectorEnv
with either the spawn
or forkserver
start methods, you must wrap your code containing the vectorized environment with if __name__ == "__main__":
. See this documentation for more information.
if __name__ == "__main__":
envs = gym.vector.make("CartPole-v1", num_envs=3, context="spawn")
Working with vectorized environments¶
While standard Gym environments take a single action and return a single observation (with a reward, and boolean indicating termination), vectorized environments take a batch of actions as input, and return a batch of observations, together with an array of rewards and booleans indicating if the episode ended in each sub-environment.
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.reset()
array([[ 0.00198895, -0.00569421, -0.03170966, 0.00126465],
[-0.02658334, 0.00755256, 0.04376719, -0.00266695],
[-0.02898625, 0.04779156, 0.02686412, -0.01298284]],
>>> actions = np.array([1, 0, 1])
>>> observations, rewards, dones, infos = envs.step(actions)
>>> observations
array([[ 0.00187507, 0.18986781, -0.03168437, -0.301252 ],
[-0.02643229, -0.18816885, 0.04371385, 0.3034975 ],
[-0.02803041, 0.24251814, 0.02660446, -0.29707024]],
>>> rewards
array([1., 1., 1.])
>>> dones
array([False, False, False])
>>> infos
({}, {}, {})
Vectorized environments are compatible with any sub-environment, regardless of the action and observation spaces (e.g. container spaces like Dict
, or any arbitrarily nested spaces). In particular, vectorized environments can automatically batch the observations returned by reset()
and step()
for any standard Gym space (e.g. Box
, Discrete
, Dict
, or any nested structure thereof). Similarly, vectorized environments can take batches of actions from any standard Gym space.
>>> class DictEnv(gym.Env):
... observation_space = gym.spaces.Dict({
... "position": gym.spaces.Box(-1., 1., (3,), np.float32),
... "velocity": gym.spaces.Box(-1., 1., (2,), np.float32)
... })
... action_space = gym.spaces.Dict({
... "fire": gym.spaces.Discrete(2),
... "jump": gym.spaces.Discrete(2),
... "acceleration": gym.spaces.Box(-1., 1., (2,), np.float32)
... })
... def reset(self):
... return self.observation_space.sample()
... def step(self, action):
... observation = self.observation_space.sample()
... return (observation, 0., False, {})
>>> envs = gym.vector.AsyncVectorEnv([lambda: DictEnv()] * 3)
>>> envs.observation_space
Dict(position:Box(-1.0, 1.0, (3, 3), float32), velocity:Box(-1.0, 1.0, (3, 2), float32))
>>> envs.action_space
Dict(fire:MultiDiscrete([2 2 2]), jump:MultiDiscrete([2 2 2]), acceleration:Box(-1.0, 1.0, (3, 2), float32))
>>> envs.reset()
>>> actions = {
... "fire": np.array([1, 1, 0]),
... "jump": np.array([0, 1, 0]),
... "acceleration": np.random.uniform(-1., 1., size=(3, 2))
... }
>>> observations, rewards, dones, infos = envs.step(actions)
>>> observations
{"position": array([[-0.5337036 , 0.7439302 , 0.41748118],
[ 0.9373266 , -0.5780453 , 0.8987405 ],
[-0.917269 , -0.5888639 , 0.812942 ]], dtype=float32),
"velocity": array([[ 0.23626241, -0.0616814 ],
[-0.4057572 , -0.4875375 ],
[ 0.26341468, 0.72282314]], dtype=float32)}
The sub-environments inside a vectorized environment automatically call reset
at the end of an episode. In the following example, the episode of the 3rd sub-environment ends after 2 steps (the agent fell in a hole), and the sub-environment gets reset (observation 0
>>> envs = gym.vector.make("FrozenLake-v1", num_envs=3, is_slippery=False)
>>> envs.reset()
array([0, 0, 0])
>>> observations, rewards, dones, infos = envs.step(np.array([1, 2, 2]))
>>> observations, rewards, dones, infos = envs.step(np.array([1, 2, 1]))
>>> dones
array([False, False, True])
>>> observations
array([8, 2, 0])
Observation & Action spaces¶
Like any Gym environment, vectorized environments contain two properties observation_space
and action_space
to specify the observation and action spaces of the environment. Since vectorized environments operate on multiple sub-environments, where the observations and actions of sub-environments are batched together, the observation and action spaces are adequately batched as well so that the input actions are valid elements of action_space
, and the observations are valid elements of observation_space
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.observation_space
Box([[-4.8 ...]], [[4.8 ...]], (3, 4), float32)
>>> envs.action_space
MultiDiscrete([2 2 2])
In order to appropriately batch the observations and actions in vectorized environments, the observation and action spaces of all the sub-environments are required to be identical.
>>> envs = gym.vector.AsyncVectorEnv([
... lambda: gym.make("CartPole-v1"),
... lambda: gym.make("MountainCar-v0")
... ])
RuntimeError: Some environments have an observation space different from `Box([-4.8 ...], [4.8 ...], (4,), float32)`. In order to batch observations, the observation spaces from all environments must be equal.
However, sometimes it may be handy to have access to the observation and action spaces of a sub-environment, and not the batched spaces. You can access those with the properties single_observation_space
and single_action_space
of the vectorized environment.
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.single_observation_space
Box([-4.8 ...], [4.8 ...], (4,), float32)
>>> envs.single_action_space
This is convenient, for example, if you instantiate a policy. In the following example, we used single_observation_space
and single_action_space
to define the weights of a linear policy. Note that thanks to the vectorized environment, you can apply the policy directly to the whole batch of observations with a single call to policy
>>> from gym.spaces.utils import flatdim
>>> from scipy.special import softmax
>>> def policy(weights, observations):
... logits =, weights)
... return softmax(logits, axis=1)
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> weights = np.random.randn(
... flatdim(envs.single_observation_space),
... envs.single_action_space.n
... )
>>> observations = envs.reset()
>>> actions = policy(weights, observations).argmax(axis=1)
>>> observations, rewards, dones, infos = envs.step(actions)