API Reference

gym.vector.make(id, num_envs=1, asynchronous=True, wrappers=None, **kwargs)

Create a vectorized environment from multiple copies of an environment, from its id.

Parameters
  • id (str) – The environment ID. This must be a valid ID from the registry.

  • num_envs (int) – Number of copies of the environment.

  • asynchronous (bool) – If True, wraps the environments in an AsyncVectorEnv (which uses multiprocessing to run the environments in parallel). If False, wraps the environments in a SyncVectorEnv.

  • wrappers (callable, or iterable of callables, optional) – If not None, then apply the wrappers to each internal environment during creation.

Returns

The vectorized environment.

Return type

gym.vector.VectorEnv

Example

>>> env = gym.vector.make('CartPole-v1', num_envs=3)
>>> env.reset()
array([[-0.04456399,  0.04653909,  0.01326909, -0.02099827],
       [ 0.03073904,  0.00145001, -0.03088818, -0.03131252],
       [ 0.03468829,  0.01500225,  0.01230312,  0.01825218]],
      dtype=float32)

VectorEnv

class gym.vector.VectorEnv(num_envs, observation_space, action_space)

Base class for vectorized environments.

Each observation returned from vectorized environment is a batch of observations for each sub-environment. And step() is also expected to receive a batch of actions for each sub-environment.

Note

All sub-environments should share the identical observation and action spaces. In other words, a vector of multiple different environments is not supported.

Parameters
  • num_envs (int) – Number of environments in the vectorized environment.

  • observation_space (gym.spaces.Space) – Observation space of a single environment.

  • action_space (gym.spaces.Space) – Action space of a single environment.

property action_space
Type

gym.spaces.Space

The (batched) action space. The input actions of step() must be valid elements of action_space.

>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.action_space
MultiDiscrete([2 2 2])
property observation_space
Type

gym.spaces.Space

The (batched) observation space. The observations returned by reset() and step() are valid elements of observation_space.

>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.observation_space
Box([[-4.8 ...]], [[4.8 ...]], (3, 4), float32)
property single_action_space
Type

gym.spaces.Space

The action space of a sub-environment.

>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.single_action_space
Discrete(2)
property single_observation_space
Type

gym.spaces.Space

The observation space of a sub-environment.

>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.single_action_space
Box([-4.8 ...], [4.8 ...], (4,), float32)
reset()

Reset all sub-environments and return a batch of initial observations.

Returns

A batch of observations from the vectorized environment.

Return type

element of observation_space

>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.reset()
array([[-0.04456399,  0.04653909,  0.01326909, -0.02099827],
       [ 0.03073904,  0.00145001, -0.03088818, -0.03131252],
       [ 0.03468829,  0.01500225,  0.01230312,  0.01825218]],
      dtype=float32)
step(actions)

Take an action for each sub-environments.

Parameters

actions (element of action_space) – Batch of actions.

Returns

  • observations (element of observation_space) – A batch of observations from the vectorized environment.

  • rewards (np.ndarray, dtype np.float_) – A vector of rewards from the vectorized environment.

  • dones (np.ndarray, dtype np.bool_) – A vector whose entries indicate whether the episode has ended.

  • infos (list of dict) – A list of auxiliary diagnostic information dicts from sub-environments.

>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.reset()
>>> actions = np.array([1, 0, 1])
>>> observations, rewards, dones, infos = envs.step(actions)

>>> observations
array([[ 0.00122802,  0.16228443,  0.02521779, -0.23700266],
       [ 0.00788269, -0.17490888,  0.03393489,  0.31735462],
       [ 0.04918966,  0.19421194,  0.02938497, -0.29495203]],
      dtype=float32)
>>> rewards
array([1., 1., 1.])
>>> dones
array([False, False, False])
>>> infos
({}, {}, {})
seed(seeds=None)

Set the random seed in all sub-environments.

Parameters

seeds (list of int, or int, optional) – Random seed for each sub-environment. If seeds is a list of length num_envs, then the items of the list are chosen as random seeds. If seeds is an int, then each sub-environment uses the random seed seeds + n, where n is the index of the sub-environment (between 0 and num_envs - 1).

>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.seed([1, 3, 5])
>>> envs.reset()
array([[ 0.03073904,  0.00145001, -0.03088818, -0.03131252],
       [ 0.02281231, -0.02475473,  0.02306162,  0.02072129],
       [-0.03742824, -0.02316945,  0.0148571 ,  0.0296055 ]],
      dtype=float32)
close(**kwargs)

Close all sub-environments and release resources.

It also closes all the existing image viewers, then calls close_extras() and set closed as True.

Warning

This function itself does not close the environments, it should be handled in close_extras(). This is generic for both synchronous and asynchronous vectorized environments.

Note

This will be automatically called when garbage collected or program exited.

AsyncVectorEnv

class gym.vector.AsyncVectorEnv(env_fns, observation_space=None, action_space=None, shared_memory=True, copy=True, context=None, daemon=True, worker=None)

Vectorized environment that runs multiple environments in parallel. It uses multiprocessing processes, and pipes for communication.

Parameters
  • env_fns (iterable of callable) – Functions that create the environments.

  • observation_space (gym.spaces.Space, optional) – Observation space of a single environment. If None, then the observation space of the first environment is taken.

  • action_space (gym.spaces.Space, optional) – Action space of a single environment. If None, then the action space of the first environment is taken.

  • shared_memory (bool) – If True, then the observations from the worker processes are communicated back through shared variables. This can improve the efficiency if the observations are large (e.g. images).

  • copy (bool) – If True, then the reset() and step() methods return a copy of the observations.

  • context (str, optional) – Context for multiprocessing. If None, then the default context is used.

  • daemon (bool) – If True, then subprocesses have daemon flag turned on; that is, they will quit if the head process quits. However, daemon=True prevents subprocesses to spawn children, so for some environments you may want to have it set to False.

  • worker (callable, optional) – If set, then use that worker in a subprocess instead of a default one. Can be useful to override some inner vector env logic, for instance, how resets on done are handled.

Warning

worker is an advanced mode option. It provides a high degree of flexibility and a high chance to shoot yourself in the foot; thus, if you are writing your own worker, it is recommended to start from the code for _worker (or _worker_shared_memory) method, and add changes.

Raises
  • RuntimeError – If the observation space of some sub-environment does not match observation_space (or, by default, the observation space of the first sub-environment).

  • ValueError – If observation_space is a custom space (i.e. not a default space in Gym, such as Box, Discrete, or Dict) and shared_memory is True.

Example

>>> env = gym.vector.AsyncVectorEnv([
...     lambda: gym.make("Pendulum-v0", g=9.81),
...     lambda: gym.make("Pendulum-v0", g=1.62)
... ])
>>> env.reset()
array([[-0.8286432 ,  0.5597771 ,  0.90249056],
       [-0.85009176,  0.5266346 ,  0.60007906]], dtype=float32)
reset()

Reset all sub-environments and return a batch of initial observations.

Returns

A batch of observations from the vectorized environment.

Return type

element of observation_space

Note

This is equivalent to a call to reset_async(), followed by a subsequent call to reset_wait() (with no timeout).

reset_async()

Send the calls to reset to each sub-environment.

Raises
  • ClosedEnvironmentError – If the environment was closed (if close() was previously called).

  • AlreadyPendingCallError – If the environment is already waiting for a pending call to another method (e.g. step_async()). This can be caused by two consecutive calls to reset_async(), with no call to reset_wait() in between.

reset_wait(timeout=None)

Wait for the calls to reset in each sub-environment to finish.

Parameters

timeout (int or float, optional) – Number of seconds before the call to reset_wait() times out. If None, the call to reset_wait() never times out.

Returns

A batch of observations from the vectorized environment.

Return type

element of observation_space

Raises
  • ClosedEnvironmentError – If the environment was closed (if close() was previously called).

  • NoAsyncCallError – If reset_wait() was called without any prior call to reset_async().

  • TimeoutError – If reset_wait() timed out.

step(actions)

Take an action for each sub-environments.

Parameters

actions (element of action_space) – Batch of actions.

Returns

  • observations (element of observation_space) – A batch of observations from the vectorized environment.

  • rewards (np.ndarray, dtype np.float_) – A vector of rewards from the vectorized environment.

  • dones (np.ndarray, dtype np.bool_) – A vector whose entries indicate whether the episode has ended.

  • infos (list of dict) – A list of auxiliary diagnostic information dicts from sub-environments.

Note

This is equivalent to a call to step_async(), followed by a subsequent call to step_wait() (with no timeout).

step_async(actions)

Send the calls to step to each sub-environment.

Parameters

actions (element of action_space) – Batch of actions.

Raises
  • ClosedEnvironmentError – If the environment was closed (if close() was previously called).

  • AlreadyPendingCallError – If the environment is already waiting for a pending call to another method (e.g. reset_async()). This can be caused by two consecutive calls to step_async(), with no call to step_wait() in between.

step_wait(timeout=None)

Wait for the calls to step in each sub-environment to finish.

Parameters

timeout (int or float, optional) – Number of seconds before the call to step_wait() times out. If None, the call to step_wait() never times out.

Returns

  • observations (element of observation_space) – A batch of observations from the vectorized environment.

  • rewards (np.ndarray, dtype np.float_) – A vector of rewards from the vectorized environment.

  • dones (np.ndarray, dtype np.bool_) – A vector whose entries indicate whether the episode has ended.

  • infos (list of dict) – A list of auxiliary diagnostic information dicts from sub-environments.

Raises
  • ClosedEnvironmentError – If the environment was closed (if close() was previously called).

  • NoAsyncCallError – If step_wait() was called without any prior call to step_async().

  • TimeoutError – If step_wait() timed out.

seed(seeds=None)

Set the random seed in all sub-environments.

Parameters

seeds (list of int, or int, optional) – Random seed for each sub-environment. If seeds is a list of length num_envs, then the items of the list are chosen as random seeds. If seeds is an int, then each sub-environment uses the random seed seeds + n, where n is the index of the sub-environment (between 0 and num_envs - 1).

close_extras(timeout=None, terminate=False)

Close the environments & clean up the extra resources (processes and pipes).

Parameters
  • timeout (int or float, optional) – Number of seconds before the call to close() times out. If None, the call to close() never times out. If the call to close() times out, then all processes are terminated.

  • terminate (bool) – If True, then the close() operation is forced and all processes are terminated.

Raises

TimeoutError – If close() timed out.

SyncVectorEnv

class gym.vector.SyncVectorEnv(env_fns, observation_space=None, action_space=None, copy=True)

Vectorized environment that serially runs multiple environments.

Parameters
  • env_fns (iterable of callable) – Functions that create the environments.

  • observation_space (gym.spaces.Space, optional) – Observation space of a single environment. If None, then the observation space of the first environment is taken.

  • action_space (gym.spaces.Space, optional) – Action space of a single environment. If None, then the action space of the first environment is taken.

  • copy (bool) – If True, then the reset() and step() methods return a copy of the observations.

Raises

RuntimeError – If the observation space of some sub-environment does not match observation_space (or, by default, the observation space of the first sub-environment).

Example

>>> env = gym.vector.SyncVectorEnv([
...     lambda: gym.make("Pendulum-v0", g=9.81),
...     lambda: gym.make("Pendulum-v0", g=1.62)
... ])
>>> env.reset()
array([[-0.8286432 ,  0.5597771 ,  0.90249056],
       [-0.85009176,  0.5266346 ,  0.60007906]], dtype=float32)
reset()

Reset all sub-environments and return a batch of initial observations.

Returns

A batch of observations from the vectorized environment.

Return type

element of observation_space

step(actions)

Take an action for each sub-environments.

Parameters

actions (element of action_space) – Batch of actions.

Returns

  • observations (element of observation_space) – A batch of observations from the vectorized environment.

  • rewards (np.ndarray, dtype np.float_) – A vector of rewards from the vectorized environment.

  • dones (np.ndarray, dtype np.bool_) – A vector whose entries indicate whether the episode has ended.

  • infos (list of dict) – A list of auxiliary diagnostic information dicts from sub-environments.

seed(seeds=None)

Set the random seed in all sub-environments.

Parameters

seeds (list of int, or int, optional) – Random seed for each sub-environment. If seeds is a list of length num_envs, then the items of the list are chosen as random seeds. If seeds is an int, then each sub-environment uses the random seed seeds + n, where n is the index of the sub-environment (between 0 and num_envs - 1).

close_extras(**kwargs)

Close the environments.