API Reference¶
- gym.vector.make(id, num_envs=1, asynchronous=True, wrappers=None, **kwargs)¶
Create a vectorized environment from multiple copies of an environment, from its id.
- Parameters
id (str) – The environment ID. This must be a valid ID from the registry.
num_envs (int) – Number of copies of the environment.
asynchronous (bool) – If True, wraps the environments in an
AsyncVectorEnv
(which uses multiprocessing to run the environments in parallel). IfFalse
, wraps the environments in aSyncVectorEnv
.wrappers (callable, or iterable of callables, optional) – If not
None
, then apply the wrappers to each internal environment during creation.
- Returns
The vectorized environment.
- Return type
Example
>>> env = gym.vector.make('CartPole-v1', num_envs=3) >>> env.reset() array([[-0.04456399, 0.04653909, 0.01326909, -0.02099827], [ 0.03073904, 0.00145001, -0.03088818, -0.03131252], [ 0.03468829, 0.01500225, 0.01230312, 0.01825218]], dtype=float32)
VectorEnv¶
- class gym.vector.VectorEnv(num_envs, observation_space, action_space)¶
Base class for vectorized environments.
Each observation returned from vectorized environment is a batch of observations for each sub-environment. And
step()
is also expected to receive a batch of actions for each sub-environment.Note
All sub-environments should share the identical observation and action spaces. In other words, a vector of multiple different environments is not supported.
- Parameters
num_envs (int) – Number of environments in the vectorized environment.
observation_space (
gym.spaces.Space
) – Observation space of a single environment.action_space (
gym.spaces.Space
) – Action space of a single environment.
- property action_space¶
- Type
gym.spaces.Space
The (batched) action space. The input actions of
step()
must be valid elements ofaction_space
.>>> envs = gym.vector.make("CartPole-v1", num_envs=3) >>> envs.action_space MultiDiscrete([2 2 2])
- property observation_space¶
- Type
gym.spaces.Space
The (batched) observation space. The observations returned by
reset()
andstep()
are valid elements ofobservation_space
.>>> envs = gym.vector.make("CartPole-v1", num_envs=3) >>> envs.observation_space Box([[-4.8 ...]], [[4.8 ...]], (3, 4), float32)
- property single_action_space¶
- Type
gym.spaces.Space
The action space of a sub-environment.
>>> envs = gym.vector.make("CartPole-v1", num_envs=3) >>> envs.single_action_space Discrete(2)
- property single_observation_space¶
- Type
gym.spaces.Space
The observation space of a sub-environment.
>>> envs = gym.vector.make("CartPole-v1", num_envs=3) >>> envs.single_action_space Box([-4.8 ...], [4.8 ...], (4,), float32)
- reset()¶
Reset all sub-environments and return a batch of initial observations.
- Returns
A batch of observations from the vectorized environment.
- Return type
element of
observation_space
>>> envs = gym.vector.make("CartPole-v1", num_envs=3) >>> envs.reset() array([[-0.04456399, 0.04653909, 0.01326909, -0.02099827], [ 0.03073904, 0.00145001, -0.03088818, -0.03131252], [ 0.03468829, 0.01500225, 0.01230312, 0.01825218]], dtype=float32)
- step(actions)¶
Take an action for each sub-environments.
- Parameters
actions (element of
action_space
) – Batch of actions.- Returns
observations (element of
observation_space
) – A batch of observations from the vectorized environment.rewards (
np.ndarray
, dtypenp.float_
) – A vector of rewards from the vectorized environment.dones (
np.ndarray
, dtypenp.bool_
) – A vector whose entries indicate whether the episode has ended.infos (list of dict) – A list of auxiliary diagnostic information dicts from sub-environments.
>>> envs = gym.vector.make("CartPole-v1", num_envs=3) >>> envs.reset() >>> actions = np.array([1, 0, 1]) >>> observations, rewards, dones, infos = envs.step(actions) >>> observations array([[ 0.00122802, 0.16228443, 0.02521779, -0.23700266], [ 0.00788269, -0.17490888, 0.03393489, 0.31735462], [ 0.04918966, 0.19421194, 0.02938497, -0.29495203]], dtype=float32) >>> rewards array([1., 1., 1.]) >>> dones array([False, False, False]) >>> infos ({}, {}, {})
- seed(seeds=None)¶
Set the random seed in all sub-environments.
- Parameters
seeds (list of int, or int, optional) – Random seed for each sub-environment. If
seeds
is a list of lengthnum_envs
, then the items of the list are chosen as random seeds. Ifseeds
is an int, then each sub-environment uses the random seedseeds + n
, wheren
is the index of the sub-environment (between0
andnum_envs - 1
).
>>> envs = gym.vector.make("CartPole-v1", num_envs=3) >>> envs.seed([1, 3, 5]) >>> envs.reset() array([[ 0.03073904, 0.00145001, -0.03088818, -0.03131252], [ 0.02281231, -0.02475473, 0.02306162, 0.02072129], [-0.03742824, -0.02316945, 0.0148571 , 0.0296055 ]], dtype=float32)
- close(**kwargs)¶
Close all sub-environments and release resources.
It also closes all the existing image viewers, then calls
close_extras()
and setclosed
asTrue
.Warning
This function itself does not close the environments, it should be handled in
close_extras()
. This is generic for both synchronous and asynchronous vectorized environments.Note
This will be automatically called when garbage collected or program exited.
AsyncVectorEnv¶
- class gym.vector.AsyncVectorEnv(env_fns, observation_space=None, action_space=None, shared_memory=True, copy=True, context=None, daemon=True, worker=None)¶
Vectorized environment that runs multiple environments in parallel. It uses multiprocessing processes, and pipes for communication.
- Parameters
env_fns (iterable of callable) – Functions that create the environments.
observation_space (
gym.spaces.Space
, optional) – Observation space of a single environment. IfNone
, then the observation space of the first environment is taken.action_space (
gym.spaces.Space
, optional) – Action space of a single environment. IfNone
, then the action space of the first environment is taken.shared_memory (bool) – If
True
, then the observations from the worker processes are communicated back through shared variables. This can improve the efficiency if the observations are large (e.g. images).copy (bool) – If
True
, then thereset()
andstep()
methods return a copy of the observations.context (str, optional) – Context for multiprocessing. If
None
, then the default context is used.daemon (bool) – If
True
, then subprocesses havedaemon
flag turned on; that is, they will quit if the head process quits. However,daemon=True
prevents subprocesses to spawn children, so for some environments you may want to have it set toFalse
.worker (callable, optional) – If set, then use that worker in a subprocess instead of a default one. Can be useful to override some inner vector env logic, for instance, how resets on done are handled.
Warning
worker
is an advanced mode option. It provides a high degree of flexibility and a high chance to shoot yourself in the foot; thus, if you are writing your own worker, it is recommended to start from the code for_worker
(or_worker_shared_memory
) method, and add changes.- Raises
RuntimeError – If the observation space of some sub-environment does not match
observation_space
(or, by default, the observation space of the first sub-environment).ValueError – If
observation_space
is a custom space (i.e. not a default space in Gym, such asBox
,Discrete
, orDict
) andshared_memory
isTrue
.
Example
>>> env = gym.vector.AsyncVectorEnv([ ... lambda: gym.make("Pendulum-v0", g=9.81), ... lambda: gym.make("Pendulum-v0", g=1.62) ... ]) >>> env.reset() array([[-0.8286432 , 0.5597771 , 0.90249056], [-0.85009176, 0.5266346 , 0.60007906]], dtype=float32)
- reset()¶
Reset all sub-environments and return a batch of initial observations.
- Returns
A batch of observations from the vectorized environment.
- Return type
element of
observation_space
Note
This is equivalent to a call to
reset_async()
, followed by a subsequent call toreset_wait()
(with no timeout).
- reset_async()¶
Send the calls to
reset
to each sub-environment.- Raises
ClosedEnvironmentError – If the environment was closed (if
close()
was previously called).AlreadyPendingCallError – If the environment is already waiting for a pending call to another method (e.g.
step_async()
). This can be caused by two consecutive calls toreset_async()
, with no call toreset_wait()
in between.
- reset_wait(timeout=None)¶
Wait for the calls to
reset
in each sub-environment to finish.- Parameters
timeout (int or float, optional) – Number of seconds before the call to
reset_wait()
times out. IfNone
, the call toreset_wait()
never times out.- Returns
A batch of observations from the vectorized environment.
- Return type
element of
observation_space
- Raises
ClosedEnvironmentError – If the environment was closed (if
close()
was previously called).NoAsyncCallError – If
reset_wait()
was called without any prior call toreset_async()
.TimeoutError – If
reset_wait()
timed out.
- step(actions)¶
Take an action for each sub-environments.
- Parameters
actions (element of
action_space
) – Batch of actions.- Returns
observations (element of
observation_space
) – A batch of observations from the vectorized environment.rewards (
np.ndarray
, dtypenp.float_
) – A vector of rewards from the vectorized environment.dones (
np.ndarray
, dtypenp.bool_
) – A vector whose entries indicate whether the episode has ended.infos (list of dict) – A list of auxiliary diagnostic information dicts from sub-environments.
Note
This is equivalent to a call to
step_async()
, followed by a subsequent call tostep_wait()
(with no timeout).
- step_async(actions)¶
Send the calls to
step
to each sub-environment.- Parameters
actions (element of
action_space
) – Batch of actions.- Raises
ClosedEnvironmentError – If the environment was closed (if
close()
was previously called).AlreadyPendingCallError – If the environment is already waiting for a pending call to another method (e.g.
reset_async()
). This can be caused by two consecutive calls tostep_async()
, with no call tostep_wait()
in between.
- step_wait(timeout=None)¶
Wait for the calls to
step
in each sub-environment to finish.- Parameters
timeout (int or float, optional) – Number of seconds before the call to
step_wait()
times out. IfNone
, the call tostep_wait()
never times out.- Returns
observations (element of
observation_space
) – A batch of observations from the vectorized environment.rewards (
np.ndarray
, dtypenp.float_
) – A vector of rewards from the vectorized environment.dones (
np.ndarray
, dtypenp.bool_
) – A vector whose entries indicate whether the episode has ended.infos (list of dict) – A list of auxiliary diagnostic information dicts from sub-environments.
- Raises
ClosedEnvironmentError – If the environment was closed (if
close()
was previously called).NoAsyncCallError – If
step_wait()
was called without any prior call tostep_async()
.TimeoutError – If
step_wait()
timed out.
- seed(seeds=None)¶
Set the random seed in all sub-environments.
- Parameters
seeds (list of int, or int, optional) – Random seed for each sub-environment. If
seeds
is a list of lengthnum_envs
, then the items of the list are chosen as random seeds. Ifseeds
is an int, then each sub-environment uses the random seedseeds + n
, wheren
is the index of the sub-environment (between0
andnum_envs - 1
).
- close_extras(timeout=None, terminate=False)¶
Close the environments & clean up the extra resources (processes and pipes).
- Parameters
timeout (int or float, optional) – Number of seconds before the call to
close()
times out. IfNone
, the call toclose()
never times out. If the call toclose()
times out, then all processes are terminated.terminate (bool) – If
True
, then theclose()
operation is forced and all processes are terminated.
- Raises
TimeoutError – If
close()
timed out.
SyncVectorEnv¶
- class gym.vector.SyncVectorEnv(env_fns, observation_space=None, action_space=None, copy=True)¶
Vectorized environment that serially runs multiple environments.
- Parameters
env_fns (iterable of callable) – Functions that create the environments.
observation_space (
gym.spaces.Space
, optional) – Observation space of a single environment. IfNone
, then the observation space of the first environment is taken.action_space (
gym.spaces.Space
, optional) – Action space of a single environment. IfNone
, then the action space of the first environment is taken.copy (bool) – If
True
, then thereset()
andstep()
methods return a copy of the observations.
- Raises
RuntimeError – If the observation space of some sub-environment does not match
observation_space
(or, by default, the observation space of the first sub-environment).
Example
>>> env = gym.vector.SyncVectorEnv([ ... lambda: gym.make("Pendulum-v0", g=9.81), ... lambda: gym.make("Pendulum-v0", g=1.62) ... ]) >>> env.reset() array([[-0.8286432 , 0.5597771 , 0.90249056], [-0.85009176, 0.5266346 , 0.60007906]], dtype=float32)
- reset()¶
Reset all sub-environments and return a batch of initial observations.
- Returns
A batch of observations from the vectorized environment.
- Return type
element of
observation_space
- step(actions)¶
Take an action for each sub-environments.
- Parameters
actions (element of
action_space
) – Batch of actions.- Returns
observations (element of
observation_space
) – A batch of observations from the vectorized environment.rewards (
np.ndarray
, dtypenp.float_
) – A vector of rewards from the vectorized environment.dones (
np.ndarray
, dtypenp.bool_
) – A vector whose entries indicate whether the episode has ended.infos (list of dict) – A list of auxiliary diagnostic information dicts from sub-environments.
- seed(seeds=None)¶
Set the random seed in all sub-environments.
- Parameters
seeds (list of int, or int, optional) – Random seed for each sub-environment. If
seeds
is a list of lengthnum_envs
, then the items of the list are chosen as random seeds. Ifseeds
is an int, then each sub-environment uses the random seedseeds + n
, wheren
is the index of the sub-environment (between0
andnum_envs - 1
).
- close_extras(**kwargs)¶
Close the environments.