.. automodule:: gym.vector :noindex: Getting Started =============== Creating a vectorized environment --------------------------------- To create a vectorized environment that runs multiple sub-environments, you can wrap your sub-environments inside :class:`gym.vector.SyncVectorEnv` (for sequential execution), or :class:`gym.vector.AsyncVectorEnv` (for parallel execution, with `multiprocessing`_). These vectorized environments take as input a list of callable specifying how the sub-environments are created. .. code-block:: >>> envs = gym.vector.AsyncVectorEnv([ ... lambda: gym.make("CartPole-v1"), ... lambda: gym.make("CartPole-v1"), ... lambda: gym.make("CartPole-v1") ... ]) Alternatively, to create a vectorized environment of multiple copies of the same registered sub-environment, you can use the function :func:`gym.vector.make`. .. code-block:: >>> envs = gym.vector.make("CartPole-v1", num_envs=3) # Equivalent .. note:: To enable automatic batching of actions and observations, all the sub-environments must share the same :obj:`action_space` and :obj:`observation_space`. However, all the sub-environments are not required to be exact copies of one another. For example, you can run 2 instances of ``Pendulum-v0`` with different values of the gravity in a vectorized environment with .. code-block:: >>> env = gym.vector.AsyncVectorEnv([ ... lambda: gym.make("Pendulum-v0", g=9.81), ... lambda: gym.make("Pendulum-v0", g=1.62) ... ]) See also :ref:`Observation & Action spaces` for more information about automatic batching. .. warning:: When using :class:`AsyncVectorEnv` with either the ``spawn`` or ``forkserver`` start methods, you must wrap your code containing the vectorized environment with ``if __name__ == "__main__":``. See `this documentation `_ for more information. .. code-block:: if __name__ == "__main__": envs = gym.vector.make("CartPole-v1", num_envs=3, context="spawn") Working with vectorized environments ------------------------------------ While standard Gym environments take a single action and return a single observation (with a reward, and boolean indicating termination), vectorized environments take a *batch of actions* as input, and return a *batch of observations*, together with an array of rewards and booleans indicating if the episode ended in each sub-environment. .. code-block:: >>> envs = gym.vector.make("CartPole-v1", num_envs=3) >>> envs.reset() array([[ 0.00198895, -0.00569421, -0.03170966, 0.00126465], [-0.02658334, 0.00755256, 0.04376719, -0.00266695], [-0.02898625, 0.04779156, 0.02686412, -0.01298284]], dtype=float32) >>> actions = np.array([1, 0, 1]) >>> observations, rewards, dones, infos = envs.step(actions) >>> observations array([[ 0.00187507, 0.18986781, -0.03168437, -0.301252 ], [-0.02643229, -0.18816885, 0.04371385, 0.3034975 ], [-0.02803041, 0.24251814, 0.02660446, -0.29707024]], dtype=float32) >>> rewards array([1., 1., 1.]) >>> dones array([False, False, False]) >>> infos ({}, {}, {}) Vectorized environments are compatible with any sub-environment, regardless of the action and observation spaces (e.g. container spaces like :class:`~gym.spaces.Dict`, or any arbitrarily nested spaces). In particular, vectorized environments can automatically batch the observations returned by :meth:`~VectorEnv.reset` and :meth:`~VectorEnv.step` for any standard Gym space (e.g. :class:`~gym.spaces.Box`, :class:`~gym.spaces.Discrete`, :class:`~gym.spaces.Dict`, or any nested structure thereof). Similarly, vectorized environments can take batches of actions from any standard Gym space. .. code-block:: >>> class DictEnv(gym.Env): ... observation_space = gym.spaces.Dict({ ... "position": gym.spaces.Box(-1., 1., (3,), np.float32), ... "velocity": gym.spaces.Box(-1., 1., (2,), np.float32) ... }) ... action_space = gym.spaces.Dict({ ... "fire": gym.spaces.Discrete(2), ... "jump": gym.spaces.Discrete(2), ... "acceleration": gym.spaces.Box(-1., 1., (2,), np.float32) ... }) ... ... def reset(self): ... return self.observation_space.sample() ... ... def step(self, action): ... observation = self.observation_space.sample() ... return (observation, 0., False, {}) >>> envs = gym.vector.AsyncVectorEnv([lambda: DictEnv()] * 3) >>> envs.observation_space Dict(position:Box(-1.0, 1.0, (3, 3), float32), velocity:Box(-1.0, 1.0, (3, 2), float32)) >>> envs.action_space Dict(fire:MultiDiscrete([2 2 2]), jump:MultiDiscrete([2 2 2]), acceleration:Box(-1.0, 1.0, (3, 2), float32)) >>> envs.reset() >>> actions = { ... "fire": np.array([1, 1, 0]), ... "jump": np.array([0, 1, 0]), ... "acceleration": np.random.uniform(-1., 1., size=(3, 2)) ... } >>> observations, rewards, dones, infos = envs.step(actions) >>> observations {"position": array([[-0.5337036 , 0.7439302 , 0.41748118], [ 0.9373266 , -0.5780453 , 0.8987405 ], [-0.917269 , -0.5888639 , 0.812942 ]], dtype=float32), "velocity": array([[ 0.23626241, -0.0616814 ], [-0.4057572 , -0.4875375 ], [ 0.26341468, 0.72282314]], dtype=float32)} .. note:: The sub-environments inside a vectorized environment automatically call :obj:`reset` at the end of an episode. In the following example, the episode of the 3rd sub-environment ends after 2 steps (the agent fell in a hole), and the sub-environment gets reset (observation ``0``). .. code-block:: >>> envs = gym.vector.make("FrozenLake-v1", num_envs=3, is_slippery=False) >>> envs.reset() array([0, 0, 0]) >>> observations, rewards, dones, infos = envs.step(np.array([1, 2, 2])) >>> observations, rewards, dones, infos = envs.step(np.array([1, 2, 1])) >>> dones array([False, False, True]) >>> observations array([8, 2, 0]) Observation & Action spaces --------------------------- Like any Gym environment, vectorized environments contain two properties :attr:`~VectorEnv.observation_space` and :attr:`~VectorEnv.action_space` to specify the observation and action spaces of the environment. Since vectorized environments operate on multiple sub-environments, where the observations and actions of sub-environments are batched together, the observation and action spaces are adequately batched as well so that the input actions are valid elements of :attr:`~VectorEnv.action_space`, and the observations are valid elements of :attr:`~VectorEnv.observation_space`. .. code-block:: >>> envs = gym.vector.make("CartPole-v1", num_envs=3) >>> envs.observation_space Box([[-4.8 ...]], [[4.8 ...]], (3, 4), float32) >>> envs.action_space MultiDiscrete([2 2 2]) .. note:: In order to appropriately batch the observations and actions in vectorized environments, the observation and action spaces of all the sub-environments are required to be identical. .. code-block:: >>> envs = gym.vector.AsyncVectorEnv([ ... lambda: gym.make("CartPole-v1"), ... lambda: gym.make("MountainCar-v0") ... ]) RuntimeError: Some environments have an observation space different from `Box([-4.8 ...], [4.8 ...], (4,), float32)`. In order to batch observations, the observation spaces from all environments must be equal. However, sometimes it may be handy to have access to the observation and action spaces of a sub-environment, and not the batched spaces. You can access those with the properties :attr:`~VectorEnv.single_observation_space` and :attr:`~VectorEnv.single_action_space` of the vectorized environment. .. code-block:: >>> envs = gym.vector.make("CartPole-v1", num_envs=3) >>> envs.single_observation_space Box([-4.8 ...], [4.8 ...], (4,), float32) >>> envs.single_action_space Discrete(2) This is convenient, for example, if you instantiate a policy. In the following example, we used :attr:`~VectorEnv.single_observation_space` and :attr:`~VectorEnv.single_action_space` to define the weights of a linear policy. Note that thanks to the vectorized environment, you can apply the policy directly to the whole batch of observations with a single call to :obj:`policy`. .. code-block:: >>> from gym.spaces.utils import flatdim >>> from scipy.special import softmax >>> def policy(weights, observations): ... logits = np.dot(observations, weights) ... return softmax(logits, axis=1) >>> envs = gym.vector.make("CartPole-v1", num_envs=3) >>> weights = np.random.randn( ... flatdim(envs.single_observation_space), ... envs.single_action_space.n ... ) >>> observations = envs.reset() >>> actions = policy(weights, observations).argmax(axis=1) >>> observations, rewards, dones, infos = envs.step(actions) .. _multiprocessing: https://docs.python.org/3/library/multiprocessing.html