rlify.agents package

This package contains all agents and their helper classes.

rlify.agents.drl_agent module

class rlify.agents.drl_agent.RL_Agent(obs_space, action_space, batch_size=256, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0)

Bases: ABC

RL_Agent is an abstract class that defines the basic structure of an RL agent. It is used as a base class for all RL agents.

TRAIN = 0
EVAL = 1
__init__(obs_space, action_space, batch_size=256, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0)
Parameters:
  • obs_space (gym.spaces) – observation space of the environment

  • action_space (gym.spaces) – action space of the environment

  • batch_size (int, optional) – batch size for training. Defaults to 256.

  • explorer (Explorer, optional) – exploration method. Defaults to RandomExplorer().

  • num_parallel_envs (int) – number of parallel environments. Defaults to 4.

  • num_epochs_per_update (int) – Training epochs per update. Defaults to 10.

  • lr (float, optional) – learning rate. Defaults to 0.0001.

  • device (torch.device, optional) – device to run on. Defaults to None.

  • experience_class (object, optional) – experience replay class. Defaults to ExperienceReplay.

  • max_mem_size (int, optional) – maximum size of the experience replay buffer. Defaults to 10e6.

  • discount_factor (float, optional) – discount factor. Defaults to 0.99.

  • reward_normalization (bool, optional) – whether to normalize the rewards by maximum absolut value. Defaults to True.

  • tensorboard_dir (str, optional) – tensorboard directory. Defaults to ‘./tensorboard’.

  • dataloader_workers (int, optional) – number of workers for the dataloader. Defaults to 0.

  • accumulate_gradients_per_epoch (bool, optional) – whether to update the model every epoch or every batch. Defaults to None - when None is set in reccurent models it will be set to True, and in normal models it will be set to False

get_train_batch_size()

Returns batch_size on normal NN Returns ceil(batch_size / num_parallel_envs) on RNN

contains_reccurent_nn()
validate_models(models)
init_tb_writer(tensorboard_dir=None)

Initializes tensorboard writer

Parameters:

tensorboard_dir (str) – tensorboard directory

abstract static get_models_input_output_shape(obs_space, action_space)

Calculates the input and output shapes of the models :type obs_space: :param obs_space: observation space :type action_space: :param action_space: action space

Returns:

dictionary containing the input and output shapes of the models

Return type:

dictionary

abstract setup_models()

Initializes the NN models

Return type:

list[Module]

abstract update_policy(trajectories_dataset)

Updates the models and according to the agnets logic

get_train_metrics()

Returns the training metrics

read_obs_space_properties()

Returns the observation space properties

read_action_space_properties()

Returns the action space properties

define_action_space(action_space)

Defines the action space

__del__()

Destructor

static read_nn_properties(ckpt_fname)
_generate_nn_save_key(model)

Generates a key for saving the model the key includes the approximated args, class type - for reproducibility and the state dict of the model :type model: Module :param model: the model to save

Returns:

dictionary containing the model’s state

Return type:

dictionary

abstract save_agent(f_name)

Saves the agent to a file.

Parameters:

f_name (str) – file name

Return type:

dict

Returns: a dictionary containing the agent’s state.

abstract load_agent(f_name)

Loads the agent from a file. Returns: a dictionary containing the agent’s state.

abstract set_train_mode()

sets the agent to train mode - all models are set to train mode

abstract set_eval_mode()

sets the agent to train mode - all models are set to eval mode

gracefully_close_envs()

A decorator that closes the environment processes in case of an exception

Parameters:

func – the function to wrap

Returns:

the wrapped function

train_episodial(*args, **kwargs)
train_n_steps(*args, **kwargs)
_train_n_iters(env, n_iters, episodes=False, max_episode_len=None, disable_tqdm=False)

Trains the agent for a given number of steps

Parameters:
  • env (gym.Env) – the environment to train on

  • n_iters (int) – number of steps/episodes to train

  • episodes (bool, optional) – whether to train for episodes or steps. Defaults to False.

  • max_episode_len (int, optional) – maximum episode length - truncates after that. Defaults to None.

  • disable_tqdm (bool, optional) – disable tqdm. Defaults to False.

Returns:

train rewards

abstract get_trajectories_data()

Returns the trajectories data

criterion_using_loss_flag(func, arg1, arg2, loss_flag)

Applies the function using only where loss flag is true

Parameters:
  • func – the function to apply

  • arg1 – the first argument

  • arg2 – the second argument

  • loss_flag – the loss flag

Returns:

the result of the function

apply_regularization(reg_coeff, vector, loss_flag)

Applies the criterion to the arguments

set_num_parallel_env(num_parallel_envs)

Sets the number of parallel environments

Parameters:

num_parallel_envs (int) – number of parallel environments

abstract act(observations, num_obs=1)
Parameters:
  • observations (array) – The observations to act on

  • num_obs (int) – The number of observations to act on

Return type:

array

Returns:

The selected actions (np.ndarray)

load_highest_score_agent()

Loads the highest score agent from training

get_highest_score_agent_ckpt_path()

Returns the path of the highest score agent from training

abstract best_act(observations, num_obs=1)

The highest probabilities actions in a detrminstic way

Parameters:
  • observations – The observations to act on

  • num_obs – The number of observations to act on

Returns:

The highest probabilty action to be taken in a detrministic way

norm_obs(observations)

Normalizes the observations according to the pre given normalization parameters [future api - currently not availble]

pre_process_obs_for_act(observations, num_obs)

Pre processes the observations for act

Parameters:
  • observations (ObsWrapper | dict) – The observations to act on

  • num_obs (int) – The number of observations to act on

Returns:

The pre processed observations an ObsWrapper object with the right dims

return_correct_actions_dim(actions, num_obs)

Returns the correct actions dimention

Parameters:
  • actions (array) – The selected actions

  • num_obs (int) – The number of observations to act on

close_env_procs()

Closes the environment processes

set_intrisic_reward_func(func)

sets the agents inner reward function to a custom function that takes state, action, reward and returns reward for the algorithm:

# Create some agent
agent = PPO_Agent(obs_space=env.observation_space, action_space=env.action_space tensorboard_dir = None)
def dummy_reward_func(state, action, reward):
    if state[0] > 0:
        return reward + 1
agent.set_intrisic_reward_func(dummy_reward_func)
# now train normaly
Parameters:

func (function) – a function that takes state, action, reward and returns reward for the algorithm

intrisic_reward_func(state, action, reward)

Calculates the agents inner reward

collect_episode_obs(env, max_episode_len=None, num_to_collect_in_parallel=None)

Collects observations from the environment

Parameters:
  • env (gym.env) – gym environment

  • max_episode_len (int, optional) – maximum episode length. Defaults to None.

  • num_to_collect_in_parallel (int, optional) – number of parallel environments. Defaults to None.

  • env_funcs (dict, optional) – dictionary of env functions mapping to call on the environment. Defaults to {“step”: “step”, “reset”: “reset”}.

Returns:

total reward collected

Return type:

float

abstract reset_rnn_hidden()

if agent uses rnn, when the hidden states are reset. this callback is called in many places so please impliment it in you agent

get_last_collected_experiences(number_of_episodes)

returns the last collected experiences

Parameters:

number_of_episodes (int) – number of episodes to return

clear_exp()

clears the experience replay buffer

__abstractmethods__ = frozenset({'act', 'best_act', 'get_models_input_output_shape', 'get_trajectories_data', 'load_agent', 'reset_rnn_hidden', 'save_agent', 'set_eval_mode', 'set_train_mode', 'setup_models', 'update_policy'})
__dict__ = mappingproxy({'__module__': 'rlify.agents.drl_agent', '__doc__': '\n    RL_Agent is an abstract class that defines the basic structure of an RL agent.\n    It is used as a base class for all RL agents.\n    ', 'TRAIN': 0, 'EVAL': 1, '__init__': <function RL_Agent.__init__>, 'get_train_batch_size': <function RL_Agent.get_train_batch_size>, 'contains_reccurent_nn': <function RL_Agent.contains_reccurent_nn>, 'validate_models': <function RL_Agent.validate_models>, 'init_tb_writer': <function RL_Agent.init_tb_writer>, 'get_models_input_output_shape': <staticmethod(<function RL_Agent.get_models_input_output_shape>)>, 'setup_models': <function RL_Agent.setup_models>, 'update_policy': <function RL_Agent.update_policy>, 'get_train_metrics': <function RL_Agent.get_train_metrics>, 'read_obs_space_properties': <function RL_Agent.read_obs_space_properties>, 'read_action_space_properties': <function RL_Agent.read_action_space_properties>, 'define_action_space': <function RL_Agent.define_action_space>, '__del__': <function RL_Agent.__del__>, 'read_nn_properties': <staticmethod(<function RL_Agent.read_nn_properties>)>, '_generate_nn_save_key': <function RL_Agent._generate_nn_save_key>, 'save_agent': <function RL_Agent.save_agent>, 'load_agent': <function RL_Agent.load_agent>, 'set_train_mode': <function RL_Agent.set_train_mode>, 'set_eval_mode': <function RL_Agent.set_eval_mode>, 'gracefully_close_envs': <function RL_Agent.gracefully_close_envs>, 'train_episodial': <function RL_Agent.gracefully_close_envs.<locals>.wrapper>, 'train_n_steps': <function RL_Agent.gracefully_close_envs.<locals>.wrapper>, '_train_n_iters': <function RL_Agent._train_n_iters>, 'get_trajectories_data': <function RL_Agent.get_trajectories_data>, 'criterion_using_loss_flag': <function RL_Agent.criterion_using_loss_flag>, 'apply_regularization': <function RL_Agent.apply_regularization>, 'set_num_parallel_env': <function RL_Agent.set_num_parallel_env>, 'act': <function RL_Agent.act>, 'load_highest_score_agent': <function RL_Agent.load_highest_score_agent>, 'get_highest_score_agent_ckpt_path': <function RL_Agent.get_highest_score_agent_ckpt_path>, 'best_act': <function RL_Agent.best_act>, 'norm_obs': <function RL_Agent.norm_obs>, 'pre_process_obs_for_act': <function RL_Agent.pre_process_obs_for_act>, 'return_correct_actions_dim': <function RL_Agent.return_correct_actions_dim>, 'close_env_procs': <function RL_Agent.close_env_procs>, 'set_intrisic_reward_func': <function RL_Agent.set_intrisic_reward_func>, 'intrisic_reward_func': <function RL_Agent.intrisic_reward_func>, 'collect_episode_obs': <function RL_Agent.collect_episode_obs>, 'reset_rnn_hidden': <function RL_Agent.reset_rnn_hidden>, 'get_last_collected_experiences': <function RL_Agent.get_last_collected_experiences>, 'clear_exp': <function RL_Agent.clear_exp>, 'run_env': <function RL_Agent.gracefully_close_envs.<locals>.wrapper>, '__dict__': <attribute '__dict__' of 'RL_Agent' objects>, '__weakref__': <attribute '__weakref__' of 'RL_Agent' objects>, '__abstractmethods__': frozenset({'save_agent', 'load_agent', 'act', 'get_trajectories_data', 'reset_rnn_hidden', 'update_policy', 'get_models_input_output_shape', 'set_eval_mode', 'setup_models', 'best_act', 'set_train_mode'}), '_abc_impl': <_abc._abc_data object>, '__annotations__': {}})
__module__ = 'rlify.agents.drl_agent'
__weakref__

list of weak references to the object

_abc_impl = <_abc._abc_data object>
run_env(*args, **kwargs)

rlify.agents.vdqn_agent module

class rlify.agents.vdqn_agent.DQNDataset(states, actions, rewards, returns, dones, truncated, next_states, prepare_for_rnn)

Bases: Dataset

Dataset for DQN

__init__(states, actions, rewards, returns, dones, truncated, next_states, prepare_for_rnn)
Parameters:
  • states – np.array: the states

  • actions – np.array: the actions

  • rewards – np.array: the rewards

  • returns – np.array: the returns

  • dones – np.array: the dones

  • truncated – np.array: the truncated

  • next_states – np.array: the next states

  • prepare_for_rnn – bool: whether to prepare for RNN or not

__len__()
__getitems__(idx)
__getitem__(idx)
collate_fn(batch)
__module__ = 'rlify.agents.vdqn_agent'
__parameters__ = ()
class rlify.agents.vdqn_agent.DQNData(states, actions, rewards, returns, dones, truncated, next_states, prepare_for_rnn)

Bases: IData

DQN Data

__init__(states, actions, rewards, returns, dones, truncated, next_states, prepare_for_rnn)
Parameters:
  • states – np.array: the states

  • actions – np.array: the actions

  • rewards – np.array: the rewards

  • returns – np.array: the returns

  • dones – np.array: the dones

  • truncated – np.array: the truncated

  • next_states – np.array: the next states

  • prepare_for_rnn – bool: whether to prepare for RNN or not

__abstractmethods__ = frozenset({})
__annotations__ = {}
__module__ = 'rlify.agents.vdqn_agent'
_abc_impl = <_abc._abc_data object>
class rlify.agents.vdqn_agent.VDQN_Agent(obs_space, action_space, Q_model, dqn_reg=0.0, batch_size=64, soft_exploit=True, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0, accumulate_gradients_per_epoch=None)

Bases: RL_Agent

DQN Agent

__init__(obs_space, action_space, Q_model, dqn_reg=0.0, batch_size=64, soft_exploit=True, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0, accumulate_gradients_per_epoch=None)

Example:

env_name = "CartPole-v1"
env = gym.make(env_name, render_mode=None)
models_shapes = VDQN_Agent.get_models_input_output_shape(env.observation_space, env.action_space)
Q_input_shape = models_shapes["Q_model"]["input_shape"]
Q_out_shape = models_shapes["Q_model"]["out_shape"]
Q_model = fc.FC(input_shape=Q_input_shape, out_shape=Q_out_shape)
agent = VDQN_Agent(obs_space=env.observation_space, action_space=env.action_space, batch_size=64, max_mem_size=10**5, num_parallel_envs=16,
                    lr=3e-4, Q_model=Q_model, discount_factor=0.99, target_update='hard[update_freq=10]', tensorboard_dir = None, num_epochs_per_update=2)
train_stats = agent.train_n_steps(env=env,n_steps=40000)
Parameters:
  • obs_space (gym.spaces) – The observation space.

  • action_space (gym.spaces) – The action space.

  • Q_model (BaseModel) – The Q model.

  • dqn_reg (float, optional) – The DQN regularization. Defaults to 0.0.

  • batch_size (int, optional) – The batch size. Defaults to 64.

  • soft_exploit (bool, optional) – Whether to use soft exploitation. Defaults to True.

  • explorer (Explorer, optional) – The explorer. Defaults to RandomExplorer().

  • num_parallel_envs (int, optional) – The number of parallel environments. Defaults to 4.

  • num_epochs_per_update (int, optional) – The number of epochs per update. Defaults to 10.

  • lr (float, optional) – The learning rate. Defaults to 3e-4.

  • device (str, optional) – The device. Defaults to None.

  • experience_class (object, optional) – The experience class. Defaults to ExperienceReplay.

  • max_mem_size (int, optional) – The maximum memory size. Defaults to int(10e6).

  • discount_factor (float, optional) – The discount factor. Defaults to 0.99.

  • reward_normalization (bool, optional) – Whether to normalize the rewards. Defaults to True.

  • tensorboard_dir (str, optional) – The tensorboard directory. Defaults to “./tensorboard”.

  • dataloader_workers (int, optional) – The number of dataloader workers. Defaults to 0.

  • accumulate_gradients_per_epoch (bool, optional) – Whether to accumulate gradients per epoch. Defaults to None.

check_action_space()
setup_models()

Initializes the Q Model and optimizer.

static get_models_input_output_shape(obs_space, action_space)

Calculates the input and output shapes of the models :type obs_space: :param obs_space: observation space :type action_space: :param action_space: action space

Returns:

dictionary containing the input and output shapes of the models

Return type:

dictionary

set_train_mode()

sets the agent to train mode - all models are set to train mode

set_eval_mode()

sets the agent to train mode - all models are set to eval mode

best_act(observations, num_obs=1)

The highest probabilities actions in a detrminstic way

Parameters:
  • observations – The observations to act on

  • num_obs – The number of observations to act on

Returns:

The highest probabilty action to be taken in a detrministic way

save_agent(f_name)

Saves the agent to a file.

Parameters:

f_name (str) – file name

Return type:

dict

Returns: a dictionary containing the agent’s state.

load_agent(f_name)

Loads the agent from a file. Returns: a dictionary containing the agent’s state.

contains_reccurent_nn()
act_base(observations, num_obs=1)

Returns the Q values for the given observations.

Parameters:
  • observations (np.array) – The observations.

  • num_obs (int, optional) – The number of observations. Defaults to 1.

Return type:

Tensor

Returns:

The Q values (torch.tensor)

act(observations, num_obs=1)
Parameters:
  • observations (array) – The observations to act on

  • num_obs (int) – The number of observations to act on

Return type:

ndarray

Returns:

The selected actions (np.ndarray)

reset_rnn_hidden()

if agent uses rnn, when the hidden states are reset. this callback is called in many places so please impliment it in you agent

get_trajectories_data()

Returns the trajectories data

_get_dqn_experiences()

loads experiences from the replay buffer and returns them as tensors.

Returns:

(states, actions, rewards, dones, truncated, next_states, returns)

Return type:

tuple

update_policy(trajectory_data)

Updates the models and according to the agnets logic

__abstractmethods__ = frozenset({})
__annotations__ = {}
__module__ = 'rlify.agents.vdqn_agent'
_abc_impl = <_abc._abc_data object>

rlify.agents.dqn_agent module

class rlify.agents.dqn_agent.DQN_Agent(obs_space, action_space, Q_model, target_update='hard[update_freq=10]', dqn_reg=0.0, batch_size=64, soft_exploit=True, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0)

Bases: VDQN_Agent

DQN Agent

__init__(obs_space, action_space, Q_model, target_update='hard[update_freq=10]', dqn_reg=0.0, batch_size=64, soft_exploit=True, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0)

Example:

env_name = "CartPole-v1"
env = gym.make(env_name, render_mode=None)
models_shapes = DQN_Agent.get_models_input_output_shape(env.observation_space, env.action_space)
Q_input_shape = models_shapes["Q_model"]["input_shape"]
Q_out_shape = models_shapes["Q_model"]["out_shape"]
Q_model = fc.FC(input_shape=Q_input_shape, out_shape=Q_out_shape)
agent = DQN_Agent(
obs_space=env.observation_space,
action_space=env.action_space,
Q_model=Q_model,
batch_size=64,
max_mem_size=int(10e6),
num_parallel_envs=4,
num_epochs_per_update=10,
lr=3e-4,
discount_factor=0.99,
target_update="hard[update_freq=10]",
)
train_stats = agent.train_n_steps(env=env, n_steps=40000)
Parameters:
  • obs_space (gym.spaces) – The observation space of the environment.

  • action_space (gym.spaces) – The action space of the environment.

  • Q_model (BaseModel) – The Q-network model.

  • dqn_reg (float, optional) – The L2 regularization coefficient for the Q-network. Defaults to 0.0.

  • target_update (str, optional) – The target update rule. Defaults to “hard[update_freq=10]”.

  • batch_size (int, optional) – The batch size for training. Defaults to 64.

  • soft_exploit (bool, optional) – Whether to use soft exploitation during action selection. Defaults to True.

  • explorer (Explorer, optional) – The exploration strategy. Defaults to RandomExplorer().

  • num_parallel_envs (int, optional) – The number of parallel environments. Defaults to 4.

  • num_epochs_per_update (int, optional) – The number of epochs per update. Defaults to 10.

  • lr (float, optional) – The learning rate. Defaults to 3e-4.

  • device (str, optional) – The device to use for training. Defaults to None.

  • experience_class (object, optional) – The experience replay class. Defaults to ExperienceReplay.

  • max_mem_size (int, optional) – The maximum size of the experience replay memory. Defaults to int(10e6).

  • discount_factor (float, optional) – The discount factor for future rewards. Defaults to 0.99.

  • reward_normalization (bool, optional) – Whether to normalize rewards. Defaults to True.

  • tensorboard_dir (str, optional) – The directory to save TensorBoard logs. Defaults to “./tensorboard”.

  • dataloader_workers (int, optional) – The number of workers for the data loader. Defaults to 0.

  • accumulate_gradients_per_epoch (bool, optional) – Whether to accumulate gradients per epoch. Defaults to None.

setup_models()

Initializes the Q and target Q networks.

static get_models_input_output_shape(obs_space, action_space)

Calculates the input and output shapes of the models :type obs_space: :param obs_space: observation space :type action_space: :param action_space: action space

Returns:

dictionary containing the input and output shapes of the models

Return type:

dictionary

init_target_update_rule(target_update)

Initializes the target update rule.

Parameters:

target_update (str) – ‘soft[tau=0.01]’ or ‘hard[update_freq=10]’ target update

set_train_mode()

sets the agent to train mode - all models are set to train mode

set_eval_mode()

sets the agent to train mode - all models are set to eval mode

hard_target_update(manual_update=False)

Hard update model parameters.

Parameters:

manual_update (bool, optional) – Whether to force an update. Defaults to False - in case of force update target_update_counter is not updated.

soft_target_update()

Soft update model parameters.

save_agent(f_name)

Saves the agent to a file.

Parameters:

f_name (str) – file name

Return type:

dict

Returns: a dictionary containing the agent’s state.

load_agent(f_name)

Loads the agent from a file. Returns: a dictionary containing the agent’s state.

reset_rnn_hidden()

if agent uses rnn, when the hidden states are reset. this callback is called in many places so please impliment it in you agent

update_policy(trajectory_data)

Updates the policy. Using the DQN algorithm.

__abstractmethods__ = frozenset({})
__annotations__ = {}
__module__ = 'rlify.agents.dqn_agent'
_abc_impl = <_abc._abc_data object>

rlify.agents.ddpg_agent module

class rlify.agents.ddpg_agent.DDPG_Agent(obs_space, action_space, Q_model, Q_mle_model, target_update='soft[tau=0.005]', dqn_reg=0.0, batch_size=64, soft_exploit=True, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0)

Bases: DQN_Agent

DQN Agent

__init__(obs_space, action_space, Q_model, Q_mle_model, target_update='soft[tau=0.005]', dqn_reg=0.0, batch_size=64, soft_exploit=True, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0)

Example:

env_name = "CartPole-v1"
env = gym.make(env_name, render_mode=None)
models_shapes = DDPG_Agent.get_models_input_output_shape(                env.observation_space, env.action_space)
Q_input_shape = models_shapes["Q_model"]["input_shape"]
Q_out_shape = models_shapes["Q_model"]["out_shape"]
Q_mle_input_shape = models_shapes["Q_mle_model"]["input_shape"]
Q_mle_out_shape = models_shapes["Q_mle_model"]["out_shape"]
Q_model = fc.FC(
    input_shape=Q_input_shape,
    out_shape=Q_out_shape,
)
Q_mle_model = fc.FC(
    input_shape=Q_mle_input_shape,
    out_shape=Q_mle_out_shape,
)
agent = DDPG_Agent(obs_space=env.observation_space, action_space=                env.action_space, Q_model=Q_model, Q_mle_model=Q_mle_model)
train_stats = agent.train_n_steps(env=env_c,n_steps=40000)
Parameters:
  • obs_space (gym.spaces) – The observation space of the environment

  • action_space (gym.spaces) – The action space of the environment

  • Q_model (BaseModel) – The Q model

  • Q_mle_model (BaseModel) – The MLE model

  • target_update (str, optional) – The target update strategy.

  • "soft[tau=0.005]". (Defaults to)

  • dqn_reg (float, optional) – The regularization factor for the Q model.

  • 0.0. (Defaults to)

  • batch_size (int, optional) – The batch size. Defaults to 64.

  • soft_exploit (bool, optional) – Whether to use soft exploitation.

  • True. (Defaults to)

  • explorer (Explorer, optional) – The explorer. Defaults to RandomExplorer().

  • num_parallel_envs (int, optional) – The number of parallel environments.

  • 4. (Defaults to)

  • num_epochs_per_update (int, optional) – The number of epochs per update.

  • 10. (Defaults to)

  • lr (float, optional) – The learning rate. Defaults to 3e-4.

  • device (str, optional) – The device to use. Defaults to None.

  • experience_class (object, optional) – The experience class. Defaults to ExperienceReplay.

  • max_mem_size (int, optional) – The maximum memory size. Defaults to int(10e6).

  • discount_factor (float, optional) – The discount factor. Defaults to 0.99.

  • reward_normalization (bool, optional) – Whether to normalize the rewards.

  • True.

  • tensorboard_dir (str, optional) – The tensorboard directory. Defaults to “./tensorboard”.

  • dataloader_workers (int, optional) – The number of dataloader workers.

  • 0. (Defaults to)

  • accumulate_gradients_per_epoch (bool, optional) – Whether to accumulate gradients per epoch.

  • None. (Defaults to)

setup_models()

Initializes the Q, target Q and MLE networks.

static get_models_input_output_shape(obs_space, action_space)

Returns the input and output shapes of the Q model.

Return type:

dict

check_action_space()
set_train_mode()

sets the agent to train mode - all models are set to train mode

set_eval_mode()

sets the agent to train mode - all models are set to eval mode

hard_target_update(manual_update=False)

Hard update model parameters.

Parameters:
  • manual_update (bool, optional) – Whether to force an update,

  • update (Defaults to False - in case of force) – target_update_counter is not updated.

soft_target_update()

Soft update model parameters.

best_act(observations, num_obs=1)

The highest probabilities actions in a detrminstic way

Parameters:
  • observations – The observations to act on

  • num_obs – The number of observations to act on

Returns:

The highest probabilty action to be taken in a detrministic way

reset_rnn_hidden()

if agent uses rnn, when the hidden states are reset. this callback is called in many places so please impliment it in you agent

save_agent(f_name)

Saves the agent to a file.

Parameters:

f_name (str) – file name

Return type:

dict

Returns: a dictionary containing the agent’s state.

load_agent(f_name)

Loads the agent from a file. Returns: a dictionary containing the agent’s state.

actor_action(observations, num_obs=1, use_target=False)

Returns the actor action for a batch of observations.

Parameters:
  • observations (np.ndarray, torch.tensor) – The observations to act on

  • num_obs (int, optional) – The number of observations to act on.

  • 1. (Defaults to)

Returns:

The actions

Return type:

torch.tensor

get_actor_action_value(states, actions, use_target=False)

Returns the actor action value for a batch of observations.

Parameters:
  • states (torch.tensor) – The observations to act on

  • dones (torch.tensor) – The dones of the observations

  • actions (torch.tensor) – The actions to act on

Returns:

The actions values

Return type:

torch.tensor

act(observations, num_obs=1)
Parameters:
  • observations (array) – The observations to act on

  • num_obs (int) – The number of observations to act on

Returns:

The selected actions (np.ndarray)

update_policy(trajectory_data)

Updates the policy, using the DDPG algorithm.

__abstractmethods__ = frozenset({})
__annotations__ = {}
__module__ = 'rlify.agents.ddpg_agent'
_abc_impl = <_abc._abc_data object>

rlify.agents.ppo_agent module

class rlify.agents.ppo_agent.PPODataset(states, actions, dones, returns, advantages, logits, prepare_for_rnn)

Bases: Dataset

Dataset for PPO.

__init__(states, actions, dones, returns, advantages, logits, prepare_for_rnn)
Parameters:
  • states (np.ndarray) – The states.

  • actions (np.ndarray) – The actions.

  • dones (np.ndarray) – The dones.

  • returns (np.ndarray) – The returns.

  • advantages (np.ndarray) – The advantages.

  • logits (np.ndarray) – The logits.

  • prepare_for_rnn (bool) – Whether to prepare for RNN.

__len__()
__getitems__(idx)
__getitem__(idx)
collate_fn(batch)
__annotations__ = {}
__module__ = 'rlify.agents.ppo_agent'
__parameters__ = ()
class rlify.agents.ppo_agent.PPOData(states, actions, dones, returns, advantages, logits, prepare_for_rnn)

Bases: IData

A class for PPO data.

__init__(states, actions, dones, returns, advantages, logits, prepare_for_rnn)
Parameters:
  • states (np.ndarray) – The states.

  • actions (np.ndarray) – The actions.

  • dones (np.ndarray) – The dones.

  • returns (np.ndarray) – The returns.

  • advantages (np.ndarray) – The advantages.

  • logits (np.ndarray) – The logits.

  • prepare_for_rnn (bool) – Whether to prepare for RNN.

__abstractmethods__ = frozenset({})
__annotations__ = {}
__module__ = 'rlify.agents.ppo_agent'
_abc_impl = <_abc._abc_data object>
class rlify.agents.ppo_agent.PPO_Agent(obs_space, action_space, policy_nn, critic_nn, batch_size=1024, entropy_coeff=0.1, kl_div_thresh=0.03, clip_param=0.1, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ForgettingExperienceReplay'>, max_mem_size=1000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0, accumulate_gradients_per_epoch=None)

Bases: RL_Agent

Proximal Policy Optimization (PPO) reinforcement learning agent. Inherits from RL_Agent.

__init__(obs_space, action_space, policy_nn, critic_nn, batch_size=1024, entropy_coeff=0.1, kl_div_thresh=0.03, clip_param=0.1, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ForgettingExperienceReplay'>, max_mem_size=1000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0, accumulate_gradients_per_epoch=None)

Example:

env_name = 'Pendulum-v1'
env = gym.make(env_name, render_mode=None)
models_shapes = PPO_Agent.get_models_input_output_shape(env.observation_space, env.action_space)
policy_input_shape = models_shapes["policy_nn"]["input_shape"]
policy_out_shape = models_shapes["policy_nn"]["out_shape"]
critic_input_shape = models_shapes["critic_nn"]["input_shape"]
critic_out_shape = models_shapes["critic_nn"]["out_shape"]
policy_nn = fc.FC(input_shape=policy_input_shape, embed_dim=128, depth=3, activation=torch.nn.ReLU(), out_shape=policy_out_shape)
critic_nn = fc.FC(input_shape=critic_input_shape, embed_dim=128, depth=3, activation=torch.nn.ReLU(), out_shape=critic_out_shape)
agent = PPO_Agent(obs_space=env.observation_space, action_space=env.action_space, device=device, batch_size=1024, max_mem_size=10**5,
                num_parallel_envs=4, lr=3e-4, entropy_coeff=0.05, policy_nn=policy_nn, critic_nn=critic_nn, discount_factor=0.99, tensorboard_dir = None)
train_stats = agent.train_n_steps(env=env,n_steps=250000)
Parameters:
  • obs_space (gym.spaces) – The observation space of the environment.

  • action_space (gym.spaces) – The action space of the environment.

  • policy_nn (nn.Module) – The policy neural network.

  • critic_nn (nn.Module) – The critic neural network.

  • batch_size (int) – The batch size for training.

  • entropy_coeff (float) – The coefficient for the entropy regularization term.

  • kl_div_thresh (float) – The threshold for the KL divergence between old and new policy.

  • clip_param (float) – The clipping parameter for the PPO loss.

  • explorer (Explorer) – The exploration strategy.

  • num_parallel_envs (int) – The number of parallel environments.

  • num_epochs_per_update (int) – The number of epochs per update.

  • lr (float) – The learning rate.

  • device (str) – The device to use for training.

  • experience_class (object) – The experience replay class.

  • max_mem_size (int) – The maximum memory size for experience replay.

  • discount_factor (float) – The discount factor for future rewards.

  • reward_normalization (bool) – Whether to normalize rewards.

  • tensorboard_dir (str) – The directory to save tensorboard logs.

  • dataloader_workers (int) – The number of workers for the dataloader.

  • accumulate_gradients_per_epoch (bool) – Whether to accumulate gradients per epoch.

set_train_mode()

sets the agent to train mode - all models are set to train mode

set_eval_mode()

sets the agent to train mode - all models are set to eval mode

static get_models_input_output_shape(obs_space, action_space)

Returns the input and output shapes of the Q model.

Return type:

dict

setup_models()

Initializes the NN models

save_agent(f_name)

Saves the agent to a file.

Parameters:

f_name (str) – file name

Return type:

dict

Returns: a dictionary containing the agent’s state.

load_agent(f_name)

Loads the agent from a file. Returns: a dictionary containing the agent’s state.

reset_rnn_hidden()

if agent uses rnn, when the hidden states are reset. this callback is called in many places so please impliment it in you agent

set_num_parallel_env(num_parallel_envs)

Sets the number of parallel environments

Parameters:

num_parallel_envs (int) – number of parallel environments

best_act(observations, num_obs=1)

The highest probabilities actions in a detrminstic way

Parameters:
  • observations – The observations to act on

  • num_obs – The number of observations to act on

Returns:

The highest probabilty action to be taken in a detrministic way

best_act_discrete(observations, num_obs=1)
best_act_cont(observations, num_obs=1)
act(observations, num_obs=1)
Parameters:
  • observations – The observations to act on

  • num_obs – The number of observations to act on

Returns:

The selected actions (np.ndarray)

get_trajectories_data()

Returns the trajectories data

calc_logits_values(states, actions, dones)
_get_ppo_experiences(num_episodes=None)

Get the experiences for PPO :type num_episodes: :param num_episodes: Number of episodes to get. :type num_episodes: int

Returns:

(states, actions, rewards, dones, truncated, next_states)

Return type:

tuple

__abstractmethods__ = frozenset({})
__annotations__ = {}
__module__ = 'rlify.agents.ppo_agent'
_abc_impl = <_abc._abc_data object>
update_policy(trajectory_data)

Update the policy network. Args: exp (tuple): Experience tuple.

rlify.agents.heuristic_agent module

class rlify.agents.heuristic_agent.Heuristic_Agent(heuristic_func, **kwargs)

Bases: RL_Agent

A Heuristic Agent that uses a heuristic function to act.

__init__(heuristic_func, **kwargs)
Parameters:
  • heuristic_function – A function that takes in the inner_state, observation (ObsWrapper) and returns a tuple: (inner_state, action) the inner state (could be None) and the action to be taken, please notice that the actions shape is b,n_actions,action_dim Please check more ObsWrapper for more info on the observation input object

  • kwargs – Arguments for the RL_Agent base class

Example:

env_name = "CartPole-v1"
env_c = gym.make(env_name, render_mode=None)
def heuristic_func(inner_state, obs: ObsWrapper):
    # an function that does not keep inner state
    b_shape = len(obs)
    actions = np.zeros((b_shape, 1)) # single discrete action
    # just a dummy heuristic for a gym env with np.array observations (for more details about the obs object check ObsWrapper)
    # the heuristic check whether the first number of each observation is positive, if so, it returns action=1, else 0
    actions[torch.where(obs['data'][:,0] > 0)[0].cpu()] = 1
    return None, actions

agent_c = Heuristic_Agent(obs_space=env_c.observation_space, action_space=env_c.action_space, heuristic_func=heuristic_func)
reward = agent_c.run_env(env_c, best_act=True)
print("Run Reward:", reward)
setup_models()

Does nothing in this agent.

get_models_input_output_shape(action_space)

Does nothing in this agent.

set_train_mode()

sets the agent to train mode - all models are set to train mode

set_eval_mode()

sets the agent to train mode - all models are set to eval mode

save_agent(f_name)

Saves the agent to a file.

Parameters:

f_name (str) – file name

Return type:

dict

Returns: a dictionary containing the agent’s state.

load_agent(f_name)

Loads the agent from a file. Returns: a dictionary containing the agent’s state.

train(env, n_episodes)
act(observations, num_obs=1)
Parameters:
  • observations – The observations to act on

  • num_obs – The number of observations to act on

Returns:

The selected actions (np.ndarray)

best_act(observations, num_obs=1)

The highest probabilities actions in a detrminstic way

Parameters:
  • observations – The observations to act on

  • num_obs – The number of observations to act on

Returns:

The highest probabilty action to be taken in a detrministic way

reset_rnn_hidden()

reset nn hidden_state - does nothing in this agent

update_policy(trajectory)

does nothing in this agent.

get_trajectories_data(num_episodes)

Mainly for Paired Algorithm support

clear_exp()

clears the experience replay buffer

__abstractmethods__ = frozenset({})
__annotations__ = {}
__module__ = 'rlify.agents.heuristic_agent'
_abc_impl = <_abc._abc_data object>

rlify.agents.explorers module

class rlify.agents.explorers.Explorer

Bases: ABC

Abstrcat Exploration Class

__init__()
abstract explore()

Returns True if it is an exploration action time step

abstract update()

updates the exploration epsilon

abstract act(action_space, obs, num_obs)

Responsible for storing an inner state if needed(in self.inner_state attr) Returns the action to be taken

__abstractmethods__ = frozenset({'act', 'explore', 'update'})
__annotations__ = {}
__dict__ = mappingproxy({'__module__': 'rlify.agents.explorers', '__doc__': 'Abstrcat Exploration Class', '__init__': <function Explorer.__init__>, 'explore': <function Explorer.explore>, 'update': <function Explorer.update>, 'act': <function Explorer.act>, '__dict__': <attribute '__dict__' of 'Explorer' objects>, '__weakref__': <attribute '__weakref__' of 'Explorer' objects>, '__abstractmethods__': frozenset({'act', 'update', 'explore'}), '_abc_impl': <_abc._abc_data object>, '__annotations__': {}})
__module__ = 'rlify.agents.explorers'
__weakref__

list of weak references to the object

_abc_impl = <_abc._abc_data object>
class rlify.agents.explorers.RandomExplorer(exploration_epsilon=1, eps_end=0.05, eps_dec=0.01)

Bases: Explorer

Class that acts a linear exploration method

__init__(exploration_epsilon=1, eps_end=0.05, eps_dec=0.01)
Parameters:
  • exploration_epsilon (int) – The initial exploration epsilon

  • eps_end (float) – The final exploration epsilon

  • eps_dec (float) – The decay rate of the exploration epsilon

explore()

Returns True if it is an exploration action time step (randomness based on the exploration epsilon)

update()

updates the exploration epsilon in linear mode: exploration_epsilon * (1-self.eps_dec)

act(action_space, obs, num_obs)
Parameters:
  • action_space – The action space of the env

  • obs – The observation of the env

  • num_obs – The number of observations to act on

Reutns a random action from the action space

_act_discrete(action_space, obs, num_obs)
_act_cont(action_space, obs, num_obs)
__abstractmethods__ = frozenset({})
__annotations__ = {}
__module__ = 'rlify.agents.explorers'
_abc_impl = <_abc._abc_data object>
class rlify.agents.explorers.HeuristicExplorer(heuristic_function, exploration_epsilon=1, eps_end=0.05, eps_dec=0.01)

Bases: RandomExplorer

A class for custom exploration methods- defined by user in init heuristic_function(inner_state, obs) - > action

__init__(heuristic_function, exploration_epsilon=1, eps_end=0.05, eps_dec=0.01)
Parameters:

heuristic_function – A function that takes in the inner_state, observation (ObsWrapper) and returns a tuple: (inner_state, action) the inner state (could be None) and the action to be taken, please notice that the actions shape is b,n_actions,action_dim

explore()

Returns True if it is an exploration action time step (randomness based on the exploration epsilon)

update()

updates the exploration epsilon in linear mode: exploration_epsilon * (1-self.eps_dec)

act(action_space, obs, num_obs)

Call the heuristic function to get the action, also updates the inner state

__abstractmethods__ = frozenset({})
__annotations__ = {}
__module__ = 'rlify.agents.explorers'
_abc_impl = <_abc._abc_data object>

rlify.agents.action_spaces_utils module

class rlify.agents.action_spaces_utils.MCAW(lows, highs, locs_scales)

Bases: object

Multivariate Continuous Action Space wrapper

__init__(lows, highs, locs_scales)
Parameters:
  • low (list) – the lower bound of the actions

  • high (list) – the higher bound of the actions

  • locs_scales (torch.tensor) – the mean and scale of all the actions

sample(sample_shape=())
Parameters:

sample_shape (torch.Size) – the shape of the sample

Returns:

a tensor of shape (b, n_actions, sample_shape)

log_prob(actions)

calculates the log prob of each action :type actions: :param actions: a tensor of shape (b, n_actions) :type actions: torch.tensor

Returns:

a tensor of shape (b, n_actions)

property loc
property scale
entropy()

caculates the mean entropy of all the actions :returns: a tensor of shape (b, 1)

__dict__ = mappingproxy({'__module__': 'rlify.agents.action_spaces_utils', '__doc__': '\n    Multivariate Continuous Action Space wrapper\n    ', '__init__': <function MCAW.__init__>, 'sample': <function MCAW.sample>, 'log_prob': <function MCAW.log_prob>, 'loc': <property object>, 'scale': <property object>, 'entropy': <function MCAW.entropy>, '__dict__': <attribute '__dict__' of 'MCAW' objects>, '__weakref__': <attribute '__weakref__' of 'MCAW' objects>, '__annotations__': {}})
__module__ = 'rlify.agents.action_spaces_utils'
__weakref__

list of weak references to the object

class rlify.agents.action_spaces_utils.CAW(low, high, loc, scale)

Bases: Normal

Continuous Action Wrapper

__init__(low, high, loc, scale)
Parameters:
  • low (float) – the lower bound of the action

  • high (float) – the higher bound of the action

  • loc (torch.tensor) – the mean of the action

  • scale (torch.tensor) – the scale of the action

sample(sample_shape=())
Parameters:

sample_shape (torch.Size) – the shape of the sample

Returns:

a tensor of shape (b, sample_shape)

__module__ = 'rlify.agents.action_spaces_utils'
class rlify.agents.action_spaces_utils.MDA(start, possible_actions, n_actions, x)

Bases: object

Multivariate Discrete Action Space

__init__(start, possible_actions, n_actions, x)
Parameters:
  • start (np.array) – an offset for start of each action

  • possible_actions (int) – number of possible actions

  • n_actions (np.array) – number of actions for each action

  • x (torch.tensor) – the logits for each action

sample(sample_shape=())
Returns:

a tensor of shape (b, n_actions, sample_shape)

log_prob(actions)

calculates the log prob of each action :type actions: tensor :param actions: a tensor of shape (b, n_actions) :type actions: torch.tensor

Returns:

a tensor of shape (b, n_actions)

property probs

Returns: a tensor of shape (b, n_actions)

entropy()
__dict__ = mappingproxy({'__module__': 'rlify.agents.action_spaces_utils', '__doc__': '\n    Multivariate Discrete Action Space\n    ', '__init__': <function MDA.__init__>, 'sample': <function MDA.sample>, 'log_prob': <function MDA.log_prob>, 'probs': <property object>, 'entropy': <function MDA.entropy>, '__dict__': <attribute '__dict__' of 'MDA' objects>, '__weakref__': <attribute '__weakref__' of 'MDA' objects>, '__annotations__': {}})
__module__ = 'rlify.agents.action_spaces_utils'
__weakref__

list of weak references to the object

rlify.agents.agent_utils module

rlify.agents.agent_utils.pad_from_done_indices(data, dones)

Packs the data from the done indices to torch.nn.utils.rnn.PackedSequence

rlify.agents.agent_utils.pad_states_from_done_indices(data, dones)

Packs the data from the done indices to torch.nn.utils.rnn.PackedSequence

rlify.agents.agent_utils.pad_tensors_from_done_indices(data, dones)

Packs the data from the done indices to torch.nn.utils.rnn.PackedSequence

rlify.agents.agent_utils.calc_gaes(rewards, values, terminated, discount_factor=0.99, decay=0.9)

works with rewards vector which consitst of many epidsodes Return the General Advantage Estimates from the given rewards and values. Paper: https://arxiv.org/pdf/1506.02438.pdf

rlify.agents.agent_utils.calc_returns(rewards, terminated, discount_factor=0.99)

works with rewards vector which consitst of many epidsodes Return the General Advantage Estimates from the given rewards and values. Paper: https://arxiv.org/pdf/1506.02438.pdf

class rlify.agents.agent_utils.ObsShapeWraper(obs_shape)

Bases: dict

dict_types = [<class 'dict'>, <class 'gymnasium.spaces.dict.Dict'>]
__init__(obs_shape)
__dict__ = mappingproxy({'__module__': 'rlify.agents.agent_utils', 'dict_types': [<class 'dict'>, <class 'gymnasium.spaces.dict.Dict'>], '__init__': <function ObsShapeWraper.__init__>, '__dict__': <attribute '__dict__' of 'ObsShapeWraper' objects>, '__weakref__': <attribute '__weakref__' of 'ObsShapeWraper' objects>, '__doc__': None, '__annotations__': {}})
__module__ = 'rlify.agents.agent_utils'
__weakref__

list of weak references to the object

class rlify.agents.agent_utils.ObsWrapper(data=None, keep_dims=True, tensors=False)

Bases: object

A class for wrapping observations, the object is roughly a dict of np.arrays or torch.tensors A default key is ‘data’ for the main data if it in either a np.array or torch.tensor

Example:

obs = ObsWrapper({'data':np.array([1,2,3]), 'data2':np.array([4,5,6])})
print(obs['data'])
print(obs['data2'])
print(obs['data'][0])
obs = ObsWrapper(np.array([1,2,3]))
print(obs['data'])
__init__(data=None, keep_dims=True, tensors=False)
Parameters:
  • data ((dict, array, tensor)) – The data to wrap

  • keep_dims (bool) – Whether to keep the dimensions of the data, if False will add a dimension of batch to the data

  • tensors (bool) – Whether to keep the data in torch.tensor

update_shape()

Updates the shape of the object

init_from_dict(data, keep_dims, tensors)

Initializes from a dict

Parameters:
  • data – The data to initialize from

  • keep_dims – Whether to keep the dimensions of the data, if False will add a dimension of batch to the data

  • tensors – Whether to keep the data in torch.tensor

init_from_list_obsWrapper_obs(obs_list)

Initializes from a list of ObsWrapper objects :type obs_list: :param obs_list: The list of ObsWrapper objects

init_from_list_generic_data(obs_list)

Initializes from a list of generic data :type obs_list: :param obs_list: The list of generic data

_init_from_none_(keep_dims, tensors)

Initializes an object without data

__setitem__(key, value)

Sets an item in the object :type key: :param key: The key to set :type value: :param value: The value to set

__delitem__(key)

Deletes an item in the object

Parameters:

key – The key to delete

__iter__()
Returns:

an iterator over the object

__getitem__(key)
Parameters:

key – The key to get

Returns:

The relevant item in respect to the key

slice_tensors(key)
Parameters:

key – The key to get

Returns:

The sliced tensors

keys()
Returns:

the keys of the object

items()
Returns:

the items of the object

values()
Returns:

the values of the object

__len__()
Returns:

The length of the object

__str__()

Returns the string representation of the object

Return type:

str

__repr__()

Return repr(self).

Return type:

str

__mul__(other)

Multiplies the object by another object :type other: :param other: The other object to multiply by :param multiplies key by key using <*> pointwise operator:

__add__(other)

Adds the object by another object :type other: :param other: The other object to add by :param adds key by key using <+> pointwise operator:

__neg__()

Negates the object

__sub__(other)

Subtracts the object by another object :type other: :param other: The other object to subtract by :param subtracts key by key using <-> pointwise operator:

__truediv__(other)

Divides the object by another object :type other: :param other: The other object to divide by :param divides key by key using </> pointwise operator:

unsqueeze(dim=0)
Parameters:

dim – The device to put the tensors on

Returns:

The object as tensors

squeeze(dim=0)
Parameters:

dim – The device to put the tensors on

Returns:

The object as tensors

flatten(start_dim=None, env_dim=None)
Parameters:

dim – The device to put the tensors on

Returns:

The object as tensors

get_as_tensors(device)
Parameters:

device – The device to put the tensors on

Returns:

The object as tensors

to(device, non_blocking=False)
Parameters:

device – The device to put the tensors on

Returns:

The object as tensors

stack()

stack a list of objects

cat(other, axis=0)

Concatenates the object by another object :type other: :param other: The other object to concatenate by :param concatenates key by key:

np_roll(indx, inplace=False)

Rolls the data by indx and fills the empty space with zeros - only on axis 0 :type indx: :param indx: The index to roll by :type inplace: :param inplace: Whether to do the roll inplace

Returns:

The rolled object

__dict__ = mappingproxy({'__module__': 'rlify.agents.agent_utils', '__doc__': "\n    A class for wrapping observations, the object is roughly a dict of np.arrays or torch.tensors\n    A default key is 'data' for the main data if it in either a np.array or torch.tensor\n\n    Example::\n\n            obs = ObsWrapper({'data':np.array([1,2,3]), 'data2':np.array([4,5,6])})\n            print(obs['data'])\n            print(obs['data2'])\n            print(obs['data'][0])\n            obs = ObsWrapper(np.array([1,2,3]))\n            print(obs['data'])\n    ", '__init__': <function ObsWrapper.__init__>, 'update_shape': <function ObsWrapper.update_shape>, 'init_from_dict': <function ObsWrapper.init_from_dict>, 'init_from_list_obsWrapper_obs': <function ObsWrapper.init_from_list_obsWrapper_obs>, 'init_from_list_generic_data': <function ObsWrapper.init_from_list_generic_data>, '_init_from_none_': <function ObsWrapper._init_from_none_>, '__setitem__': <function ObsWrapper.__setitem__>, '__delitem__': <function ObsWrapper.__delitem__>, '__iter__': <function ObsWrapper.__iter__>, '__getitem__': <function ObsWrapper.__getitem__>, 'slice_tensors': <function ObsWrapper.slice_tensors>, 'keys': <function ObsWrapper.keys>, 'items': <function ObsWrapper.items>, 'values': <function ObsWrapper.values>, '__len__': <function ObsWrapper.__len__>, '__str__': <function ObsWrapper.__str__>, '__repr__': <function ObsWrapper.__repr__>, '__mul__': <function ObsWrapper.__mul__>, '__add__': <function ObsWrapper.__add__>, '__neg__': <function ObsWrapper.__neg__>, '__sub__': <function ObsWrapper.__sub__>, '__truediv__': <function ObsWrapper.__truediv__>, 'unsqueeze': <function ObsWrapper.unsqueeze>, 'squeeze': <function ObsWrapper.squeeze>, 'flatten': <function ObsWrapper.flatten>, 'get_as_tensors': <function ObsWrapper.get_as_tensors>, 'to': <function ObsWrapper.to>, 'stack': <function ObsWrapper.stack>, 'cat': <function ObsWrapper.cat>, 'np_roll': <function ObsWrapper.np_roll>, '__dict__': <attribute '__dict__' of 'ObsWrapper' objects>, '__weakref__': <attribute '__weakref__' of 'ObsWrapper' objects>, '__annotations__': {}})
__module__ = 'rlify.agents.agent_utils'
__weakref__

list of weak references to the object

class rlify.agents.agent_utils.IData(dataset, prepare_for_rnn)

Bases: ABC

An abstract class for agents data

__init__(dataset, prepare_for_rnn)
Parameters:
  • dataset (Dataset) – The dataset to use

  • prepare_for_rnn – Whether to prepare the data for RNN

get_dataloader(batch_size, shuffle, num_workers)
Parameters:
  • batch_size – The batch size

  • shuffle – Whether to shuffle the data

  • num_workers – The number of workers

Returns:

A DataLoader object

__abstractmethods__ = frozenset({})
__annotations__ = {}
__dict__ = mappingproxy({'__module__': 'rlify.agents.agent_utils', '__doc__': '\n    An abstract class for agents data\n    ', '__init__': <function IData.__init__>, 'get_dataloader': <function IData.get_dataloader>, '__dict__': <attribute '__dict__' of 'IData' objects>, '__weakref__': <attribute '__weakref__' of 'IData' objects>, '__abstractmethods__': frozenset(), '_abc_impl': <_abc._abc_data object>, '__annotations__': {}})
__module__ = 'rlify.agents.agent_utils'
__weakref__

list of weak references to the object

_abc_impl = <_abc._abc_data object>
class rlify.agents.agent_utils.LambdaDataset(obs_collection, tensor_collection, dones, prepare_for_rnn)

Bases: Dataset

A dataset class for general purposes

__init__(obs_collection, tensor_collection, dones, prepare_for_rnn)
Parameters:
  • obs_collection (tuple[ObsWrapper]) – The observation collection

  • tensor_collection (tuple[tensor]) – The tensor collection

  • dones (tensor) – The dones tensor

  • prepare_for_rnn (bool) – Whether to prepare the data for RNN

__len__()
_prepare_data(obs_collection, tensor_collection, dones)

Prepares the data for the dataset in form of tensors :type obs_collection: :param obs_collection: The observation collection :type tensor_collection: :param tensor_collection: The tensor collection :type dones: :param dones: The dones tensor

Returns:

The prepared data

_pad_experiecne(obs_collection, tensor_collection, dones)

Pads the experience for RNN :type obs_collection: :param obs_collection: The observation collection :type tensor_collection: :param tensor_collection: The tensor collection :type dones: :param dones: The dones tensor

Returns:

The padded experience and loss flag loss flag is a tensor of ones where the data are not padded

__annotations__ = {}
__getitems__(idx)
__module__ = 'rlify.agents.agent_utils'
__parameters__ = ()
__getitem__(idx)
collate_fn(batch)
class rlify.agents.agent_utils.LambdaData(obs_collection, tensor_collection, dones, prepare_for_rnn)

Bases: IData

__abstractmethods__ = frozenset({})
__annotations__ = {}
__module__ = 'rlify.agents.agent_utils'
_abc_impl = <_abc._abc_data object>
__init__(obs_collection, tensor_collection, dones, prepare_for_rnn)
Parameters:
  • dataset – The dataset to use

  • prepare_for_rnn (bool) – Whether to prepare the data for RNN

class rlify.agents.agent_utils.TrainMetrics

Bases: object

__dict__ = mappingproxy({'__module__': 'rlify.agents.agent_utils', '__init__': <function TrainMetrics.__init__>, 'add': <function TrainMetrics.add>, 'on_epoch_end': <function TrainMetrics.on_epoch_end>, 'get_metrcis_df': <function TrainMetrics.get_metrcis_df>, '__iter__': <function TrainMetrics.__iter__>, '__next__': <function TrainMetrics.__next__>, '__getitem__': <function TrainMetrics.__getitem__>, '__dict__': <attribute '__dict__' of 'TrainMetrics' objects>, '__weakref__': <attribute '__weakref__' of 'TrainMetrics' objects>, '__doc__': None, '__annotations__': {}})
__module__ = 'rlify.agents.agent_utils'
__weakref__

list of weak references to the object

__init__()
Parameters:

metrics – The metrics to store.

add(metric_name, value)

Adds a metric to the metrics. :type metric_name: :param metric_name: The name of the metric. :type value: :param value: The value of the metric.

on_epoch_end()

Adds a metric to the metrics. :param metric_name: The name of the metric. :param value: The value of the metric.

get_metrcis_df()
Returns:

The metrics as a dataframe.

__iter__()
Returns:

An iterator over the metrics.

__next__()

Returns: An iterator over the metrics.

__getitem__(key)

Returns: An iterator over the metrics.