rlify.agents package

This package contains all agents and their helper classes.

rlify.agents.drl_agent module

class rlify.agents.drl_agent.RL_Agent(obs_space, action_space, batch_size=256, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0)

Bases: ABC

RL_Agent is an abstract class that defines the basic structure of an RL agent. It is used as a base class for all RL agents.

TRAIN = 0

EVAL = 1

__init__(obs_space, action_space, batch_size=256, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0)

Parameters:

obs_space (gym.spaces) – observation space of the environment
action_space (gym.spaces) – action space of the environment
batch_size (int, optional) – batch size for training. Defaults to 256.
explorer (Explorer, optional) – exploration method. Defaults to RandomExplorer().
num_parallel_envs (int) – number of parallel environments. Defaults to 4.
num_epochs_per_update (int) – Training epochs per update. Defaults to 10.
lr (float, optional) – learning rate. Defaults to 0.0001.
device (torch.device, optional) – device to run on. Defaults to None.
experience_class (object, optional) – experience replay class. Defaults to ExperienceReplay.
max_mem_size (int, optional) – maximum size of the experience replay buffer. Defaults to 10e6.
discount_factor (float, optional) – discount factor. Defaults to 0.99.
reward_normalization (bool, optional) – whether to normalize the rewards by maximum absolut value. Defaults to True.
tensorboard_dir (str, optional) – tensorboard directory. Defaults to ‘./tensorboard’.
dataloader_workers (int, optional) – number of workers for the dataloader. Defaults to 0.
accumulate_gradients_per_epoch (bool, optional) – whether to update the model every epoch or every batch. Defaults to None - when None is set in reccurent models it will be set to True, and in normal models it will be set to False

get_train_batch_size(): Returns batch_size on normal NN Returns ceil(batch_size / num_parallel_envs) on RNN

contains_reccurent_nn()

validate_models(models)

init_tb_writer(tensorboard_dir=None)

Initializes tensorboard writer

Parameters:: tensorboard_dir (str) – tensorboard directory

abstract static get_models_input_output_shape(obs_space, action_space)

Calculates the input and output shapes of the models :type obs_space: :param obs_space: observation space :type action_space: :param action_space: action space

Returns:: dictionary containing the input and output shapes of the models
Return type:: dictionary

abstract setup_models()

Initializes the NN models

Return type:: list[Module]

abstract update_policy(trajectories_dataset): Updates the models and according to the agnets logic

get_train_metrics(): Returns the training metrics

read_obs_space_properties(): Returns the observation space properties

read_action_space_properties(): Returns the action space properties

define_action_space(action_space): Defines the action space

__del__(): Destructor

static read_nn_properties(ckpt_fname)

_generate_nn_save_key(model)

Generates a key for saving the model the key includes the approximated args, class type - for reproducibility and the state dict of the model :type model: Module :param model: the model to save

Returns:: dictionary containing the model’s state
Return type:: dictionary

abstract save_agent(f_name)

Saves the agent to a file.

Parameters:: f_name (str) – file name
Return type:: dict

Returns: a dictionary containing the agent’s state.

abstract load_agent(f_name): Loads the agent from a file. Returns: a dictionary containing the agent’s state.

abstract set_train_mode(): sets the agent to train mode - all models are set to train mode

abstract set_eval_mode(): sets the agent to train mode - all models are set to eval mode

gracefully_close_envs()

A decorator that closes the environment processes in case of an exception

Parameters:: func – the function to wrap
Returns:: the wrapped function

train_episodial(*args, **kwargs)

train_n_steps(*args, **kwargs)

_train_n_iters(env, n_iters, episodes=False, max_episode_len=None, disable_tqdm=False)

Trains the agent for a given number of steps

Parameters:

env (gym.Env) – the environment to train on
n_iters (int) – number of steps/episodes to train
episodes (bool, optional) – whether to train for episodes or steps. Defaults to False.
max_episode_len (int, optional) – maximum episode length - truncates after that. Defaults to None.
disable_tqdm (bool, optional) – disable tqdm. Defaults to False.

Returns:

train rewards

abstract get_trajectories_data(): Returns the trajectories data

criterion_using_loss_flag(func, arg1, arg2, loss_flag)

Applies the function using only where loss flag is true

Parameters:

func – the function to apply
arg1 – the first argument
arg2 – the second argument
loss_flag – the loss flag

Returns:

the result of the function

apply_regularization(reg_coeff, vector, loss_flag): Applies the criterion to the arguments

set_num_parallel_env(num_parallel_envs)

Sets the number of parallel environments

Parameters:: num_parallel_envs (int) – number of parallel environments

abstract act(observations, num_obs=1)

Parameters:

observations (array) – The observations to act on
num_obs (int) – The number of observations to act on

Return type:

array

Returns:

The selected actions (np.ndarray)

load_highest_score_agent(): Loads the highest score agent from training

get_highest_score_agent_ckpt_path(): Returns the path of the highest score agent from training

abstract best_act(observations, num_obs=1)

The highest probabilities actions in a detrminstic way

Parameters:

observations – The observations to act on
num_obs – The number of observations to act on

Returns:

The highest probabilty action to be taken in a detrministic way

norm_obs(observations): Normalizes the observations according to the pre given normalization parameters [future api - currently not availble]

pre_process_obs_for_act(observations, num_obs)

Pre processes the observations for act

Parameters:

observations (ObsWrapper | dict) – The observations to act on
num_obs (int) – The number of observations to act on

Returns:

The pre processed observations an ObsWrapper object with the right dims

return_correct_actions_dim(actions, num_obs)

Returns the correct actions dimention

Parameters:

actions (array) – The selected actions
num_obs (int) – The number of observations to act on

close_env_procs(): Closes the environment processes

set_intrisic_reward_func(func)

sets the agents inner reward function to a custom function that takes state, action, reward and returns reward for the algorithm:

# Create some agent
agent = PPO_Agent(obs_space=env.observation_space, action_space=env.action_space tensorboard_dir = None)
def dummy_reward_func(state, action, reward):
    if state[0] > 0:
        return reward + 1
agent.set_intrisic_reward_func(dummy_reward_func)
# now train normaly

Parameters:: func (function) – a function that takes state, action, reward and returns reward for the algorithm

intrisic_reward_func(state, action, reward): Calculates the agents inner reward

collect_episode_obs(env, max_episode_len=None, num_to_collect_in_parallel=None)

Collects observations from the environment

Parameters:

env (gym.env) – gym environment
max_episode_len (int, optional) – maximum episode length. Defaults to None.
num_to_collect_in_parallel (int, optional) – number of parallel environments. Defaults to None.
env_funcs (dict, optional) – dictionary of env functions mapping to call on the environment. Defaults to {“step”: “step”, “reset”: “reset”}.

Returns:

total reward collected

Return type:

float

abstract reset_rnn_hidden(): if agent uses rnn, when the hidden states are reset. this callback is called in many places so please impliment it in you agent

get_last_collected_experiences(number_of_episodes)

returns the last collected experiences

Parameters:: number_of_episodes (int) – number of episodes to return

clear_exp(): clears the experience replay buffer

__abstractmethods__ = frozenset({'act', 'best_act', 'get_models_input_output_shape', 'get_trajectories_data', 'load_agent', 'reset_rnn_hidden', 'save_agent', 'set_eval_mode', 'set_train_mode', 'setup_models', 'update_policy'})

__dict__ = mappingproxy({'__module__': 'rlify.agents.drl_agent', '__doc__': '\n RL_Agent is an abstract class that defines the basic structure of an RL agent.\n It is used as a base class for all RL agents.\n ', 'TRAIN': 0, 'EVAL': 1, '__init__': <function RL_Agent.__init__>, 'get_train_batch_size': <function RL_Agent.get_train_batch_size>, 'contains_reccurent_nn': <function RL_Agent.contains_reccurent_nn>, 'validate_models': <function RL_Agent.validate_models>, 'init_tb_writer': <function RL_Agent.init_tb_writer>, 'get_models_input_output_shape': <staticmethod(<function RL_Agent.get_models_input_output_shape>)>, 'setup_models': <function RL_Agent.setup_models>, 'update_policy': <function RL_Agent.update_policy>, 'get_train_metrics': <function RL_Agent.get_train_metrics>, 'read_obs_space_properties': <function RL_Agent.read_obs_space_properties>, 'read_action_space_properties': <function RL_Agent.read_action_space_properties>, 'define_action_space': <function RL_Agent.define_action_space>, '__del__': <function RL_Agent.__del__>, 'read_nn_properties': <staticmethod(<function RL_Agent.read_nn_properties>)>, '_generate_nn_save_key': <function RL_Agent._generate_nn_save_key>, 'save_agent': <function RL_Agent.save_agent>, 'load_agent': <function RL_Agent.load_agent>, 'set_train_mode': <function RL_Agent.set_train_mode>, 'set_eval_mode': <function RL_Agent.set_eval_mode>, 'gracefully_close_envs': <function RL_Agent.gracefully_close_envs>, 'train_episodial': <function RL_Agent.gracefully_close_envs.<locals>.wrapper>, 'train_n_steps': <function RL_Agent.gracefully_close_envs.<locals>.wrapper>, '_train_n_iters': <function RL_Agent._train_n_iters>, 'get_trajectories_data': <function RL_Agent.get_trajectories_data>, 'criterion_using_loss_flag': <function RL_Agent.criterion_using_loss_flag>, 'apply_regularization': <function RL_Agent.apply_regularization>, 'set_num_parallel_env': <function RL_Agent.set_num_parallel_env>, 'act': <function RL_Agent.act>, 'load_highest_score_agent': <function RL_Agent.load_highest_score_agent>, 'get_highest_score_agent_ckpt_path': <function RL_Agent.get_highest_score_agent_ckpt_path>, 'best_act': <function RL_Agent.best_act>, 'norm_obs': <function RL_Agent.norm_obs>, 'pre_process_obs_for_act': <function RL_Agent.pre_process_obs_for_act>, 'return_correct_actions_dim': <function RL_Agent.return_correct_actions_dim>, 'close_env_procs': <function RL_Agent.close_env_procs>, 'set_intrisic_reward_func': <function RL_Agent.set_intrisic_reward_func>, 'intrisic_reward_func': <function RL_Agent.intrisic_reward_func>, 'collect_episode_obs': <function RL_Agent.collect_episode_obs>, 'reset_rnn_hidden': <function RL_Agent.reset_rnn_hidden>, 'get_last_collected_experiences': <function RL_Agent.get_last_collected_experiences>, 'clear_exp': <function RL_Agent.clear_exp>, 'run_env': <function RL_Agent.gracefully_close_envs.<locals>.wrapper>, '__dict__': <attribute '__dict__' of 'RL_Agent' objects>, '__weakref__': <attribute '__weakref__' of 'RL_Agent' objects>, '__abstractmethods__': frozenset({'save_agent', 'load_agent', 'act', 'get_trajectories_data', 'reset_rnn_hidden', 'update_policy', 'get_models_input_output_shape', 'set_eval_mode', 'setup_models', 'best_act', 'set_train_mode'}), '_abc_impl': <_abc._abc_data object>, '__annotations__': {}})

__module__ = 'rlify.agents.drl_agent'

__weakref__: list of weak references to the object

_abc_impl = <_abc._abc_data object>

run_env(*args, **kwargs)

rlify.agents.vdqn_agent module

class rlify.agents.vdqn_agent.DQNDataset(states, actions, rewards, returns, dones, truncated, next_states, prepare_for_rnn)

Bases: Dataset

Dataset for DQN

__init__(states, actions, rewards, returns, dones, truncated, next_states, prepare_for_rnn)

Parameters:

states – np.array: the states
actions – np.array: the actions
rewards – np.array: the rewards
returns – np.array: the returns
dones – np.array: the dones
truncated – np.array: the truncated
next_states – np.array: the next states
prepare_for_rnn – bool: whether to prepare for RNN or not

__len__()

__getitems__(idx)

__getitem__(idx)

collate_fn(batch)

__module__ = 'rlify.agents.vdqn_agent'

__parameters__ = ()

class rlify.agents.vdqn_agent.DQNData(states, actions, rewards, returns, dones, truncated, next_states, prepare_for_rnn)

Bases: IData

DQN Data

__init__(states, actions, rewards, returns, dones, truncated, next_states, prepare_for_rnn)

Parameters:

states – np.array: the states
actions – np.array: the actions
rewards – np.array: the rewards
returns – np.array: the returns
dones – np.array: the dones
truncated – np.array: the truncated
next_states – np.array: the next states
prepare_for_rnn – bool: whether to prepare for RNN or not

__abstractmethods__ = frozenset({})

__annotations__ = {}

__module__ = 'rlify.agents.vdqn_agent'

_abc_impl = <_abc._abc_data object>

class rlify.agents.vdqn_agent.VDQN_Agent(obs_space, action_space, Q_model, dqn_reg=0.0, batch_size=64, soft_exploit=True, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0, accumulate_gradients_per_epoch=None)

Bases: RL_Agent

DQN Agent

__init__(obs_space, action_space, Q_model, dqn_reg=0.0, batch_size=64, soft_exploit=True, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0, accumulate_gradients_per_epoch=None)

Example:

env_name = "CartPole-v1"
env = gym.make(env_name, render_mode=None)
models_shapes = VDQN_Agent.get_models_input_output_shape(env.observation_space, env.action_space)
Q_input_shape = models_shapes["Q_model"]["input_shape"]
Q_out_shape = models_shapes["Q_model"]["out_shape"]
Q_model = fc.FC(input_shape=Q_input_shape, out_shape=Q_out_shape)
agent = VDQN_Agent(obs_space=env.observation_space, action_space=env.action_space, batch_size=64, max_mem_size=10**5, num_parallel_envs=16,
                    lr=3e-4, Q_model=Q_model, discount_factor=0.99, target_update='hard[update_freq=10]', tensorboard_dir = None, num_epochs_per_update=2)
train_stats = agent.train_n_steps(env=env,n_steps=40000)

Parameters:

obs_space (gym.spaces) – The observation space.
action_space (gym.spaces) – The action space.
Q_model (BaseModel) – The Q model.
dqn_reg (float, optional) – The DQN regularization. Defaults to 0.0.
batch_size (int, optional) – The batch size. Defaults to 64.
soft_exploit (bool, optional) – Whether to use soft exploitation. Defaults to True.
explorer (Explorer, optional) – The explorer. Defaults to RandomExplorer().
num_parallel_envs (int, optional) – The number of parallel environments. Defaults to 4.
num_epochs_per_update (int, optional) – The number of epochs per update. Defaults to 10.
lr (float, optional) – The learning rate. Defaults to 3e-4.
device (str, optional) – The device. Defaults to None.
experience_class (object, optional) – The experience class. Defaults to ExperienceReplay.
max_mem_size (int, optional) – The maximum memory size. Defaults to int(10e6).
discount_factor (float, optional) – The discount factor. Defaults to 0.99.
reward_normalization (bool, optional) – Whether to normalize the rewards. Defaults to True.
tensorboard_dir (str, optional) – The tensorboard directory. Defaults to “./tensorboard”.
dataloader_workers (int, optional) – The number of dataloader workers. Defaults to 0.
accumulate_gradients_per_epoch (bool, optional) – Whether to accumulate gradients per epoch. Defaults to None.

check_action_space()

setup_models(): Initializes the Q Model and optimizer.

static get_models_input_output_shape(obs_space, action_space)

Calculates the input and output shapes of the models :type obs_space: :param obs_space: observation space :type action_space: :param action_space: action space

Returns:: dictionary containing the input and output shapes of the models
Return type:: dictionary

set_train_mode(): sets the agent to train mode - all models are set to train mode

set_eval_mode(): sets the agent to train mode - all models are set to eval mode

best_act(observations, num_obs=1)

The highest probabilities actions in a detrminstic way

Parameters:

observations – The observations to act on
num_obs – The number of observations to act on

Returns:

The highest probabilty action to be taken in a detrministic way

save_agent(f_name)

Saves the agent to a file.

Parameters:: f_name (str) – file name
Return type:: dict

Returns: a dictionary containing the agent’s state.

load_agent(f_name): Loads the agent from a file. Returns: a dictionary containing the agent’s state.

contains_reccurent_nn()

act_base(observations, num_obs=1)

Returns the Q values for the given observations.

Parameters:

observations (np.array) – The observations.
num_obs (int, optional) – The number of observations. Defaults to 1.

Return type:

Tensor

Returns:

The Q values (torch.tensor)

act(observations, num_obs=1)

Parameters:

observations (array) – The observations to act on
num_obs (int) – The number of observations to act on

Return type:

ndarray

Returns:

The selected actions (np.ndarray)

reset_rnn_hidden(): if agent uses rnn, when the hidden states are reset. this callback is called in many places so please impliment it in you agent

get_trajectories_data(): Returns the trajectories data

_get_dqn_experiences()

loads experiences from the replay buffer and returns them as tensors.

Returns:: (states, actions, rewards, dones, truncated, next_states, returns)
Return type:: tuple

update_policy(trajectory_data): Updates the models and according to the agnets logic

__abstractmethods__ = frozenset({})

__annotations__ = {}

__module__ = 'rlify.agents.vdqn_agent'

_abc_impl = <_abc._abc_data object>

rlify.agents.dqn_agent module

class rlify.agents.dqn_agent.DQN_Agent(obs_space, action_space, Q_model, target_update='hard[update_freq=10]', dqn_reg=0.0, batch_size=64, soft_exploit=True, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0)

Bases: VDQN_Agent

DQN Agent

__init__(obs_space, action_space, Q_model, target_update='hard[update_freq=10]', dqn_reg=0.0, batch_size=64, soft_exploit=True, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0)

Example:

env_name = "CartPole-v1"
env = gym.make(env_name, render_mode=None)
models_shapes = DQN_Agent.get_models_input_output_shape(env.observation_space, env.action_space)
Q_input_shape = models_shapes["Q_model"]["input_shape"]
Q_out_shape = models_shapes["Q_model"]["out_shape"]
Q_model = fc.FC(input_shape=Q_input_shape, out_shape=Q_out_shape)
agent = DQN_Agent(
obs_space=env.observation_space,
action_space=env.action_space,
Q_model=Q_model,
batch_size=64,
max_mem_size=int(10e6),
num_parallel_envs=4,
num_epochs_per_update=10,
lr=3e-4,
discount_factor=0.99,
target_update="hard[update_freq=10]",
)
train_stats = agent.train_n_steps(env=env, n_steps=40000)

Parameters:

obs_space (gym.spaces) – The observation space of the environment.
action_space (gym.spaces) – The action space of the environment.
Q_model (BaseModel) – The Q-network model.
dqn_reg (float, optional) – The L2 regularization coefficient for the Q-network. Defaults to 0.0.
target_update (str, optional) – The target update rule. Defaults to “hard[update_freq=10]”.
batch_size (int, optional) – The batch size for training. Defaults to 64.
soft_exploit (bool, optional) – Whether to use soft exploitation during action selection. Defaults to True.
explorer (Explorer, optional) – The exploration strategy. Defaults to RandomExplorer().
num_parallel_envs (int, optional) – The number of parallel environments. Defaults to 4.
num_epochs_per_update (int, optional) – The number of epochs per update. Defaults to 10.
lr (float, optional) – The learning rate. Defaults to 3e-4.
device (str, optional) – The device to use for training. Defaults to None.
experience_class (object, optional) – The experience replay class. Defaults to ExperienceReplay.
max_mem_size (int, optional) – The maximum size of the experience replay memory. Defaults to int(10e6).
discount_factor (float, optional) – The discount factor for future rewards. Defaults to 0.99.
reward_normalization (bool, optional) – Whether to normalize rewards. Defaults to True.
tensorboard_dir (str, optional) – The directory to save TensorBoard logs. Defaults to “./tensorboard”.
dataloader_workers (int, optional) – The number of workers for the data loader. Defaults to 0.
accumulate_gradients_per_epoch (bool, optional) – Whether to accumulate gradients per epoch. Defaults to None.

setup_models(): Initializes the Q and target Q networks.

static get_models_input_output_shape(obs_space, action_space)

Calculates the input and output shapes of the models :type obs_space: :param obs_space: observation space :type action_space: :param action_space: action space

Returns:: dictionary containing the input and output shapes of the models
Return type:: dictionary

init_target_update_rule(target_update)

Initializes the target update rule.

Parameters:: target_update (str) – ‘soft[tau=0.01]’ or ‘hard[update_freq=10]’ target update

set_train_mode(): sets the agent to train mode - all models are set to train mode

set_eval_mode(): sets the agent to train mode - all models are set to eval mode

hard_target_update(manual_update=False)

Hard update model parameters.

Parameters:: manual_update (bool, optional) – Whether to force an update. Defaults to False - in case of force update target_update_counter is not updated.

soft_target_update(): Soft update model parameters.

save_agent(f_name)

Saves the agent to a file.

Parameters:: f_name (str) – file name
Return type:: dict

Returns: a dictionary containing the agent’s state.

load_agent(f_name): Loads the agent from a file. Returns: a dictionary containing the agent’s state.

reset_rnn_hidden(): if agent uses rnn, when the hidden states are reset. this callback is called in many places so please impliment it in you agent

update_policy(trajectory_data): Updates the policy. Using the DQN algorithm.

__abstractmethods__ = frozenset({})

__annotations__ = {}

__module__ = 'rlify.agents.dqn_agent'

_abc_impl = <_abc._abc_data object>

rlify.agents.ddpg_agent module

class rlify.agents.ddpg_agent.DDPG_Agent(obs_space, action_space, Q_model, Q_mle_model, target_update='soft[tau=0.005]', dqn_reg=0.0, batch_size=64, soft_exploit=True, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0)

Bases: DQN_Agent

DQN Agent

__init__(obs_space, action_space, Q_model, Q_mle_model, target_update='soft[tau=0.005]', dqn_reg=0.0, batch_size=64, soft_exploit=True, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ExperienceReplay'>, max_mem_size=10000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0)

Example:

env_name = "CartPole-v1"
env = gym.make(env_name, render_mode=None)
models_shapes = DDPG_Agent.get_models_input_output_shape(                env.observation_space, env.action_space)
Q_input_shape = models_shapes["Q_model"]["input_shape"]
Q_out_shape = models_shapes["Q_model"]["out_shape"]
Q_mle_input_shape = models_shapes["Q_mle_model"]["input_shape"]
Q_mle_out_shape = models_shapes["Q_mle_model"]["out_shape"]
Q_model = fc.FC(
    input_shape=Q_input_shape,
    out_shape=Q_out_shape,
)
Q_mle_model = fc.FC(
    input_shape=Q_mle_input_shape,
    out_shape=Q_mle_out_shape,
)
agent = DDPG_Agent(obs_space=env.observation_space, action_space=                env.action_space, Q_model=Q_model, Q_mle_model=Q_mle_model)
train_stats = agent.train_n_steps(env=env_c,n_steps=40000)

Parameters:

obs_space (gym.spaces) – The observation space of the environment
action_space (gym.spaces) – The action space of the environment
Q_model (BaseModel) – The Q model
Q_mle_model (BaseModel) – The MLE model
target_update (str, optional) – The target update strategy.
"soft[tau=0.005]". (Defaults to)
dqn_reg (float, optional) – The regularization factor for the Q model.
0.0. (Defaults to)
batch_size (int, optional) – The batch size. Defaults to 64.
soft_exploit (bool, optional) – Whether to use soft exploitation.
True. (Defaults to)
explorer (Explorer, optional) – The explorer. Defaults to RandomExplorer().
num_parallel_envs (int, optional) – The number of parallel environments.
4. (Defaults to)
num_epochs_per_update (int, optional) – The number of epochs per update.
10. (Defaults to)
lr (float, optional) – The learning rate. Defaults to 3e-4.
device (str, optional) – The device to use. Defaults to None.
experience_class (object, optional) – The experience class. Defaults to ExperienceReplay.
max_mem_size (int, optional) – The maximum memory size. Defaults to int(10e6).
discount_factor (float, optional) – The discount factor. Defaults to 0.99.
reward_normalization (bool, optional) – Whether to normalize the rewards.
True.
tensorboard_dir (str, optional) – The tensorboard directory. Defaults to “./tensorboard”.
dataloader_workers (int, optional) – The number of dataloader workers.
0. (Defaults to)
accumulate_gradients_per_epoch (bool, optional) – Whether to accumulate gradients per epoch.
None. (Defaults to)

setup_models(): Initializes the Q, target Q and MLE networks.

static get_models_input_output_shape(obs_space, action_space)

Returns the input and output shapes of the Q model.

Return type:: dict

check_action_space()

set_train_mode(): sets the agent to train mode - all models are set to train mode

set_eval_mode(): sets the agent to train mode - all models are set to eval mode

hard_target_update(manual_update=False)

Hard update model parameters.

Parameters:

manual_update (bool, optional) – Whether to force an update,
update (Defaults to False - in case of force) – target_update_counter is not updated.

soft_target_update(): Soft update model parameters.

best_act(observations, num_obs=1)

The highest probabilities actions in a detrminstic way

Parameters:

observations – The observations to act on
num_obs – The number of observations to act on

Returns:

The highest probabilty action to be taken in a detrministic way

reset_rnn_hidden(): if agent uses rnn, when the hidden states are reset. this callback is called in many places so please impliment it in you agent

save_agent(f_name)

Saves the agent to a file.

Parameters:: f_name (str) – file name
Return type:: dict

Returns: a dictionary containing the agent’s state.

load_agent(f_name): Loads the agent from a file. Returns: a dictionary containing the agent’s state.

actor_action(observations, num_obs=1, use_target=False)

Returns the actor action for a batch of observations.

Parameters:

observations (np.ndarray, torch.tensor) – The observations to act on
num_obs (int, optional) – The number of observations to act on.
1. (Defaults to)

Returns:

The actions

Return type:

torch.tensor

get_actor_action_value(states, actions, use_target=False)

Returns the actor action value for a batch of observations.

Parameters:

states (torch.tensor) – The observations to act on
dones (torch.tensor) – The dones of the observations
actions (torch.tensor) – The actions to act on

Returns:

The actions values

Return type:

torch.tensor

act(observations, num_obs=1)

Parameters:

observations (array) – The observations to act on
num_obs (int) – The number of observations to act on

Returns:

The selected actions (np.ndarray)

update_policy(trajectory_data): Updates the policy, using the DDPG algorithm.

__abstractmethods__ = frozenset({})

__annotations__ = {}

__module__ = 'rlify.agents.ddpg_agent'

_abc_impl = <_abc._abc_data object>

rlify.agents.ppo_agent module

class rlify.agents.ppo_agent.PPODataset(states, actions, dones, returns, advantages, logits, prepare_for_rnn)

Bases: Dataset

Dataset for PPO.

__init__(states, actions, dones, returns, advantages, logits, prepare_for_rnn)

Parameters:

states (np.ndarray) – The states.
actions (np.ndarray) – The actions.
dones (np.ndarray) – The dones.
returns (np.ndarray) – The returns.
advantages (np.ndarray) – The advantages.
logits (np.ndarray) – The logits.
prepare_for_rnn (bool) – Whether to prepare for RNN.

__len__()

__getitems__(idx)

__getitem__(idx)

collate_fn(batch)

__annotations__ = {}

__module__ = 'rlify.agents.ppo_agent'

__parameters__ = ()

class rlify.agents.ppo_agent.PPOData(states, actions, dones, returns, advantages, logits, prepare_for_rnn)

Bases: IData

A class for PPO data.

__init__(states, actions, dones, returns, advantages, logits, prepare_for_rnn)

Parameters:

states (np.ndarray) – The states.
actions (np.ndarray) – The actions.
dones (np.ndarray) – The dones.
returns (np.ndarray) – The returns.
advantages (np.ndarray) – The advantages.
logits (np.ndarray) – The logits.
prepare_for_rnn (bool) – Whether to prepare for RNN.

__abstractmethods__ = frozenset({})

__annotations__ = {}

__module__ = 'rlify.agents.ppo_agent'

_abc_impl = <_abc._abc_data object>

class rlify.agents.ppo_agent.PPO_Agent(obs_space, action_space, policy_nn, critic_nn, batch_size=1024, entropy_coeff=0.1, kl_div_thresh=0.03, clip_param=0.1, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ForgettingExperienceReplay'>, max_mem_size=1000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0, accumulate_gradients_per_epoch=None)

Bases: RL_Agent

Proximal Policy Optimization (PPO) reinforcement learning agent. Inherits from RL_Agent.

__init__(obs_space, action_space, policy_nn, critic_nn, batch_size=1024, entropy_coeff=0.1, kl_div_thresh=0.03, clip_param=0.1, explorer=<rlify.agents.explorers.RandomExplorer object>, num_parallel_envs=4, num_epochs_per_update=10, lr=0.0003, device=None, experience_class=<class 'rlify.agents.experience_replay.ForgettingExperienceReplay'>, max_mem_size=1000000, discount_factor=0.99, reward_normalization=True, tensorboard_dir='./tensorboard', dataloader_workers=0, accumulate_gradients_per_epoch=None)

Example:

env_name = 'Pendulum-v1'
env = gym.make(env_name, render_mode=None)
models_shapes = PPO_Agent.get_models_input_output_shape(env.observation_space, env.action_space)
policy_input_shape = models_shapes["policy_nn"]["input_shape"]
policy_out_shape = models_shapes["policy_nn"]["out_shape"]
critic_input_shape = models_shapes["critic_nn"]["input_shape"]
critic_out_shape = models_shapes["critic_nn"]["out_shape"]
policy_nn = fc.FC(input_shape=policy_input_shape, embed_dim=128, depth=3, activation=torch.nn.ReLU(), out_shape=policy_out_shape)
critic_nn = fc.FC(input_shape=critic_input_shape, embed_dim=128, depth=3, activation=torch.nn.ReLU(), out_shape=critic_out_shape)
agent = PPO_Agent(obs_space=env.observation_space, action_space=env.action_space, device=device, batch_size=1024, max_mem_size=10**5,
                num_parallel_envs=4, lr=3e-4, entropy_coeff=0.05, policy_nn=policy_nn, critic_nn=critic_nn, discount_factor=0.99, tensorboard_dir = None)
train_stats = agent.train_n_steps(env=env,n_steps=250000)

Parameters:

obs_space (gym.spaces) – The observation space of the environment.
action_space (gym.spaces) – The action space of the environment.
policy_nn (nn.Module) – The policy neural network.
critic_nn (nn.Module) – The critic neural network.
batch_size (int) – The batch size for training.
entropy_coeff (float) – The coefficient for the entropy regularization term.
kl_div_thresh (float) – The threshold for the KL divergence between old and new policy.
clip_param (float) – The clipping parameter for the PPO loss.
explorer (Explorer) – The exploration strategy.
num_parallel_envs (int) – The number of parallel environments.
num_epochs_per_update (int) – The number of epochs per update.
lr (float) – The learning rate.
device (str) – The device to use for training.
experience_class (object) – The experience replay class.
max_mem_size (int) – The maximum memory size for experience replay.
discount_factor (float) – The discount factor for future rewards.
reward_normalization (bool) – Whether to normalize rewards.
tensorboard_dir (str) – The directory to save tensorboard logs.
dataloader_workers (int) – The number of workers for the dataloader.
accumulate_gradients_per_epoch (bool) – Whether to accumulate gradients per epoch.

set_train_mode(): sets the agent to train mode - all models are set to train mode

set_eval_mode(): sets the agent to train mode - all models are set to eval mode

static get_models_input_output_shape(obs_space, action_space)

Returns the input and output shapes of the Q model.

Return type:: dict

setup_models(): Initializes the NN models

save_agent(f_name)

Saves the agent to a file.

Parameters:: f_name (str) – file name
Return type:: dict

Returns: a dictionary containing the agent’s state.

load_agent(f_name): Loads the agent from a file. Returns: a dictionary containing the agent’s state.

reset_rnn_hidden(): if agent uses rnn, when the hidden states are reset. this callback is called in many places so please impliment it in you agent

set_num_parallel_env(num_parallel_envs)

Sets the number of parallel environments

Parameters:: num_parallel_envs (int) – number of parallel environments

best_act(observations, num_obs=1)

The highest probabilities actions in a detrminstic way

Parameters:

observations – The observations to act on
num_obs – The number of observations to act on

Returns:

The highest probabilty action to be taken in a detrministic way

best_act_discrete(observations, num_obs=1)

best_act_cont(observations, num_obs=1)

act(observations, num_obs=1)

Parameters:

observations – The observations to act on
num_obs – The number of observations to act on

Returns:

The selected actions (np.ndarray)

get_trajectories_data(): Returns the trajectories data

calc_logits_values(states, actions, dones)

_get_ppo_experiences(num_episodes=None)

Get the experiences for PPO :type num_episodes: :param num_episodes: Number of episodes to get. :type num_episodes: int

Returns:: (states, actions, rewards, dones, truncated, next_states)
Return type:: tuple

__abstractmethods__ = frozenset({})

__annotations__ = {}

__module__ = 'rlify.agents.ppo_agent'

_abc_impl = <_abc._abc_data object>

update_policy(trajectory_data): Update the policy network. Args: exp (tuple): Experience tuple.

rlify.agents.heuristic_agent module

class rlify.agents.heuristic_agent.Heuristic_Agent(heuristic_func, **kwargs)

Bases: RL_Agent

A Heuristic Agent that uses a heuristic function to act.

__init__(heuristic_func, **kwargs)

Parameters:

heuristic_function – A function that takes in the inner_state, observation (ObsWrapper) and returns a tuple: (inner_state, action) the inner state (could be None) and the action to be taken, please notice that the actions shape is b,n_actions,action_dim Please check more ObsWrapper for more info on the observation input object
kwargs – Arguments for the RL_Agent base class

Example:

env_name = "CartPole-v1"
env_c = gym.make(env_name, render_mode=None)
def heuristic_func(inner_state, obs: ObsWrapper):
    # an function that does not keep inner state
    b_shape = len(obs)
    actions = np.zeros((b_shape, 1)) # single discrete action
    # just a dummy heuristic for a gym env with np.array observations (for more details about the obs object check ObsWrapper)
    # the heuristic check whether the first number of each observation is positive, if so, it returns action=1, else 0
    actions[torch.where(obs['data'][:,0] > 0)[0].cpu()] = 1
    return None, actions

agent_c = Heuristic_Agent(obs_space=env_c.observation_space, action_space=env_c.action_space, heuristic_func=heuristic_func)
reward = agent_c.run_env(env_c, best_act=True)
print("Run Reward:", reward)

setup_models(): Does nothing in this agent.

get_models_input_output_shape(action_space): Does nothing in this agent.

set_train_mode(): sets the agent to train mode - all models are set to train mode

set_eval_mode(): sets the agent to train mode - all models are set to eval mode

save_agent(f_name)

Saves the agent to a file.

Parameters:: f_name (str) – file name
Return type:: dict

Returns: a dictionary containing the agent’s state.

load_agent(f_name): Loads the agent from a file. Returns: a dictionary containing the agent’s state.

train(env, n_episodes)

act(observations, num_obs=1)

Parameters:

observations – The observations to act on
num_obs – The number of observations to act on

Returns:

The selected actions (np.ndarray)

best_act(observations, num_obs=1)

The highest probabilities actions in a detrminstic way

Parameters:

observations – The observations to act on
num_obs – The number of observations to act on

Returns:

The highest probabilty action to be taken in a detrministic way

reset_rnn_hidden(): reset nn hidden_state - does nothing in this agent

update_policy(trajectory): does nothing in this agent.

get_trajectories_data(num_episodes): Mainly for Paired Algorithm support

clear_exp(): clears the experience replay buffer

__abstractmethods__ = frozenset({})

__annotations__ = {}

__module__ = 'rlify.agents.heuristic_agent'

_abc_impl = <_abc._abc_data object>

rlify.agents.explorers module

class rlify.agents.explorers.Explorer

Bases: ABC

Abstrcat Exploration Class

__init__()

abstract explore(): Returns True if it is an exploration action time step

abstract update(): updates the exploration epsilon

abstract act(action_space, obs, num_obs): Responsible for storing an inner state if needed(in self.inner_state attr) Returns the action to be taken

__abstractmethods__ = frozenset({'act', 'explore', 'update'})

__annotations__ = {}

__dict__ = mappingproxy({'__module__': 'rlify.agents.explorers', '__doc__': 'Abstrcat Exploration Class', '__init__': <function Explorer.__init__>, 'explore': <function Explorer.explore>, 'update': <function Explorer.update>, 'act': <function Explorer.act>, '__dict__': <attribute '__dict__' of 'Explorer' objects>, '__weakref__': <attribute '__weakref__' of 'Explorer' objects>, '__abstractmethods__': frozenset({'act', 'update', 'explore'}), '_abc_impl': <_abc._abc_data object>, '__annotations__': {}})

__module__ = 'rlify.agents.explorers'

__weakref__: list of weak references to the object

_abc_impl = <_abc._abc_data object>

class rlify.agents.explorers.RandomExplorer(exploration_epsilon=1, eps_end=0.05, eps_dec=0.01)

Bases: Explorer

Class that acts a linear exploration method

__init__(exploration_epsilon=1, eps_end=0.05, eps_dec=0.01)

Parameters:

exploration_epsilon (int) – The initial exploration epsilon
eps_end (float) – The final exploration epsilon
eps_dec (float) – The decay rate of the exploration epsilon

explore(): Returns True if it is an exploration action time step (randomness based on the exploration epsilon)

update(): updates the exploration epsilon in linear mode: exploration_epsilon * (1-self.eps_dec)

act(action_space, obs, num_obs)

Parameters:

action_space – The action space of the env
obs – The observation of the env
num_obs – The number of observations to act on

Reutns a random action from the action space

_act_discrete(action_space, obs, num_obs)

_act_cont(action_space, obs, num_obs)

__abstractmethods__ = frozenset({})

__annotations__ = {}

__module__ = 'rlify.agents.explorers'

_abc_impl = <_abc._abc_data object>

class rlify.agents.explorers.HeuristicExplorer(heuristic_function, exploration_epsilon=1, eps_end=0.05, eps_dec=0.01)

Bases: RandomExplorer

A class for custom exploration methods- defined by user in init heuristic_function(inner_state, obs) - > action

__init__(heuristic_function, exploration_epsilon=1, eps_end=0.05, eps_dec=0.01)

Parameters:: heuristic_function – A function that takes in the inner_state, observation (ObsWrapper) and returns a tuple: (inner_state, action) the inner state (could be None) and the action to be taken, please notice that the actions shape is b,n_actions,action_dim

explore(): Returns True if it is an exploration action time step (randomness based on the exploration epsilon)

update(): updates the exploration epsilon in linear mode: exploration_epsilon * (1-self.eps_dec)

act(action_space, obs, num_obs): Call the heuristic function to get the action, also updates the inner state

__abstractmethods__ = frozenset({})

__annotations__ = {}

__module__ = 'rlify.agents.explorers'

_abc_impl = <_abc._abc_data object>

rlify.agents.action_spaces_utils module

class rlify.agents.action_spaces_utils.MCAW(lows, highs, locs_scales)

Bases: object

Multivariate Continuous Action Space wrapper

__init__(lows, highs, locs_scales)

Parameters:

low (list) – the lower bound of the actions
high (list) – the higher bound of the actions
locs_scales (torch.tensor) – the mean and scale of all the actions

sample(sample_shape=())

Parameters:: sample_shape (torch.Size) – the shape of the sample
Returns:: a tensor of shape (b, n_actions, sample_shape)

log_prob(actions)

calculates the log prob of each action :type actions: :param actions: a tensor of shape (b, n_actions) :type actions: torch.tensor

Returns:: a tensor of shape (b, n_actions)

property loc

property scale

entropy(): caculates the mean entropy of all the actions :returns: a tensor of shape (b, 1)

__dict__ = mappingproxy({'__module__': 'rlify.agents.action_spaces_utils', '__doc__': '\n Multivariate Continuous Action Space wrapper\n ', '__init__': <function MCAW.__init__>, 'sample': <function MCAW.sample>, 'log_prob': <function MCAW.log_prob>, 'loc': <property object>, 'scale': <property object>, 'entropy': <function MCAW.entropy>, '__dict__': <attribute '__dict__' of 'MCAW' objects>, '__weakref__': <attribute '__weakref__' of 'MCAW' objects>, '__annotations__': {}})

__module__ = 'rlify.agents.action_spaces_utils'

__weakref__: list of weak references to the object

class rlify.agents.action_spaces_utils.CAW(low, high, loc, scale)

Bases: Normal

Continuous Action Wrapper

__init__(low, high, loc, scale)

Parameters:

low (float) – the lower bound of the action
high (float) – the higher bound of the action
loc (torch.tensor) – the mean of the action
scale (torch.tensor) – the scale of the action

sample(sample_shape=())

Parameters:: sample_shape (torch.Size) – the shape of the sample
Returns:: a tensor of shape (b, sample_shape)

__module__ = 'rlify.agents.action_spaces_utils'

class rlify.agents.action_spaces_utils.MDA(start, possible_actions, n_actions, x)

Bases: object

Multivariate Discrete Action Space

__init__(start, possible_actions, n_actions, x)

Parameters:

start (np.array) – an offset for start of each action
possible_actions (int) – number of possible actions
n_actions (np.array) – number of actions for each action
x (torch.tensor) – the logits for each action

sample(sample_shape=())

Returns:: a tensor of shape (b, n_actions, sample_shape)

log_prob(actions)

calculates the log prob of each action :type actions: tensor :param actions: a tensor of shape (b, n_actions) :type actions: torch.tensor

Returns:: a tensor of shape (b, n_actions)

property probs: Returns: a tensor of shape (b, n_actions)

entropy()

__dict__ = mappingproxy({'__module__': 'rlify.agents.action_spaces_utils', '__doc__': '\n Multivariate Discrete Action Space\n ', '__init__': <function MDA.__init__>, 'sample': <function MDA.sample>, 'log_prob': <function MDA.log_prob>, 'probs': <property object>, 'entropy': <function MDA.entropy>, '__dict__': <attribute '__dict__' of 'MDA' objects>, '__weakref__': <attribute '__weakref__' of 'MDA' objects>, '__annotations__': {}})

__module__ = 'rlify.agents.action_spaces_utils'

__weakref__: list of weak references to the object

rlify.agents.agent_utils module

rlify.agents.agent_utils.pad_from_done_indices(data, dones): Packs the data from the done indices to torch.nn.utils.rnn.PackedSequence

rlify.agents.agent_utils.pad_states_from_done_indices(data, dones): Packs the data from the done indices to torch.nn.utils.rnn.PackedSequence

rlify.agents.agent_utils.pad_tensors_from_done_indices(data, dones): Packs the data from the done indices to torch.nn.utils.rnn.PackedSequence

rlify.agents.agent_utils.calc_gaes(rewards, values, terminated, discount_factor=0.99, decay=0.9): works with rewards vector which consitst of many epidsodes Return the General Advantage Estimates from the given rewards and values. Paper: https://arxiv.org/pdf/1506.02438.pdf

rlify.agents.agent_utils.calc_returns(rewards, terminated, discount_factor=0.99): works with rewards vector which consitst of many epidsodes Return the General Advantage Estimates from the given rewards and values. Paper: https://arxiv.org/pdf/1506.02438.pdf

class rlify.agents.agent_utils.ObsShapeWraper(obs_shape)

Bases: dict

dict_types = [<class 'dict'>, <class 'gymnasium.spaces.dict.Dict'>]

__init__(obs_shape)

__dict__ = mappingproxy({'__module__': 'rlify.agents.agent_utils', 'dict_types': [<class 'dict'>, <class 'gymnasium.spaces.dict.Dict'>], '__init__': <function ObsShapeWraper.__init__>, '__dict__': <attribute '__dict__' of 'ObsShapeWraper' objects>, '__weakref__': <attribute '__weakref__' of 'ObsShapeWraper' objects>, '__doc__': None, '__annotations__': {}})

__module__ = 'rlify.agents.agent_utils'

__weakref__: list of weak references to the object

class rlify.agents.agent_utils.ObsWrapper(data=None, keep_dims=True, tensors=False)

Bases: object

A class for wrapping observations, the object is roughly a dict of np.arrays or torch.tensors A default key is ‘data’ for the main data if it in either a np.array or torch.tensor

Example:

obs = ObsWrapper({'data':np.array([1,2,3]), 'data2':np.array([4,5,6])})
print(obs['data'])
print(obs['data2'])
print(obs['data'][0])
obs = ObsWrapper(np.array([1,2,3]))
print(obs['data'])

__init__(data=None, keep_dims=True, tensors=False)

Parameters:

data ((dict, array, tensor)) – The data to wrap
keep_dims (bool) – Whether to keep the dimensions of the data, if False will add a dimension of batch to the data
tensors (bool) – Whether to keep the data in torch.tensor

update_shape(): Updates the shape of the object

init_from_dict(data, keep_dims, tensors)

Initializes from a dict

Parameters:

data – The data to initialize from
keep_dims – Whether to keep the dimensions of the data, if False will add a dimension of batch to the data
tensors – Whether to keep the data in torch.tensor

init_from_list_obsWrapper_obs(obs_list): Initializes from a list of ObsWrapper objects :type obs_list: :param obs_list: The list of ObsWrapper objects

init_from_list_generic_data(obs_list): Initializes from a list of generic data :type obs_list: :param obs_list: The list of generic data

_init_from_none_(keep_dims, tensors): Initializes an object without data

__setitem__(key, value): Sets an item in the object :type key: :param key: The key to set :type value: :param value: The value to set

__delitem__(key)

Deletes an item in the object

Parameters:: key – The key to delete

__iter__()

Returns:: an iterator over the object

__getitem__(key)

Parameters:: key – The key to get
Returns:: The relevant item in respect to the key

slice_tensors(key)

Parameters:: key – The key to get
Returns:: The sliced tensors

keys()

Returns:: the keys of the object

items()

Returns:: the items of the object

values()

Returns:: the values of the object

__len__()

Returns:: The length of the object

__str__()

Returns the string representation of the object

Return type:: str

__repr__()

Return repr(self).

Return type:: str

__mul__(other): Multiplies the object by another object :type other: :param other: The other object to multiply by :param multiplies key by key using <*> pointwise operator:

__add__(other): Adds the object by another object :type other: :param other: The other object to add by :param adds key by key using <+> pointwise operator:

__neg__(): Negates the object

__sub__(other): Subtracts the object by another object :type other: :param other: The other object to subtract by :param subtracts key by key using <-> pointwise operator:

__truediv__(other): Divides the object by another object :type other: :param other: The other object to divide by :param divides key by key using </> pointwise operator:

unsqueeze(dim=0)

Parameters:: dim – The device to put the tensors on
Returns:: The object as tensors

squeeze(dim=0)

Parameters:: dim – The device to put the tensors on
Returns:: The object as tensors

flatten(start_dim=None, env_dim=None)

Parameters:: dim – The device to put the tensors on
Returns:: The object as tensors

get_as_tensors(device)

Parameters:: device – The device to put the tensors on
Returns:: The object as tensors

to(device, non_blocking=False)

Parameters:: device – The device to put the tensors on
Returns:: The object as tensors

stack(): stack a list of objects

cat(other, axis=0): Concatenates the object by another object :type other: :param other: The other object to concatenate by :param concatenates key by key:

np_roll(indx, inplace=False)

Rolls the data by indx and fills the empty space with zeros - only on axis 0 :type indx: :param indx: The index to roll by :type inplace: :param inplace: Whether to do the roll inplace

Returns:: The rolled object

__dict__ = mappingproxy({'__module__': 'rlify.agents.agent_utils', '__doc__': "\n A class for wrapping observations, the object is roughly a dict of np.arrays or torch.tensors\n A default key is 'data' for the main data if it in either a np.array or torch.tensor\n\n Example::\n\n obs = ObsWrapper({'data':np.array([1,2,3]), 'data2':np.array([4,5,6])})\n print(obs['data'])\n print(obs['data2'])\n print(obs['data'][0])\n obs = ObsWrapper(np.array([1,2,3]))\n print(obs['data'])\n ", '__init__': <function ObsWrapper.__init__>, 'update_shape': <function ObsWrapper.update_shape>, 'init_from_dict': <function ObsWrapper.init_from_dict>, 'init_from_list_obsWrapper_obs': <function ObsWrapper.init_from_list_obsWrapper_obs>, 'init_from_list_generic_data': <function ObsWrapper.init_from_list_generic_data>, '_init_from_none_': <function ObsWrapper._init_from_none_>, '__setitem__': <function ObsWrapper.__setitem__>, '__delitem__': <function ObsWrapper.__delitem__>, '__iter__': <function ObsWrapper.__iter__>, '__getitem__': <function ObsWrapper.__getitem__>, 'slice_tensors': <function ObsWrapper.slice_tensors>, 'keys': <function ObsWrapper.keys>, 'items': <function ObsWrapper.items>, 'values': <function ObsWrapper.values>, '__len__': <function ObsWrapper.__len__>, '__str__': <function ObsWrapper.__str__>, '__repr__': <function ObsWrapper.__repr__>, '__mul__': <function ObsWrapper.__mul__>, '__add__': <function ObsWrapper.__add__>, '__neg__': <function ObsWrapper.__neg__>, '__sub__': <function ObsWrapper.__sub__>, '__truediv__': <function ObsWrapper.__truediv__>, 'unsqueeze': <function ObsWrapper.unsqueeze>, 'squeeze': <function ObsWrapper.squeeze>, 'flatten': <function ObsWrapper.flatten>, 'get_as_tensors': <function ObsWrapper.get_as_tensors>, 'to': <function ObsWrapper.to>, 'stack': <function ObsWrapper.stack>, 'cat': <function ObsWrapper.cat>, 'np_roll': <function ObsWrapper.np_roll>, '__dict__': <attribute '__dict__' of 'ObsWrapper' objects>, '__weakref__': <attribute '__weakref__' of 'ObsWrapper' objects>, '__annotations__': {}})

__module__ = 'rlify.agents.agent_utils'

__weakref__: list of weak references to the object

class rlify.agents.agent_utils.IData(dataset, prepare_for_rnn)

Bases: ABC

An abstract class for agents data

__init__(dataset, prepare_for_rnn)

Parameters:

dataset (Dataset) – The dataset to use
prepare_for_rnn – Whether to prepare the data for RNN

get_dataloader(batch_size, shuffle, num_workers)

Parameters:

batch_size – The batch size
shuffle – Whether to shuffle the data
num_workers – The number of workers

Returns:

A DataLoader object

__abstractmethods__ = frozenset({})

__annotations__ = {}

__dict__ = mappingproxy({'__module__': 'rlify.agents.agent_utils', '__doc__': '\n An abstract class for agents data\n ', '__init__': <function IData.__init__>, 'get_dataloader': <function IData.get_dataloader>, '__dict__': <attribute '__dict__' of 'IData' objects>, '__weakref__': <attribute '__weakref__' of 'IData' objects>, '__abstractmethods__': frozenset(), '_abc_impl': <_abc._abc_data object>, '__annotations__': {}})

__module__ = 'rlify.agents.agent_utils'

__weakref__: list of weak references to the object

_abc_impl = <_abc._abc_data object>

class rlify.agents.agent_utils.LambdaDataset(obs_collection, tensor_collection, dones, prepare_for_rnn)

Bases: Dataset

A dataset class for general purposes

__init__(obs_collection, tensor_collection, dones, prepare_for_rnn)

Parameters:

obs_collection (tuple[ObsWrapper]) – The observation collection
tensor_collection (tuple[tensor]) – The tensor collection
dones (tensor) – The dones tensor
prepare_for_rnn (bool) – Whether to prepare the data for RNN

__len__()

_prepare_data(obs_collection, tensor_collection, dones)

Prepares the data for the dataset in form of tensors :type obs_collection: :param obs_collection: The observation collection :type tensor_collection: :param tensor_collection: The tensor collection :type dones: :param dones: The dones tensor

Returns:: The prepared data

_pad_experiecne(obs_collection, tensor_collection, dones)

Pads the experience for RNN :type obs_collection: :param obs_collection: The observation collection :type tensor_collection: :param tensor_collection: The tensor collection :type dones: :param dones: The dones tensor

Returns:: The padded experience and loss flag loss flag is a tensor of ones where the data are not padded

__annotations__ = {}

__getitems__(idx)

__module__ = 'rlify.agents.agent_utils'

__parameters__ = ()

__getitem__(idx)

collate_fn(batch)

class rlify.agents.agent_utils.LambdaData(obs_collection, tensor_collection, dones, prepare_for_rnn)

Bases: IData

__abstractmethods__ = frozenset({})

__annotations__ = {}

__module__ = 'rlify.agents.agent_utils'

_abc_impl = <_abc._abc_data object>

__init__(obs_collection, tensor_collection, dones, prepare_for_rnn)

Parameters:

dataset – The dataset to use
prepare_for_rnn (bool) – Whether to prepare the data for RNN

class rlify.agents.agent_utils.TrainMetrics

Bases: object

__dict__ = mappingproxy({'__module__': 'rlify.agents.agent_utils', '__init__': <function TrainMetrics.__init__>, 'add': <function TrainMetrics.add>, 'on_epoch_end': <function TrainMetrics.on_epoch_end>, 'get_metrcis_df': <function TrainMetrics.get_metrcis_df>, '__iter__': <function TrainMetrics.__iter__>, '__next__': <function TrainMetrics.__next__>, '__getitem__': <function TrainMetrics.__getitem__>, '__dict__': <attribute '__dict__' of 'TrainMetrics' objects>, '__weakref__': <attribute '__weakref__' of 'TrainMetrics' objects>, '__doc__': None, '__annotations__': {}})

__module__ = 'rlify.agents.agent_utils'

__weakref__: list of weak references to the object

__init__()

Parameters:: metrics – The metrics to store.

add(metric_name, value): Adds a metric to the metrics. :type metric_name: :param metric_name: The name of the metric. :type value: :param value: The value of the metric.

on_epoch_end(): Adds a metric to the metrics. :param metric_name: The name of the metric. :param value: The value of the metric.

get_metrcis_df()

Returns:: The metrics as a dataframe.

__iter__()

Returns:: An iterator over the metrics.

__next__(): Returns: An iterator over the metrics.

__getitem__(key): Returns: An iterator over the metrics.