API Documentation

HNP

class hnp.hnp.HNP(slow_continuous_idx)

Bases: object

Class for HNP computation

__init__(slow_continuous_idx) None

Constructor for HNP object

Parameters:

slow_continuous_idx – Indices for slowly-changing continuous vars

Returns:

None

get_next_value(vtb, full_obs_index, cont_obs_index_floats)

Computes the new state value of tiles using HNP

HNP is only applied to slowly-changing continuous variables. First compute the next state tile portions from the float indices, then compute the next state value using the tile portions.

Parameters:
  • vtb – State value table

  • full_obs_index – Value table index of observation

  • cont_obs_index_floats – Continuous variable indices

Returns:

Next state value for continuous variables

Agents

class hnp.agents.Agent(env: sinergym.envs.EplusEnv, config: dict, results_dir: str = 'training_results', use_beobench: bool = False)

Bases: ABC

Parent Reinforcement Learning Agent Class

__init__(env: sinergym.envs.EplusEnv, config: dict, results_dir: str = 'training_results', use_beobench: bool = False) None

Constructor for RL agent

Parameters:
  • env – Gym environment

  • config – Agent configuration

  • results_dir – Directory to save results

  • use_beobench – Enable beobench

Returns:

None

save_results() None

Saves training result

abstract train() None

RL agent training

class hnp.agents.RandomActionAgent(env: sinergym.envs.EplusEnv, config: dict, results_dir: str = 'training_results', use_beobench: bool = False)

Bases: Agent

Random Action Agent Class

__init__(env: sinergym.envs.EplusEnv, config: dict, results_dir: str = 'training_results', use_beobench: bool = False) None

Constructor for RL agent

Parameters:
  • env – Gym environment

  • config – Agent configuration

  • results_dir – Directory to save results

  • use_beobench – Enable beobench

Returns:

None

train() None

Random Action agent training

class hnp.agents.FixedActionAgent(env: sinergym.envs.EplusEnv, config: dict, results_dir: str = 'training_results', use_beobench: bool = False)

Bases: Agent

Fixed Action Agent Class

__init__(env: sinergym.envs.EplusEnv, config: dict, results_dir: str = 'training_results', use_beobench: bool = False) None

Constructor for RL agent

Parameters:
  • env – Gym environment

  • config – Agent configuration

  • results_dir – Directory to save results

  • use_beobench – Enable beobench

Returns:

None

train() None

Fixed Action agent training

class hnp.agents.QLearningAgent(env: sinergym.envs.EplusEnv, config: dict, obs_mask: numpy.ndarray, results_dir: str = 'training_results', use_beobench: bool = False, use_hnp: bool = True)

Bases: Agent

Q-Learning Agent Class

__init__(env: sinergym.envs.EplusEnv, config: dict, obs_mask: numpy.ndarray, results_dir: str = 'training_results', use_beobench: bool = False, use_hnp: bool = True) None

Constructor for Q-Learning agent

Parameters:
  • env – Gym environment

  • config – Agent configuration

  • obs_mask – Mask to categorize variables into slowly-changing continuous, fast-changing continuous, and discrete variables

  • results_dir – Directory to save results

  • use_beobench – Enable Beobench

  • use_hnp – Enable HNP

Returns:

None

choose_action(obs_index: numpy.ndarray, mode: str = 'explore') int

Get action following epsilon-greedy policy

Parameters:
  • obs_index – Observation index

  • mode – Training or evaluation

Returns:

Action

get_act_shape() int

Get the action space shape

Returns:

Action space shape

get_next_value(obs: numpy.ndarray) tuple[float, numpy.ndarray]

Computes the new state value

If not using HNP, the new state value is retrieved from the value table directly. For HNP computation, refer to the get_next_value function in HNP class.

Parameters:

obs – Observation

Returns:

Next state value and value table index of observation

get_obs_shape() tuple

Get the observation space shape

The state space for continuous variables are tile coded based on the number of tiles set in the configuration file.

Returns:

Tuple of discretized observation space for continuous variables and the observation space for discrete variables

get_vtb_idx_from_obs(obs: numpy.ndarray) tuple[numpy.ndarray, numpy.ndarray]

Get the value table index from observation

For continuous variables, first get the indices as floats, then round to integers and stack them with discrete variable indices

Parameters:

obs – Observation

Returns:

Value table index and continuous variable indices as floats

save_results() None

Saves training result and learned Q table

train() None

Q-Learning agent training

The training follows conventional Q-Learning update rule. If the episode length is less than the total number of weather data points, then the environment would not reset by the end of the episode. The environment only resets when all the weather data points have been used up.

Environment

class hnp.environment.ObservationWrapper(*args: Any, **kwargs: Any)

Bases: ObservationWrapper

Sinergym environment wrapper to modify observations

__init__(env, obs_to_keep)

Constructor for observation wrapper

Parameters:
  • env – Sinergym environment

  • obs_to_keep – Indices of state variables that are used

Returns:

None

observation(observation)

Remove the unused state variables from observation

Parameters:

observation – Full observation

Returns:

Filtered observation

hnp.environment.create_env(env_config: dict | None = None) gym.Env

Create sinergym environment

Parameters:

env_config – Configuration kwargs for sinergym. Currently, there is only a single key in this dictionary, “name”. This sets the name of the environment.

Returns:

A configured gym environment.