API of PoPEx Package

The API Documentation of the package

popex

Algorithm

Utilities

Generic PoPEx utilities

utils.py contains utilities for performing a PoPEx sampling and computing predictions:

Category probabilities and kld maps
Hard conditioning data
  • generate_hd(): Computes the new hard conditioning data
  • merge_hd(): Merges prior and new hard conditioning data
  • compute_ncmod(): Computes the number of conditioning points per model type
  • compute_w_lik(): Computes the likelihood weights (used for the hard conditioning maps)
Generic functions
popex.utils.check_category(list_values, category)

check_category verifies for each value if it falls in category

Parameters:
  • list_values (ndarray) – list of scalars
  • category (list) – list of 2-tuples
Returns:

i-th elements is True if list_values[i] belongs to the category and False otherwise. category is defined as union of all intervals defined by the 2-tuples (cf. popex_objects.CatMType)

Return type:

ndarray, shape = np.shape(list_values)

popex.utils.check_interval(list_values, interval)

check_interval verifies for each value if it falls in interval

Parameters:
  • list_values (ndarray) – list of scalars
  • category (2-tuple) – interval, first value is the lower end, second is the high end
Returns:

i-th elements is True if list_values[i] falls in the interval

Return type:

ndarray, shape = np.shape(list_values)

popex.utils.compute_cat_prob(popex, weights, start=-1, stop=-1)

compute_cat_prob computes the weighted category probabilities.

The models are obtained from popex.model and weighted by weights.

Parameters:
  • popex (PoPEx) – PoPEx main structure (cf popex_objects.PoPEx)
  • weights (ndarray, shape=(nmod,)) – Relative weights of the models
  • start (int) –

    Defines the first model to take into account:

    • -1: For starting at 0
    • N: For starting at max(N, 0)
  • stop (int) –

    Defines the last model to take into account

    • -1: For stopping at popex.nmod
    • N: For stopping at min(N, popex.nmo)
Returns:

A tuple of instances that describes the category probabilities for all categorical model types.

If a model type i is not a subclass of CatMType, the corresponding map is set to None. If a model is given by (CatModel_1, …, CatModel_m) and the model values in CatModel_i are subdivided into ncat_i categories, then the return value is a (CatProb_1, …, CatProb_m) tuple where return[i].param_val is a ndarray with shape=(nparam_i, ncat_i).

Return type:

m-tuple

popex.utils.compute_entropy(p_cat)

compute_entropy computes the entropy of p_cat.

The entropy of a discrete probability distributions p = (p_1, …,p_s) is

H(p) = -sum_{i=1}^s p_i log( p_i ).

Therefore, if the probability map p_cat is a m-tuple such that p_cat[i].param_val is an ndarray of shape=(nparam_i, nfac_i), the entropy is also an m-tuple where H[i].param_val being an ndarray of shape=(nparam,).

Notes

Note that t*log(t) -> 0 as t -> 0. Therefore, H(x) = 0 wherever p_i(x) = 0.

Parameters:p_cat (m-tuple) – Tuple of CatProb instances with p_cat[i].param_val being an ndarray of shape=(nparam_i, nfac_i)
Returns:Tuple of entropy maps
return[i] : None or ContParam
Return value i is None if p_cat[i] is None, otherwise it is an instance of ContParam
Return type:m-tuple
popex.utils.compute_kld(p_cat, q_cat)

compute_kld computes the Kullback-Leibler divergence (KLD) between two category probability maps p_cat and q_cat.

The KLD between two discrete probability distributions p = (p_1, …,p_s) and q = (q_1, …,q_s) is

KLD(p||q) = sum_{i=1}^s p_i log( p_i / q_i).

Therefore, if the probability maps p_cat and q_cat are m-tuples such that p_cat[i].param_val and q_cat[i].param_val are ndarrays of shape=(nparam_i, nfac_i), the Kullback-Leibler divergence is also an m-tuple where kld[i].param_val is an ndarray of shape=(nparam,).

Notes

Note that t*log(t/a) -> 0 as t -> 0. Therefore, we require that q_i(x) = 0 implies p_i(x) = 0 in which case we can put kld(x) = 0. However, due to the (inaccurate) numerical representation of the probability maps, it is possible that q_i(x) = 0 and p_i(x) > 0 (f.e. when q has been approximated from a relative small set of models). In this case we enforce q_i(x) = p_i(x) what leads to kld(x) = 0.

Parameters:
  • p_cat (m-tuple) – Tuple of CatProb instances with p_cat[i].param_val being an ndarray of shape=(nparam_i x nfac_i)
  • q_cat (m-tuple) – Tuple of CatProb instances with q_cat[i].param_val being an ndarray of shape=(nparam_i, nfac_i)
Returns:

Tuple of kld maps

return[i] : None or ContParam

Return value i is None if p_cat[i]` is None, otherwise it is an instance of ContParam

Return type:

m-tuple

popex.utils.compute_ncmod(popex, meth_w_hd=None)

compute_ncmod computes, for each model type, the number of conditioning points.

It is assumed that the number of hard data is restricted model type-wise. Therefore, the number of conditioning points is also computed model type-wise by sampling from an uniform random variable ~U(0, popex.ncmax[imtype]).

Notes

Note that if the total sum of the likelihood values in popex.p_lik is zero, ncmod is set to zero for each model type.

Parameters:
  • popex (PoPEx) – PoPEx main structure
  • meth_w_hd (dict) – Method to compute hard conditioning weights (cf. compute_w_lik())
Returns:

Number of conditioning points per model type

Return type:

m-tuple

popex.utils.compute_subset_ind(p_frac, weights)

compute_subset_ind computes the smallest index set that covers a given percentage.

This means that the subset indices ind are such that

np.sum(weights[ind]) >= p_frac * np.sum(weights),

or in other words weights[ind] covers at least a fraction of p_frac of the total some of weights.

Parameters:
  • p_frac (float) – Coverage fraction in (0, 1]
  • weights (ndarray, shape=(nw,)) – Non-negative weights
Returns:

Subset of indices

Return type:

list

popex.utils.compute_w_lik(popex, meth=None)

compute_w_lik returns the set of normalized likelihood values.

In practice, when the likelihood values must be represented by a floating point number, it might be advantageous to compute approximations of L(m).

There are several approximation possibilities that are implemented in this version (specified in meth):

  1. No approximation (meth={‘name’: ‘exact’} or meth=None):

    L(m) = exp( ‘log_p_lik’ )

  2. Sqrt-unskewed (meth={‘name’: ‘exp_sqrt_log’})

    L(m) ~ exp( -sqrt(-‘log_p_lik’ )

  3. K-unskewed (meth={‘name’: ‘exp_sqrt_log’, ‘pow’: k})

    L(m) ~ exp( - (-‘log_p_lik’)^k )

  4. Inverse log (meth={‘name’: ‘inv_log’})

    L(m) ~ 1 / ( 1-‘log_p_lik’ )

  5. Inverse sqrt-log (meth={‘name’: ‘inv_sqrt_log’})

    L(m) ~ 1 / ( 1+sqrt(-‘log_p_lik’) ).

  6. Soft likelihood (meth={‘name’: ‘soft’, ‘fsigma’: fsigma})

    L(m) ~ exp(‘log_p_lik’/fsigma^2).

As mentioned above, these techniques aim to unskew the likelihood values.

Notes

This function is used in two different locations (with possibly two different approximation techniques): for the learning scheme in the PoPEx sampling and for computing predictions. While in first case any approximation technique can be used, the latter choice might bias the computation weights.

Parameters:
  • popex (PoPEx) – PoPEx main structure
  • meth (dict) –

    Defines the approximation method to be used. Fields are

    • 'name' : Name of the method (str)
    • 'pow' : Power for method (c) (float)
Returns:

Array of normalized weights

Return type:

ndarray, shape=(nmod,)

popex.utils.compute_w_pred(popex, nw_min=0, ibnd=-1, meth=None)

compute_w_pred returns the set of normalized predictive weights.

For assuring a minimum number of effective weights, they are computed such that

ne(w_pred) = min(nw_min, ne(w))

where w contains the weights associated to the models and ne(w) denotes the number of effective weights. This quantity can be modified by replacing w with w^alpha, where alpha > 0. A 1-d optimisation problem is used to compute the optimal alpha value.

Parameters:
  • popex (PoPEx) – PoPEx main structure
  • nw_min (int) – Mininum number of effective weights (= l_0)
  • ibnd (int) – Length of the weight array
  • meth (dict) –

    Defines the approximation method to be used (cf. compute_w_lik()) Fields are

    • 'name' : Name of the method (str)
    • 'pow' : Power for method (c) (float)
    • 'fsigma: f-sigma parametrer for method (f) ‘soft’ (float)
Returns:

Array of predictive weights

Return type:

ndarray, shape=(nmod,)

popex.utils.generate_hd(popex, meth_w_hd, ncmod, kld, p_cat, q_cat)

generate_hd generates the hard conditioning data set that is used to sample a new model.

This set of hard conditioning data does NOT include prior hard conditioning. For each model type (imtype), every hard conditioning is obtained by the following 2-steps:

  1. Sample a location [j] according to the values in the Kullback- Leibler divergence map (i.e. the values in kld[imtype].param_val)
  2. Sample a model [k] according to the weights from compute_w_lik() and directly extract the hard conditioning value from popex.model[k][imtype].param_val[j].

In addition to the hard conditioning, this function also extracts probability values from q_cat and p_cat at the conditioning location. These values represent the prior/weighted category probability of the category that corresponds to popex.model[k][imtype].param_val[j]. They can be useful to compute the sampling weight ratio.

Notes

There are two important things to note:

  1. The two objects hd_prior and hd_generation are the corresponding prior and weighted probability values of the hard conditioning CATEGORY that corresponds to the values in hd_param_val. Therefore, if they are used in the computation of the sampling weight ratio, one uses CATEGORY probabilities and NOT value probabilities.
  2. Numerical imperfections (for example in the computation of ‘q_cat’) can cause locations where p_cat > 0 but q_cat = 0. In the computation of the Kullback-Leibler divergence we did put corresponding kld values to 0 (by enforcing q_i(x) = p_i(x)) and therefore it is impossible to sample and condition such locations.
Parameters:
  • popex (PoPEx) – PoPEx main structure
  • meth_w_hd (dict) – Method to compute hard conditioning weights (cf. compute_w_lik())
  • ncmod (m-tuple) – Number of conditioning points per model type
  • kld (m-tuple) – Tuple of ContParam instances defining the Kullback-Leibler divergence
  • p_cat (m-tuple) – Tuple of CatProb instances defining the weighted category probabilities with p_cat[i].param_val being an ndarray of shape=(nparam_i, nfac_i)
  • q_cat (m-tuple) – Tuple of CatProb instances defining the weighted category probabilities with q_cat[i].param_val being an ndarray of shape=(nparam_i x nfac_i)
Returns:

  • hd_param_ind (m-tuple) – Tuple of hard conditioning indices where hard conditioning values are imposed. If there is no hard conditioning for a model type i, then hd_ind[i] is None otherwise it is an ndarray of shape=(ncmod[i], ndarray).
  • hd_param_val (m-tuple) – Tuple of hard conditioning values that are imposed at the hard conditioning indices. If there is no hard conditioning for a model type i, then hd_val[i] is None otherwise it is an ndarray of shape=(ncmod[i], ndarray).
  • hd_prior (m-tuple) – Tuple of probability values according to the prior probability maps in q_cat. Each value corresponds to the prior probability of the category that contains the extracted hard conditioning value. If there is no hard conditioning for a model type i, then hd_val[i] is None, otherwise it is an ndarray of shape=(ncmod[i], ndarray).
  • hd_generation (m-tuple) – Tuple of probability values according to the sampling probability maps in p_cat. Each value corresponds to the sampling probability of the category that contains the extracted hard conditioning values. If there is no hard conditioning for a model type i, then hd_val[i] is None otherwise it is an ndarray of shape=(ncmod[i], ndarray).

popex.utils.list_mCatParam_to_mCatProb(list_mCatParam)

list_mCatParam_to_mCatProb converts a list of m-tuples, each m-tuple containing CatParam, to a m-tuple of CatProb

Parameters:list_mCatParam (list) – list of m-tuples of CatParam objects
Returns:each element is a CatProb instance
Return type:m-tuple
popex.utils.merge_hd(hd_param_ind_1, hd_param_ind_2, hd_param_val_1, hd_param_val_2)

merge_hd used for merging two sets of hard conditioning data.

It is assumed that hd_param_ind_i[imtype] is None if and only if hd_param_val_i[imtype] is None.

Parameters:
  • hd_param_ind_1 (m-tuple) – First set of hard conditioning indices
  • hd_param_ind_2 (m-tuple) – Second set of hard conditioning indices
  • hd_param_val_1 (m-tuple) – First set of hard conditioning values
  • hd_param_val_2 (m-tuple) – Second set of hard conditioning values
Returns:

  • hd_ind (m-tuple) – Merged set of hard conditioning indices
  • hd_par (m-tuple) – Merged set of hard conditioning values

popex.utils.update_cat_prob(p_cat, m_new, w_new, sum_w_old)

update_cat_prob updates (in place) the category probabilities.

If p_cat_old represents the old category probability maps, then we have

p_cat_new = [sum_w_old*p_cat_old + sum_i w_new_i*1(m_new_i)] / [sum_w_old + sum(w_new)]

where 1(m_new_i) is the categorical indicator of the model i.

Parameters:
  • p_cat (m-tuple) – Tuple of categorical probability maps (cf. compute_cat_prob())
  • m_new (list) – List of m-tuples defining a set of nmod models
  • w_new (ndarray, shape=(nmod,)) – Weights associated to the new models
  • sum_w_old (float) – Old weight normalization constant
Returns:

Return type:

None

popex.utils.write_hd_info(popex, imod, hd_param_ind, hd_param_val)

write_hd_info writes the hard conditioning that has been deduced for creating a specific model to a text file.

The text file is saved at popex.path_res with the following structure:
<popex.path_res>$
└– hd
└– hd_modXXXXXX.txt
Parameters:
  • popex (PoPEx) – PoPEx main structure
  • imod (int) – Model index
  • hd_param_ind (m-tuple) – Hard conditioning indices
  • hd_param_val (m-tuple) – Hard conditioning values
Returns:

Return type:

None

popex.utils.write_run_info(pb, popex, imod, log_p_lik, cmp_log_p_lik, log_p_pri, log_p_gen, ncmod)

write_run_info writes some algorithm specific information to a text file.

The text file is save at popex.path_res with the following structure:
<popex.path_res>$
└– run_info.txt
Parameters:
  • pb (Problem) – Defines the problem functions and parameters
  • popex (PoPEx) – PoPEx main structure
  • imod (int) – Model index
  • log_p_lik (float) – Log-likelihood value of the model
  • cmp_log_p_lik (bool) – Indicates if likelihood has been computed (True) or predicted (False)
  • log_p_pri (float) – Prior log-probability of the model
  • log_p_gen (float) – Sampling log-probability of the model
  • ncmod (m-tuple) – Model type specific number of conditioning points used
Returns:

Return type:

None

n_e utilities

isample.neff is a module that implements the most common importance sampling diagnostics used in the PoPEx procedure. Namely these are

  • ne() Effective number of weights for estimating the expectation
  • ne_var() Effective number of weights for estimating the variance
  • ne_gamma() Effective number of weights for estimating the skewness
  • alpha() Optimization for finding the weight correction exponent
  • correct_w() Computes the set of corrected weights
popex.isampl.neff.alpha(weights, theta, a_init=1)

alpha computes the best alpha for lowering the skewness of the weights.

Let k_1 be the number of zero weights and k_2 is the number of weights that attain the maximum value in ‘weights’. For a given theta in the interval (k_2, n-k_1), this functions finds the unique alpha such that

neff.ne(weights ** alpha) = theta.

The alpha is found by using the ‘L-BFGS-B’ optimization algorithm of scipy.optimize with initial value equal to a_init. We set upper and lower bounds to ALPHA_MIN and ALPHA_MAX, respectively. The maximum number of iterations is 10.

Parameters:
  • weights (ndarray, shape=(n,)) – Set of weights
  • theta (float) – A positive value for the effective number of weights
  • a_init (float) – Initial guess for the optimization problem
Returns:

alpha value

Return type:

float

popex.isampl.neff.correct_w(weights, ne_w_corr)

correct_w is a function that lowers the skewness of the weights.

In addition to the method alpha() it computes whether the correction is possible and treats the exceptions accordingly.

Parameters:
  • weights (ndarray, shape=(n,)) – Set of weights
  • ne_w_corr (float) – A positive value for the effective number of weights
Returns:

w_corr – Corrected set of weights that is also normalized

Return type:

ndarray, shape=(n,)

popex.isampl.neff.find_fsigma(popex, theta, fsigma_max, ibnd)

fsigma computes the best fsigma for soft likelihood.

Let k_1 be the number of zero weights and k_2 is the number of weights that attain the maximum value in ‘weights’. For a given theta in the interval (k_2, n-k_1), this functions finds the unique alpha such that

neff.ne(weights(soft_lik(log_p_lik, fsigma)) = theta.

The f_sigma is found by using the ‘L-BFGS-B’ optimization algorithm of scipy.optimize with initial value equal to f_sigma_init. We set and lower bound to FSIGMA_MIN, upper bound specified as argument.

Parameters:
  • popex (popex object) – Popex object
  • theta (float) – A positive value for the effective number of weights
  • fsigma_max (float) – Value greater than 1, upper bound for fsigma
  • ibnd (int) – Include only first ibnd iterations
Returns:

fsigma value

Return type:

float

popex.isampl.neff.ne(weights)

ne computes the effective number of weights.

Kish’s effective number of weights is computed as

n_e(w_1, …, w_n) = ( sum w_i )^2 / ( sum w_i^2 ).
Parameters:weights (ndarray, shape=(n,)) – Set of weights.
Returns:Effective number of weights.
Return type:float
popex.isampl.neff.ne_gamma(weights)

ne_gamma computes the effective number of weights for the skewness.

The effective number of weights for estimating the skewness is computed as:

n_e(w_1, …, w_n) = ( sum w_i^2 )^3 / ( ( sum w_i^3 )^2 ).
Parameters:weights (ndarray, shape=(n,)) – Set of weights.
Returns:Effective number of weights (for the skewness).
Return type:float
popex.isampl.neff.ne_var(weights)

ne_var computes the effective number of weights for estimating the variance.

The effective number of weights for an empirical estimation of the variance is computed as:

n_e(w_1, …, w_n) = ( sum w_i^2 )^2 / ( sum w_i^4 ).
Parameters:weights (ndarray, shape=(n,)) – Set of weights.
Returns:Effective number of weights (for the variance).
Return type:float

Classes

popex_objects.py contains the PoPEx-specific class definitions.

Main structure
  • PoPEx: Main class for any PoPEx simulation. It contains the model chain and any corresponding probability measures.
Sampling definitions
  • Problem: Defines the sampling parameters and functions
  • Learning: Learning scheme for learning the likelihood values
  • Prediction: Defines the prediction parameters and functions
Classes associated to a model type
  • MType: (abstract) Parent class for each map associated to a model type
  • ContParam: (inherits from MType) Class for each map that is associated to the model types but not to categories (e.g. kld[imtype], entropy[imtype])
  • CatMType: (abstract, inherits from MType) Parent class for each map that is associated to categories
  • CatProb: (inherits from CatMType) This class is used for the representation of probability distributions over categories (e.g. p_cat[imtype], q_cat[imtype])
  • CatParam: (inherits from CatMType) This class is used for the representation of categorized parameter values (e.g. model[j][imtype])
class popex.popex_objects.CatMType(dtype_val='float64', param_val=None, categories=None)

This class is the abstract parent of any quantity associated to a categorical model type.

Parameters:
  • dtype_val (str) – Type of the ndarray values (eg. ‘int8’, ‘float32’, ‘float64’, etc)
  • param_val (ndarray) – Values associated to the parameters
  • categories (list) –

    List of size ncat. Each instance of the list is again a list of 2-tuples that define the value range for the category.

    If categories[i] = [(v_1, v_2), (v_3, v_4)], where v_j are real values, then the category i is defined by the union

    [v_1, v_2) U [v_3, v_4)
ncat

Number of categories.

Returns:Number of categories in categories
Return type:int
class popex.popex_objects.CatParam(dtype_val='float64', param_val=None, categories=None)

This class is used to define a categorized 1-dimensional parameter map that is associated to a model type (e.g. model, etc). The categories of each model parameter in param_val is indicated in param_cat. These categories are automatically updated if param_val or categories change.

Notes

The shape of param_val and param_cat is shape=(nparam,).

param_cat

Category indicator array.

Returns:Category indicators of the values in param_val.
Return type:ndarray, shape=(nparam,)
class popex.popex_objects.CatProb(dtype_val='float64', param_val=None, categories=None)

This class is used to define a map of continuous values for each category, where each value is associated to a model parameter (e.g. p_cat, q_cat, etc.).

Notes

The shape of param_val is shape=(nparam, ncat).

class popex.popex_objects.ContParam(dtype_val='float64', param_val=None)

This class is used to define a map of continuous values where each value is associated to a model parameter (e.g. entropy, kld, etc.).

Notes

The shape of param_val is (nparam,).

class popex.popex_objects.Learning

Learning defines an abstract parent class for a learning scheme.

Let’s assume that we want to define a learning scheme that predicts the log-likelihood of a model. In this case we define an explicite sub-class of Learning and provide implementations of the methods

It is assumed that there is a choice between ‘evaluating the exact answer’ (which is very expensive) or ‘predicting the answer by a machine learning scheme’ (which should be very fast). The learning scheme undergoes the following main steps

  • Update the learning scheme regularly by using the function trian() (cf. upd_ls_freq in algorithm.run_popex_mp). Note that here you can choose to only use likelihood values that have effectively been computed (cf. PoPEx.cmp_log_p_lik).
  • For a given instance compute a probability p in [0,1] with which the log-likelihood is predicted or evaluated (cf. compute_p_eval_for()) and then the value is predicted (cf. learn_value_of()).

Notes

In the PoPEx framework this can be used to learn the log-likelihood values for each model (=value of interest). In this regard, predicting a value rather than computing it can considerably improve the overall computational time.

compute_p_eval_for(model)

Computes and return a probability value in [0, 1] that determines whether a model should be evaluated exactly.

The two extreme confidence values signify:

p=0: Value can be learned from the learning scheme p=1: Value should be evaluated exactly.
Parameters:model (m-tuple) – Tuple of Mtype instances that define the new model
Returns:Probability value in [0, 1]
Return type:float
learn_value_of(model)

Uses the existing learning scheme to learn the value of interest for an instance.

Notes

This function should raise an error if there is no existing learning scheme.

Parameters:model (m-tuple) – Tuple of Mtype instances that define the new model
Returns:Predicted log-likelihood value
Return type:float
train(popex)

This method creates a learning scheme.

The learning scheme can be saved as class parameter.

Parameters:popex (PoPEx) – PoPEx main structure (cf popex_objects.PoPEx)
Returns:
Return type:None
class popex.popex_objects.MType(dtype_val='float64', param_val=None)

This class is the parent of any quantity associated to a model type.

Parameters:
  • dtype_val (str) – Type of the ndarray values (eg. ‘int8’, ‘float32’, ‘float64’, etc)
  • param_val (ndarray) – Values associated to the parameters
nparam

Number of parameters.

Returns:Number of values in param_val
Return type:int
class popex.popex_objects.PoPEx(model=None, log_p_lik=array([], dtype=float64), cmp_log_p_lik=array([], dtype=bool), log_p_pri=array([], dtype=float64), log_p_gen=array([], dtype=float64), ncmax=(0, ), nc=None, nmtype=1, path_res='~/')

Main class for any PoPEx simulation.

This class is the main object for the PoPEx algorithm. It contains all the models, likelihood, log-prior and log-generation information of a PoPEx run.

Parameters:
  • model (list) –

    List of models

    model[j] : m-tuple
    Tuple of MType instances
  • log_p_lik (ndarray, shape=(nmod,)) – Natural logarithm of likelihood measure
  • cmp_log_p_lik (ndarray, shape=(nmod,)) – Boolean indicator whether the log-likelihood value has been computed (True) or predicted (False)
  • log_p_pri (ndarray, shape=(nmod,)) – Natural logarithm of prior measure value
  • log_p_gen (ndarray, shape=(nmod,)) – Natural logarithm of sampling measure value
  • ncmax (m-tuple) – Maximum number of conditioning points for each model type
  • nc (list) –

    List of m-tuples

    nc[j] : m-tuple
    Contains the number of conditioning used conditioning points in the generation of model[j]
  • nmtype (int) – Number of model types
  • path_res (str) – Path of the results
add_model(imod, model, log_p_lik, cmp_log_p_lik, log_p_pri, log_p_gen, ncmod)

Appends a new model at the end of the model list and updates the measure arrays.

Parameters:
  • imod (int) – Model index
  • model (m-tuple) – Tuple of MType instances defining a new model
  • log_p_lik (float) – Log-likelihood value of model
  • cmp_log_p_lik (bool) – Indicates if the log-likelihood has been computed (True) or predicted (False)
  • log_p_pri (float) – Log-prior value of model
  • log_p_gen (float) – Log-generation value of model
  • ncmod (m-tuple) – Defines the number of conditioning points that have been used in the generation of the model
Returns:

Return type:

None

insert_model(loc, imod, model, log_p_lik, cmp_log_p_lik, log_p_pri, log_p_gen, ncmod)

Inserts a new model at loc of the model list and updates the measure arrays.

Parameters:
  • loc (int) – Location of the insertion
  • imod (int) – Model index
  • model (m-tuple) – Tuple of MType instances defining a model
  • log_p_lik (float) – Log-likelihood value of model
  • cmp_log_p_lik (bool) – Indicates if the log-likelihood has been computed (True) or predicted (False)
  • log_p_pri (float) – Log-prior value of model
  • log_p_gen (float) – Log-generation value of model
  • ncmod (m-tuple)) – Defines the number of conditioning points that have been used in the generation of the model
Returns:

Return type:

None

nmod

Number of models

Returns:Number of models in model
Return type:int
class popex.popex_objects.Prediction(compute_pred=None, meth_w_pred=None, nw_min=None, wfrac_pred=1.0)

Defines a prediction that should be computed based on an existing PoPEx instance.

The user must provide function definitions for compute_pred that actually implements the prediction operator. Note that there is no return value expected from that function. Any important result can be saved under

<path_res>$
└– solution
└– pred_<name>_modXXXXXX
Parameters:
  • compute_pred (function) –

    Computes the prediction for a given model.

    compute_pred(popex, imod)

    Parameters:
    Returns:
    None
  • meth_w_pred (dict) – Defines the method used for computing the prediction weights (cf. popex.utils.compute_w_lik())
  • nw_min (float) – Minimum number of effective weights (= l_0)
  • wfrac_pred (float) – Number in (0,1] defining the fraction of the total weight to be used for the prediction. If p=1, all predictions for any model with non-zero weight are computed. If p<1, we take the minimum number of weights to cover a ratio of p of the total sum of weights.
class popex.popex_objects.Problem(generate_m, compute_log_p_lik, get_hd_pri, compute_log_p_pri=None, compute_log_p_gen=None, learning_scheme=None, meth_w_hd=None, nmtype=1, seed=0)

Defines the sampling problem that should be addressed by the PoPEx method.

The user must provide function definitions for generate_m and compute_log_p_lik. For the definition of the model space, we can also provide ‘prior hard conditioning’ through the function get_hd_pri. Optionally, a likelihood learning scheme can be defined in learning_scheme. Furthermore, one also must define how to compute the ratio in the importance sampling weights. For this, the functions compute_log_p_pri and compute_log_p_gen can also be defined manually. If they are left empty, the default version that only considers the hard conditioning data points is used.

Parameters:
  • generate_m (function) –

    Generates a new model m from a set of hard conditioning data.

    generate_m(hd_param_ind, hd_param_val, imod)

    Parameters:
    • hd_param_ind : m-tuple
      For each instance in the model tuple, this variable defines the hard conditioning INDICES (where to apply HD). hd_param_ind[i] is an ndarray of shape=(nhd_i,)
    • hd_param_val : m-tuple
      For each instance in the model tuple, this variable defines the hard conditioning VALUES (what to imposed). hd_param_val[i] is an ndarray of shape=(nhd_i,)
    • imod : int
      Model index
    Returns:
    m-tuple
    New model such as (CatParam_1, …, CatParam_m)
  • compute_log_p_lik (function) –

    Computes the natural logarithm of the likelihood of a model. It usually runs an expensive forward operation and compares the response to a given set of observations.

    compute_log_p_lik(model, imod)

    Parameters:
    • model : m-tuple
      Model such as (CatParam_1, …, CatParam_m) (cf. output of generate_m())
    • imod : int
      Model index
    Returns:
    float
    Log-likelihood value of the model
  • get_hd_pri (function) –

    Provides the ‘prior hard conditioning’ that is used in the definition of the model space (i.e. parameter values that are known without uncertainty).

    get_hd_pri()

    Returns
    • hd_pri_ind : m-tuple
      For each instance in the model tuple, this variable defines the hard conditioning INDICES.
    • hd_pri_val : m-tuple
      For each instance in the model tuple, this variable defines the hard conditioning VALUES.
  • compute_log_p_pri (function, optional) –

    This function computes the log-prior probability of a model that has been generated from a given set of hard conditioning data. Note that it’s definition is OPTIONAL. If it is left undefined, a default implementation will be used (see remark below).

    compute_log_p_pri(model, hd_p_pri, hd_param_ind)

    Parameters:
    • model : m-tuple
      Model such as (CatParam_1, …, CatParam_m) (cf. output of generate_m())
    • hd_p_pri : m-tuple
      Tuple of the hard conditioning probability values for a given model. Each probability value describes the prior probability of observing the category of the model value at the corresponding conditioning location. hd_p_pri[i] is an ndarray of shape=(nhd_i,)
    • hd_param_ind : m-tuple
      Hard conditioning indices
    Returns:
    float
    Log-prior probability value
  • compute_log_p_gen (function, optional) –

    This function computes the log-probability of generating a model in the PoPEx sampling from a given set of hard conditioning. Note that it’s definition is OPTIONAL. If it is left undefined, a default implementation will be used (see remark below).

    compute_log_p_gen(model, hd_p_gen, hd_param_ind)

    Parameters:
    • model : m-tuple
      Model such as (CatParam_1, …, CatParam_m) (cf. output of generate_m())
    • hd_p_gen : m-tuple
      Tuple of the hard conditioning probability values for a given model. Each probability value describes the prior probability of observing the category of the model value at the corresponding conditioning location. hd_p_gen[i] is an ndarray of shape=(nhd_i,)
    • hd_param_ind : m-tuple
      Hard conditioning indices
    Returns:
    float
    Log-generation probability value
  • learning_scheme (Learning, optional) – Learning scheme for log_p_lik (concrete sublcass of Learning)
  • meth_w_hd (dict, optional) – Defines the method for computing the learning weights that are used in the computation of the hard conditioning points (cf. compute_w_lik())
  • nmtype (int) – Number of model types
  • seed (int) – Initial seed

Notes

  1. Let us provide a simple example for hard conditioning data in hd_param_ind and hd_param_val. It is important to note that PoPEx does NOT use any parameter locations. They might be defined by the user. If so, they have to follow a certain structure. Let the parameter locations be such that:

                                x    y    z
    param_loc[0] = np.array([[0.5, 1.5, 0.5],   # Parameter 0
                             [0.5, 2.5, 0.5],   # Parameter 1
                             [0.5, 3.5, 0.5]]   # Parameter 2
    

    and the parameter indices (in hd_param_ind) are for example:

    hd_param_ind[0] = [0, 2]    # Condition parameter 0 and 2
    

    so we will use param_loc[0][hd_param_ind[0], :] for obtaining the array:

    np.array([[0.5, 1.5, 0.5],
              [0.5, 3.5, 0.5]]).
    

    This array indicates the physical locations where hard conditioning should be applied for the model type 0. Let the parameter values (in hd_param_val) be given by:

    hd_param_val[0] = np.array([1.2, 2.5]).
    

    Together with the conditioning locations above, this imposes hard conditioning data as follows:

     x     y     z     val
    0.5   1.5   0.5    1.2
    0.5   3.5   0.5    2.5
    
  2. Note that it is possible to NOT define compute_log_p_pri and compute_log_p_gen. In this case, a predefined function will be used. This predefined implementation assumes that the quantities p_pri and p_gen are only used TOGETHER in the form of a RATIO

    ratio(m) = rho(m) / phi(m).

    In other words, the default functions assume that we are only interested in the DIFFERENCE of the log values, i.e.

    log_p_pri - log_p_gen,

    and never in the exact values on their own. It is left to the user to implement a more suitable computation, whenever the above assumption is not sufficient. For more information also consult the theoretical description of the PoPEx method.

  3. It is also possible to NOT define the learning_scheme. In this case, the log-likelihood value will ALWAYS be computed.