API of PoPEx Package¶

The API Documentation of the package

popex

Algorithm¶

Utilities¶

Generic PoPEx utilities¶

utils.py contains utilities for performing a PoPEx sampling and computing predictions:

Category probabilities and kld maps

compute_cat_prob(): Computes probability maps according to categories
update_cat_prob(): Updates probability maps according to categories
compute_entropy(): Computes entropy of a probability map
compute_kld(): Computes kld of two probability maps

Hard conditioning data

generate_hd(): Computes the new hard conditioning data
merge_hd(): Merges prior and new hard conditioning data
compute_ncmod(): Computes the number of conditioning points per model type
compute_w_lik(): Computes the likelihood weights (used for the hard conditioning maps)

Generic functions

compute_w_pred(): Computes the weights for the predictions
compute_subset_ind(): Computes the smallest number of indices that cover a given percentage of a total weight
write_hd_info(): Writes/saves hd information about each model
write_run_info(): Appends information about models to run info file

popex.utils.check_category(list_values, category)¶

check_category verifies for each value if it falls in category

Parameters:	list_values (ndarray) – list of scalars category (list) – list of 2-tuples
Returns:	i-th elements is True if list_values[i] belongs to the category and False otherwise. category is defined as union of all intervals defined by the 2-tuples (cf. popex_objects.CatMType)
Return type:	ndarray, shape = np.shape(list_values)

popex.utils.check_interval(list_values, interval)¶

check_interval verifies for each value if it falls in interval

Parameters:	list_values (ndarray) – list of scalars category (2-tuple) – interval, first value is the lower end, second is the high end
Returns:	i-th elements is True if list_values[i] falls in the interval
Return type:	ndarray, shape = np.shape(list_values)

popex.utils.compute_cat_prob(popex, weights, start=-1, stop=-1)¶

compute_cat_prob computes the weighted category probabilities.

The models are obtained from popex.model and weighted by weights.

Parameters:

popex (PoPEx) – PoPEx main structure (cf popex_objects.PoPEx)
weights (ndarray, shape=(nmod,)) – Relative weights of the models
start (int) –
Defines the first model to take into account:
- -1: For starting at 0
- N: For starting at max(N, 0)
stop (int) –
Defines the last model to take into account
- -1: For stopping at popex.nmod
- N: For stopping at min(N, popex.nmo)

Returns:

A tuple of instances that describes the category probabilities for all categorical model types.

If a model type i is not a subclass of CatMType, the corresponding map is set to None. If a model is given by (CatModel_1, …, CatModel_m) and the model values in CatModel_i are subdivided into ncat_i categories, then the return value is a (CatProb_1, …, CatProb_m) tuple where return[i].param_val is a ndarray with shape=(nparam_i, ncat_i).

Return type:

m-tuple

popex.utils.compute_entropy(p_cat)¶

compute_entropy computes the entropy of p_cat.

The entropy of a discrete probability distributions p = (p_1, …,p_s) is

H(p) = -sum_{i=1}^s p_i log( p_i ).

Therefore, if the probability map p_cat is a m-tuple such that p_cat[i].param_val is an ndarray of shape=(nparam_i, nfac_i), the entropy is also an m-tuple where H[i].param_val being an ndarray of shape=(nparam,).

Notes

Note that t*log(t) -> 0 as t -> 0. Therefore, H(x) = 0 wherever p_i(x) = 0.

Parameters:	p_cat (m-tuple) – Tuple of `CatProb` instances with p_cat[i].param_val being an `ndarray` of shape=(nparam_i, nfac_i)
Returns:	Tuple of entropy maps return[i] : `None` or `ContParam` Return value i is `None` if p_cat[i] is `None`, otherwise it is an instance of `ContParam`
Return type:	m-tuple

popex.utils.compute_kld(p_cat, q_cat)¶

compute_kld computes the Kullback-Leibler divergence (KLD) between two category probability maps p_cat and q_cat.

The KLD between two discrete probability distributions p = (p_1, …,p_s) and q = (q_1, …,q_s) is

KLD(p||q) = sum_{i=1}^s p_i log( p_i / q_i).

Therefore, if the probability maps p_cat and q_cat are m-tuples such that p_cat[i].param_val and q_cat[i].param_val are ndarrays of shape=(nparam_i, nfac_i), the Kullback-Leibler divergence is also an m-tuple where kld[i].param_val is an ndarray of shape=(nparam,).

Notes

Note that t*log(t/a) -> 0 as t -> 0. Therefore, we require that q_i(x) = 0 implies p_i(x) = 0 in which case we can put kld(x) = 0. However, due to the (inaccurate) numerical representation of the probability maps, it is possible that q_i(x) = 0 and p_i(x) > 0 (f.e. when q has been approximated from a relative small set of models). In this case we enforce q_i(x) = p_i(x) what leads to kld(x) = 0.

Parameters:

p_cat (m-tuple) – Tuple of CatProb instances with p_cat[i].param_val being an ndarray of shape=(nparam_i x nfac_i)
q_cat (m-tuple) – Tuple of CatProb instances with q_cat[i].param_val being an ndarray of shape=(nparam_i, nfac_i)

Returns:

Tuple of kld maps

return[i] : None or ContParam: Return value i is None if p_cat[i]` is None, otherwise it is an instance of ContParam

Return type:

m-tuple

popex.utils.compute_ncmod(popex, meth_w_hd=None)¶

compute_ncmod computes, for each model type, the number of conditioning points.

It is assumed that the number of hard data is restricted model type-wise. Therefore, the number of conditioning points is also computed model type-wise by sampling from an uniform random variable ~U(0, popex.ncmax[imtype]).

Notes

Note that if the total sum of the likelihood values in popex.p_lik is zero, ncmod is set to zero for each model type.

Parameters:	popex (PoPEx) – PoPEx main structure meth_w_hd (dict) – Method to compute hard conditioning weights (cf. `compute_w_lik()`)
Returns:	Number of conditioning points per model type
Return type:	m-tuple

popex.utils.compute_subset_ind(p_frac, weights)¶

compute_subset_ind computes the smallest index set that covers a given percentage.

This means that the subset indices ind are such that

np.sum(weights[ind]) >= p_frac * np.sum(weights),

or in other words weights[ind] covers at least a fraction of p_frac of the total some of weights.

Parameters:	p_frac (float) – Coverage fraction in (0, 1] weights (ndarray, shape=(nw,)) – Non-negative weights
Returns:	Subset of indices
Return type:	list

popex.utils.compute_w_lik(popex, meth=None)¶

compute_w_lik returns the set of normalized likelihood values.

In practice, when the likelihood values must be represented by a floating point number, it might be advantageous to compute approximations of L(m).

There are several approximation possibilities that are implemented in this version (specified in meth):

No approximation (meth={‘name’: ‘exact’} or meth=None):

L(m) = exp( ‘log_p_lik’ )

Sqrt-unskewed (meth={‘name’: ‘exp_sqrt_log’})

L(m) ~ exp( -sqrt(-‘log_p_lik’ )

K-unskewed (meth={‘name’: ‘exp_sqrt_log’, ‘pow’: k})

L(m) ~ exp( - (-‘log_p_lik’)^k )

Inverse log (meth={‘name’: ‘inv_log’})

L(m) ~ 1 / ( 1-‘log_p_lik’ )

Inverse sqrt-log (meth={‘name’: ‘inv_sqrt_log’})

L(m) ~ 1 / ( 1+sqrt(-‘log_p_lik’) ).

Soft likelihood (meth={‘name’: ‘soft’, ‘fsigma’: fsigma})

L(m) ~ exp(‘log_p_lik’/fsigma^2).

As mentioned above, these techniques aim to unskew the likelihood values.

Notes

This function is used in two different locations (with possibly two different approximation techniques): for the learning scheme in the PoPEx sampling and for computing predictions. While in first case any approximation technique can be used, the latter choice might bias the computation weights.

Parameters:	popex (PoPEx) – PoPEx main structure meth (dict) – Defines the approximation method to be used. Fields are `'name'` : Name of the method (str) `'pow'` : Power for method (c) (float)
Returns:	Array of normalized weights
Return type:	ndarray, shape=(nmod,)

popex.utils.compute_w_pred(popex, nw_min=0, ibnd=-1, meth=None)¶

compute_w_pred returns the set of normalized predictive weights.

For assuring a minimum number of effective weights, they are computed such that

ne(w_pred) = min(nw_min, ne(w))

where w contains the weights associated to the models and ne(w) denotes the number of effective weights. This quantity can be modified by replacing w with w^alpha, where alpha > 0. A 1-d optimisation problem is used to compute the optimal alpha value.

Parameters:	popex (PoPEx) – PoPEx main structure nw_min (int) – Mininum number of effective weights (= l_0) ibnd (int) – Length of the weight array meth (dict) – Defines the approximation method to be used (cf. `compute_w_lik()`) Fields are `'name'` : Name of the method (str) `'pow'` : Power for method (c) (float) `'fsigma`: f-sigma parametrer for method (f) ‘soft’ (float)
Returns:	Array of predictive weights
Return type:	ndarray, shape=(nmod,)

popex.utils.generate_hd(popex, meth_w_hd, ncmod, kld, p_cat, q_cat)¶

generate_hd generates the hard conditioning data set that is used to sample a new model.

This set of hard conditioning data does NOT include prior hard conditioning. For each model type (imtype), every hard conditioning is obtained by the following 2-steps:

Sample a location [j] according to the values in the Kullback- Leibler divergence map (i.e. the values in kld[imtype].param_val)

Sample a model [k] according to the weights from compute_w_lik() and directly extract the hard conditioning value from popex.model[k][imtype].param_val[j].

In addition to the hard conditioning, this function also extracts probability values from q_cat and p_cat at the conditioning location. These values represent the prior/weighted category probability of the category that corresponds to popex.model[k][imtype].param_val[j]. They can be useful to compute the sampling weight ratio.

Notes

There are two important things to note:

The two objects hd_prior and hd_generation are the corresponding prior and weighted probability values of the hard conditioning CATEGORY that corresponds to the values in hd_param_val. Therefore, if they are used in the computation of the sampling weight ratio, one uses CATEGORY probabilities and NOT value probabilities.

Numerical imperfections (for example in the computation of ‘q_cat’) can cause locations where p_cat > 0 but q_cat = 0. In the computation of the Kullback-Leibler divergence we did put corresponding kld values to 0 (by enforcing q_i(x) = p_i(x)) and therefore it is impossible to sample and condition such locations.

Parameters:

popex (PoPEx) – PoPEx main structure
meth_w_hd (dict) – Method to compute hard conditioning weights (cf. compute_w_lik())
ncmod (m-tuple) – Number of conditioning points per model type
kld (m-tuple) – Tuple of ContParam instances defining the Kullback-Leibler divergence
p_cat (m-tuple) – Tuple of CatProb instances defining the weighted category probabilities with p_cat[i].param_val being an ndarray of shape=(nparam_i, nfac_i)
q_cat (m-tuple) – Tuple of CatProb instances defining the weighted category probabilities with q_cat[i].param_val being an ndarray of shape=(nparam_i x nfac_i)

Returns:

hd_param_ind (m-tuple) – Tuple of hard conditioning indices where hard conditioning values are imposed. If there is no hard conditioning for a model type i, then hd_ind[i] is None otherwise it is an ndarray of shape=(ncmod[i], ndarray).
hd_param_val (m-tuple) – Tuple of hard conditioning values that are imposed at the hard conditioning indices. If there is no hard conditioning for a model type i, then hd_val[i] is None otherwise it is an ndarray of shape=(ncmod[i], ndarray).
hd_prior (m-tuple) – Tuple of probability values according to the prior probability maps in q_cat. Each value corresponds to the prior probability of the category that contains the extracted hard conditioning value. If there is no hard conditioning for a model type i, then hd_val[i] is None, otherwise it is an ndarray of shape=(ncmod[i], ndarray).
hd_generation (m-tuple) – Tuple of probability values according to the sampling probability maps in p_cat. Each value corresponds to the sampling probability of the category that contains the extracted hard conditioning values. If there is no hard conditioning for a model type i, then hd_val[i] is None otherwise it is an ndarray of shape=(ncmod[i], ndarray).

popex.utils.list_mCatParam_to_mCatProb(list_mCatParam)¶

list_mCatParam_to_mCatProb converts a list of m-tuples, each m-tuple containing CatParam, to a m-tuple of CatProb

Parameters:	list_mCatParam (list) – list of m-tuples of CatParam objects
Returns:	each element is a CatProb instance
Return type:	m-tuple

popex.utils.merge_hd(hd_param_ind_1, hd_param_ind_2, hd_param_val_1, hd_param_val_2)¶

merge_hd used for merging two sets of hard conditioning data.

It is assumed that hd_param_ind_i[imtype] is None if and only if hd_param_val_i[imtype] is None.

Parameters:

hd_param_ind_1 (m-tuple) – First set of hard conditioning indices
hd_param_ind_2 (m-tuple) – Second set of hard conditioning indices
hd_param_val_1 (m-tuple) – First set of hard conditioning values
hd_param_val_2 (m-tuple) – Second set of hard conditioning values

Returns:

hd_ind (m-tuple) – Merged set of hard conditioning indices
hd_par (m-tuple) – Merged set of hard conditioning values

popex.utils.update_cat_prob(p_cat, m_new, w_new, sum_w_old)¶

update_cat_prob updates (in place) the category probabilities.

If p_cat_old represents the old category probability maps, then we have

p_cat_new = [sum_w_old*p_cat_old + sum_i w_new_i*1(m_new_i)] / [sum_w_old + sum(w_new)]

where 1(m_new_i) is the categorical indicator of the model i.

Parameters:	p_cat (m-tuple) – Tuple of categorical probability maps (cf. `compute_cat_prob()`) m_new (list) – List of m-tuples defining a set of nmod models w_new (ndarray, shape=(nmod,)) – Weights associated to the new models sum_w_old (float) – Old weight normalization constant
Returns:
Return type:	None

popex.utils.write_hd_info(popex, imod, hd_param_ind, hd_param_val)¶

write_hd_info writes the hard conditioning that has been deduced for creating a specific model to a text file.

The text file is saved at popex.path_res with the following structure:

<popex.path_res>$

└– hd

└– hd_modXXXXXX.txt

Parameters:	popex (PoPEx) – PoPEx main structure imod (int) – Model index hd_param_ind (m-tuple) – Hard conditioning indices hd_param_val (m-tuple) – Hard conditioning values
Returns:
Return type:	None

popex.utils.write_run_info(pb, popex, imod, log_p_lik, cmp_log_p_lik, log_p_pri, log_p_gen, ncmod)¶

write_run_info writes some algorithm specific information to a text file.

The text file is save at popex.path_res with the following structure:

<popex.path_res>$

└– run_info.txt

Parameters:	pb (Problem) – Defines the problem functions and parameters popex (PoPEx) – PoPEx main structure imod (int) – Model index log_p_lik (float) – Log-likelihood value of the model cmp_log_p_lik (bool) – Indicates if likelihood has been computed (True) or predicted (False) log_p_pri (float) – Prior log-probability of the model log_p_gen (float) – Sampling log-probability of the model ncmod (m-tuple) – Model type specific number of conditioning points used
Returns:
Return type:	None

n_e utilities¶

isample.neff is a module that implements the most common importance sampling diagnostics used in the PoPEx procedure. Namely these are

ne() Effective number of weights for estimating the expectation

ne_var() Effective number of weights for estimating the variance

ne_gamma() Effective number of weights for estimating the skewness

alpha() Optimization for finding the weight correction exponent

correct_w() Computes the set of corrected weights

popex.isampl.neff.alpha(weights, theta, a_init=1)¶

alpha computes the best alpha for lowering the skewness of the weights.

Let k_1 be the number of zero weights and k_2 is the number of weights that attain the maximum value in ‘weights’. For a given theta in the interval (k_2, n-k_1), this functions finds the unique alpha such that

neff.ne(weights ** alpha) = theta.

The alpha is found by using the ‘L-BFGS-B’ optimization algorithm of scipy.optimize with initial value equal to a_init. We set upper and lower bounds to ALPHA_MIN and ALPHA_MAX, respectively. The maximum number of iterations is 10.

Parameters:	weights (ndarray, shape=(n,)) – Set of weights theta (float) – A positive value for the effective number of weights a_init (float) – Initial guess for the optimization problem
Returns:	alpha value
Return type:	float

popex.isampl.neff.correct_w(weights, ne_w_corr)¶

correct_w is a function that lowers the skewness of the weights.

In addition to the method alpha() it computes whether the correction is possible and treats the exceptions accordingly.

Parameters:	weights (ndarray, shape=(n,)) – Set of weights ne_w_corr (float) – A positive value for the effective number of weights
Returns:	w_corr – Corrected set of weights that is also normalized
Return type:	ndarray, shape=(n,)

popex.isampl.neff.find_fsigma(popex, theta, fsigma_max, ibnd)¶

fsigma computes the best fsigma for soft likelihood.

Let k_1 be the number of zero weights and k_2 is the number of weights that attain the maximum value in ‘weights’. For a given theta in the interval (k_2, n-k_1), this functions finds the unique alpha such that

neff.ne(weights(soft_lik(log_p_lik, fsigma)) = theta.

The f_sigma is found by using the ‘L-BFGS-B’ optimization algorithm of scipy.optimize with initial value equal to f_sigma_init. We set and lower bound to FSIGMA_MIN, upper bound specified as argument.

Parameters:	popex (popex object) – Popex object theta (float) – A positive value for the effective number of weights fsigma_max (float) – Value greater than 1, upper bound for fsigma ibnd (int) – Include only first ibnd iterations
Returns:	fsigma value
Return type:	float

popex.isampl.neff.ne(weights)¶

ne computes the effective number of weights.

Kish’s effective number of weights is computed as

n_e(w_1, …, w_n) = ( sum w_i )^2 / ( sum w_i^2 ).

Parameters:	weights (ndarray, shape=(n,)) – Set of weights.
Returns:	Effective number of weights.
Return type:	float

popex.isampl.neff.ne_gamma(weights)¶

ne_gamma computes the effective number of weights for the skewness.

The effective number of weights for estimating the skewness is computed as:

n_e(w_1, …, w_n) = ( sum w_i^2 )^3 / ( ( sum w_i^3 )^2 ).

Parameters:	weights (ndarray, shape=(n,)) – Set of weights.
Returns:	Effective number of weights (for the skewness).
Return type:	float

popex.isampl.neff.ne_var(weights)¶

ne_var computes the effective number of weights for estimating the variance.

The effective number of weights for an empirical estimation of the variance is computed as:

n_e(w_1, …, w_n) = ( sum w_i^2 )^2 / ( sum w_i^4 ).

Parameters:	weights (ndarray, shape=(n,)) – Set of weights.
Returns:	Effective number of weights (for the variance).
Return type:	float

Classes¶

popex_objects.py contains the PoPEx-specific class definitions.

Main structure

PoPEx: Main class for any PoPEx simulation. It contains the model chain and any corresponding probability measures.

Sampling definitions

Problem: Defines the sampling parameters and functions
Learning: Learning scheme for learning the likelihood values
Prediction: Defines the prediction parameters and functions

Classes associated to a model type

MType: (abstract) Parent class for each map associated to a model type
ContParam: (inherits from MType) Class for each map that is associated to the model types but not to categories (e.g. kld[imtype], entropy[imtype])
CatMType: (abstract, inherits from MType) Parent class for each map that is associated to categories
CatProb: (inherits from CatMType) This class is used for the representation of probability distributions over categories (e.g. p_cat[imtype], q_cat[imtype])
CatParam: (inherits from CatMType) This class is used for the representation of categorized parameter values (e.g. model[j][imtype])

class popex.popex_objects.CatMType(dtype_val='float64', param_val=None, categories=None)¶

This class is the abstract parent of any quantity associated to a categorical model type.

Parameters:

dtype_val (str) – Type of the ndarray values (eg. ‘int8’, ‘float32’, ‘float64’, etc)
param_val (ndarray) – Values associated to the parameters
categories (list) –
List of size ncat. Each instance of the list is again a list of 2-tuples that define the value range for the category.

If categories[i] = [(v_1, v_2), (v_3, v_4)], where v_j are real values, then the category i is defined by the union

[v_1, v_2) U [v_3, v_4)

ncat¶

Number of categories.

Returns:	Number of categories in categories
Return type:	int

class popex.popex_objects.CatParam(dtype_val='float64', param_val=None, categories=None)¶

This class is used to define a categorized 1-dimensional parameter map that is associated to a model type (e.g. model, etc). The categories of each model parameter in param_val is indicated in param_cat. These categories are automatically updated if param_val or categories change.

Notes

The shape of param_val and param_cat is shape=(nparam,).

param_cat¶

Category indicator array.

Returns:	Category indicators of the values in param_val.
Return type:	ndarray, shape=(nparam,)

class popex.popex_objects.CatProb(dtype_val='float64', param_val=None, categories=None)¶

This class is used to define a map of continuous values for each category, where each value is associated to a model parameter (e.g. p_cat, q_cat, etc.).

Notes

The shape of param_val is shape=(nparam, ncat).

class popex.popex_objects.ContParam(dtype_val='float64', param_val=None)¶

This class is used to define a map of continuous values where each value is associated to a model parameter (e.g. entropy, kld, etc.).

Notes

The shape of param_val is (nparam,).

class popex.popex_objects.Learning¶

Learning defines an abstract parent class for a learning scheme.

Let’s assume that we want to define a learning scheme that predicts the log-likelihood of a model. In this case we define an explicite sub-class of Learning and provide implementations of the methods

train()

compute_p_eval_for()

learn_value_of()

It is assumed that there is a choice between ‘evaluating the exact answer’ (which is very expensive) or ‘predicting the answer by a machine learning scheme’ (which should be very fast). The learning scheme undergoes the following main steps

Update the learning scheme regularly by using the function trian() (cf. upd_ls_freq in algorithm.run_popex_mp). Note that here you can choose to only use likelihood values that have effectively been computed (cf. PoPEx.cmp_log_p_lik).

For a given instance compute a probability p in [0,1] with which the log-likelihood is predicted or evaluated (cf. compute_p_eval_for()) and then the value is predicted (cf. learn_value_of()).

Notes

In the PoPEx framework this can be used to learn the log-likelihood values for each model (=value of interest). In this regard, predicting a value rather than computing it can considerably improve the overall computational time.

compute_p_eval_for(model)¶

Computes and return a probability value in [0, 1] that determines whether a model should be evaluated exactly.

The two extreme confidence values signify:

p=0: Value can be learned from the learning scheme p=1: Value should be evaluated exactly.

Parameters:	model (m-tuple) – Tuple of `Mtype` instances that define the new model
Returns:	Probability value in [0, 1]
Return type:	float

learn_value_of(model)¶

Uses the existing learning scheme to learn the value of interest for an instance.

Notes

This function should raise an error if there is no existing learning scheme.

Parameters:	model (m-tuple) – Tuple of `Mtype` instances that define the new model
Returns:	Predicted log-likelihood value
Return type:	float

train(popex)¶

This method creates a learning scheme.

The learning scheme can be saved as class parameter.

Parameters:	popex (PoPEx) – PoPEx main structure (cf popex_objects.PoPEx)
Returns:
Return type:	None

class popex.popex_objects.MType(dtype_val='float64', param_val=None)¶

This class is the parent of any quantity associated to a model type.

Parameters:	dtype_val (str) – Type of the `ndarray` values (eg. ‘int8’, ‘float32’, ‘float64’, etc) param_val (ndarray) – Values associated to the parameters

nparam¶

Number of parameters.

Returns:	Number of values in param_val
Return type:	int

class popex.popex_objects.PoPEx(model=None, log_p_lik=array([], dtype=float64), cmp_log_p_lik=array([], dtype=bool), log_p_pri=array([], dtype=float64), log_p_gen=array([], dtype=float64), ncmax=(0, ), nc=None, nmtype=1, path_res='~/')¶

Main class for any PoPEx simulation.

This class is the main object for the PoPEx algorithm. It contains all the models, likelihood, log-prior and log-generation information of a PoPEx run.

Parameters:

model (list) –
List of models

model[j] : m-tuple

Tuple of MType instances
log_p_lik (ndarray, shape=(nmod,)) – Natural logarithm of likelihood measure
cmp_log_p_lik (ndarray, shape=(nmod,)) – Boolean indicator whether the log-likelihood value has been computed (True) or predicted (False)
log_p_pri (ndarray, shape=(nmod,)) – Natural logarithm of prior measure value
log_p_gen (ndarray, shape=(nmod,)) – Natural logarithm of sampling measure value
ncmax (m-tuple) – Maximum number of conditioning points for each model type
nc (list) –
List of m-tuples

nc[j] : m-tuple

Contains the number of conditioning used conditioning points in the generation of model[j]
nmtype (int) – Number of model types
path_res (str) – Path of the results

add_model(imod, model, log_p_lik, cmp_log_p_lik, log_p_pri, log_p_gen, ncmod)¶

Appends a new model at the end of the model list and updates the measure arrays.

Parameters:	imod (int) – Model index model (m-tuple) – Tuple of `MType` instances defining a new model log_p_lik (float) – Log-likelihood value of model cmp_log_p_lik (bool) – Indicates if the log-likelihood has been computed (True) or predicted (False) log_p_pri (float) – Log-prior value of model log_p_gen (float) – Log-generation value of model ncmod (m-tuple) – Defines the number of conditioning points that have been used in the generation of the model
Returns:
Return type:	None

insert_model(loc, imod, model, log_p_lik, cmp_log_p_lik, log_p_pri, log_p_gen, ncmod)¶

Inserts a new model at loc of the model list and updates the measure arrays.

Parameters:	loc (int) – Location of the insertion imod (int) – Model index model (m-tuple) – Tuple of `MType` instances defining a model log_p_lik (float) – Log-likelihood value of model cmp_log_p_lik (bool) – Indicates if the log-likelihood has been computed (True) or predicted (False) log_p_pri (float) – Log-prior value of model log_p_gen (float) – Log-generation value of model ncmod (m-tuple)) – Defines the number of conditioning points that have been used in the generation of the model
Returns:
Return type:	None

nmod¶

Number of models

Returns:	Number of models in model
Return type:	int

class popex.popex_objects.Prediction(compute_pred=None, meth_w_pred=None, nw_min=None, wfrac_pred=1.0)¶

Defines a prediction that should be computed based on an existing PoPEx instance.

The user must provide function definitions for compute_pred that actually implements the prediction operator. Note that there is no return value expected from that function. Any important result can be saved under

<path_res>$

└– solution

└– pred_<name>_modXXXXXX

Parameters:

compute_pred (function) –
Computes the prediction for a given model.

compute_pred(popex, imod)
Parameters:
- popex : PoPEx
  
  PoPEx main structure (cf popex.popex_objects.PoPEx)
- imod : int
  
  Model index
Returns:

None
meth_w_pred (dict) – Defines the method used for computing the prediction weights (cf. popex.utils.compute_w_lik())
nw_min (float) – Minimum number of effective weights (= l_0)
wfrac_pred (float) – Number in (0,1] defining the fraction of the total weight to be used for the prediction. If p=1, all predictions for any model with non-zero weight are computed. If p<1, we take the minimum number of weights to cover a ratio of p of the total sum of weights.

class popex.popex_objects.Problem(generate_m, compute_log_p_lik, get_hd_pri, compute_log_p_pri=None, compute_log_p_gen=None, learning_scheme=None, meth_w_hd=None, nmtype=1, seed=0)¶

Defines the sampling problem that should be addressed by the PoPEx method.

The user must provide function definitions for generate_m and compute_log_p_lik. For the definition of the model space, we can also provide ‘prior hard conditioning’ through the function get_hd_pri. Optionally, a likelihood learning scheme can be defined in learning_scheme. Furthermore, one also must define how to compute the ratio in the importance sampling weights. For this, the functions compute_log_p_pri and compute_log_p_gen can also be defined manually. If they are left empty, the default version that only considers the hard conditioning data points is used.

Parameters:

generate_m (function) –
Generates a new model m from a set of hard conditioning data.

generate_m(hd_param_ind, hd_param_val, imod)
Parameters:
- hd_param_ind : m-tuple
  
  For each instance in the model tuple, this variable defines the hard conditioning INDICES (where to apply HD). hd_param_ind[i] is an ndarray of shape=(nhd_i,)
- hd_param_val : m-tuple
  
  For each instance in the model tuple, this variable defines the hard conditioning VALUES (what to imposed). hd_param_val[i] is an ndarray of shape=(nhd_i,)
- imod : int
  
  Model index
Returns:

m-tuple

New model such as (CatParam_1, …, CatParam_m)
compute_log_p_lik (function) –
Computes the natural logarithm of the likelihood of a model. It usually runs an expensive forward operation and compares the response to a given set of observations.

compute_log_p_lik(model, imod)
Parameters:
- model : m-tuple
  
  Model such as (CatParam_1, …, CatParam_m) (cf. output of generate_m())
- imod : int
  
  Model index
Returns:

float

Log-likelihood value of the model
get_hd_pri (function) –
Provides the ‘prior hard conditioning’ that is used in the definition of the model space (i.e. parameter values that are known without uncertainty).

get_hd_pri()
Returns
- hd_pri_ind : m-tuple
  
  For each instance in the model tuple, this variable defines the hard conditioning INDICES.
- hd_pri_val : m-tuple
  
  For each instance in the model tuple, this variable defines the hard conditioning VALUES.
compute_log_p_pri (function, optional) –
This function computes the log-prior probability of a model that has been generated from a given set of hard conditioning data. Note that it’s definition is OPTIONAL. If it is left undefined, a default implementation will be used (see remark below).

compute_log_p_pri(model, hd_p_pri, hd_param_ind)
Parameters:
- model : m-tuple
  
  Model such as (CatParam_1, …, CatParam_m) (cf. output of generate_m())
- hd_p_pri : m-tuple
  
  Tuple of the hard conditioning probability values for a given model. Each probability value describes the prior probability of observing the category of the model value at the corresponding conditioning location. hd_p_pri[i] is an ndarray of shape=(nhd_i,)
- hd_param_ind : m-tuple
  
  Hard conditioning indices
Returns:

float

Log-prior probability value
compute_log_p_gen (function, optional) –
This function computes the log-probability of generating a model in the PoPEx sampling from a given set of hard conditioning. Note that it’s definition is OPTIONAL. If it is left undefined, a default implementation will be used (see remark below).

compute_log_p_gen(model, hd_p_gen, hd_param_ind)
Parameters:
- model : m-tuple
  
  Model such as (CatParam_1, …, CatParam_m) (cf. output of generate_m())
- hd_p_gen : m-tuple
  
  Tuple of the hard conditioning probability values for a given model. Each probability value describes the prior probability of observing the category of the model value at the corresponding conditioning location. hd_p_gen[i] is an ndarray of shape=(nhd_i,)
- hd_param_ind : m-tuple
  
  Hard conditioning indices
Returns:

float

Log-generation probability value
learning_scheme (Learning, optional) – Learning scheme for log_p_lik (concrete sublcass of Learning)
meth_w_hd (dict, optional) – Defines the method for computing the learning weights that are used in the computation of the hard conditioning points (cf. compute_w_lik())
nmtype (int) – Number of model types
seed (int) – Initial seed

Notes

Let us provide a simple example for hard conditioning data in hd_param_ind and hd_param_val. It is important to note that PoPEx does NOT use any parameter locations. They might be defined by the user. If so, they have to follow a certain structure. Let the parameter locations be such that:
```
                            x    y    z
param_loc[0] = np.array([[0.5, 1.5, 0.5],   # Parameter 0
                         [0.5, 2.5, 0.5],   # Parameter 1
                         [0.5, 3.5, 0.5]]   # Parameter 2
```
and the parameter indices (in hd_param_ind) are for example:
```
hd_param_ind[0] = [0, 2]    # Condition parameter 0 and 2
```
so we will use param_loc[0][hd_param_ind[0], :] for obtaining the array:
```
np.array([[0.5, 1.5, 0.5],
          [0.5, 3.5, 0.5]]).
```
This array indicates the physical locations where hard conditioning should be applied for the model type 0. Let the parameter values (in hd_param_val) be given by:
```
hd_param_val[0] = np.array([1.2, 2.5]).
```
Together with the conditioning locations above, this imposes hard conditioning data as follows:
```
 x     y     z     val
0.5   1.5   0.5    1.2
0.5   3.5   0.5    2.5
```
Note that it is possible to NOT define compute_log_p_pri and compute_log_p_gen. In this case, a predefined function will be used. This predefined implementation assumes that the quantities p_pri and p_gen are only used TOGETHER in the form of a RATIO

ratio(m) = rho(m) / phi(m).

In other words, the default functions assume that we are only interested in the DIFFERENCE of the log values, i.e.

log_p_pri - log_p_gen,

and never in the exact values on their own. It is left to the user to implement a more suitable computation, whenever the above assumption is not sufficient. For more information also consult the theoretical description of the PoPEx method.
It is also possible to NOT define the learning_scheme. In this case, the log-likelihood value will ALWAYS be computed.