API of PoPEx Package¶
The API Documentation of the package
popex
Algorithm¶
Utilities¶
Generic PoPEx utilities¶
utils.py contains utilities for performing a PoPEx sampling and computing predictions:
- Category probabilities and kld maps
compute_cat_prob(): Computes probability maps according to categoriesupdate_cat_prob(): Updates probability maps according to categoriescompute_entropy(): Computes entropy of a probability mapcompute_kld(): Computes kld of two probability maps
- Hard conditioning data
generate_hd(): Computes the new hard conditioning datamerge_hd(): Merges prior and new hard conditioning datacompute_ncmod(): Computes the number of conditioning points per model typecompute_w_lik(): Computes the likelihood weights (used for the hard conditioning maps)
- Generic functions
compute_w_pred(): Computes the weights for the predictionscompute_subset_ind(): Computes the smallest number of indices that cover a given percentage of a total weightwrite_hd_info(): Writes/saves hd information about each modelwrite_run_info(): Appends information about models to run info file
-
popex.utils.check_category(list_values, category)¶ check_category verifies for each value if it falls in category
Parameters: - list_values (ndarray) – list of scalars
- category (list) – list of 2-tuples
Returns: i-th elements is True if list_values[i] belongs to the category and False otherwise. category is defined as union of all intervals defined by the 2-tuples (cf. popex_objects.CatMType)
Return type: ndarray, shape = np.shape(list_values)
-
popex.utils.check_interval(list_values, interval)¶ check_interval verifies for each value if it falls in interval
Parameters: - list_values (ndarray) – list of scalars
- category (2-tuple) – interval, first value is the lower end, second is the high end
Returns: i-th elements is True if list_values[i] falls in the interval
Return type: ndarray, shape = np.shape(list_values)
-
popex.utils.compute_cat_prob(popex, weights, start=-1, stop=-1)¶ compute_cat_prob computes the weighted category probabilities.
The models are obtained from popex.model and weighted by weights.
Parameters: - popex (PoPEx) – PoPEx main structure (cf popex_objects.PoPEx)
- weights (ndarray, shape=(nmod,)) – Relative weights of the models
- start (int) –
Defines the first model to take into account:
- -1: For starting at 0
- N: For starting at max(N, 0)
- stop (int) –
Defines the last model to take into account
- -1: For stopping at popex.nmod
- N: For stopping at min(N, popex.nmo)
Returns: A tuple of instances that describes the category probabilities for all categorical model types.
If a model type i is not a subclass of
CatMType, the corresponding map is set toNone. If a model is given by (CatModel_1, …, CatModel_m) and the model values in CatModel_i are subdivided into ncat_i categories, then the return value is a (CatProb_1, …, CatProb_m) tuple where return[i].param_val is andarraywith shape=(nparam_i, ncat_i).Return type: m-tuple
-
popex.utils.compute_entropy(p_cat)¶ compute_entropy computes the entropy of p_cat.
The entropy of a discrete probability distributions p = (p_1, …,p_s) is
H(p) = -sum_{i=1}^s p_i log( p_i ).Therefore, if the probability map p_cat is a
m-tuplesuch that p_cat[i].param_val is anndarrayof shape=(nparam_i, nfac_i), the entropy is also anm-tuplewhere H[i].param_val being anndarrayof shape=(nparam,).Notes
Note that t*log(t) -> 0 as t -> 0. Therefore, H(x) = 0 wherever p_i(x) = 0.
Parameters: p_cat (m-tuple) – Tuple of CatProbinstances with p_cat[i].param_val being anndarrayof shape=(nparam_i, nfac_i)Returns: Tuple of entropy maps - return[i] :
NoneorContParam - Return value i is
Noneif p_cat[i] isNone, otherwise it is an instance ofContParam
Return type: m-tuple - return[i] :
-
popex.utils.compute_kld(p_cat, q_cat)¶ compute_kld computes the Kullback-Leibler divergence (KLD) between two category probability maps p_cat and q_cat.
The KLD between two discrete probability distributions p = (p_1, …,p_s) and q = (q_1, …,q_s) is
KLD(p||q) = sum_{i=1}^s p_i log( p_i / q_i).Therefore, if the probability maps p_cat and q_cat are
m-tuplessuch that p_cat[i].param_val and q_cat[i].param_val arendarraysof shape=(nparam_i, nfac_i), the Kullback-Leibler divergence is also anm-tuplewhere kld[i].param_val is anndarrayof shape=(nparam,).Notes
Note that t*log(t/a) -> 0 as t -> 0. Therefore, we require that q_i(x) = 0 implies p_i(x) = 0 in which case we can put kld(x) = 0. However, due to the (inaccurate) numerical representation of the probability maps, it is possible that q_i(x) = 0 and p_i(x) > 0 (f.e. when q has been approximated from a relative small set of models). In this case we enforce q_i(x) = p_i(x) what leads to kld(x) = 0.
Parameters: - p_cat (m-tuple) – Tuple of
CatProbinstances with p_cat[i].param_val being anndarrayof shape=(nparam_i x nfac_i) - q_cat (m-tuple) – Tuple of
CatProbinstances with q_cat[i].param_val being anndarrayof shape=(nparam_i, nfac_i)
Returns: Tuple of kld maps
- return[i] :
NoneorContParam Return value i is
Noneif p_cat[i]` isNone, otherwise it is an instance ofContParam
Return type: m-tuple
- p_cat (m-tuple) – Tuple of
-
popex.utils.compute_ncmod(popex, meth_w_hd=None)¶ compute_ncmod computes, for each model type, the number of conditioning points.
It is assumed that the number of hard data is restricted model type-wise. Therefore, the number of conditioning points is also computed model type-wise by sampling from an uniform random variable ~U(0, popex.ncmax[imtype]).
Notes
Note that if the total sum of the likelihood values in popex.p_lik is zero, ncmod is set to zero for each model type.
Parameters: - popex (PoPEx) – PoPEx main structure
- meth_w_hd (dict) – Method to compute hard conditioning weights (cf.
compute_w_lik())
Returns: Number of conditioning points per model type
Return type: m-tuple
-
popex.utils.compute_subset_ind(p_frac, weights)¶ compute_subset_ind computes the smallest index set that covers a given percentage.
This means that the subset indices ind are such that
np.sum(weights[ind]) >= p_frac * np.sum(weights),or in other words weights[ind] covers at least a fraction of p_frac of the total some of weights.
Parameters: - p_frac (float) – Coverage fraction in (0, 1]
- weights (ndarray, shape=(nw,)) – Non-negative weights
Returns: Subset of indices
Return type: list
-
popex.utils.compute_w_lik(popex, meth=None)¶ compute_w_lik returns the set of normalized likelihood values.
In practice, when the likelihood values must be represented by a floating point number, it might be advantageous to compute approximations of L(m).
There are several approximation possibilities that are implemented in this version (specified in meth):
No approximation (meth={‘name’: ‘exact’} or meth=None):
L(m) = exp( ‘log_p_lik’ )
Sqrt-unskewed (meth={‘name’: ‘exp_sqrt_log’})
L(m) ~ exp( -sqrt(-‘log_p_lik’ )
K-unskewed (meth={‘name’: ‘exp_sqrt_log’, ‘pow’: k})
L(m) ~ exp( - (-‘log_p_lik’)^k )
Inverse log (meth={‘name’: ‘inv_log’})
L(m) ~ 1 / ( 1-‘log_p_lik’ )
Inverse sqrt-log (meth={‘name’: ‘inv_sqrt_log’})
L(m) ~ 1 / ( 1+sqrt(-‘log_p_lik’) ).
Soft likelihood (meth={‘name’: ‘soft’, ‘fsigma’: fsigma})
L(m) ~ exp(‘log_p_lik’/fsigma^2).
As mentioned above, these techniques aim to unskew the likelihood values.
Notes
This function is used in two different locations (with possibly two different approximation techniques): for the learning scheme in the PoPEx sampling and for computing predictions. While in first case any approximation technique can be used, the latter choice might bias the computation weights.
Parameters: - popex (PoPEx) – PoPEx main structure
- meth (dict) –
Defines the approximation method to be used. Fields are
'name': Name of the method (str)'pow': Power for method (c) (float)
Returns: Array of normalized weights
Return type: ndarray, shape=(nmod,)
-
popex.utils.compute_w_pred(popex, nw_min=0, ibnd=-1, meth=None)¶ compute_w_pred returns the set of normalized predictive weights.
For assuring a minimum number of effective weights, they are computed such that
ne(w_pred) = min(nw_min, ne(w))where w contains the weights associated to the models and ne(w) denotes the number of effective weights. This quantity can be modified by replacing w with w^alpha, where alpha > 0. A 1-d optimisation problem is used to compute the optimal alpha value.
Parameters: - popex (PoPEx) – PoPEx main structure
- nw_min (int) – Mininum number of effective weights (= l_0)
- ibnd (int) – Length of the weight array
- meth (dict) –
Defines the approximation method to be used (cf.
compute_w_lik()) Fields are'name': Name of the method (str)'pow': Power for method (c) (float)'fsigma: f-sigma parametrer for method (f) ‘soft’ (float)
Returns: Array of predictive weights
Return type: ndarray, shape=(nmod,)
-
popex.utils.generate_hd(popex, meth_w_hd, ncmod, kld, p_cat, q_cat)¶ generate_hd generates the hard conditioning data set that is used to sample a new model.
This set of hard conditioning data does NOT include prior hard conditioning. For each model type (imtype), every hard conditioning is obtained by the following 2-steps:
- Sample a location [j] according to the values in the Kullback- Leibler divergence map (i.e. the values in kld[imtype].param_val)
- Sample a model [k] according to the weights from
compute_w_lik()and directly extract the hard conditioning value from popex.model[k][imtype].param_val[j].
In addition to the hard conditioning, this function also extracts probability values from q_cat and p_cat at the conditioning location. These values represent the prior/weighted category probability of the category that corresponds to popex.model[k][imtype].param_val[j]. They can be useful to compute the sampling weight ratio.
Notes
There are two important things to note:
- The two objects hd_prior and hd_generation are the corresponding prior and weighted probability values of the hard conditioning CATEGORY that corresponds to the values in hd_param_val. Therefore, if they are used in the computation of the sampling weight ratio, one uses CATEGORY probabilities and NOT value probabilities.
- Numerical imperfections (for example in the computation of ‘q_cat’) can cause locations where p_cat > 0 but q_cat = 0. In the computation of the Kullback-Leibler divergence we did put corresponding kld values to 0 (by enforcing q_i(x) = p_i(x)) and therefore it is impossible to sample and condition such locations.
Parameters: - popex (PoPEx) – PoPEx main structure
- meth_w_hd (dict) – Method to compute hard conditioning weights (cf.
compute_w_lik()) - ncmod (m-tuple) – Number of conditioning points per model type
- kld (m-tuple) – Tuple of
ContParaminstances defining the Kullback-Leibler divergence - p_cat (m-tuple) – Tuple of
CatProbinstances defining the weighted category probabilities with p_cat[i].param_val being anndarrayof shape=(nparam_i, nfac_i) - q_cat (m-tuple) – Tuple of
CatProbinstances defining the weighted category probabilities with q_cat[i].param_val being anndarrayof shape=(nparam_i x nfac_i)
Returns: - hd_param_ind (m-tuple) – Tuple of hard conditioning indices where hard conditioning values are
imposed. If there is no hard conditioning for a model type i, then
hd_ind[i] is
Noneotherwise it is anndarrayof shape=(ncmod[i], ndarray). - hd_param_val (m-tuple) – Tuple of hard conditioning values that are imposed at the hard
conditioning indices. If there is no hard conditioning for a model type
i, then hd_val[i] is
Noneotherwise it is anndarrayof shape=(ncmod[i], ndarray). - hd_prior (m-tuple) – Tuple of probability values according to the prior probability maps in
q_cat. Each value corresponds to the prior probability
of the category that contains the extracted hard conditioning value.
If there is no hard conditioning for a model type i,
then hd_val[i] is
None, otherwise it is anndarrayof shape=(ncmod[i], ndarray). - hd_generation (m-tuple) – Tuple of probability values according to the sampling probability maps
in p_cat. Each value corresponds to the sampling probability of the
category that contains the extracted hard conditioning values. If there
is no hard conditioning for a model type i, then hd_val[i] is
Noneotherwise it is anndarrayof shape=(ncmod[i], ndarray).
-
popex.utils.list_mCatParam_to_mCatProb(list_mCatParam)¶ list_mCatParam_to_mCatProb converts a list of m-tuples, each m-tuple containing CatParam, to a m-tuple of CatProb
Parameters: list_mCatParam (list) – list of m-tuples of CatParam objects Returns: each element is a CatProb instance Return type: m-tuple
-
popex.utils.merge_hd(hd_param_ind_1, hd_param_ind_2, hd_param_val_1, hd_param_val_2)¶ merge_hd used for merging two sets of hard conditioning data.
It is assumed that hd_param_ind_i[imtype] is
Noneif and only if hd_param_val_i[imtype] isNone.Parameters: - hd_param_ind_1 (m-tuple) – First set of hard conditioning indices
- hd_param_ind_2 (m-tuple) – Second set of hard conditioning indices
- hd_param_val_1 (m-tuple) – First set of hard conditioning values
- hd_param_val_2 (m-tuple) – Second set of hard conditioning values
Returns: - hd_ind (m-tuple) – Merged set of hard conditioning indices
- hd_par (m-tuple) – Merged set of hard conditioning values
-
popex.utils.update_cat_prob(p_cat, m_new, w_new, sum_w_old)¶ update_cat_prob updates (in place) the category probabilities.
If p_cat_old represents the old category probability maps, then we have
p_cat_new = [sum_w_old*p_cat_old + sum_i w_new_i*1(m_new_i)] / [sum_w_old + sum(w_new)]where 1(m_new_i) is the categorical indicator of the model i.
Parameters: - p_cat (m-tuple) – Tuple of categorical probability maps (cf.
compute_cat_prob()) - m_new (list) – List of m-tuples defining a set of nmod models
- w_new (ndarray, shape=(nmod,)) – Weights associated to the new models
- sum_w_old (float) – Old weight normalization constant
Returns: Return type: None
- p_cat (m-tuple) – Tuple of categorical probability maps (cf.
-
popex.utils.write_hd_info(popex, imod, hd_param_ind, hd_param_val)¶ write_hd_info writes the hard conditioning that has been deduced for creating a specific model to a text file.
The text file is saved at popex.path_res with the following structure:<popex.path_res>$└– hd└– hd_modXXXXXX.txtParameters: - popex (PoPEx) – PoPEx main structure
- imod (int) – Model index
- hd_param_ind (m-tuple) – Hard conditioning indices
- hd_param_val (m-tuple) – Hard conditioning values
Returns: Return type: None
-
popex.utils.write_run_info(pb, popex, imod, log_p_lik, cmp_log_p_lik, log_p_pri, log_p_gen, ncmod)¶ write_run_info writes some algorithm specific information to a text file.
The text file is save at popex.path_res with the following structure:<popex.path_res>$└– run_info.txtParameters: - pb (Problem) – Defines the problem functions and parameters
- popex (PoPEx) – PoPEx main structure
- imod (int) – Model index
- log_p_lik (float) – Log-likelihood value of the model
- cmp_log_p_lik (bool) – Indicates if likelihood has been computed (True) or predicted (False)
- log_p_pri (float) – Prior log-probability of the model
- log_p_gen (float) – Sampling log-probability of the model
- ncmod (m-tuple) – Model type specific number of conditioning points used
Returns: Return type: None
n_e utilities¶
isample.neff is a module that implements the most common importance sampling diagnostics used in the PoPEx procedure. Namely these are
ne()Effective number of weights for estimating the expectationne_var()Effective number of weights for estimating the variancene_gamma()Effective number of weights for estimating the skewnessalpha()Optimization for finding the weight correction exponentcorrect_w()Computes the set of corrected weights
-
popex.isampl.neff.alpha(weights, theta, a_init=1)¶ alpha computes the best alpha for lowering the skewness of the weights.
Let k_1 be the number of zero weights and k_2 is the number of weights that attain the maximum value in ‘weights’. For a given theta in the interval (k_2, n-k_1), this functions finds the unique alpha such that
neff.ne(weights ** alpha) = theta.The alpha is found by using the ‘L-BFGS-B’ optimization algorithm of scipy.optimize with initial value equal to a_init. We set upper and lower bounds to ALPHA_MIN and ALPHA_MAX, respectively. The maximum number of iterations is 10.
Parameters: - weights (ndarray, shape=(n,)) – Set of weights
- theta (float) – A positive value for the effective number of weights
- a_init (float) – Initial guess for the optimization problem
Returns: alpha value
Return type: float
-
popex.isampl.neff.correct_w(weights, ne_w_corr)¶ correct_w is a function that lowers the skewness of the weights.
In addition to the method
alpha()it computes whether the correction is possible and treats the exceptions accordingly.Parameters: - weights (ndarray, shape=(n,)) – Set of weights
- ne_w_corr (float) – A positive value for the effective number of weights
Returns: w_corr – Corrected set of weights that is also normalized
Return type: ndarray, shape=(n,)
-
popex.isampl.neff.find_fsigma(popex, theta, fsigma_max, ibnd)¶ fsigma computes the best fsigma for soft likelihood.
Let k_1 be the number of zero weights and k_2 is the number of weights that attain the maximum value in ‘weights’. For a given theta in the interval (k_2, n-k_1), this functions finds the unique alpha such that
neff.ne(weights(soft_lik(log_p_lik, fsigma)) = theta.The f_sigma is found by using the ‘L-BFGS-B’ optimization algorithm of scipy.optimize with initial value equal to f_sigma_init. We set and lower bound to FSIGMA_MIN, upper bound specified as argument.
Parameters: - popex (popex object) – Popex object
- theta (float) – A positive value for the effective number of weights
- fsigma_max (float) – Value greater than 1, upper bound for fsigma
- ibnd (int) – Include only first ibnd iterations
Returns: fsigma value
Return type: float
-
popex.isampl.neff.ne(weights)¶ ne computes the effective number of weights.
Kish’s effective number of weights is computed as
n_e(w_1, …, w_n) = ( sum w_i )^2 / ( sum w_i^2 ).Parameters: weights (ndarray, shape=(n,)) – Set of weights. Returns: Effective number of weights. Return type: float
-
popex.isampl.neff.ne_gamma(weights)¶ ne_gamma computes the effective number of weights for the skewness.
The effective number of weights for estimating the skewness is computed as:
n_e(w_1, …, w_n) = ( sum w_i^2 )^3 / ( ( sum w_i^3 )^2 ).Parameters: weights (ndarray, shape=(n,)) – Set of weights. Returns: Effective number of weights (for the skewness). Return type: float
-
popex.isampl.neff.ne_var(weights)¶ ne_var computes the effective number of weights for estimating the variance.
The effective number of weights for an empirical estimation of the variance is computed as:
n_e(w_1, …, w_n) = ( sum w_i^2 )^2 / ( sum w_i^4 ).Parameters: weights (ndarray, shape=(n,)) – Set of weights. Returns: Effective number of weights (for the variance). Return type: float
Classes¶
popex_objects.py contains the PoPEx-specific class definitions.
- Main structure
PoPEx: Main class for any PoPEx simulation. It contains the model chain and any corresponding probability measures.
- Sampling definitions
Problem: Defines the sampling parameters and functionsLearning: Learning scheme for learning the likelihood valuesPrediction: Defines the prediction parameters and functions
- Classes associated to a model type
MType: (abstract) Parent class for each map associated to a model typeContParam: (inherits fromMType) Class for each map that is associated to the model types but not to categories (e.g. kld[imtype], entropy[imtype])CatMType: (abstract, inherits fromMType) Parent class for each map that is associated to categoriesCatProb: (inherits fromCatMType) This class is used for the representation of probability distributions over categories (e.g. p_cat[imtype], q_cat[imtype])CatParam: (inherits fromCatMType) This class is used for the representation of categorized parameter values (e.g. model[j][imtype])
-
class
popex.popex_objects.CatMType(dtype_val='float64', param_val=None, categories=None)¶ This class is the abstract parent of any quantity associated to a categorical model type.
Parameters: - dtype_val (str) – Type of the
ndarrayvalues (eg. ‘int8’, ‘float32’, ‘float64’, etc) - param_val (ndarray) – Values associated to the parameters
- categories (list) –
List of size ncat. Each instance of the list is again a list of 2-tuples that define the value range for the category.
If categories[i] = [(v_1, v_2), (v_3, v_4)], where v_j are real values, then the category i is defined by the union
[v_1, v_2) U [v_3, v_4)
-
ncat¶ Number of categories.
Returns: Number of categories in categories Return type: int
- dtype_val (str) – Type of the
-
class
popex.popex_objects.CatParam(dtype_val='float64', param_val=None, categories=None)¶ This class is used to define a categorized 1-dimensional parameter map that is associated to a model type (e.g. model, etc). The categories of each model parameter in param_val is indicated in param_cat. These categories are automatically updated if param_val or categories change.
Notes
The shape of param_val and param_cat is shape=(nparam,).
-
param_cat¶ Category indicator array.
Returns: Category indicators of the values in param_val. Return type: ndarray, shape=(nparam,)
-
-
class
popex.popex_objects.CatProb(dtype_val='float64', param_val=None, categories=None)¶ This class is used to define a map of continuous values for each category, where each value is associated to a model parameter (e.g. p_cat, q_cat, etc.).
Notes
The shape of param_val is shape=(nparam, ncat).
-
class
popex.popex_objects.ContParam(dtype_val='float64', param_val=None)¶ This class is used to define a map of continuous values where each value is associated to a model parameter (e.g. entropy, kld, etc.).
Notes
The shape of param_val is (nparam,).
-
class
popex.popex_objects.Learning¶ Learning defines an abstract parent class for a learning scheme.
Let’s assume that we want to define a learning scheme that predicts the log-likelihood of a model. In this case we define an explicite sub-class of
Learningand provide implementations of the methodsIt is assumed that there is a choice between ‘evaluating the exact answer’ (which is very expensive) or ‘predicting the answer by a machine learning scheme’ (which should be very fast). The learning scheme undergoes the following main steps
- Update the learning scheme regularly by using the function
trian()(cf. upd_ls_freq in algorithm.run_popex_mp). Note that here you can choose to only use likelihood values that have effectively been computed (cf. PoPEx.cmp_log_p_lik). - For a given instance compute a probability p in [0,1] with which
the log-likelihood is predicted or evaluated (cf.
compute_p_eval_for()) and then the value is predicted (cf.learn_value_of()).
Notes
In the PoPEx framework this can be used to learn the log-likelihood values for each model (=value of interest). In this regard, predicting a value rather than computing it can considerably improve the overall computational time.
-
compute_p_eval_for(model)¶ Computes and return a probability value in [0, 1] that determines whether a model should be evaluated exactly.
The two extreme confidence values signify:
p=0: Value can be learned from the learning scheme p=1: Value should be evaluated exactly.Parameters: model (m-tuple) – Tuple of Mtypeinstances that define the new modelReturns: Probability value in [0, 1] Return type: float
-
learn_value_of(model)¶ Uses the existing learning scheme to learn the value of interest for an instance.
Notes
This function should raise an error if there is no existing learning scheme.
Parameters: model (m-tuple) – Tuple of Mtypeinstances that define the new modelReturns: Predicted log-likelihood value Return type: float
- Update the learning scheme regularly by using the function
-
class
popex.popex_objects.MType(dtype_val='float64', param_val=None)¶ This class is the parent of any quantity associated to a model type.
Parameters: - dtype_val (str) – Type of the
ndarrayvalues (eg. ‘int8’, ‘float32’, ‘float64’, etc) - param_val (ndarray) – Values associated to the parameters
-
nparam¶ Number of parameters.
Returns: Number of values in param_val Return type: int
- dtype_val (str) – Type of the
-
class
popex.popex_objects.PoPEx(model=None, log_p_lik=array([], dtype=float64), cmp_log_p_lik=array([], dtype=bool), log_p_pri=array([], dtype=float64), log_p_gen=array([], dtype=float64), ncmax=(0, ), nc=None, nmtype=1, path_res='~/')¶ Main class for any PoPEx simulation.
This class is the main object for the PoPEx algorithm. It contains all the models, likelihood, log-prior and log-generation information of a PoPEx run.
Parameters: - model (list) –
List of models
- model[j] : m-tuple
- Tuple of
MTypeinstances
- log_p_lik (ndarray, shape=(nmod,)) – Natural logarithm of likelihood measure
- cmp_log_p_lik (ndarray, shape=(nmod,)) – Boolean indicator whether the log-likelihood value has been computed (True) or predicted (False)
- log_p_pri (ndarray, shape=(nmod,)) – Natural logarithm of prior measure value
- log_p_gen (ndarray, shape=(nmod,)) – Natural logarithm of sampling measure value
- ncmax (m-tuple) – Maximum number of conditioning points for each model type
- nc (list) –
List of m-tuples
- nc[j] : m-tuple
- Contains the number of conditioning used conditioning points in the generation of model[j]
- nmtype (int) – Number of model types
- path_res (str) – Path of the results
-
add_model(imod, model, log_p_lik, cmp_log_p_lik, log_p_pri, log_p_gen, ncmod)¶ Appends a new model at the end of the model list and updates the measure arrays.
Parameters: - imod (int) – Model index
- model (m-tuple) – Tuple of
MTypeinstances defining a new model - log_p_lik (float) – Log-likelihood value of model
- cmp_log_p_lik (bool) – Indicates if the log-likelihood has been computed (True) or predicted (False)
- log_p_pri (float) – Log-prior value of model
- log_p_gen (float) – Log-generation value of model
- ncmod (m-tuple) – Defines the number of conditioning points that have been used in the generation of the model
Returns: Return type: None
-
insert_model(loc, imod, model, log_p_lik, cmp_log_p_lik, log_p_pri, log_p_gen, ncmod)¶ Inserts a new model at loc of the model list and updates the measure arrays.
Parameters: - loc (int) – Location of the insertion
- imod (int) – Model index
- model (m-tuple) – Tuple of
MTypeinstances defining a model - log_p_lik (float) – Log-likelihood value of model
- cmp_log_p_lik (bool) – Indicates if the log-likelihood has been computed (True) or predicted (False)
- log_p_pri (float) – Log-prior value of model
- log_p_gen (float) – Log-generation value of model
- ncmod (m-tuple)) – Defines the number of conditioning points that have been used in the generation of the model
Returns: Return type: None
-
nmod¶ Number of models
Returns: Number of models in model Return type: int
- model (list) –
-
class
popex.popex_objects.Prediction(compute_pred=None, meth_w_pred=None, nw_min=None, wfrac_pred=1.0)¶ Defines a prediction that should be computed based on an existing PoPEx instance.
The user must provide function definitions for compute_pred that actually implements the prediction operator. Note that there is no return value expected from that function. Any important result can be saved under
<path_res>$└– solution└– pred_<name>_modXXXXXXParameters: - compute_pred (function) –
Computes the prediction for a given model.
compute_pred(popex, imod)
- Parameters:
- popex : PoPEx
- PoPEx main structure (cf
popex.popex_objects.PoPEx)
- imod : int
- Model index
- Returns:
- None
- meth_w_pred (dict) – Defines the method used for computing the prediction weights (cf.
popex.utils.compute_w_lik()) - nw_min (float) – Minimum number of effective weights (= l_0)
- wfrac_pred (float) – Number in (0,1] defining the fraction of the total weight to be used for the prediction. If p=1, all predictions for any model with non-zero weight are computed. If p<1, we take the minimum number of weights to cover a ratio of p of the total sum of weights.
- compute_pred (function) –
-
class
popex.popex_objects.Problem(generate_m, compute_log_p_lik, get_hd_pri, compute_log_p_pri=None, compute_log_p_gen=None, learning_scheme=None, meth_w_hd=None, nmtype=1, seed=0)¶ Defines the sampling problem that should be addressed by the PoPEx method.
The user must provide function definitions for generate_m and compute_log_p_lik. For the definition of the model space, we can also provide ‘prior hard conditioning’ through the function get_hd_pri. Optionally, a likelihood learning scheme can be defined in learning_scheme. Furthermore, one also must define how to compute the ratio in the importance sampling weights. For this, the functions compute_log_p_pri and compute_log_p_gen can also be defined manually. If they are left empty, the default version that only considers the hard conditioning data points is used.
Parameters: - generate_m (function) –
Generates a new model m from a set of hard conditioning data.
generate_m(hd_param_ind, hd_param_val, imod)
- Parameters:
- hd_param_ind : m-tuple
- For each instance in the model tuple, this variable defines the
hard conditioning INDICES (where to apply HD).
hd_param_ind[i] is an
ndarrayof shape=(nhd_i,)
- hd_param_val : m-tuple
- For each instance in the model tuple, this variable defines the
hard conditioning VALUES (what to imposed).
hd_param_val[i] is an
ndarrayof shape=(nhd_i,)
- imod : int
- Model index
- Returns:
- m-tuple
- New model such as (CatParam_1, …, CatParam_m)
- compute_log_p_lik (function) –
Computes the natural logarithm of the likelihood of a model. It usually runs an expensive forward operation and compares the response to a given set of observations.
compute_log_p_lik(model, imod)
- Parameters:
- model : m-tuple
- Model such as (CatParam_1, …, CatParam_m) (cf. output of generate_m())
- imod : int
- Model index
- Returns:
- float
- Log-likelihood value of the model
- get_hd_pri (function) –
Provides the ‘prior hard conditioning’ that is used in the definition of the model space (i.e. parameter values that are known without uncertainty).
get_hd_pri()
- Returns
- hd_pri_ind : m-tuple
- For each instance in the model tuple, this variable defines the hard conditioning INDICES.
- hd_pri_val : m-tuple
- For each instance in the model tuple, this variable defines the hard conditioning VALUES.
- compute_log_p_pri (function, optional) –
This function computes the log-prior probability of a model that has been generated from a given set of hard conditioning data. Note that it’s definition is OPTIONAL. If it is left undefined, a default implementation will be used (see remark below).
compute_log_p_pri(model, hd_p_pri, hd_param_ind)
- Parameters:
- model : m-tuple
- Model such as (CatParam_1, …, CatParam_m) (cf. output of generate_m())
- hd_p_pri : m-tuple
- Tuple of the hard conditioning probability values for a given
model. Each probability value describes the prior probability
of observing the category of the model value
at the corresponding conditioning location.
hd_p_pri[i] is an
ndarrayof shape=(nhd_i,)
- hd_param_ind : m-tuple
- Hard conditioning indices
- Returns:
- float
- Log-prior probability value
- compute_log_p_gen (function, optional) –
This function computes the log-probability of generating a model in the PoPEx sampling from a given set of hard conditioning. Note that it’s definition is OPTIONAL. If it is left undefined, a default implementation will be used (see remark below).
compute_log_p_gen(model, hd_p_gen, hd_param_ind)
- Parameters:
- model : m-tuple
- Model such as (CatParam_1, …, CatParam_m) (cf. output of generate_m())
- hd_p_gen : m-tuple
- Tuple of the hard conditioning probability values for a given
model. Each probability value describes the prior probability
of observing the category of the model value
at the corresponding conditioning location.
hd_p_gen[i] is an
ndarrayof shape=(nhd_i,)
- hd_param_ind : m-tuple
- Hard conditioning indices
- Returns:
- float
- Log-generation probability value
- learning_scheme (Learning, optional) – Learning scheme for log_p_lik (concrete sublcass of
Learning) - meth_w_hd (dict, optional) – Defines the method for computing the learning weights that are used in
the computation of the hard conditioning points (cf.
compute_w_lik()) - nmtype (int) – Number of model types
- seed (int) – Initial seed
Notes
Let us provide a simple example for hard conditioning data in hd_param_ind and hd_param_val. It is important to note that PoPEx does NOT use any parameter locations. They might be defined by the user. If so, they have to follow a certain structure. Let the parameter locations be such that:
x y z param_loc[0] = np.array([[0.5, 1.5, 0.5], # Parameter 0 [0.5, 2.5, 0.5], # Parameter 1 [0.5, 3.5, 0.5]] # Parameter 2
and the parameter indices (in hd_param_ind) are for example:
hd_param_ind[0] = [0, 2] # Condition parameter 0 and 2
so we will use param_loc[0][hd_param_ind[0], :] for obtaining the array:
np.array([[0.5, 1.5, 0.5], [0.5, 3.5, 0.5]]).
This array indicates the physical locations where hard conditioning should be applied for the model type 0. Let the parameter values (in hd_param_val) be given by:
hd_param_val[0] = np.array([1.2, 2.5]).
Together with the conditioning locations above, this imposes hard conditioning data as follows:
x y z val 0.5 1.5 0.5 1.2 0.5 3.5 0.5 2.5
Note that it is possible to NOT define compute_log_p_pri and compute_log_p_gen. In this case, a predefined function will be used. This predefined implementation assumes that the quantities p_pri and p_gen are only used TOGETHER in the form of a RATIO
ratio(m) = rho(m) / phi(m).
In other words, the default functions assume that we are only interested in the DIFFERENCE of the log values, i.e.
log_p_pri - log_p_gen,
and never in the exact values on their own. It is left to the user to implement a more suitable computation, whenever the above assumption is not sufficient. For more information also consult the theoretical description of the PoPEx method.
It is also possible to NOT define the learning_scheme. In this case, the log-likelihood value will ALWAYS be computed.
- generate_m (function) –