Module Index

class bayesreg.BLR(**kwargs)[source]

Bases: object

Bayesian linear regression

Estimation and prediction of Bayesian linear regression models

Basic usage:

B = BLR()
hyp = B.estimate(hyp0, X, y)
ys,s2 = B.predict(hyp, X, y, Xs)

where the variables are

Parameters:
  • hyp – vector of hyperparmaters.

  • X – N x D data array

  • y – 1D Array of targets (length N)

  • Xs – Nte x D array of test cases

  • hyp0 – starting estimates for hyperparameter optimisation

Returns:

  • ys - predictive mean

  • s2 - predictive variance

The hyperparameters are:

hyp = ( log(beta), log(alpha) )  # hyp is a list or numpy array

The implementation and notation mostly follows Bishop (2006). The hyperparameter beta is the noise precision and alpha is the precision over lengthscale parameters. This can be either a scalar variable (a common lengthscale for all input variables), or a vector of length D (a different lengthscale for each input variable, derived using an automatic relevance determination formulation). These are estimated using conjugate gradient optimisation of the marginal likelihood.

Reference: Bishop (2006) Pattern Recognition and Machine Learning, Springer

Written by A. Marquand

dloglik(hyp, X, y, Xv=None)[source]

Function to compute derivatives

estimate(hyp0, X, y, **kwargs)[source]

Function to estimate the model

Parameters:
  • hyp – hyperparameter vector

  • X – covariates

  • y – responses

  • optimizer – optimisation algorithm (‘cg’,’powell’,’nelder-mead’,’l0bfgs-b’)

loglik(hyp, X, y, Xv=None)[source]

Function to compute compute log (marginal) likelihood

penalized_loglik(hyp, X, y, Xv=None, l=0.1, norm='L1')[source]

Function to compute the penalized log (marginal) likelihood

Parameters:
  • hyp – hyperparameter vector

  • X – covariates

  • y – responses

  • Xv – covariates for heteroskedastic noise

  • l – regularisation penalty

  • norm – type of regulariser (L1 or L2)

post(hyp, X, y, Xv=None)[source]

Generic function to compute posterior distribution.

This function will save the posterior mean and precision matrix as self.m and self.A and will also update internal parameters (e.g. N, D and the prior covariance (Sigma_a) and precision (Lambda_a).

Parameters:
  • hyp – hyperparameter vector

  • X – covariates

  • y – responses

  • Xv – covariates for heteroskedastic noise

predict(hyp, X, y, Xs, var_groups_test=None, var_covariates_test=None, **kwargs)[source]

Function to make predictions from the model

Parameters:
  • hyp – hyperparameter vector

  • X – covariates for training data

  • y – responses for training data

  • Xs – covariates for test data

  • var_covariates_test – test covariates for heteroskedastic noise

This always returns Gaussian predictions, i.e.

Returns:

  • ys - predictive mean

  • s2 - predictive variance

predict_and_adjust(hyp, X, y, Xs=None, ys=None, var_groups_test=None, var_groups_adapt=None, **kwargs)[source]

Function to transfer the model to a new site. This is done by first making predictions on the adaptation data given by X, adjusting by the residuals with respect to y.

Parameters:
  • hyp – hyperparameter vector

  • X – covariates for adaptation (i.e. calibration) data

  • y – responses for adaptation data

  • Xs – covariate data (for which predictions should be adjusted)

  • ys – true response variables (to be adjusted)

  • var_groups_test – variance groups (e.g. sites) for test data

  • var_groups_adapt – variance groups for adaptation data

There are two possible ways of using this function, depending on whether ys or Xs is specified

If ys is specified, this is applied directly to the data, which is assumed to be in the input space (i.e. not warped). In this case the adjusted true data points are returned in the same space

Alternatively, Xs is specified, then the predictions are made and adjusted. In this case the predictive variance are returned in the warped (i.e. Gaussian) space.

This function needs to know which sites are associated with which data points, which provided by var_groups_xxx, which is a list or array of scalar ids .

class gp.CovBase(x=None)[source]

Bases: object

Base class for covariance functions.

All covariance functions must define the following methods:

CovFunction.get_n_params()
CovFunction.cov()
CovFunction.xcov()
CovFunction.dcov()
abstract cov(theta, x, z=None)[source]

Return the full covariance (or cross-covariance if z is given)

abstract dcov(theta, x, i)[source]

Return the derivative of the covariance function with respect to the i-th hyperparameter

get_n_params()[source]

Report the number of parameters required

class gp.CovLin(x=None)[source]

Bases: CovBase

Linear covariance function (no hyperparameters)

cov(theta, x, z=None)[source]

Return the full covariance (or cross-covariance if z is given)

dcov(theta, x, i)[source]

Return the derivative of the covariance function with respect to the i-th hyperparameter

get_n_params()

Report the number of parameters required

class gp.CovSqExp(x=None)[source]

Bases: CovBase

Ordinary squared exponential covariance function. The hyperparameters are:

theta = ( log(ell), log(sf) )

where ell is a lengthscale parameter and sf2 is the signal variance

cov(theta, x, z=None)[source]

Return the full covariance (or cross-covariance if z is given)

dcov(theta, x, i)[source]

Return the derivative of the covariance function with respect to the i-th hyperparameter

get_n_params()

Report the number of parameters required

class gp.CovSqExpARD(x=None)[source]

Bases: CovBase

Squared exponential covariance function with ARD The hyperparameters are:

theta = (log(ell_1, ..., log_ell_D), log(sf))

where ell_i are lengthscale parameters and sf2 is the signal variance

cov(theta, x, z=None)[source]

Return the full covariance (or cross-covariance if z is given)

dcov(theta, x, i)[source]

Return the derivative of the covariance function with respect to the i-th hyperparameter

get_n_params()

Report the number of parameters required

class gp.CovSum(x=None, covfuncnames=None)[source]

Bases: CovBase

Sum of covariance functions. These are passed in as a cell array and intialised automatically. For example:

C = CovSum(x,(CovLin, CovSqExpARD))
C = CovSum.cov(x, )

The hyperparameters are:

theta = ( log(ell_1, ..., log_ell_D), log(sf2) )

where ell_i are lengthscale parameters and sf2 is the signal variance

cov(theta, x, z=None)[source]

Return the full covariance (or cross-covariance if z is given)

dcov(theta, x, i)[source]

Return the derivative of the covariance function with respect to the i-th hyperparameter

get_n_params()

Report the number of parameters required

class gp.GPR(hyp=None, covfunc=None, X=None, y=None, n_iter=100, tol=0.001, verbose=False, warp=None)[source]

Bases: object

Gaussian process regression

Estimation and prediction of Gaussian process regression models

Basic usage:

G = GPR()
hyp = B.estimate(hyp0, cov, X, y)
ys, ys2 = B.predict(hyp, cov, X, y, Xs)

where the variables are

Parameters:
  • hyp – vector of hyperparmaters

  • cov – covariance function

  • X – N x D data array

  • y – 1D Array of targets (length N)

  • Xs – Nte x D array of test cases

  • hyp0 – starting estimates for hyperparameter optimisation

Returns:

  • ys - predictive mean

  • ys2 - predictive variance

The hyperparameters are:

hyp = ( log(sn), (cov function params) )  # hyp is a list or array

The implementation and notation follows Rasmussen and Williams (2006). As in the gpml toolbox, these parameters are estimated using conjugate gradient optimisation of the marginal likelihood. Note that there is no explicit mean function, thus the gpr routines are limited to modelling zero-mean processes.

Reference: C. Rasmussen and C. Williams (2006) Gaussian Processes for Machine Learning

Written by A. Marquand

dloglik(hyp, covfunc, X, y)[source]

Function to compute derivatives

estimate(hyp0, covfunc, X, y, optimizer='cg')[source]

Function to estimate the model

loglik(hyp, covfunc, X, y)[source]

Function to compute compute log (marginal) likelihood

post(hyp, covfunc, X, y)[source]

Generic function to compute posterior distribution.

predict(hyp, X, y, Xs)[source]

Function to make predictions from the model

normative.estimate(covfile, respfile, **kwargs)[source]

Estimate a normative model

This will estimate a model in one of two settings according to theparticular parameters specified (see below)

  • under k-fold cross-validation. requires respfile, covfile and cvfolds>=2

  • estimating a training dataset then applying to a second test dataset. requires respfile, covfile, testcov and testresp.

  • estimating on a training dataset ouput of forward maps mean and se. requires respfile, covfile and testcov

The models are estimated on the basis of data stored on disk in ascii or neuroimaging data formats (nifti or cifti). Ascii data should be in tab or space delimited format with the number of subjects in rows and the number of variables in columns. Neuroimaging data will be reshaped into the appropriate format

Basic usage:

estimate(covfile, respfile, [extra_arguments])

where the variables are defined below. Note that either the cfolds parameter or (testcov, testresp) should be specified, but not both.

Parameters:
  • respfile – response variables for the normative model

  • covfile – covariates used to predict the response variable

  • maskfile – mask used to apply to the data (nifti only)

  • cvfolds – Number of cross-validation folds

  • testcov – Test covariates

  • testresp – Test responses

  • alg – Algorithm for normative model

  • configparam – Parameters controlling the estimation algorithm

  • saveoutput – Save the output to disk? Otherwise returned as arrays

  • outputsuffix – Text string to add to the output filenames

  • inscaler – Scaling approach for input covariates, could be ‘None’ (Default), ‘standardize’, ‘minmax’, or ‘robminmax’.

  • outscaler – Scaling approach for output responses, could be ‘None’ (Default), ‘standardize’, ‘minmax’, or ‘robminmax’.

All outputs are written to disk in the same format as the input. These are:

Outputs:
  • yhat - predictive mean

  • ys2 - predictive variance

  • nm - normative model

  • Z - deviance scores

  • Rho - Pearson correlation between true and predicted responses

  • pRho - parametric p-value for this correlation

  • rmse - root mean squared error between true/predicted responses

  • smse - standardised mean squared error

The outputsuffix may be useful to estimate multiple normative models in the same directory (e.g. for custom cross-validation schemes)

normative.evaluate(Y, Yhat, S2=None, mY=None, sY=None, nlZ=None, nm=None, Xz_tr=None, alg=None, metrics=['Rho', 'RMSE', 'SMSE', 'EXPV', 'MSLL'])[source]

Compute error metrics This function will compute error metrics based on a set of predictions Yhat and a set of true response variables Y, namely:

  • Rho: Pearson correlation

  • RMSE: root mean squared error

  • SMSE: standardized mean squared error

  • EXPV: explained variance

If the predictive variance is also specified the log loss will be computed (which also takes into account the predictive variance). If the mean and standard deviation are also specified these will be used to standardize this, yielding the mean standardized log loss

Parameters:
  • Y – N x P array of true response variables

  • Yhat – N x P array of predicted response variables

  • S2 – predictive variance

  • mY – mean of the training set

  • sY – standard deviation of the training set

Returns metrics:

evaluation metrics

normative.extend(covfile, respfile, maskfile=None, **kwargs)[source]

This function extends an existing HBR model with data from new sites/scanners.

Basic usage:

extend(covfile, respfile [extra_arguments])

where the variables are defined below.

Parameters:
  • covfile – covariates for new data

  • respfile – response variables for new data

  • maskfile – mask used to apply to the data (nifti only)

  • model_path – Directory containing the normative model and metadata

  • trbefile – file address to batch effects file for new data

  • batch_size – batch size (for use with normative_parallel)

  • job_id – batch id

  • output_path – the path for saving the the extended model

  • informative_prior – use initial model prior or learn from scratch (default is False).

  • generation_factor – see below

generation factor refers to the number of samples generated for each combination of covariates and batch effects. Default is 10.

All outputs are written to disk in the same format as the input.

normative.fit(covfile, respfile, **kwargs)[source]

Fits a normative model to the data.

Parameters: covfile (str): The path to the covariates file. respfile (str): The path to the response variables file. maskfile (str, optional): The path to the mask file. Defaults to None. alg (str, optional): The algorithm to use. Defaults to ‘gpr’. savemodel (bool, optional): Whether to save the model. Defaults to True. outputsuffix (str, optional): The suffix to append to the output files. Defaults to ‘fit’. inscaler (str, optional): The scaler to use for the input data. Defaults to ‘None’. outscaler (str, optional): The scaler to use for the output data. Defaults to ‘None’.

Returns: None

normative.get_args(*args)[source]

Parse command line arguments for normative modeling

Parameters:

args – command line arguments

Returns respfile:

response variables for the normative model

Returns maskfile:

mask used to apply to the data (nifti only)

Returns covfile:

covariates used to predict the response variable

Returns cvfolds:

Number of cross-validation folds

Returns testcov:

Test covariates

Returns testresp:

Test responses

Returns func:

Function to call

Returns alg:

Algorithm for normative model

Returns configparam:

Parameters controlling the estimation algorithm

Returns kw_args:

Additional keyword arguments

normative.load_response_vars(datafile, maskfile=None, vol=True)[source]

Load response variables from file. This will load the data and mask it if necessary. If the data is in ascii format it will be converted into a numpy array. If the data is in neuroimaging format it will be reshaped into a 2D array (subjects x variables) and a mask will be created if necessary.

Parameters:
  • datafile – File containing the response variables

  • maskfile – Mask file (nifti only)

  • vol – If True, load the data as a 4D volume (nifti only)

Returns Y:

Response variables

Returns volmask:

Mask file (nifti only)

normative.main(*args)[source]

Parse arguments and estimate model

normative.merge(covfile=None, respfile=None, **kwargs)[source]

This function extends an existing HBR model with data from new sites/scanners.

Basic usage:

merge(model_path1, model_path2 [extra_arguments])

where the variables are defined below.

Parameters:
  • covfile – Not required. Always set to None.

  • respfile – Not required. Always set to None.

  • model_path1 – Directory containing the model and metadata (1st model)

  • model_path2 – Directory containing the model and metadata (2nd model)

  • batch_size – batch size (for use with normative_parallel)

  • job_id – batch id

  • output_path – the path for saving the the extended model

  • generation_factor – see below

The generation factor refers tothe number of samples generated for each combination of covariates and batch effects. Default is 10.

All outputs are written to disk in the same format as the input.

normative.predict(covfile, respfile, maskfile=None, **kwargs)[source]

Make predictions on the basis of a pre-estimated normative model If only the covariates are specified then only predicted mean and variance will be returned. If the test responses are also specified then quantities That depend on those will also be returned (Z scores and error metrics)

Basic usage:

predict(covfile, [extra_arguments])

where the variables are defined below.

Parameters:
  • covfile – test covariates used to predict the response variable

  • respfile – test response variables for the normative model

  • maskfile – mask used to apply to the data (nifti only)

  • model_path – Directory containing the normative model and metadata. When using parallel prediction, do not pass the model path. It will be automatically decided.

  • outputsuffix – Text string to add to the output filenames

  • batch_size – batch size (for use with normative_parallel)

  • job_id – batch id

  • fold – which cross-validation fold to use (default = 0)

  • fold – list of model IDs to predict (if not specified all are computed)

  • return_y – return the (transformed) response variable (default = False)

All outputs are written to disk in the same format as the input. These are:

Outputs:
  • Yhat - predictive mean

  • S2 - predictive variance

  • Z - Z scores

  • Y - response variable (if return_y is True)

normative.save_results(respfile, Yhat, S2, maskvol, Z=None, Y=None, outputsuffix=None, results=None, save_path='')[source]

Writes the results of the normative model to disk.

Parameters: respfile (str): The response variables file. Yhat (np.array): The predicted response variables. S2 (np.array): The predictive variance. maskvol (np.array): The mask volume. Z (np.array, optional): The latent variable. Defaults to None. Y (np.array, optional): The observed response variables. Defaults to None. outputsuffix (str, optional): The suffix to append to the output files. Defaults to None. results (dict, optional): The results of the normative model. Defaults to None. save_path (str, optional): The directory to save the results to. Defaults to ‘’.

Returns: None

normative.transfer(covfile, respfile, testcov=None, testresp=None, maskfile=None, **kwargs)[source]

Transfer learning on the basis of a pre-estimated normative model by using the posterior distribution over the parameters as an informed prior for new data. currently only supported for HBR.

Basic usage:

transfer(covfile, respfile [extra_arguments])

where the variables are defined below.

Parameters:
  • covfile – transfer covariates used to predict the response variable

  • respfile – transfer response variables for the normative model

  • maskfile – mask used to apply to the data (nifti only)

  • testcov – Test covariates

  • testresp – Test responses

  • model_path – Directory containing the normative model and metadata

  • trbefile – Training batch effects file

  • batch_size – batch size (for use with normative_parallel)

  • job_id – batch id

All outputs are written to disk in the same format as the input. These are:

Outputs:
  • Yhat - predictive mean

  • S2 - predictive variance

  • Z - Z scores

normative.tune(covfile, respfile, maskfile=None, **kwargs)[source]

This function tunes an existing HBR model with real data.

Basic usage:

tune(covfile, respfile [extra_arguments])

where the variables are defined below.

Parameters:
  • covfile – covariates for new data

  • respfile – response variables for new data

  • maskfile – mask used to apply to the data (nifti only)

  • model_path – Directory containing the normative model and metadata

  • trbefile – file address to batch effects file for new data

  • batch_size – batch size (for use with normative_parallel)

  • job_id – batch id

  • output_path – the path for saving the the extended model

  • informative_prior – use initial model prior or learn from scracth (default is False).

  • generation_factor – see below

generation factor refers to the number of samples generated for each combination of covariates and batch effects. Default is 10.

All outputs are written to disk in the same format as the input.

normative_parallel.bashwrap_nm(processing_dir, python_path, normative_path, job_name, covfile_path, respfile_path, func='estimate', **kwargs)[source]

This function wraps normative modelling into a bash script to run it on a torque cluster system.

Basic usage:

bashwrap_nm(processing_dir, python_path, normative_path, job_name, covfile_path, respfile_path)
Parameters:
  • processing_dir – Full path to the processing dir

  • python_path – Full path to the python distribution

  • normative_path – Full path to the normative.py

  • job_name – Name for the bash script that is the output of this function

  • covfile_path – Full path to a .txt file that contains all covariates (subjects x covariates) for the responsefile

  • respfile_path – Full path to a .txt that contains all features (subjects x features)

  • cv_folds – Number of cross validations

  • testcovfile_path – Full path to a .txt file that contains all covariates (subjects x covariates) for the testresponse file

  • testrespfile_path – Full path to a .txt file that contains all test features

  • alg – which algorithm to use

  • configparam – configuration parameters for this algorithm

Outputs:

A bash.sh file containing the commands for normative modelling saved to the processing directory (written to disk).

written by (primarily) T Wolfers, (adapted) S Rutherford.

normative_parallel.check_job_status(jobs)[source]

A utility function to count the tasks with different status.

Parameters:

jobs – List of job ids.

Returns:

returns the number of taks athat are queued, running, completed etc

normative_parallel.check_jobs(jobs, delay=60)[source]

A utility function for chacking the status of submitted jobs.

Parameters:
  • jobs – list of job ids.

  • delay – the delay (in sec) between two consequative checks, defaults to 60.

normative_parallel.collect_nm(processing_dir, job_name, func='estimate', collect=False, binary=False, batch_size=None, outputsuffix='_estimate')[source]

Function to checks and collects all batches.

Basic usage:

collect_nm(processing_dir, job_name)
Parameters:
  • processing_dir – Full path to the processing directory

  • collect – If True data is checked for failed batches and collected; if False data is just checked

  • binary – Results in pkl format

Outputs:

Text or pkl files containing all results accross all batches the combined output (written to disk).

Returns 0:

if batches fail

Returns 1:

if bathches complete successfully

written by (primarily) T Wolfers, (adapted) SM Kia, (adapted) S Rutherford.

normative_parallel.delete_nm(processing_dir, binary=False)[source]

This function deletes all processing for normative modelling and just keeps the combined output.

Basic usage:

collect_nm(processing_dir)
Parameters:
  • processing_dir – Full path to the processing directory.

  • binary – Results in pkl format.

written by (primarily) T Wolfers, (adapted) SM Kia, (adapted) S Rutherford.

normative_parallel.execute_nm(processing_dir, python_path, job_name, covfile_path, respfile_path, batch_size, memory, duration, normative_path=None, func='estimate', interactive=False, **kwargs)[source]

Execute parallel normative models This function is a mother function that executes all parallel normative modelling routines. Different specifications are possible using the sub- functions.

Basic usage:

execute_nm(processing_dir, python_path, job_name, covfile_path, respfile_path, batch_size, memory, duration)
Parameters:
  • processing_dir – Full path to the processing dir

  • python_path – Full path to the python distribution

  • normative_path – Full path to the normative.py. If None (default) then it will automatically retrieves the path from the installed packeage.

  • job_name – Name for the bash script that is the output of this function

  • covfile_path – Full path to a .txt file that contains all covariats (subjects x covariates) for the responsefile

  • respfile_path – Full path to a .txt that contains all features (subjects x features)

  • batch_size – Number of features in each batch

  • memory – Memory requirements written as string for example 4gb or 500mb

  • duation – The approximate duration of the job, a string with HH:MM:SS for example 01:01:01

  • cv_folds – Number of cross validations

  • testcovfile_path – Full path to a .txt file that contains all covariates (subjects x covariates) for the test response file

  • testrespfile_path – Full path to a .txt file that contains all test features

  • log_path – Path for saving log files

  • binary – If True uses binary format for response file otherwise it is text

  • interactive – If False (default) the user should manually rerun the failed jobs or collect the results. If ‘auto’ the job status are checked until all jobs are completed then the failed jobs are rerun and the results are automaticallu collectted. Using ‘query’ is similar to ‘auto’ unless it asks for user verification thius is immune to endless loop in the case of bugs in the code.

written by (primarily) T Wolfers, (adapted) SM Kia The documentation is adapated by S Rutherford.

normative_parallel.qsub_nm(job_path, log_path, memory, duration)[source]

This function submits a job.sh scipt to the torque custer using the qsub command.

Basic usage:

qsub_nm(job_path, log_path, memory, duration)
Parameters:
  • job_path – Full path to the job.sh file.

  • memory – Memory requirements written as string for example 4gb or 500mb.

  • duation – The approximate duration of the job, a string with HH:MM:SS for example 01:01:01.

Outputs:

Submission of the job to the (torque) cluster.

written by (primarily) T Wolfers, (adapted) SM Kia, (adapted) S Rutherford.

normative_parallel.rerun_nm(processing_dir, log_path, memory, duration, binary=False, interactive=False)[source]

This function reruns all failed batched in processing_dir after collect_nm has identified the failed batches. Basic usage:

rerun_nm(processing_dir, log_path, memory, duration)
Parameters:
  • processing_dir – Full path to the processing directory

  • memory – Memory requirements written as string for example 4gb or 500mb.

  • duration – The approximate duration of the job, a string with HH:MM:SS for example 01:01:01.

written by (primarily) T Wolfers, (adapted) SM Kia, (adapted) S Rutherford.

normative_parallel.retrieve_jobs()[source]

A utility function to retrieve task status from the outputs of qstat.

Returns:

a dictionary of jobs.

normative_parallel.sbatch_nm(job_path, log_path)[source]

This function submits a job.sh scipt to the torque custer using the qsub command.

Basic usage:

sbatch_nm(job_path, log_path)
Parameters:
  • job_path – Full path to the job.sh file

  • log_path – The logs are currently stored in the working dir

Outputs:

Submission of the job to the (torque) cluster.

written by (primarily) T Wolfers, (adapted) S Rutherford.

normative_parallel.sbatchrerun_nm(processing_dir, memory, duration, new_memory=False, new_duration=False, binary=False, **kwargs)[source]

This function reruns all failed batched in processing_dir after collect_nm has identified he failed batches.

Basic usage:

rerun_nm(processing_dir, memory, duration)
Parameters:
  • processing_dir – Full path to the processing directory.

  • memory – Memory requirements written as string, for example 4gb or 500mb.

  • duration – The approximate duration of the job, a string with HH:MM:SS for example 01:01:01.

  • new_memory – If you want to change the memory you have to indicate it here.

  • new_duration – If you want to change the duration you have to indicate it here.

Outputs:

Re-runs failed batches.

written by (primarily) T Wolfers, (adapted) S Rutherford.

normative_parallel.sbatchwrap_nm(processing_dir, python_path, normative_path, job_name, covfile_path, respfile_path, memory, duration, func='estimate', **kwargs)[source]

This function wraps normative modelling into a bash script to run it on a torque cluster system.

Basic usage:

sbatchwrap_nm(processing_dir, python_path, normative_path, job_name, covfile_path, respfile_path, memory, duration)
Parameters:
  • processing_dir – Full path to the processing dir

  • python_path – Full path to the python distribution

  • normative_path – Full path to the normative.py

  • job_name – Name for the bash script that is the output of this function

  • covfile_path – Full path to a .txt file that contains all covariates (subjects x covariates) for the responsefile

  • respfile_path – Full path to a .txt that contains all features (subjects x features)

  • cv_folds – Number of cross validations

  • testcovfile_path – Full path to a .txt file that contains all covariates (subjects x covariates) for the testresponse file

  • testrespfile_path – Full path to a .txt file that contains all test features

  • alg – which algorithm to use

  • configparam – configuration parameters for this algorithm

Outputs:

A bash.sh file containing the commands for normative modelling saved to the processing directory (written to disk).

written by (primarily) T Wolfers, (adapted) S Rutherford

normative_parallel.split_nm(processing_dir, respfile_path, batch_size, binary, **kwargs)[source]

This function prepares the input files for normative_parallel.

Basic usage:

split_nm(processing_dir, respfile_path, batch_size, binary, testrespfile_path)
Parameters:
  • processing_dir – Full path to the processing dir

  • respfile_path – Full path to the responsefile.txt (subjects x features)

  • batch_size – Number of features in each batch

  • testrespfile_path – Full path to the test responsefile.txt (subjects x features)

  • binary – If True binary file

Outputs:

The creation of a folder struture for batch-wise processing.

witten by (primarily) T Wolfers (adapted) SM Kia, (adapted) S Rutherford.

trendsurf.create_basis(X, basis, mask)[source]

Create a basis set

This will create a basis set for the trend surface model. This is currently fit using a polynomial model of a specified degree. The models are estimated on the basis of data stored on disk in ascii or neuroimaging data formats (currently nifti only). Ascii data should be in tab or space delimited format with the number of voxels in rows and the number of subjects in columns. Neuroimaging data will be reshaped into the appropriate format

Parameters:
  • X – covariates

  • basis – model order for the interpolating polynomial

  • mask – mask used to apply to the data

Returns:

  • Phi - basis set

trendsurf.estimate(filename, maskfile, basis, ard=False, outputall=False, saveoutput=True, **kwargs)[source]

Estimate a trend surface model

This will estimate a trend surface model, independently for each subject. This is currently fit using a polynomial model of a specified degree. The models are estimated on the basis of data stored on disk in ascii or neuroimaging data formats (currently nifti only). Ascii data should be in tab or space delimited format with the number of voxels in rows and the number of subjects in columns. Neuroimaging data will be reshaped into the appropriate format

Basic usage:

estimate(filename, maskfile, basis)

where the variables are defined below. Note that either the cfolds parameter or (testcov, testresp) should be specified, but not both.

Parameters:
  • filename – 4-d nifti file containing the images to be estimated

  • maskfile – nifti mask used to apply to the data

  • basis – model order for the interpolating polynomial

All outputs are written to disk in the same format as the input. These are:

Outputs:
  • yhat - predictive mean

  • ys2 - predictive variance

  • trendcoeff - coefficients from the trend surface model

  • negloglik - Negative log marginal likelihood

  • hyp - hyperparameters

  • explainedvar - explained variance

  • rmse - standardised mean squared error

trendsurf.get_args(*args)[source]

Parse command line arguments

This will parse the command line arguments for the trend surface model. The arguments are:

Parameters:
  • filename – 4-d nifti file containing the images to be estimated

  • maskfile – nifti mask used to apply to the data

  • basis – model order for the interpolating polynomial

  • covfile – file containing covariates

  • ard – use ARD

  • outputall – output all measures

Returns:

  • filename - 4-d nifti file containing the images to be estimated

  • maskfile - nifti mask used to apply to the data

  • basis - model order for the interpolating polynomial

  • covfile - file containing covariates

  • ard - use ARD

  • outputall - output all measures

trendsurf.load_data(datafile, maskfile=None)[source]

Load data from disk

This will load data from disk, either in nifti or ascii format. If the data are in ascii format, they should be in tab or space delimited format with the number of voxels in rows and the number of subjects in columns. Neuroimaging data will be reshaped into the appropriate format

Parameters:
  • datafile – 4-d nifti file containing the images to be estimated

  • maskfile – nifti mask used to apply to the data

Returns:

  • dat - data in vectorised form

  • world - voxel coordinates

  • mask - mask used to apply to the data

trendsurf.main(*args)[source]
trendsurf.write_nii(data, filename, examplenii, mask)[source]

Write data to nifti file

This will write data to a nifti file, using the header information from an example nifti file.

Parameters:
  • data – data to be written

  • filename – name of file to be written

  • examplenii – example nifti file

  • mask – mask used to apply to the data

Returns:

  • Phi - basis set

class rfa.GPRRFA(hyp=None, X=None, y=None, n_feat=None, n_iter=100, tol=0.001, verbose=False)[source]

Bases: object

Random Feature Approximation for Gaussian Process Regression

Estimation and prediction of Bayesian linear regression models

Basic usage:

R = GPRRFA()
hyp = R.estimate(hyp0, X, y)
ys,s2 = R.predict(hyp, X, y, Xs)

where the variables are

Parameters:
  • hyp – vector of hyperparmaters.

  • X – N x D data array

  • y – 1D Array of targets (length N)

  • Xs – Nte x D array of test cases

  • hyp0 – starting estimates for hyperparameter optimisation

Returns:

  • ys - predictive mean

  • s2 - predictive variance

The hyperparameters are:

hyp = [ log(sn), log(ell), log(sf) ]  # hyp is a numpy array

where sn^2 is the noise variance, ell are lengthscale parameters and sf^2 is the signal variance. This provides an approximation to the covariance function:

k(x,z) = x'*z + sn2*exp(0.5*(x-z)'*Lambda*(x-z))

where Lambda = diag((ell_1^2, … ell_D^2))

Written by A. Marquand

dloglik(hyp, X, y)[source]

Function to compute derivatives

estimate(hyp0, X, y, optimizer='lbfgs')[source]

Function to estimate the model

get_n_params(X)[source]
loglik(hyp, X, y)[source]

Function to compute compute log (marginal) likelihood

post(hyp, X, y)[source]

Generic function to compute posterior distribution.

This function will save the posterior mean and precision matrix as self.m and self.A and will also update internal parameters (e.g. N, D and the prior covariance (Sigma) and precision (iSigma).

predict(hyp, X, y, Xs)[source]

Function to make predictions from the model

fileio.alphanum_key(s)[source]

Turn a string into a list of numbers

Basic usage:

alphanum_key(s)
Parameters:

s – string to convert

fileio.create_mask(data_array, mask, verbose=False)[source]

Create a mask from a data array or a nifti file

Basic usage:

create_mask(data_array, mask, verbose)
Parameters:
  • data_array – numpy array containing the data to write out

  • mask – nifti image containing a mask for the image

  • verbose – verbose output

fileio.file_extension(filename)[source]

Determine the file extension of a file (e.g. .nii.gz)

Basic usage:

file_extension(filename)
Parameters:

filename – name of the file to check

fileio.file_stem(filename)[source]

Determine the file stem of a file (e.g. /path/to/file.nii.gz -> file)

Basic usage:

file_stem(filename)
Parameters:

filename – name of the file to check

fileio.file_type(filename)[source]

Determine the file type of a file

Basic usage:

file_type(filename)
Parameters:

filename – name of the file to check

fileio.load(filename, mask=None, text=False, vol=True)[source]

Load a numpy array from a file

Basic usage:

load(filename, mask, text, vol)
Parameters:
  • filename – name of the file to load

  • mask – nifti image containing a mask for the image

  • text – whether to write out a text file

  • vol – whether to load the image as a volume

fileio.load_ascii(filename)[source]

Load an ascii file into a numpy array

Basic usage:

load_ascii(filename)
Parameters:

filename – name of the file to load

fileio.load_cifti(filename, vol=False, mask=None, rmtmp=True)[source]

Load a cifti file into a numpy array

Basic usage:

load_cifti(filename, vol, mask, rmtmp)
Parameters:
  • filename – name of the file to load

  • vol – whether to load the image as a volume

  • mask – nifti image containing a mask for the image

  • rmtmp – whether to remove temporary files

fileio.load_nifti(datafile, mask=None, vol=False, verbose=False)[source]

Load a nifti file into a numpy array

Basic usage:

load_nifti(datafile, mask, vol, verbose)
Parameters:
  • datafile – name of the file to load

  • mask – nifti image containing a mask for the image

  • vol – whether to load the image as a volume

  • verbose – verbose output

fileio.load_pd(filename)[source]

Load a csv file into a pandas dataframe

Basic usage:

load_pd(filename)
Parameters:

filename – name of the file to load

fileio.predictive_interval(s2_forward, cov_forward, multiplicator)[source]

Calculates a predictive interval for the forward model

fileio.save(data, filename, example=None, mask=None, text=False, dtype=None)[source]

Save a numpy array to a file

Basic usage:

save(data, filename, example, mask, text, dtype)
Parameters:
  • data – numpy array containing the data to write out

  • filename – where to store it

  • example – example file to copy the geometry from

  • mask – nifti image containing a mask for the image

  • text – whether to write out a text file

  • dtype – data type for the output image (if different from the image)

fileio.save_ascii(data, filename)[source]

Save a numpy array to an ascii file

Basic usage:

save_ascii(data, filename)
Parameters:
  • data – numpy array containing the data to write out

  • filename – where to store it

fileio.save_cifti(data, filename, example, mask=None, vol=True, volatlas=None)[source]

Save a cifti file from a numpy array

Basic usage:

save_cifti(data, filename, example, mask, vol, volatlas)
Parameters:
  • data – numpy array containing the data to write out

  • filename – where to store it

  • example – example file to copy the geometry from

  • mask – nifti image containing a mask for the image

  • vol – whether to load the image as a volume

  • volatlas – atlas to use for the volume

fileio.save_nifti(data, filename, examplenii, mask, dtype=None)[source]

Write output to nifti

Basic usage:

save_nifti(data, filename mask, dtype)
Parameters:
  • data – numpy array containing the data to write out

  • filename – where to store it

  • examplenii – nifti to copy the geometry and data type from

  • dtype – data type for the output image (if different from the image)

Mask:

nifti image containing a mask for the image

fileio.save_pd(data, filename)[source]

Save a pandas dataframe to a csv file

Basic usage:

save_pd(data, filename)
Parameters:
  • data – pandas dataframe containing the data to write out

  • filename – where to store it

fileio.sort_nicely(l)[source]

Sort a list of strings in a natural way

Basic usage:

sort_nicely(l)
Parameters:

l – list of strings to sort

fileio.tryint(s)[source]

Try to convert a string to an integer

Basic usage:

tryint(s)
Parameters:

s – string to convert

fileio.vol2vec(dat, mask, verbose=False)[source]

Vectorise a 3d image

Basic usage:

vol2vec(dat, mask, verbose)
Parameters:
  • dat – numpy array containing the data to write out

  • mask – nifti image containing a mask for the image

  • verbose – verbose output