[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10. Estimating ibd Based Test Statistics by MCMC

See Concept Index for: ibd-based tests.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.1 Introduction to lm_ibdtests and civil

See References, for details of the cited papers.

The program lm_ibdtests uses identity-by-descent (ibd) based and likelihood-ratio based statistics to construct linkage detection tests. The current version allows only discrete trait data (affected or unaffected or unknown phenotypic status).

The ibd scoring approach involves construction of an ibd measure (T) that is a function of the inheritance vectors and affectation status of the individuals in pedigrees. The program uses realizations of the inheritance vectors conditional only on the marker data (Y) to compute a Monte Carlo estimate of the test statistic E(T|Y). Four different ibd measures are implemented in the program. Two of these measures, T=Slambda and T=Saffunaff (developed by Saonli Basu), allow incorporation both of affected and of unaffected individuals in the analysis. The test statistic is used to test the null hypothesis of no linkage between the trait and a set of markers. For this approach, two different testing options have been implemented; one is a normality-based test and the other is a permutation test. The permutation test keeps the observed marker data unchanged and permutes the affectation status. In the normality-based test, test statistics (T=Spairs, for example) are computed for each realization and averaged over realizations. The program then reports the p-values from each test at the marker loci. For more details of these methods, see [Bas08].

A new (lambda,p) model has been implemented in lm_ibdtests. The (lambda,p) model models the trait-dependent segregation of inheritance vectors at a locus given the trait data on individuals and constructs a chi-square test for linkage detection. The (lambda,p) model incorporates both affected and unaffected individuals in the analysis. The delta model is also implemented in the program. The current version of lm_ibdtests only allows the ibd measure T=Spairs in the delta model set-up. The program returns the p-values of the likelihood-ratio statistics under each of these two models. For a detailed description of the (lambda,p) and delta models, see [Bas10]. For a real data analysis using lm_ibdtests, see [Sie05].

The program civil is due to Yanming Di, see [DT09]. It is still in beta-test version. The program performs marginal and conditional inheritance vector tests for linkage detection and localization. The name civil is an acronym for Conditional Inheritance Vector test In Linkage analysis.

In an inheritance vector test, the test statistic is a score that measures the connection between the observed trait values and the inheritance vector at the test position. Excess such connection provides evidence for genetic linkage. civil implemented two such scores: a variance component type score (the vc-score) and a score developed by Yanming Di (the w-score).

civil computes marginal and conditional test p-values using Monte Carlo method: to approximate the null test statistic distributions, the program will hold trait values fixed and resample the inheritance vectors. The inheritance vectors along a chromosome should follow a Markov Chain distribution in genomic regions absent of causal genetic variants. In a marginal test, the null inheritance vectors are sampled from the marginal distribution of the Markov Chain, which is uniform over the set of all possible inheritance vectors (see Introduction to lm_auto gl_auto and lm_pval). In a conditional inheritance vector test, the inheritance vectors are sampled from the conditional distribution the inheritance vector at the test position given the observed inheritance vectors at the two conditioning positions, as determined by the Markov Chain distribution.

A significant conditional test result provides linkage localization information: it suggests that linkage signal exists in the region bounded by the two conditioning positions, and the conditional p-value gives the false positive probability. A significant marginal test result does not allow such interpretation. For conditional tests, there is a trade-off between power and precision. When the two conditioning positions are more far apart, the conditional test will be more powerful, but a significant conditional test result will provide less precise localization information.

See Concept Index for: lm_ibdtests introduction, civil introduction, vc-score and w-score.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.2 Sample lm_ibdtests parameter file

The example parameter file for lm_ibdtests, ‘ped73_ibdt_IBD.par’, may be found in the ‘TraitTests’ subdirectory of ‘MORGAN_Examples’. Several lines in the example parameter file have been explained in previous sections of the tutorial, only the sections requiring additional explanation are shown below.

 
sample by scan
set L-sampler probability 0.5
set burn-in iterations 1000
check progress MC iterations 1000

compute ibd statistics
set ibd measures Spairs Srobdom
set ibd tests norm permu
set ibd permutations 999

compute scores every 100 iterations

The statement ‘sample by scan’ indicates that all loci or all meioses are updated successively in an order determined by random permutation. The alternative ‘sample by step’ updates only one locus (L-sampler) or one meiosis (M-sampler) in each iteration. The ‘set L-sampler probability’ statement specifies that an L-sampler step/scan will be used at each MCMC iteration with probability 0.5: otherwise the single-meiosis M-sampler will be used. The ‘set burn-in iterations’ statement specifies 1000 iterations to be performed initially, with one trait locus (if any) unlinked to the marker map. The ‘check progress’ statement instructs the program to print the current iteration number to ‘stdout’ every 1000 iterations.

The ‘compute ibd statistics’ statement must be included in the parameter file when running lm_ibdtests. The next line instructs the program to use Spairs and Srobdom to perform the ibd tests. The ‘set ibd tests’ command calls for both normal and permutation tests to be run. The next line is needed since permutation test were requested in the previous line; it specifies how many permutations are to be used in the calculations. In this case, the default (999) is specified; it is recommended that at least 50 permutations are used. The last line in the parameter file is used to specify when to compute scores, the default is every MCMC iteration.

See Concept Index for: sample parameter file for lm_ibdtests.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.3 Sample lm_ibdtests output

Under the subdirectory ‘TraitTests’, run the example with the following command

 
./lm_ibdtests ped73_ibdt_IBD.par

The part of the output that tabulates test statistics and p values is shown below. The upper table provides the permutation-test p-values for each of the two test statistics Spairs and Srobdom at each of the 10 marker-locus positions, these positions being given for both the male and female genetic maps. It is apparent that there is no significant association of the trait with any of these marker positions; the p-values at markers 5 and 6 are somewhat smaller, but do not achieve (e.g.) a 0.05 significance level. The lower table gives the same result, but this time using a Normal distribution approximation to obtain the p-value. In this case the standardized (N(0,1)) value of the test statistic is given, as well as the corresponding p-value. Again there are no significant results in this small example. There is a broad qualitative correspondence between the p-values of the two tables, but the results are not close. This may be due to the small number of permutations used, or, more likely, due to the inadequacies of the Normal approximation.

 
 ************************************
 p Value for Permutation Test for IBD
 ************************************

            pos(Haldane cM)   Spairs  Srobdom
     locus     male  female  p-value  p-value

  marker-1    0.000   0.000   0.9020   0.9300
  marker-2   10.000  10.000   0.8780   0.8450
  marker-3   20.000  20.000   0.8130   0.7800
  marker-4   30.000  30.000   0.5080   0.5190
  marker-5   40.000  40.000   0.2550   0.2480
  marker-6   50.000  50.000   0.2950   0.2510
  marker-7   60.000  60.000   0.3850   0.5090
  marker-8   70.000  70.000   0.5100   0.6660
  marker-9   80.000  80.000   0.6610   0.7750
 marker-10   90.000  90.000   0.5640   0.7470

 *******************************
 p Value for Normal Test for IBD
 *******************************

            pos(Haldane cM)
    locus     male  female   Spairs p-value  Srobdom p-value

  marker-1    0.000   0.000  -0.7843  0.7951  -0.2867  0.6167
  marker-2   10.000  10.000  -0.9574  0.8166  -0.3841  0.6567
  marker-3   20.000  20.000  -1.1825  0.8816  -0.2260  0.5692
  marker-4   30.000  30.000  -0.6437  0.7381  -0.1272  0.5552
  marker-5   40.000  40.000   0.2478  0.4103   0.0986  0.4743
  marker-6   50.000  50.000  -0.2270  0.5752  -0.3275  0.6252
  marker-7   60.000  60.000  -0.1503  0.5612  -0.3514  0.6437
  marker-8   70.000  70.000  -0.3096  0.6372  -0.3587  0.6557
  marker-9   80.000  80.000  -0.4877  0.6902  -0.2706  0.6037
 marker-10   90.000  90.000  -0.2924  0.6222  -0.1136  0.5662

Your values may be different due to different random seeds in your seed file.

For more details about the lm_ibdtest methods, see [Bas08].

See Concept Index for: lm_ibdtests sample output.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.4 Sample civil parameter file

civil bases its tests on the inheritance vectors at the test or conditioning positions. Since these are not observable, a randomized-test strategy is used to deal with this issue. To perform marginal and conditional tests using civil, the user must first run the MORGAN program gl_auto to draw an MCMC sample of the inheritance vectors jointly at all involved genomic positions: including all possible test positions and conditioning positions. For either the marginal or the conditional test, at each test position, civil will compute N test statistic values and N p-values, one for each MCMC realization of inheritance vectors, where N is the size of the MCMC sample. The collection of the N p-values provides an empirical distribution of a randomized (or latent) p-value.

Typically, 5 files are required for running civil, *.par *.xtra *.ped *.markers *.oscor and an optional seed file can also be used.

The parameter file ‘*.par’ for civil should be based on the one used by gl_auto to generate the MCMC realizations of the segregation indicators. It should include MORGAN statements about pedigrees, quantitative traits, markers and sampler seeds. Additional informations on the gl_auto output file and marginal, conditional test setup are specified in an extra parameter file ‘*.xtra’ and provided to civil through the ‘input extra file’ statement.

For example, in the civil parameter file ‘Autozyg/Gold/civil.vc.par’, the pedigree and marker informations are specified as

 
        input pedigree file 'civil.ped'

        input marker data file 'civil.markers'
        select all markers

The pedigree and marker information should be the same as those in the gl_auto par file, except that civil requires a quantitative trait to be specified, so a column of quantitative trait values need to be added to the input pedigree file if it is not already there.

In the same par file, a quantitative trait is specified as

 
        select trait 2
        set trait data quantitative

        input pedigree record trait 2 real 3

        set trait 2  tloc 12

        set trait 2 for tloc 12 genotype means 0.2000000,  4.9000000,  9.6000000
        set trait 2 additive variance 2.0
        set trait 2 residual variance 15.0

        set tloc 12 allele freqs 0.3 0.7
        map test tloc 12 all interval proportions 0.3 0.7
        map test tloc 12 external recomb fracts   0.1 0.3 0.45

The two ‘map test tloc’ statements are required by MORGAN, but the numbers in those lines will not be used by civil. The values of ‘additive variance’ and ‘residual variance’ specified here will be used by civil only when ‘use_sample_variance’ is set to ‘no’ in the extra parameter file (see below). The ‘genotype means’ will be used only if ‘use_sample_mean’ is set to ‘no’ in the extra parameter file.

Additional informations about marginal and conditional test setup are provided to civil through an ‘extra file’.

 
        input extra file 'civil.vc.xtra'

The outline of the extra file is as follows (for an example, see ‘Autozyg/Gold/civil.vc.xtra’):

 
        ## inheritance vector file name (.oscor file)
        civil.oscor
        ## output file directory
        .
        ## output file keyword
        civil
        ## info on the oscor file ...
        n_mcmc  10
        order 0
        ## trait model parameters ...
        pD 0.3
        use_sample_mean yes
        mu 0
        use_sample_sd yes
        ## marginal test parameters
        test_statistic vc
        n_mc 9999
        n_pos 101
        test_pos 0 4 8 12 ...
        ## conditional test parameters
        test_statistic vc
        n_mc 999
        n_pos 81
        test_pos 40 44 48 ...
        test_pos_l 0 4 8 ...
        test_pos_r 80 84 88 ...

The first 6 lines provide the name of the gl_auto output file (line 2), the name of the output directory (line 4), and a keyword for naming the output files (line 6). civil will create four output files, suffixed by ‘*.miv.p.out’, ‘*.miv.t.out’, ‘*.civ.p.out’, and ‘*.civ.t.out’, in the output directory. The four files store marginal and conditional test statistic values and p-values.

The section following ‘## info on the oscor file ...’ specifies the number of MCMC scans in the gl_auto output file and whether the output is arranged by component or not, with 1 meaning yes and 0 no. If the lines in the sgl_auto output is arranged by component, the lines will be rearranged so that they are ordered by MCMC scan and a new file will be created to store the rearranged output file.

The section following ‘## trait model parameters ...’ specifies the rare allele frequency of the putative causal variant and specifies how to estimate mean trait value and residual standard error for the trait values: if ‘use_sample_mean yes’, then civil will use the raw sample mean to estimate the mean trait value, otherwise the mean value specified in the next line will be used. If ‘use_sample_sd yes’, then civil will use the sample sd to estimate residual standard error, otherwise residual standard error will be estimated by sqrt(residual variance + additive variance) using values provided in the main civil parameter file.

The section following ‘## marginal test parameters’ specifies the test statistic, the number of Monte Carlo runs for simulating the null distribution (not to be confused with the count of MCMC realizations in the gl_auto output scores file), the number of tests requested and the indices to the test positions for the marginal tests. Currently, two test statistic options ‘vc’ and ‘w’ are available. In this example par file, we ask civil to perform 101 marginal tests at positions indexed by 0, 4, 8, ..., 404.

The section following ‘## conditional test parameters’ specifies the test statistic, the number of Monte Carlo runs for simulating the null distribution, the number of tests requested, indices to the test positions, indices to the left and right conditioning positions (one line for each set of positions) for conditional tests. In this example par file, we ask civil to perform 81 condition tests. The first conditional test will be at position indexed by 40 and be conditioned on positions 0 and 80.

Note that the test positions have to be a subset of marker positions. The idea is to run gl_auto using a set of dense markers that should include all potential test and conditioning positions, although not necessarily all markers in the marker data file. When performing marginal and conditional tests, less dense marker positions can be used.

Currently, this extra file has rigid format requirement. Comment lines (starting with ##) can be modified, but no line should be deleted or added, nor should existing lines be broken into multiple lines. The example xtra file ‘Gold/civil.vc.xtra’ can be used as a template for creating new xtra file.

See Concept Index for: sample parameter file for civil, latent p-values, randomized p-values.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.5 Sample civil output

Since civil is still a beta-test program, it does not have an example in the ‘MORGAN_Examples’ directory. Instead, reference is made to the gold standard examples in the main MORGAN source directory, in the subdirectory ‘Autozyg/Gold’.

Before running the program civil, the user needs to run gl_auto to obtain an MCMC sample of whole chromosome realizations of meiosis indicators. See Running gl_auto example and sample output, for details. Under the directory ‘Autozyg/Gold’, the output file ‘civil.oscor’ from a previous gl_auto run is provided for demonstration and testing purpose.

Before running civil, an output subdirectory must exist. If vc is specified as the test statistic, create a subdirectory named ‘vc’ for storing temporary files in the user specified output file directory; if w is specified as the test statistic, create a subdirectory named ‘w’.

To run the example in ‘Autozyg/Gold’ make sure the following files are present there: civil.vc.par, civil.vc.xtra, civil.ped, civil.markers, civil.oscor.

In the ‘Autozyg/Gold’ directory, run civil by typing

 
        ../civil civil.vc.par > civil.vc.out

Information on the progress of the program will be printed to stdout, together with summary information about the pedigrees, markers, trait values, and marginal and conditional test setup. For a large number of pedigrees, civil can take several hours to finish. Once the program is finished, four output files, *.miv.?.t.out, *.miv.?.p.out, *.civ.?.t.out, *.civ.?.p.out, will be written to the specified output file directory: ‘*’ is the output file keyword specified in the xtra file and ‘?’ is the name of the specified test statistic (‘w’ or ‘vc’). They store marginal test statistic values, marginal test p-values, conditional test statistic values, conditional test p-values.

The upper left portion of a marginal test p-values file ‘Autozyg/Gold/civil.miv.m.p.out’ is shown below:

 
test_pos  test_map        pval0           pval1           pval2         ...
0       0.000000        0.214400        0.098700        0.357800        ...
4       1.000000        0.305700        0.108900        0.142800        ...
8       2.000000        0.327400        0.133200        0.132700        ...
...

In this output file, the first row is the header. Each of the remaining rows corresponds to one marginal test. The first two columns are the index and the map position of the test position. The columns 3 to N + 2 are the test p-values, one for each MCMC realization of the meiosis indicators. The layout of the marginal test statistic file is similar.

The conditional test p-values file ‘Autozyg/Gold/civil.civ.m.p.out’ has more columns. For each test, the first 6 columns now correspond to indices to conditional test position, left conditioning position and right conditioning position; then map positions of the conditional test position, left conditioning position and right conditioning position. Starting from column 7 are the N p-values, one for each MCMC realization.

Many temporary files will also be created under the subdirectories ‘vc’ or ‘w’ of the output directory. These files store intermediate results for computing the test scores. These results will be reused to save time when more tests need to be performed: for example, the user may want to perform more marginal and conditional tests at different test or conditioning positions.

However, if pedigree structures or trait values in the pedigree file, or trait parameters in the ‘extra file’ file have changed since last run, these temporary files should not be reused and should be deleted before running civil. If pedigree structures have changed, gl_auto also need to be rerun. Use the overwrite option for the gl_auto output scores file, to overwrite the previous file, and/or rename the previous file if you wish to retain it.

See Concept Index for: civil sample output.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

10.6 lm_ibdtests and civil statements

The programs lm_ibdtests and civil use the pedigree, and genetic map and marker statements of previous sections.

The following statements are specific to lm_ibdtests:

compute (ibd | likelihood-ratio) statistics

Required: one of the two options must be specified.

output (sampler | permutation) seeds only

The program lm_ibdtests uses random seeds for its permutation testing in addition to the usual MCMC sampler seeds. If an output seed file is named, both ending permutation and sampler seeds will be saved unless only one or the other is requested.

set ibd measures [Spairs] [Srobdom] [Saffect] [Slambda]

Optional. lm_ibdtests uses 1 to 4 measures to perform ibd tests for linage; these are specified in the order [Spairs] [Srobdom] [Saffect] [Slambda]. Spairs, Srobdom, and Slambda may be specified for both normal and permutation tests; Saffect may not currently be specified with the normal tests option.

set ibd tests [normal] [permutation]

Optional. Normal and/or permutation tests may be specified.

set ibd permutations I

Optional. Need to be specified when the permutation test is requested through ‘set ibd tests’. The default is 999. It is recommended that at least 50 permutations are used.

set likelihood-ratio lambda-p model gridpoints I1 I2

When the lambda_p measure is used for the chi-square likelihood-ratio test), the number of gridpoints may be specified. The number I1 is the number of gridpoints in the interval for the lambda-parameters of the model, and I2 is the number of gridpoints in the interval for p. The default is 6 and 9, respectively.

set likelihood-ratio measures [delta][lambda_p]

When computing the chi-square likelihood-ratio test, the choice of measures is delta and/or lambda_p, in the order [delta] [lambda_p]. The default is ‘delta’.

set likelihood-ratio tests

When computing likelihood-ratio statistics, chi-squared tests are performed. Thus, this statement is presently redundant, as there is no choice in tests.

set permutation seeds H1 H2

The program lm_ibdtests uses random seeds for its permutation testing in addition to the usual MCMC sampler seeds. The seeds may be specified in the ‘input seed file’ or in the parameter file: otherwise default seeds will be used.

The program civil has no program-specific parameter statements. Instead information is provided to civil using the input extra file statement:

input extra file filename

Required

For information about the contents of the extra file see Sample civil parameter file.

See Concept Index for: lm_ibdtests statements, civil statements, ibd measures, likelihood-ratio measures.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Elizabeth Thompson on September 6, 2019 using texi2html 1.82.