[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11. Estimating Location lod Scores by MCMC

See Concept Index for: location lod scores estimates.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.1 Introduction to lm_linkage, lm_bayes, lm_twoqtl, gl_lods and base_trait_lods

The programs lm_linkage, lm_bayes, gl_lods and lm_twoqtl are referred to as ‘Lodscore’ programs. The program lm_linkage replaces the two programs lm_markers and lm_multiple of pre-2011 versions of MORGAN. As of 2013, the program lm_twoqtl remains a beta test version, but the new program gl_lods is now fully implemented and documented.

The Lodscore programs use MCMC to perform multipoint linkage analysis and trait mapping on large pedigrees where many individuals may be unobserved and exact computation is infeasible. The data are the genotypes of observed individuals in the pedigree at marker loci and discrete or continuous trait data. As with exact methods of computing lod scores, the genetic model is assumed known. The only unknown parameter is the location of the trait locus. Therefore, the user is required to specify the marker locations, trait and marker allele frequencies and penetrance function. Presently, users are limited in their choice of penetrance function, but this is under revision and will change in future releases of MORGAN.

lm_linkage is an implementation of the Lange-Sobel estimator, using either the single- or multiple-meiosis LM-sampler: See Single and multiple meiosis LM-samplers. The Lange-Sobel estimate works reasonably well in reasonable time, provided a good MCMC sampler is used, and provided the trait data do not have strong impact on the conditional distribution of meiosis indicators. The lm_linkage program samples only the meiosis indicators at marker loci, and only conditional on the marker data. Even when the trait inheritance information is strong, the method can produce quite accurate lod scores in the absence of linkage, but it can be inaccurate in estimating the strength of linkage signals. As well as producing the lod score, our current implementation provides a batch-means pointwise estimate of the Monte Carlo standard error of the lod-score estimate. lm_linkage can work with genotypic, discrete or quantitative traits.

lm_linkage combines the earlier programs lm_markers and lm_multiple. The original lm_multiple program and multiple-meiosis sampler are the work of Liping Tong [TT08]. As well as allowing use of either the single- or multiple-meiosis LM-sampler, the lm_linkage program optionally perform exact lodscore computations on small pedigree components, and includes better exact computation and pedigree peeling options for use in the lod score estimator (see Exact HMM computations).

lm_bayes is an alternative method implemented for genotypic or discrete traits. The MCMC performance is better than for the old lm_markers program, but it has other computational overheads. lm_bayes samples trait locations from a posterior distribution, and then divides it by the prior to produce the likelihood and hence the lod score. Estimation is in two phases. A preliminary run with discrete uniform prior gives order-of-magnitude relative likelihoods. Then, using the inverse of these likelihoods as prior weights of a ‘pseudo-prior’ distribution. Using this ‘pseudo-prior’ a second run is made to estimate the likelihood. The purpose of the ‘pseudo-prior’ is to produce an approximately uniform posterior, so that likelihoods will be well estimated at all test positions. It is important that the initial run is long enough for all test positions to be sampled, and for the unlinked trait position to have a reasonable number of realizations. For locations at which lod scores are very negative, or for the unlinked position when there is some linked location with strong positive lod score, this can be problematic.

Our current implementation of lm_bayes provides two lod score estimates. The first is a crude estimate which counts realizations of locations sampled to estimate the posterior: as can be seen from the output this can be quite erratic. The Rao-Blackwellized estimator is much preferred, and produces good estimates in reasonable time. The lm_bayes program is the work of Andrew George [GT03,GWT05].

The beta-test program lm_twoqtl does parametric linkage analysis for a quantitative trait model having one or two linked QTL and a polygenic component. Each QTL has two alleles with three different genotypic means. The Normally distributed polygenic component does not include dominance, and the environmental contribution is has a Normal distribution with mean zero and uncorrelated among individuals. The program output consists of MCMC-based lod score estimates of the joint locations of the one or two contributing QTL. As of 2011, the program uses the same MCMC options as lm_linkage for sampling descent at marker loci conditional on marker data. Conditionally on these realizations the program then uses exact computation (on very small pedigrees) or an additional level of Monte Carlo to estimate the relevant lod score contributions. The original versions of the lm_twoqtl were the work on YunJu Sung [STW07,SW07].

The program gl_lods computes lod score contributions for a discrete or continuous trait given a set of ibd_graphs across the chromosome, produced by gl_auto: See Introduction to lm_auto gl_auto and lm_pval. If the gl_auto run uses the ‘set MCMC markers only’ option, then the overall lod score computed by gl_lods is identical to that produces by lm_linkage when the same MCMC options are used in the in gl_auto and in lm_linkage. gl_lods many of the same trait definition and mapping request parameter statements as lm_linkage (Location lod scores statements). However its input consists of ibd_graphs and an individuals file; there are no marker data or pedigree data. For further information on the motivation for splitting of the lm_linkage lod score computation into the generation of marker-based ibd graphs (using gl_auto) followed by trait-likelihood computation on the ibd graphs: See Parameter files for the gl_lods program. See also [Tho11].

Because the gl_lods program computes log-likelihood contributions for given IBD, and does not use a pedigree, there is no way to compute the usual normalizing factor of the pedigree-based probability of observed trait data. This must be supplied to the program, and may be computed via the base_trait_lods program, provided a pedigree and the same trait model and data are used. An alternative approach using permutation of trait data conditional on the locus-specific IBD was developed by Chris Glazner [GT15] and has been implemented as an option in gl_lods. For quantitative data using variance component models, we are investigating the used of the Kullback Leibler information (expected lod score) conditional on the locus-specific IBD as a locus-specific normalizing factor.

See References, for details of the cited papers.

See Concept Index for: Markov chain Monte Carlo, lm_linkage introduction, lm_bayes introduction, lm_twoqtl introduction, meiosis indicators, multiple meiosis sampler.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.2 Sample parameter files for lm_linkage and lm_bayes

There are three example parameter files in the ‘Lodscores’ subdirectory: ‘ped73_ge.par’, ‘ped73_ph.par’ and ‘ped73_qu.par’. These files are examples of how to analyze genotypic, discrete (phenotypic), and quantitative (continuous) traits, respectively. Each of these files is written for use with lm_linkage since this is our preferred program and can analyze genotypic, discrete, and quantitative traits. The program lm_bayes will run with the same parameter files ‘ped73_ge.par’ and ‘ped73_ph.par’, but will adopt defaults for several statements specific to this program and will generate warning for others not implemented for lm_bayes. If lm_bayes is run using ‘ped73_qu.par’, all statements regarding quantitative traits will be ignored, and the program will use default genotypic data.

The marker and MCMC information is very similar for all three parameter files. For ‘ped73_qu.par’ it is as follows:

 
set printlevel 5

input pedigree file '../ped73.ped'
input marker data file '../ped73.marker.missing'
input seed file         '../sampler.seed'
output overwrite seed file        '../sampler.seed'

set trait 1 data quantitative
input pedigree record trait 1 real 1

select all markers 
select trait 1
set trait 1 tloc 1
map test tloc 1 all interval proportions 0.3 0.7
map test tloc 1 external recomb fracts   0.05 0.15 0.3 0.4 0.45 

sample by scan                    
set L-sampler probability 0.2

set burn-in iterations 150
set MC iterations 3000           
check progress MC iterations 1000

set global MCMC
use single meiosis sampler        

The pedigree file specified by the ‘input pedigree file’ statement can contain multiple traits. As discussed in previous sections, the marker map, allele frequencies and genotypes can be contained in the parameter file or in a separate file specified by the ‘input marker data file’ statement as in the example above.

As in other programs, the trait data are included in the pedigree file. The ‘select trait’ statement tells the program which trait in this file is to be analyzed, and the ‘input pedigree record trait’ indicates where the data are to be found, while the ‘set trait ...tloc...’ statement connects the trait with a specific tloc for this analysis.

The two ‘map test tloc’ statements give trait locus test positions at which the lod scores should be calculated. When the trait locus is located between two markers, the position is specified in terms of the proportional genetic distance between the two markers (this option makes handling gender-specific maps easy). In this example, the test trait positions are specified to be at 30 and 70 percent of the interval. The second ‘map test tloc’ statement allows test trait locus positions located before the first marker or after the last marker to be specified; the positions are specified explicitly in terms of recombination fractions (or genetic distances) with the nearest marker locus. Note that an external recombination fraction of 0.5 is not necessary since the likelihood of an unlinked trait locus is always used as a reference when computing the lod scores.

The final seven statements give MCMC specifications. The ‘sample by scan’ statement instructs the program to update all the meiosis indicators, S, at each iteration, in an order determined by random permutation. The alternative ‘sample by step’ updates only one locus (L-sampler) or only one meiosis (M-sampler) in each iteration. The ‘sample by scan’ statement is the default and strongly recommended. The L-sampler probability is set at 20 percent, which is often a good choice. For a detailed discussion of effects of varying L- to M-sampler ratio, see section 10.6 in [Tho00].

In the ‘set burn-in iterations’ statement, 150 burn-in iterations, are requested. The next statement requests 3000 MCMC iterations; for each realized set of marker-location inheritance vectors the trait-likelihood contribution will be computed at each test position of the trait locus. This is for demonstration purposes only. For real data analyses, use longer runs, on the order of 10^5 MCMC iterations. The last statement in this group tells the program to report progress every 1000 iterations.

Although the lm_linkage program can use the multiple-meiosis sampler, and this is recommended, the final two statements here specify ‘set global MCMC’ and ‘use single meiosis sampler’. Thus, for this example, the single-meiosis sampler will be used (as in the old lm_markers program) and MCMC will be performed globally over all pedigree components, rather than component-by-component. This provides an example of how these options may be used for compatibility with older examples.

As an example of MORGAN parameter statement flexibility we give also a part of a version of the above parameter file in which lines are broken, words mis-spelled, and lower/upper case used arbitrarily::

 
#  This version of ped73_qu.par is designed to show the flexibility
#     of MORGAN parameter statements

set prinlev 5

inPU Pedi file '../ped73.ped'
input mark 
data FILE '../ped73.marker.missing'
input seed file         '../sampler.seed'
outp over 
seed file        '../sampler.seed'

set trait data 1 quant
Input pedigree Recod trait 1 real 1

sele traix 1
set trait 1 
tloc 1
map test tloc 1 all intevl props 0.3 0.7
map test tloc 1 exte reco frac 0.05 0.15 
0.3 0.4 0.45 
seLect 
all markers 

samp BY scan       
set L-sam prob 0.2
set 3000 mc ITER 
set burn-in iter 150
check progss 1000 MC iters 
set glob mcmc
USE singe meio samp        

Note that statements may be split over lines, that only the first four characters of each word is required, that upper and lower case are not distinguished, and that in some statements (for example, the ‘check progress’ and ‘set MC iterations’ statements) even the location of the integer count within the statement is flexible.

For more details of the MCMC specifications see MCMC parameter statements.

Specifying Trait Data Type

Trait data type is set by using the ‘set trait data’ statement. Recall that the ‘input pedigree record trait’ statement must be used to specify which column in the file is to be used as the trait value (see Pedigree file description statements). The three trait data types discussed in this example are implemented by including the following statements in the parameter file discussed above. Note the trait and numbers are arbitrary, but the connection must be made consistently through the file.

ped73_ge.par’ specifies a genotypic trait with the following statements:

 
set trait 3 data genotypic
input pedigree record trait 3 integer 3

select trait 3
set trait 3 tloc 1
set tloc 1 allele freqs 0.4  0.6

ped73_ph.par’ specifies a phenotypic trait with the following statements:

 
set trait 2 data discrete
input pedigree record trait 2 integer 4

select trait 2
set traits 2 tlocs 1
set traits 2 for tlocs 1 incomplete penetrance 0.05 0.6 0.95
set tlocs 1  allele freqs 0.4  0.6

Recall that for discrete data, one must specify the penetrances (see Autozyg computational parameters).

ped73_qu.par’ specifies a quantitative trait with the following statements:

 
set trait 1 data quantitative
input pedigree record trait 1 real 1

select trait 1
set trait 1 tloc 1
set trait 1 for tlocs 1 genotype  mean 90.0 100.0 110.0
set trait 1 residual variance 25.0
set tloc 1 allele freqs 0.4 0.6

When using a quantitative trait, genotypic means and residual variance must be specified. Additive variance can be specified with the statement ‘set trait ... additive variance’. The default value is zero.

The ‘set tloc ... allele freqs’ statement specifies allele frequencies at the trait locus. If the allele frequencies sum to less than 1, a warning message will be issued:

 
Sum of allele frequencies is not in range .9999, 1.0001  (W)

If the allele frequencies sum to above 1.0001, the program quits and generates an error message.

Below is a summary of the trait data types accepted for each program:

Genotypic   ped73_ge.par      Phenotypic   ped73_ph.par  Quantitative   ped73_qu.par
lm_linkage    Yes    Yes    Yes
lm_bayes    Yes    Yes    No

Additional penetrance options are available for liability classes and age-of-onset data.

An example is given in the file ‘xact_ph_liab.par’ in the ‘Lodscore’ Gold standards. The statement gives the liability class specific penetrance matrix. The penetrances are for the 11, 12, 22 genotypes. Morgan will set the penetrances for genotype 21 equal to that for 12.

 
set trait data 1 discrete with liability
input pedigree record trait 1 integer pairs 8 11

set 3 liability classes with penetrances
         0.025 0.325 0.325
         0.150 0.625 0.625
         0.350 0.950 0.950

An example is given in the file ‘xact_ph_age.par’ in the ‘Lodscore’ Gold standards:

 
input pedigree record trait 1 integer pairs 8 10  # For using ages of onset.
set trait 1 data discrete with covariate

set trait 1 for tloc 1 genotypic means  90.0 70.0 45.5
set standard deviations for genotypes   11.0 15.5 20.5

set width of ages of onset window  2.0

See Concept Index for: sample parameter file for lm_linkage, sample parameter file for lm_bayes, gender–specific maps, meiosis indicators, L-sampler, M-sampler, multiple meiosis sampler, genotypic trait specification, phenotypic trait specification, discrete trait specification, quantitative trait specification, continuous trait specification, liability class penetrances, age-of-onset penetrances.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.3 Running lm_linkage examples and sample output

lm_linkage can be run with all three parameter files in the ‘Lodscores/’ subdirectory. As usual, the syntax for running the program is:

 
./lm_linkage <parameter file>

This section describes the output obtained by using the parameter file ‘ped73_qu.par’. To run the example, type:

 
./lm_linkage ped73_qu.par

The interesting part of the output is the LodScore estimates. For each test position, we have the estimated lod score and the estimated Monte Carlo standard error.

 
LodScore estimates by Rao-Blackwellized computation:
 
 Trait pos #     position (Haldane cM)
   or marker        male     female       LodScore    StdErr

           1    -115.129   -115.129         0.0303    0.0005
           2     -80.472    -80.472         0.0558    0.0012
           3     -45.815    -45.815         0.0779    0.0031
           4     -17.834    -17.834        -0.0306    0.0080
           5      -5.268     -5.268        -0.2811    0.0142
    marker-1       0.000      0.000        -0.4986    0.0195
           6       3.000      3.000        -0.4469    0.0141
           7       7.000      7.000        -0.4342    0.0230
    marker-2      10.000     10.000        -0.4605    0.0363
           8      13.000     13.000        -0.4254    0.0247
           9      17.000     17.000        -0.4454    0.0209
    marker-3      20.000     20.000        -0.5301    0.0197
          10      23.000     23.000        -0.3174    0.0211
          11      27.000     27.000        -0.1176    0.0233
    marker-4      30.000     30.000        -0.0052    0.0259
          12      33.000     33.000         0.5058    0.0208
          13      37.000     37.000         0.8794    0.0159
    marker-5      40.000     40.000         1.0772    0.0138
          14      43.000     43.000         0.9832    0.0156
          15      47.000     47.000         0.8432    0.0213
    marker-6      50.000     50.000         0.7210    0.0252
          16      53.000     53.000         0.6558    0.0256
          17      57.000     57.000         0.5140    0.0271
    marker-7      60.000     60.000         0.3522    0.0288
          18      63.000     63.000         0.0113    0.0225
          19      67.000     67.000        -0.5473    0.0123
    marker-8      70.000     70.000        -0.9543    0.0095
          20      73.000     73.000        -0.4578    0.0212
          21      77.000     77.000        -0.1866    0.0178
    marker-9      80.000     80.000        -0.1135    0.0116
          22      83.000     83.000         0.0888    0.0091
          23      87.000     87.000         0.3132    0.0064
   marker-10      90.000     90.000         0.4544    0.0071
          24      95.268     95.268         0.6010    0.0046
          25     107.834    107.834         0.6423    0.0028
          26     135.815    135.815         0.4017    0.0011
          27     170.472    170.472         0.1762    0.0003
          28     205.129    205.129         0.0758    0.0001

For more information regarding the MCMC parameters and diagnostic output, See MCMC computational options.

See Concept Index for: running lm_linkage examples, lm_linkage sample output.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.4 Running lm_bayes examples and sample output

Under the subdirectory ‘Lodscores/’, run the lm_bayes example on the discrete (phenotypic) trait data by typing:

 
./lm_bayes ped73_ph.par

The results from lm_bayes are the lod scores toward the end of the output. Two estimates of the lod scores are provided: (1) count realizations of locations sampled to estimate the posterior probability (‘crude’) and (2) Rao-Blackwellized estimator (‘R-B’). Both are provided for comparison, but the latter should be more accurate.

 
 LodScore estimates:

 Trait pos #  position (Haldane cM)  pseudo  freq        LodScore    
   or marker      male    female     prior  visited   crude     R-B

           0  unlinked  unlinked   0.025023    94        NA        NA
           1  -115.129  -115.129   0.025276    66   -0.1580   -0.0046
           2   -80.472   -80.472   0.025727    77   -0.0987   -0.0125
           3   -45.815   -45.815   0.027843    96   -0.0372   -0.0473
           4   -17.834   -17.834   0.037973    71   -0.3030   -0.1825
           5    -5.268    -5.268   0.057289    96   -0.3506   -0.3583
    marker-1     0.000     0.000         NA    NA        NA        NA
           6     3.000     3.000   0.078826    89   -0.5221   -0.4919
           7     7.000     7.000   0.086379    88   -0.5667   -0.5255
    marker-2    10.000    10.000         NA    NA        NA        NA
           8    13.000    13.000   0.092502    87   -0.6014   -0.5456
           9    17.000    17.000   0.090858    94   -0.5600   -0.5386
    marker-3    20.000    20.000         NA    NA        NA        NA
          10    23.000    23.000   0.063483   109   -0.3400   -0.3738
          11    27.000    27.000   0.044111   103   -0.2065   -0.2086
    marker-4    30.000    30.000         NA    NA        NA        NA
          12    33.000    33.000   0.026053   114    0.0663    0.0203
          13    37.000    37.000   0.018403   103    0.1731    0.1698
    marker-5    40.000    40.000         NA    NA        NA        NA
          14    43.000    43.000   0.011818   100    0.3527    0.3585
          15    47.000    47.000   0.009347    90    0.4088    0.4600
    marker-6    50.000    50.000         NA    NA        NA        NA
          16    53.000    53.000   0.010351   121    0.4930    0.4236
          17    57.000    57.000   0.014614   121    0.3432    0.2804
    marker-7    60.000    60.000         NA    NA        NA        NA
          18    63.000    63.000   0.023348    96    0.0392    0.0769
          19    67.000    67.000   0.030506   123    0.0307   -0.0412
    marker-8    70.000    70.000         NA    NA        NA        NA
          20    73.000    73.000   0.033357   136    0.0356   -0.0903
          21    77.000    77.000   0.030400   124    0.0358   -0.0514
    marker-9    80.000    80.000         NA    NA        NA        NA
          22    83.000    83.000   0.024811    96    0.0128    0.0282
          23    87.000    87.000   0.019535   144    0.2928    0.1160
   marker-10    90.000    90.000         NA    NA        NA        NA
          24    95.268    95.268   0.013755   110    0.3281    0.2561
          25   107.834   107.834   0.013714   125    0.3849    0.2600
          26   135.815   135.815   0.018372   132    0.2816    0.1339
          27   170.472   170.472   0.022361   108    0.1091    0.0489
          28   205.129   205.129   0.023966    87   -0.0149    0.0188

Note that lm_bayes does not provide lod scores at the marker locations.

See Concept Index for: running lm_bayes examples, lm_bayes sample output, Rao-Blackwellized estimates.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.5 Running lm_twoqtl examples and sample output

The program lm_twoqtl remains beta test, so that instead of examples in ‘MORGAN_Examples’ we describe the gold standard example in the main MORGAN source directory in subdirectory ‘Lodscore/Gold’. However, much work has been done on lm_twoqtl, so that its marker-based MCMC is now as for lm_linkage. Four gold-standard examples of the lm_twoqtl parameter files and output may be found in the ‘Lodscore/Gold’ directory of MORGAN. Examples 1 and 3 are for a single trait locus, and 2 and 4 use two QTL. Examples 1 and 2 use exact computation for the trait lod score contributions on these very small examples. Examples 3 and 4 use Monte Carlo. We will use example 4 in this tutorial description, since this is the most general and novel. To create other examples, copy one of these files and replace the parameters in the file with those that you want to specify.

The various trait-model options for lm_twoqtl are summarized in the following table:

        Additive Genetic Variance:   Zero      Positive
        Number of QTL:   1one locus  one locus plus polygene
                                  2two loci  two loci plus polygene

Trait models can be any of the above four entries. However, for a one-locus trait model with no polygenic component, the program lm_linkage will provide more accurate results more quickly.

The lod score is estimated on a one-dimensional grid of points for one QTL, and a two-dimensional grid of points for two QTL. In the future the new parameter statement

 
map [chromosome  I]  test tlocs  L1 L2  jointly at markers J11 J12 ...

will allow two-locus lod score programs that provide lod scores at arbitrary pairs of marker positions.

The content of file ‘twoqtl4.par’ (reordered slightly for clarity) is:

 
use single meiosis sampler           # Select the MCMC sampler to be used.

set printlevel 5                     # Include everything in the output file.

set sampler seeds  0x53f78285 0xdfbca001
set trait   seeds  0x53f78285 0xdfbca001

input marker data file `./twoqtl.markers'
input pedigree file `./twoqtl.ped'
output extra file `./twoqtl_batch4'

select all markers
select trait 1

set trait 1 multiple tlocs 1 2
set tloc 1  allele freqs 0.1 0.9
set tloc 2  allele freqs 0.3 0.7

# requests for grid of tloc positions for lod scores
map test tloc 1 all interval proportions 0.5
map test tloc 1 external recomb fracts   0.3
map test tloc 2 all interval proportions 0.5
map test tloc 2 no default external positions

# standard MCMC requests
use sequential imputation for setup
use 100 sequential imputation realizations for setup
sample by scan
set L-sampler probability 0.2
set burn-in iterations 10
set MC iterations 60
compute scores every 10 iterations

# lodscore scoring requests
set 3 batches MC variance estimation
check progress 20 MC iterations
# For clarity, the wording `MC' has been removed from the following three  parameter statements:
#  MC in parameter statements now refers only to MCMC.
use Monte Carlo summation for trait
simulate 5 IBD realizations for trait
use multiplier 1 realization for null

# quantitative trait model specification
set trait 1 data quantitative
set trait 1 for tloc 1 genotype  mean  -2.0  0.0  2.0
set trait 1 for tloc 2 genotype  mean  -3.0  0.0  3.0
set trait 1 residual  variance  1.0
set trait 1 additive  variance  1.0

Note the number of MCMC scans (60) is very small, as also are the number of Monte Carlo realizations to be used in evaluating the trait likelihood contributions (5). Additionally, only every 10th MCMC scan is used for computing lod score contributions. This is reasonable, in that for lm_twoqtl lod score computation is computationally intensive, so that the standard procedure of scoring every scan is not efficient. However, with only 60 total scans, this means lod scores are based on only 6 realizations of inheritance conditional on the marker data. The example is for illustrative purposes only; in real examples much more Monte Carlo would be required both in the marker-based MCMC and for estimating trait contributions to each score.

Most statements are as for earlier lod-score programs and can be found in the Statement Index. The statements included in this example that require additional comment are

use single meiosis sampler

The old single-meiosis sampler is specified for consistency with earlier results; in practice the multiple meiosis sampler is preferred.

set trait 1 multiple tlocs 1 2

This statement specifies that trait 1 is contributed to by both tloc 1 and tloc 2.

set trait 1 for tloc 1 genotype mean -2.0 0.0 2.0
set trait 1 for tloc 2 genotype mean -3.0 0.0 3.0

The genotypic means for tlocs 1 and are set separately, which will imply their additive contribution to trait 1. If the tlocs are not to contribute additively, the user should instead use the statement

set trait 1 for tlocs 1 2 joint genotype means ... followed by 9 genotype means for the two tloc genotype combinations.

output extra file ‘./twoqtl_batch4

If an ‘extra file’ is specified, it is used by lm_twoqtl for output of batched means used in variance estimation. Most users will not require this file, although it can be used in MCMC diagnostics.

set 3 batches MC variance estimation

In this minimal example, the 6 scored realizations are divided into 3 batches, each of size 2. Again, real examples would use much larger number of realizations, and likely the default number of batches (20).

use Monte Carlo summation for trait
simulate 5 IBD realizations for trait

In this example, Monte Carlo summation is to be used for evaluating each trait-locus likelihood contributions conditional on marker-based realizations of inheritance, and for each such realizations there will be 5 realizations of trait allele descent.

use multiplier 1 realization for null

In this example the same number of realizations will be used to evaluate the marginal probability of trait data as are used for each lodscore grid point. In real examples, it may be advisable to increase this ratio, to obtain an accurate base level for the lodscore estimate.

The default procedure of estimation of lod scores on each of the two components separately is used. These are then summed, giving the final concluding output for this example:

 
# Lod score estimates and MC sd for entire pedigree:

# Index     TLoc1       TLoc2     LodScore    StdErr

    1     -45.815     -45.815       1.4058    0.7605
    2     -45.815       0.000       0.3872    0.6212
    3     -45.815      10.000       0.4379    0.6232
    4     -45.815      20.000       0.6616    0.6856
    5     -45.815      65.815       0.4661    0.5916
    6       0.000     -45.815       0.3503    0.6529
    7       0.000       0.000       0.1343    0.6034
    8       0.000      10.000      -0.0129    0.5600
    9       0.000      20.000       1.0364    0.5772
   10       0.000      65.815       1.1077    0.7724
   11      10.000     -45.815       0.0983    0.6902
   12      10.000       0.000       0.0461    0.5605
   13      10.000      10.000       0.8585    0.6088
   14      10.000      20.000       1.2174    0.5866
   15      10.000      65.815       0.3512    0.7539
   16      20.000     -45.815       1.5180    0.6479
   17      20.000       0.000       0.7387    0.6567
   18      20.000      10.000       0.9330    0.7020
   19      20.000      20.000       1.2473    0.6162
   20      20.000      65.815       1.3465    0.7664
   21      65.815     -45.815      -0.3066    0.7764
   22      65.815       0.000      -0.1259    0.6895
   23      65.815      10.000       0.5714    0.7124
   24      65.815      20.000       1.3537    0.6550
   25      65.815      65.815       0.5343    0.7140

These results consist of base-10 lodscore estimates with MCMC standard deviations, estimated at the requested grid of test positions.

See Concept Index for: running lm_twoqtl examples, lm_twoqtl sample output, map test tlocs jointly at markers.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.6 Parameter files for the gl_lods program

See References for details of the cited papers.

The program gl_lods computes lod score contributions for a discrete or continuous trait given a set of ibd_graphs across the chromosome, produced by gl_auto: See Introduction to lm_auto gl_auto and lm_pval. If the gl_auto run uses the ‘set MCMC markers only’ option, then the overall lod score computed by gl_lods is identical to that produced by lm_linkage when the same MCMC options are used in the in gl_auto and in lm_linkage. gl_lods many of the same trait definition and mapping request parameter statements as lm_linkage (Location lod scores statements). However its input consists of ibd_graphs and an individuals file; there are no marker data or pedigree data.

The goal of using gl_auto and gl_lods is to separate the lod score computation from the marker-based MCMC that produces realizations of the inheritance vectors at loci across the chromosome. The input to gl_lods consists of these realizations, in the format of gl_auto output compressed ibd_graphs, a specification of a trait model, and a list of individuals with their trait data. gl_lods computes lod score contributions for a specified trait model and data on specified individuals, directly using the ibd graphs on these individuals. See also [Tho11]. From MORGAN 3.2. the gl_lods program is no longer beta-test; parameter statements, capabilities, and examples have been significantly updated.

The separation of lod-score computation and marker-based MCMC has several advantages:

Examples for gl_lods are still under development, but we describe here two examples based on the gl_lods gold standards in the ‘Lodscore/Gold’ subdirectory of MORGAN. In the case of the first ped47_... example the files are only in that directory and can be run there as ‘make gold.6’. The other example simCmps_fgl_... is both included as a gold standard, and has also been added to the ‘MORGAN_Examples’ files.

See Concept Index for: ibd graph, base log likelihood, individuals file.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.7 Running gl_lods examples and sample output

The results in the output file ‘simCmps_gl.out’ are very similar to those for the previous example. The section summarizing the ibd graphs is

 
 Opened input extra file "./simCmps_fgl.oscor"

 Number of individuals in IBD graphs file is 159

 Number of individuals in nghd structure is 156
 Have alloced gen_pen nFGL=96, NLoci=7

 Opened output scores file "./simCmps_gl.scores"

Then for each component the log-likelihoods are computed, but this time base log-likelihoods were provided in the parameter file. The log-likelihoods are therefore converted to estimated lod scores at the 6 requested marker positions. For component 1:

 
====== Computing LodScore for component 1 ======

 Number of observed individuals in each IBD graph for component 1 is 19
 Grouped 1260 locations into 544 equivalence classes

 Lod scores for 100 ibd graphs at markers:
.......

 ====== Computing LodScore for component 1 ======

 Number of observed individuals in each IBD graph for component 1 is 18
 Grouped 1260 locations into 536 equivalence classes

 Log likelihoods for 100 ibd graphs at markers
   1  3  5  6  9  10
 will be printed to the output scores file


 Component 1 lod scores found by gl_lods:

 Marker number      LodScore    StdErr

      marker-1       -6.9215
      marker-3       -7.2357
      marker-5        3.1253
      marker-6        3.4153
      marker-9       -4.8130
     marker-10       -4.7849

This is then repeated for components 2 and 3. There is no Monte Carlo within this non-permutation version of gl_lods. However, even if the three pedigree components were identical, there is Monte Carlo variation in the generation of the ibd graphs by the gl_auto program.

See Concept Index for: base log likelihood.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.8 Running the base_trait_lods program

The base_trait_lods program provides the probability of trait data on a pedigree. (Previously, this information had to be extracted from a dummy run of the lm_linkage program.) The output base log-likelihoods may be input to gl_lods, enabling it to compute a classical normalized lod score; See Parameter files for the gl_lods program.

Typically the computation will be an exact single-locus computation by pedigree peeling. However, in the event the pedigree is too large or complex for exact computation a Monte Carlo alternative is provided. An example of each is provided in the ‘Lodscore/Gold’ directory, and they may be run as make gold.7.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.9 Location lod scores statements

New statements for these programs include maps for test positions, and parameters for some additional MCMC algorithms.

See Concept Index for: location lod scores statements, gl_lods statements, lm_linkage statements, lm_bayes statements.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.9.1 Location lod scores computing requests

See Concept Index for: location lod scores computing requests.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.9.2 Location lod scores file identification statements

All Lodscore programs use the general MORGAN file identification statements (see File identification statements) and the Autozyg rescue file statements (see Autozyg file identification statements).

One additional statement is optional for lm_bayes:

output Rao-Blackwellized estimates file

If this file is specified, the set of Rao-Blackwellized lod score estimates at each trait position is written at the frequency specified in the ‘compute scores’ statement.

The same standard out file specifications are available for lm_twoqtl. In particular:

output extra file

This statement is used by lm_twoqtl to output batch mean estimates used in computing the estimated Monte Carlo standard error.

See Concept Index for: location lod scores file identification statements, Rao-Blackwellized estimates.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.9.3 Location lod scores pedigree file description

Most Lodscore programs use the general MORGAN pedigree file description statements (see Pedigree file description statements). However the gl_lods program uses an individuals file rather than a pedigree file (see Individuals file).

The following statement used by gl_lods is the individuals-file analogue of the corresponding pedigree file statement:

input individuals record names 2 [integers I] [reals J]

The two "names" consist of the name and component number of the individual.

In addition, there is a statement optional for lm_linkage and gl_lods:

input pedigree record traits K1 K2 … reals X1 X2

This statement is analogous to ‘input pedigree record traits K1 K2 … integers I1 I2’ (see Pedigree file description statements) when the trait is quantitative, rather than discrete. It allows the file locations of multiple traits to be defined, although only one can be selected for any analysis.

See Concept Index for: location lod scores pedigree file description, quantitative trait.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.9.4 Location lod scores output file description

All Lodscore programs use the Autozyg output file description statements; See Autozyg output file description.

See Concept Index for: location lod scores output file description.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.9.5 Location lod scores mapping model parameters

The following statements describe the hypothesized trait locus (tloc) positions which are to be ‘tested’. That is, these are the positions at which lod scores will be computed.

map [chromosome I] [gender (M | F)] test tloc L1 all interval proportions X1 X2

Interval proportions specify the proportional genetic distance between markers for the trial positions for the test trait locus. The same ratios are used between each marker pair, regardless of the inter-genetic distance (in cM).

map [chromosome I] [gender (M | F)] test tloc L1 intervals J1 … proportions X1 X2

This statement specifies interval proportions, but between specific pairs of markers. Interval 1 is between markers 1 and 2, interval 2 is between markers 2 and 3, etc.

map [chromosome I] [gender (M | F)] test tloc L1 (beginning | ending | external) ([Kosambi] distances | recombination fractions) X1 X2

This statement specifies trial trait positions on the chromosome before the first marker and/or after the last marker.

map test tlocs L1 ... no default [interval proportions| external positions]

This pair of statements is used to eliminate computation of lod scores at default interval and/or external positions on the active chromosome.

map [chromosome I] test tloc L at markers J1 ...

This statement (new with MORGAN 3.0) is increasingly used with denser SNP marker data. If used, lod scores will be computed only at the positions of the specific markers. Note the marker indexing is by the count in the marker data file, not by selected marker.

map [chromosome I] test tlocs L1 L2 jointly at markers J11 J12 ...

This statement (not yet implemented) will allow two-locus lod score programs such as lm_twoqtl to compute lod scores only at any specified combination of marker positions rather than, as currently, on a grid.

See Concept Index for: location lod scores mapping model parameters, trait test positions, map function.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.9.6 Location lod scores population model parameters

See Concept Index for: location lod scores population model parameters.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.9.7 Location lod scores computational parameters

The following additional statements are specific to lod score computations:

set maximum I ibdgraphs per component

This statement is used by gl_lods which needs to know the upper limit of the number of ibd graphs per pedigree component provided as input.

set base log likelihood X1 X2 ...

This statement is used by the program gl_lods. The log of the marginal probability of the traits data on each pedigree component must be given to enable gl_lods to normalize its lod scores. These may be computed using the base_trait_lods program: See Running the base_trait_lods program.

set number of permutations N

This statement may be used by the program gl_lods. At each location at which a log-likelihood is computed given the realized IBD, trait-data permutation provides a normalizing null term conditional on that same IBD structure. This permutation approach is still being studied; see [GT15].

simulate N ibd realizations

This statement may be used by the program base_trait_lods, which provides a probability of trait data on a defined pedigree, under a specified trait model. If the pedigree is too complex for single-locus exact pedigree peeling, a Monte Carlo estimate may be obtained by realizing IBD on the pedigree. This statement provides the number of realizations to be used.

set pseudo-priors X1 X2 ...

This statement is optional for lm_bayes. The number of pseudo-priors is the number of test trait locus positions plus one. The first pseudo-prior is for the unlinked position; this should be assigned a positive value. All other pseudo-priors must be positive or zero. The set of pseudo-priors need not be normalized.

This statement is also optional for base_trait_lods. If the base_trait_lods base log-likelihood is to be estimated by Monte Carlo then this statement gives the number of realizations on which a crude estimate is made. This is then used in the main iterations to lessen the chance of over/under-flow

set I batches MC variance estimation

This statement is optional for lm_linkage, lm_twoqtl and gl_lods. These programs batch scored realizations in order to provide a Monte Carlo estimate of the standard deviation in estimating the lod score. This statement determines the batch size, and hence the number of batches. By default it is determined such that there are 20 batches.

The following additional statements are specific to tloc specification and likelihood computation for the program lm_twoqtl.

set traits K1 ... multiple tlocs L1 ...

This statement is used by the lm_twoqtl program to specify the tlocs L1... that contribute to each trait K1... A statement may be provided for each separate trait. However, the lm_twoqtl program expects selection of one trait with either one or two contributing tlocs.

set trait K1 for tlocs L1 L2 joint genotype means X11 X12 X13 X21 X22 X23 X31 X32 X33

This statement specifies the 9 genotypic means (3x3 matrix) for tlocs L1 and L2 in contributing to trait K1. The first index on X refers to the L1 genotype and the second to L2.

use [exact|MC] summation for trait

This statement specifies whether exact or Monte Carlo (MC) will be used by lm_twoqtl for computation of the trait contribution to the lod score. Exact summation can be used only on pedigrees with six or fewer founders.

simulate I IBD realizations for trait

If Monte Carlo summation is to be used, this statement specifies the number of realizations of tloc inheritance realizations to be used. If Monte Carlo summation is not to be used, this statement is ignored.

use multiplier I realizations for null

This statement specifies the number of time as many realizations are to be used in estimating the base-line unlinked lod-score. To obtain accurate lod-score estimates it is important this value is accurate, and it may therefore be advisable to use more realizations.

See Concept Index for: location lod scores computational parameters, pseudo-prior iterations.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.9.8 Location lod scores MCMC parameters and options

Please see that section for details regarding:

 
use (locus-by-locus sampling | sequential imputation) for setup
use I sequential imputation realizations for setup

set MC iterations I
set burn-in iterations I
sample by (scan | step)
set L-sampler probability X

check progress I MC iterations

As with the Autozyg programs, the number of desired MC iterations must be specified, as there is no default value.

set MC iterations I

This statement sets the total number of ‘main’ L- and M-sampler iterations. For lm_linkage, the total MCMC run length is the sum of the number of burn-in iterations and main iterations. For lm_bayes, the total MCMC run length is the sum of the number of burn-in, pseudo-prior (see below) and main iterations.

Additional statements for lm_bayes include the following:

set pseudo-prior iterations I

Following burn-in, lm_bayes performs iterations to calculate the pseudo-priors. These pseudo-priors are used to encourage the MCMC sampler to visit test positions of low posterior probability. The default number of iterations to compute pseudo-priors is 50% of the number of main iterations specified in the ‘set MC iterations’ statement.

set sequential imputation proposals every I iterations

This option applies to lm_bayes’s pseudo-prior and main MCMC iterations. It allows the MCMC chain to “restart” every Ith iteration. Sequential imputation is used to propose potential restart configurations which are accepted/rejected with Metropolis-Hastings probability.

set test position window I

This lm_bayes statement specifies the window size for the proposed tloc position update in the Metropolis-Hastings algorithm. I is the number of hypothesized trait positions on either side of the current position, with equal weight given to the 2*I + 1 trait positions. The default is window size is 6.

See Concept Index for: location lod scores MCMC parameters and options, MC iterations, burn-in, sequential imputation proposals.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Elizabeth Thompson on September 6, 2019 using texi2html 1.82.