[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
See Concept Index for: conditional ibd probabilities, identity by descent, ibd.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_auto
, gl_auto
and lm_pval
See References, for details of the cited papers.
The MORGAN programs lm_auto
, gl_auto
and lm_pval
are referred to
as “Autozyg” programs, as they estimate autozygosity, or identity by descent
(ibd). The Autozyg programs use MCMC to infer patterns of ibd among
members of a pedigree conditional on marker data, and possibly also
on trait data. These inferred
patterns may then be used in multipoint linkage analyses
on large pedigrees where many individuals may be unobserved and exact
computation is infeasible. The data are the genotypes at marker loci of
observed individuals in pedigrees. For lm_auto
and lm_pval
there may also be affectation status (affected / unaffected
/ unknown) for the trait of interest.
lm_auto
uses either the old (single-meiosis) or new
(multiple meiosis) LM-sampler
to realize ibd configurations from their
conditional distribution given the marker and/or trait data.
Given the data, it estimates
conditional probabilities of genome sharing patterns (gene ibd) among
specified haplotypes, often chosen from affected individuals.
The marker data are
used jointly in the sampling. The resulting ibd is either
scored marginally at
each marker locus, or over windows of a small number of loci.
gl_auto
also uses either the old or new LM-sampler
to realize ibd configurations from their
conditional distribution given the marker data.
By default, a null (no-data) unlinked ‘trait’ locus is also
included: optionally this may be omitted
Autozyg MCMC parameters and options. Rather than
estimating ibd probabilities, gl_auto
outputs realizations of ibd
configurations directly to an output file.
The output may either be of founder genome labels
(FGL or ‘gl’) or of the meiosis indicators that determine the FGL, or
both. In either case,
the output is in a compact format where only changes of FGL or meiosis
indicators are recorded, together with the positions of these changes.
Positions are in terms of marker indices in the original marker data file,
even where markers are subselected for MCMC. These output ibd graphs may
be used for subsequent analyses of trait data on the pedigrees, without
further reference to the pedigree structure or marker data.
For further details see [Tho11].
lm_pval
uses the LM-sampler to provide the conditional distribution of
an ibd measure, T,
given marker data. In principle it can be used to provide
Monte Carlo estimates of any NPL (Non-Parametric Linkage) statistics for
detecting linkage. Trait information provided to the program consists of the
list of affected members of the pedigree, provided
as the phenotypic status in the pedigree file.
The version of the program lm_pval
in MORGAN 3.0 (originally
released in MORGAN V.2.8)
uses the latent p-value distribution
of [TG07]. In lm_pval
, marker data are assumed available on some pedigree
members, at some of the marker loci. At each test genome location,
the distribution of the ibd measure, T,
conditional on marker data is compared to the unconditional distribution under
the null a priori distribution uniform over all inheritance vectors.
Then quantiles of a latent (fuzzy)
p-value distribution are produced.
A latent p-value distribution corrected for multiple
testing over genome locations
is also produced, by scoring the maximum of the ibd measure,
T, over test locations.
Additional programs using latent p-values are under development, including
programs for the distribution of latent lod scores obtained in MCMC sampling
(lm_fuzlod
), p-values and randomized tests based on latent lod score
statistics (lm_fzplod
), and randomized confidence sets for the location
of a trait locus (lm_fzconf
).
The methods are described by Thompson in [Tho08a].
The program civil
(see Introduction to lm_ibdtests and civil)
also uses latent p-values [DT09].
See Concept Index for:
Autozyg programs,
lm_auto
introduction,
gl_auto
introduction,
lm_pval
introduction,
Markov chain Monte Carlo,
autozygosity,
meiosis indicators,
identity by descent,
descent graph and ibd graph,
ibd,
latent p-values,
LM-sampler,
multiple meiosis sampler.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_auto
parameter filelm_auto
uses the parameter file ‘jv_rep_auto.par’ in the ‘IBD’
subdirectory:
input pedigree file 'jv_rep.ped' input seed file '../sampler.seed' output overwrite seed file '../sampler.seed' set printlevel 5 select all markers select trait 1 set trait 1 tloc 1 map gender F markers dist 25.5 25.5 25.5 25.5 map gender M markers dist 11.2 45.8 11.2 45.8 map gender F tloc 1 marker 2 dist 12.8 map gender M tloc 1 marker 2 dist 5.8 set markers 1 2 3 4 allele freqs .2 .2 .4 .1 .06 .04 set markers 5 allele freqs .3 .2 .3 .1 set tloc 1 allele freqs .95 .05 set marker data 5 333 1 3 1 3 1 3 1 3 1 3 331 3 4 3 4 3 4 3 4 3 4 334 2 3 2 3 2 3 2 3 2 3 431 3 4 3 4 3 4 3 4 3 4 531 3 3 3 3 3 3 3 3 3 3 343 1 3 1 3 1 3 1 3 1 3 341 3 5 3 5 3 5 0 0 3 3 344 4 6 4 6 4 6 2 4 2 4 441 3 4 3 4 0 0 3 4 3 4 541 3 3 3 3 3 3 3 3 3 3 set window patterns 0 4 set locus window 3 set component 1 proband gametes 531 1 531 0 331 0 333 1 set component 2 proband gametes 541 1 541 0 set L-sampler probability 0.2 use multiple meiosis sampler set MC iterations 2000 |
The
‘select’ statement is analogous to genedrop
’s ‘simulate’
statement (see genedrop computing requests). The statements first
specify that all markers will be used. Then the
trait (‘1’) for which data will be analyzed is identified and
connected to a tloc (‘1’).
The trait values are specified in the pedigree file: the specified pedigree file, ‘jv_rep.ped’, is a 30-member, two-component pedigree. Since there is no ‘input pedigree record trait’ statement in the example parameter file, the default behavior is implemented and so the trait value is listed after the three names and integer gender in the pedigree file. In this example, this is the 5th and final column (2nd integer). Because the trait type is not specified in the parameter file via a ‘set trait data’ statement, by default the trait data are ‘genotypic’, so that they are are coded as ‘1’, ‘3’, ‘4’ or ‘0’, corresponding to trait locus genotypes of ‘1 1’, ‘1 2’ (or ‘2 1’), ‘2 2’ or ‘missing’, respectively. In the example, the final individuals of each pedigree component, named 531 and 541, have trait value ‘4’. All other individuals in the file have trait value ‘0’.
The ‘map’ statements specify the marker map and trait position in terms of genetic distances (centiMorgan). In this example there are five markers with gender-specific maps. The trait locus position is measured from the marker to its left. In this example, the trait locus for males is between markers 2 and 3 at a distance of 12.8 cM to the left of marker 2 (See See genedrop mapping model parameters. The ‘set markers’ statements specify the number and frequency of alleles for each marker. In the example, the first four markers each have six alleles (labeled 1–6) with frequencies 0.2, 0.2, 0.4, 0.1, 0.06 and 0.04. The fifth marker has four alleles with frequencies 0.3, 0.2, 0.3 and 0.1. The trait locus has two alleles; alleles ‘1’ and ‘2’ have frequencies 0.95 and 0.05, respectively.
The ‘set marker data’ statement specifies the number of markers to be five.
Following the ‘set marker data’ statement are genotype data for typed
individuals. Alternatively, lm_auto
can read genotype data from a
separate file specified with an ‘input marker data file’ statement.
Note that in the parameter file the marker-5 allele frequencies
do not sum to 1. By default, allele frequencies will not be
normalized. The implication is that there are other alleles not present
in the marker data, whose frequencies therefore need not be listed. If the
program encounters in the marker data
an allele at this locus other than ‘1’ to
‘4’, an error will be generated.
The ‘set window patterns’ and ‘set locus window’ statements instruct
lm_auto
to compute the probabilities that the gametes named in the
‘set proband gametes’ statement have a particular ibd pattern (also
called state) jointly across several loci.
The ‘set locus window’ statement specifies the number of loci to be
examined simultaneously, in this case 3. This statement was discussed briefly in
the ibddrop
example: See Running ibddrop example and sample output.
The probabilities in the ‘IBD’ column of the output specifies
whether one of the specified set of patterns holds (‘1’)
or not (‘0’) at each of the three loci
across the window.
Using the ‘set window patterns’ statement in
lm_auto
, the user can specify ibd patterns of interest over two or
more loci.
Recall that in the ibddrop
windows option,
one can can only estimate the probability of two gametes
being ibd or not.
The ‘set window patterns’ statement indicates that we are interested in
patterns ‘0’ and/or ‘4’, which correspond to ibd patterns ‘1
1 1 1’ and ‘1 1 2 2’, respectively. That is, in component 1, we are
interested in the probability that all four of the gametes named in the
‘set proband gametes’ statement are ibd across 3-locus windows or that
the first and second gametes (maternal and paternal haplotypes of individual
531) are ibd and the third and fourth gametes (maternal haplotype of
individual 531 and paternal haplotype of individual 333) are ibd, but these
two pairs are not ibd with each other.
Recall the output of the ibddrop
program generated when using the
parameter file ‘ibd.par’. In the section of the program output headed
‘Probabilities of IBD patterns’, each of the ibd patterns listed in the
leftmost column is associated with a label in the right-most column.
Probabilities of IBD patterns Proband gamete set 1: 541 0 541 1 341 0 343 1 pattern marker-1 marker-2 trait-1 marker-3 marker-4 marker-5 label 1 1 1 1 .0287 .0298 .0310 .0273 .0287 .0298 0 1 1 1 2 .0290 .0275 .0292 .0282 .0302 .0305 1 1 1 2 1 .0132 .0135 .0138 .0140 .0139 .0132 3 |
The ‘set window patterns’ statement in the parameter file for
lm_auto
expects one or more of these labels, which instruct it to
calculate the probabilities of the associated pattern(s). This means that you
must determine the labels of the patterns of interest
(for example, by running ibddrop
),
before using lm_auto
to estimate multi-locus
probabilities.
The ‘set proband gametes’ statement is the key statement for
lm_auto
. It specifies which haplotypes are to be scored with ibd
probabilities. The syntax is as follows, where N1, N2, ... are individual ID’s
and K1, K2, ... indicate the haplotype as paternal (1) or maternal (0):
set [component M proband gametes N1 K1 N2 K2 ... |
In the example, ‘531 1’ refers to the paternal (1) haplotype of individual ‘531’. The first statement requests scoring both haplotypes of 531, the maternal (0) haplotype of 331, and the paternal (1) haplotype of 333. Note that currently the number of proband gametes to be scored jointly is limited to 10. See ibddrop statements, for more discussion of the ‘set proband gametes’ statement.
As with all of MORGAN’s MCMC-based programs, the user can specify the
desired number of MC iterations using the ‘set MC iterations’ statement,
the desired number of burn-in iterations using ‘set burn-in iterations’,
and the probability that the L-sampler is selected instead of the M-sampler
using ‘set L-sampler probability’. In this example, 2000 sampling
iterations are to be performed, using the L-sampler 20 percent of the time.
These iterations are preceded by burn-in iterations. Because the number of
burn-in iterations is not specified, lm_auto
will use the default value
of 10 percent of the number of main iterations. In practice, one would run the
MCMC sampler much longer than 2000 iterations (on the order of 10^5).
The ‘multiple meiosis’ sampler is requested and
the ‘set global MCMC’ statement is not used, so MCMC will be performed
separately for each pedigree component.
For further details of these statements:
See MCMC parameter statements.
See Concept Index for:
lm_auto
sample parameter file,
trait data,
genotypic trait,
gender–specific maps,
marker data file,
window patterns,
proband gametes,
L-sampler,
M-sampler,
burn-in.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_auto
example and sample outputThe syntax for running a MORGAN program is:
./<program> <parfile> [> <output file name>] |
The lm_auto
example can be run under the subdirectory ‘IBD’
./lm_auto jv_rep_auto.par > auto.out |
Below are sections of the output file ‘auto.out’, generated by running
lm_auto
using the parameter file ‘jv_rep_auto.par’. Note, as for
the program ibddrop
, the exact values of the probability estimates will
depend on the value of the random seed. The tables of estimated
ibd probabilities are given for each component, towards the end of the
output. The estimated
probabilities of gene ibd patterns are given for
each marker and for the trait locus (in the map order).
In the following extracted output, the MCMC diagnostic information
has been omitted.
====== IBD scores for component 1 are estimated using MCMC ====== Proband gamete set 1: 531 1 531 0 331 0 333 1 pattern marker-1 marker-2 trt-geno marker-3 marker-4 marker-5 label 1 1 1 1 .2345 .3375 .3655 .2945 .2450 .1870 0 1 1 1 2 .1165 .1815 .2250 .1785 .1425 .0940 1 1 1 2 1 .1540 .2025 .2595 .1605 .1300 .1070 3 1 1 2 2 .0135 .0215 .0270 .0195 .0095 .0080 4 1 1 2 3 .0260 .0270 .0435 .0250 .0210 .0195 5 1 2 1 1 .0190 .0110 .0010 .0115 .0200 .0275 6 1 2 1 2 .0715 .0430 .0130 .0600 .0665 .0720 7 1 2 1 3 .0560 .0250 .0090 .0275 .0420 .0520 8 1 2 2 1 .0345 .0205 .0030 .0265 .0380 .0310 9 1 2 2 2 .0925 .0435 .0120 .0685 .1060 .1410 10 1 2 2 3 .0425 .0285 .0105 .0325 .0410 .0515 11 1 2 3 1 .0245 .0125 .0040 .0115 .0210 .0275 12 1 2 3 2 .0620 .0300 .0130 .0515 .0670 .0885 13 1 2 3 3 .0115 .0020 .0040 .0125 .0125 .0185 14 1 2 3 4 .0415 .0140 .0100 .0200 .0380 .0750 15 Probabilities of IBD for pattern set for windows of 3 loci Proband gamete set 1 Pattern set: 0 4 IBD wndw 1 wndw 2 wndw 3 wndw 4 0 0 0 .4955 .4635 .4630 .5410 0 0 1 .0810 .0815 .0520 .0740 0 1 0 .0345 .0415 .0375 .0430 0 1 1 .1410 .0545 .0550 .0280 1 0 0 .0495 .0515 .1520 .1125 1 0 1 .0150 .0110 .0190 .0180 1 1 0 .0280 .1295 .0930 .1085 1 1 1 .1555 .1670 .1285 .0750 ====== IBD scores for component 2 are estimated using MCMC ====== Probabilities of IBD patterns Proband gamete set 1: 541 1 541 0 pattern marker-1 marker-2 trt-geno marker-3 marker-4 marker-5 label 1 1 .7000 .8760 .9580 .8165 .6570 .4670 0 1 2 .3000 .1240 .0420 .1835 .3430 .5330 1 |
Interpretation of these results is similar to that of ibddrop
See Running ibddrop example and sample output. Briefly, the probabilities
are summarized by ibd pattern. A pattern is a series of integers, one
representing each gamete listed in the ‘set proband gametes’ statement.
The order of gametes in the output file patterns is the same as the order in
which the gametes were listed in ‘set proband gametes’. Numbers that are
the same indicate gametes that are ibd. For instance, in the first row of
the table above, the pattern is ‘1 1 1 1’, which means that the values in
the first row represent probabilities that all four gametes are ibd at each
marker locus and at the trait locus. Likewise, ‘1 2 1 1’ means gametes 1,
3, and 4 are ibd while gamete 2 is not ibd with the others; ‘1 2 3
4’ means all four gametes are not ibd.
The second table in the above output is a result of the window size and ibd
pattern statements in the parameter file. Its interpretation is similar to the
output of ibddrop
when statement ‘set locus window’ was used,
See Running ibddrop example and sample output. Recall that in ibddrop
,
the values in the ‘IBD’ column of the output indicate whether the two
gametes specified in the ‘set proband gametes’ statement are ibd
(indicated by a ‘1’) or not (indicated by a ‘0’). With lm_auto
,
the user can specify additional ibd patterns of interest over two or more
gametes. In this example, the parameter file ‘jv_rep_auto.par’ includes
the statement ‘set window patterns 0 4’, which indicates that we are
interested in ibd patterns ‘0’ and ‘4’, corresponding to ‘1 1
1 1’ and ‘1 1 2 2’, respectively, as discussed in the previous section.
That is, we would like to know the probability that either all four gametes are
ibd or that the first two are ibd and the second two are ibd, but
the pairs are not ibd with one another for each window of three loci.
Consequently, interpretation of the ‘IBD’ column of the lm_auto
output is as follows. The row headed by ‘0 0 0’ gives probabilities that
the gametes do not follow either of the two ibd patterns of interest at all
three loci for each window. The row headed by ‘0 0 1’ gives probabilities
that the gametes do not follow either of the two ibd patterns of interest at
the first two loci in the window, but at the third loci either all four gametes
are ibd or the first two are ibd and the last two are ibd, but
the pairs are not ibd with one another.
In this section of the lm_auto
output, the order of the marker and trait
loci is the same as in the table of results for each locus; that is, the
map order. In this example, the trait locus was between markers
2 and 3. Therefore, the windows are as below:
window
loci
wndw 1
marker 1, marker 2, trait
wndw 2
marker 2, trait, marker 3
wndw 3
trait, marker 3, marker 4
wndw 4
marker 3, marker 4, marker 5
For more information regarding the MCMC parameters and diagnostic output, See MCMC computational options.
See Concept Index for:
running lm_auto
example,
lm_auto
sample output,
ibd pattern,
proband gametes.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
gl_auto
parameter fileThe parameter file of the gl_auto
example is in the ‘IBD’
subdirectory of ‘MORGAN_Examples’.
Like the earlier markerdrop
example (see
Sample markerdrop parameter file -- conditional on trait),
the gl_auto
uses the 3-component pedigree with a total of
73 individuals, ‘ped73.ped’, and the corresponding
10-marker data, ‘ped73.marker.missing’. Both these files are in the
main ‘MORGAN_Examples’ directory; that is ‘..’ relative to
the parameter file.
The gl_auto
parameter file is given here in full, as it contains
several options not so far encountered in this tutorial, as well as some
statements specific to this program.
input pedigree file '../ped73.ped' input seed file '../sampler.seed' output overwrite seed file '../sampler.seed' input marker data file '../ped73.marker.missing' select all markers set printlevel 5 # Include everything in the output file. select trait 1 input pedigree record trait 1 integer 7 # dummy (0) trait values set trait 1 tloc 11 # Connect trait and tloc map tloc 11 unlinked # Trait locus is unlinked. set tloc 11 allele freqs 0.5 0.5 # Monte Carlo setup and requests. # Use the default option of sampling by component. # Specify which meiosis sampler is to be used. use multiple meiosis sampler set limit for exact computation 12 set MCMC markers only # Do MCMC for markers only use sequential imputation for setup use 5 sequ impu realiz for setup sample by scan # Default: for clarity only set L-sampler probability 0.5 ## For real analyses, recommended number of iterations is of order 10^5 set MC iterations 2000 # For golds and checks only. set burn-in iterations 15 # For golds and checks only. # Specify what type of output is desired (or both). # Specify the desired scoring interval. output founder genome labels output overwrite scores file './ped73_glauto.fgl` output meiosis indicators output overwrite extra file './ped73_glauto.meio' output scores every 30 scored MC iterations |
The first block of statements specify data and seed files in a way that should
by now be familiar.
Note all markers are selected; all will be used in the MCMC and output
ibd graphs.
The second block defines a trait and tloc combination.
However the trait expected by gl_auto
is a dummy trait consisting
entirely of 0 values for ‘unobserved’.
The Monte Carlo requests are those used by most of the MCMC-based programs:
See MCMC parameter statements. Unlike earlier examples, the multiple
meiosis sampler is used, and sampling is (by default) by pedigree component
as‘global MCMC’ is not requests.
A limit of 12 meioses is set for exact computation;
in fact even the smallest 11-member pedigree component has 14 meioses,
so MCMC will be done on each of the three components. Other MCMC requests
are standard and similar to those used in the lm_auto
example.
Note again than many more MCMC scans (and associated burn-in) would be done
in a real example.
The final five parameter statements are specific to the gl_auto
program. This program outputs ibd graphs or meiosis indicators (or both)
to an output file. The ibd graph output is sent to the output scores
file, so if this output is requested the name must be given. Meiosis
indicators are sent to the output extra file, so if meiosis indicators
are requested this file name must be given. It is strongly recommended that
the overwrite
option be used for both these outputs to avoid appending
to previous runs of the program. Finally the
the output scoring frequency is given. Here every MCMC iteration
is ‘scored’ (the default), but ibd graphs and meiosis
indicators are computed and output only every 30 iterations.
Note that in pedigrees with multiple components,
the same number of ibd graphs are generated on each component.
On small components
where exact IID realizations are generated, every realization
is output; the number of such realizations is the same as the
number output on any component for which MCMC is used.
See Concept Index for:
gl_auto
sample parameter file,
multiple meiosis sampler,
founder genome labels,
exact computation,
ibd graph.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
gl_auto
example and sample outputThe syntax for running a MORGAN program is:
./<program> <parfile> [> <output file name>] |
The gl_auto
example can be run under the subdirectory ‘IBD’
./gl_auto ped73_glauto.par > ped73_glauto.out |
The output file ‘ped73_glauto.out’,
generated by running
lm_auto
using the parameter file ‘ped73_glauto.par’ is
actually of little interest, but should be checked to see that the
program has interpreted the data as expected. The output consists of
summaries of the input pedigrees, and several MCMC diagnostics. As for
the ibddrop
lm_auto
programs,
the exact values of the probability estimates will
depend on the value of the random seed; note that this parameter will output
final seeds overwriting the previous input seed file.
The important output is contained in the ‘output scores file’, which the parameter file specified to be ‘ped73_glauto.fgl’. Note that if you run the example with different seeds your results will differ from the example output given below. Note also that the parameter file specifies that any previous output scores file of the same name will be overwritten; if you want to save an earlier one, rename it! The output scores file contains 9636 lines, which are 66 (2000/30) realizations of ibd graphs. There are three components, with 47, 11 and 15 individuals, and each individual is output on two lines – a maternal chromosome and a paternal chromosomes. The first 94 (47+47) lines is the first output realization on the first component. The other 65 realizations follow for a total of 94 times 66 (6204) lines. There are then 66 realizations of the second component ((11+11) times 66 = 1452 lines), and finally the realizations on the third component ((15+15) times 66 = 1980 lines). The total line-count of the file is 9636.
For the seeds on this run, the first few lines of the file are shown as
101 0 1 0 101 0 2 0 102 0 3 0 102 0 4 0 201 0 4 3 5 3 8 4 9 3 201 0 1 1 2 2 |
The first column is the name of the individual (2 lines for each individual). The zero second column may be ignored. The third column is the initial (first marker) FGL, and for 101 and 102 there are no FGL-changes as they are founders. Individual 201 is the offspring of 101 and 102. His maternal chromosome consists of segments of FGL 3 and 4: it is initially 4 and there are 3 switches. At marker 5 the switch is to FGL 3, at marker 8 back to 4, and at 9 back to 3 again. Individual 201’s paternal chromosome has only 1 switch, switching from 1 to 2 at marker 2.
Lower down the pedigree there may be more FGL and more switches. For example for the maternal chromosome of 407 we have
407 0 4 4 5 3 8 4 9 3 10 6 |
The initial FGL is 4, and there are 4 switches: to 3, 4, 3, and 6 at markers 5, 8, 9, and 10. respectively.
Next consider the meiosis output in the ‘output extra file’ which was specified to be ‘ped73_glauto.meio’. This file contains a line for every meiosis sampled. Typically this is 2 lines for every non-founder individual, but a single child of a founder provides only one meiosis. In the three components of size 47, 11, and 15, there are 13, 4 and 5 founders respectively, and hence 34, 7, and 10 non-founders, but only 65, 14, and 20 meioses sampled. Hence the 66 output realizations give 4290, 924, 1320 lines for a total of 6534 lines.
The first few lines of the output file are
201 2 0 0 1 0 201 2 1 0 0 1 5 202 3 0 0 0 0 202 3 1 0 1 3 4 7 9 301 5 0 0 1 0 301 5 1 0 0 0 302 6 0 0 1 1 8 302 6 1 0 0 1 8 |
The first two columns are the name and the internal MORGAN 0-origin index of the individual, and the third column specifies whether this is the maternal (0) or paternal (1) meiosis of the individual. The fourth column refers to the unlinked null locus, and is here null (0) as this locus was not sampled by the MCMC. The fifth column gives the meiosis indicator at the first marker, and the sixth the number of changes across the chromosome. The final integers, in number corresponding to the number of switches, give the change points. Unlike the ibd graph file, no additional details of the changes are needed. since changes are from 0 to 1, or 1 to 0,
Some important points are that, first, there are programs to process and use this ibd graph and meiosis output; the user should not be concerned with the details. However, it is important that the user knows how many graphs they have generated, and whether they are done by component or globally. Until processing programs with multi-component capability have been released, it is recommended to separate the graphs by component. In general. if sampling is by component, then output is for each component, for each realization, for each individual/meiosis. If sampling is global, then output is for each realization, for each component, for each individual/meiosis. Sampling by component is normally recommended: "global MCMC" is maintained mainly for backwards compatibility.
Finally, it should be seen that the output format is compact. Here we have only 10 markers, so could have output by marker. However, the files would be no larger on the same pedigrees if we have many thousands of markers. The number of switches depends on the length of chromosome and pedigree depth, not on the marker count. (See [Tho11]).
See Concept Index for:
running gl_auto
example,
gl_auto
sample output.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_pval
parameter fileFiles for lm_pval
may be found in the ‘TraitTests’ subdirectory of
‘MORGAN_Examples’. The parameter file, ‘ped73_pval.par’ is similar to
the parameter file used for lm_auto
. An abbreviated version of
‘ped73_pval.par’ is given below:
input pedigree file '../ped73.ped' input pedigree record trait 1 integer 3 select trait 1 input seed file '../sampler.seed' input marker data file '../ped73.marker.missing' select all markers set L-sampler probability 0.2 set MC iterations 2000 |
For lm_pval
, markers are selected, but no trait locus is selected.
Therefore, no ‘map tloc marker’ statements are included, and no ‘set
traits tlocs’ statement is included. The file ‘ped73.marker.missing’
contains the marker map and genotypes, and is accessed by the statement
‘input marker data file’.
Pedigree members affected with the disease must
be specified when using lm_pval
.
The set of affected individuals is determined
implicitly by using trait data. The statement ‘select trait 1’ instructs
the program to determine the affected individuals by using the trait data for
trait 1 in the pedigree file.
The statement ‘input pedigree record
traits’ is needed to define the correspondence between trait numbers and
integers in the pedigree record, so that the program knows where to find the
desired trait data.
The trait data in this example are
(by default) genotypic: the lm_pvals
program treats both
homozygotes and heterozygotes for the disease allele ‘2’
as affected.
See Concept Index for:
lm_pval
sample parameter file.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_pval
example and sample outputUnder the subdirectory ‘TraitTests’, run the lm_pval
example by
typing:
./lm_pval ped73_pval.par > pval.out |
A portion of the output giving latent (fuzzy) p-values is below. See ‘pval.out’ for the entire output file.
Combined distribution of fuzzy p-values, by locus: pval maxim marker-1 marker-2 marker-3 marker-4 marker-5 marker-6 marker-7 marker-8 marker-9 marker-10 0.00 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.01 0.002 0.015 0.008 0.000 0.001 0.002 0.000 0.002 0.000 0.000 0.000 0.02 0.006 0.025 0.019 0.000 0.002 0.002 0.000 0.007 0.000 0.000 0.000 0.03 0.008 0.035 0.030 0.003 0.006 0.003 0.000 0.017 0.000 0.000 0.000 0.04 0.012 0.045 0.041 0.007 0.009 0.004 0.000 0.028 0.000 0.000 0.000 0.05 0.015 0.055 0.051 0.011 0.013 0.006 0.002 0.039 0.000 0.000 0.000 |
The output table shows the cumulative distribution of the latent (fuzzy) p-values generated at each marker position, as well as the cumulative distribution of the maximum latent p-value over the markers. These distributions are over the latent inheritance patterns sampled, given the marker data. That is, for each value of ‘pval’ in the left column, the table gives the proportion of sampled inheritance vectors at each marker that yield a p-value less than ‘pval’. In the last row of the example output, when pval = 0.05, 0.6% of the realizations have a p-value less than 0.05 at marker-5; at marker-7 this value is 3.9%. Overall, 1.5% of the realizations have a maximum p-value over the markers that is less than pval = 0.05 (shown in the second column labeled ’maxim’).
Recall again, that exact values in the output will depend on the random seed.
In the case of a relatively short run of lm_pval
there may be substantial
differences in the estimated latent p-value distributions.
For more information regarding the MCMC parameters and diagnostic output: See MCMC computational options.
See Concept Index for:
running lm_pval
example,
lm_pval
sample output,
fuzzy p-value.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Many of the lm_auto
and other statements following are also used
for the location lod scores programs. See Location lod scores statements.
See Concept Index for:
Autozyg statements,
lm_auto
statements,
gl_auto
statements,
lm_pval
statements.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
select [chromosome I] all markers
This statement selects all markers on the chromosome for the computation; if not all markers are to be used, use the next statement.
select [chromosome I] markers J1 J2 …
This alternate form of the ‘select markers’ statement specifies a subset of the markers.
select [chromosome I] all markers except J1 J2 …
This new form of statement allows the user to specify markers on the chromosome to be omitted in computation.
select trait K
This statement names the selected trait, and is used in both lm_auto
and
lm_pval
. For the former, this names the trait to be used in subsequent
mappings to trait loci. For the latter, it is used to determine disease
affection status of individuals.
set traits K1 … tlocs L1 …
This statement establishes the correspondence between traits and trait loci in
lm_auto
. Note that this statement is not applicable for lm_pval
.
map tlocs L1 … unlinked
Use this statement for lm_auto
if the trait locus is unlinked. L1
is the trait locus number.
check marker consistency [only]
Before running a MORGAN program which uses marker data, the setup routines check that the marker data for the selected markers are consistent with Mendel’s first law and stated pedigree information. In the absence of this parameter statement, the program will terminate with the first error found. If this statement is included, the program continues checking the rest of the markers for further inconsistencies and provides details regarding each inconsistency detected. The program terminates only after checking all the selected markers. In the absence of this statement, if no marker data inconsistencies are found, the program continues with any requested analyses. If the ‘check marker consistency only’ statement is used, the program will terminate after checking all the markers; this may be useful in any initial phase of checking the data.
These alternative terminations for ‘check marker consistency’ are given in tabular form:
No error is present | Some error(s) present | |
No statement | No termination | Terminates at first error |
check markers consist | No termination | After checking all markers |
check mark consist only | After checking all markers | After checking all markers |
See Concept Index for: Autozyg computing requests consistency (Mendelian) of marker data.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
All of the general MORGAN file identification statements can be used with the ‘Autozyg’ programs. For a list of these statements, see File identification statements. Some additional file identification can be used by ‘Autozyg’ programs:
output score file ‘filename’
This file can be used by lm_auto
to save interim cumulative scores
of ibd probabilities. The gl_auto
program expects
expects an ‘output score file’ to be specified, since it uses it
for its primary output that is then input to other programs.
input rescue file ‘filename’
A rescue file may be used to continue an lm_auto
run instead of
restarting at the beginning. This file contains intermediate data, which is
periodically saved when an output rescue file has been specified in a preceding
run.
(Note this option is not currently used/tested/)
output rescue file ‘filename’
This statement, which is optional for lm_auto
, specifies the periodic
dumping of intermediate results to files that may be used to restart the program
midstream. Data are written alternately to files with
‘1’ and ‘2’ appended to the file name.
(Note this option is not currently used/tested/)
See Concept Index for: Autozyg file identification statements, score file, rescue file.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Both Autozyg programs use the general MORGAN pedigree file description statements; see Pedigree file description statements.
See Concept Index for: Autozyg pedigree file description.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Three output file description statements are expected by gl_auto
and one additional one is optional for lm_auto
.
output (founder genome labels | meiosis indicators)
This statement is expected by ‘gl_auto’. It tells the program whether
its output is to be in the form of founder genome labels or meiosis
indicators (or both). The founder genome labels
are output to
the ‘output scores file’ and the meiosis indicators
are output
to the ‘output extra file’ so these filenames must be specified if the
output is requested.
output rescue data I iterations
This statement can be used to specify the frequency of dumping program data if an output rescue file is specified. (Note this option is not currently used/tested/)
output scores every I scored MC iterations
Note that this output option is in terms of scored iterations; not every MCMC iteration may be scored; see the compute scores statement. This statement is expected by ‘gl_auto’, since this program uses the d output score file for its primary output of founder genome labels or meiosis indicators (see the ‘output’ statement above. If an output score file is specified, but this statement is not present, ‘lm_auto’ writes the score file at the conclusion of the last iteration only (see ‘set MC iterations’ statement).
See Concept Index for: Autozyg output file description, founder genome labels.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_auto
,
see genedrop mapping model parameters.
The trait number specifies its position in the pedigree record; you may need to use the ‘input pedigree record traits’ statement (see Pedigree file description statements) to establish the correspondence between trait numbers and integers in the pedigree record.
See Concept Index for: Autozyg mapping model parameters.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
set [chromosome I] marker names N1 N2...
This statement, which is optional for both lm_auto
, gl_auto
and lm_pval
,
specifies the names of the markers in the order of their position in the marker
data file, for example, ‘set marker names D1S306 D1S249 D1S245 ....’.
See Concept Index for: Autozyg population model parameters.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
lm_auto
.
set [chromosome I] markers K data N1 M11 M12 …[N2 M21 M22 …] …
Individuals with at least one observed marker are named, together with their marker genotypes. The number of allele pairs for an individual is the same as the number of markers mapped on the chromosome. Marker loci not observed for an individual are given alleles ‘0 0’. (Those individuals with no observed markers may but need not be included in this statement.)
In the example, there are 5 markers mapped for the chromosome:
set markers 5 data 343 1 3 1 3 1 3 1 3 1 3 331 3 4 3 4 3 4 3 4 3 4 334 2 3 2 3 2 3 2 3 2 3 431 3 4 3 4 3 4 3 4 3 4 531 3 4 3 3 0 0 3 3 3 3 |
In this example five individuals have some observed marker data, but individual ‘531’ is unobserved at marker 3.
set [component M] [scoreset N] proband gametes N1 K1 N2 K2 …
This statement is required for lm_auto
. One or more scoring sets may be
given for each pedigree component, where a scoring set consists of two or more
haplotypes. If there is more than one set for the component, each set is
assigned a number 1 or greater. The maximum number of haplotypes in each set is
limited to 10, due to computer memory considerations.
Pairs of names and meiosis indicators are given, with 0 indicating maternal inheritance and 1 indicating paternal inheritance. In the example, there are two sets for the component:
set component 1 scoreset 2 proband gametes 531 1 531 0 331 0 331 1 set component 1 scoreset 4 proband gametes 561 1 362 0 364 1 |
At least one proband gamete set must be specified when running lm_auto
.
set [chromosome I] locus window K
This statement is optional for lm_auto
and gives the window size (number
of loci) for which the multi-locus ibd probabilities are scored. If no size
is given, each locus is scored separately.
set [component M] [scoreset N] window patterns L1…
This statement is a companion to ‘set locus window’ and is required for
lm_auto
when the window option is chosen. It identifies the ibd
patterns to be jointly scored for the proband gamete set N assigned by the
‘set proband gametes’ statement. A prior run, with the same proband
gametes, but without the window option is needed to select the ibd patterns.
That is, the user is required to list ibd patterns of interest by label; the
labeling of the patterns is not obvious without first running lm_auto
.
In the example, we were interested in ibd patterns ‘1 1 1 1’ and
‘1 1 2 2’, which are assigned labels ‘0’ and ‘4’, respectively,
in the output table headed ‘Probabilities of IBD patterns’. One needs to
run lm_auto
to obtain these labels.
set trait K data (genotypic | discrete | quantitative)
Trait data are specified as genotypic, discrete (phenotypic), or quantitative (continuous). They may also be specified as ‘discrete with covariate’ and as ‘discrete with liability’ With a genotypic trait, the trait locus genotype can be inferred from the trait value. There are four possible trait values: ‘0’ = missing, ‘1’ = homozygous for allele 1, ‘3’ = heterozygous, and ‘4’ = homozygous for allele 2. There are three possible trait values with a discrete (or phenotypic trait): ‘0’ = missing, ‘1’ = unaffected, and ‘2’ = affected. If a discrete trait is chosen, the next statement, ‘set incomplete penetrances’, must be included. With a quantitative trait, a missing value is denoted as a real number with integer portion ‘999’. For example, ‘999’, ‘999.3’ and ‘999.543’ all mean ‘missing’. The default trait type is genotypic.
set traits K1 … for tlocs L1 … incomplete penetrances X1 X2 X3
This statement is required for discrete trait data. Penetrances (probability of expressing the trait) are provided for the (1 1), the (1 2), and the (2 2) genotypes, respectively.
input pedigree record trait K integer pairs J1 J2
For liability classes or age-or-onset the trait data consists of the qualitative trait status (J1) and an additional covariate integer (J2) which may specify a liability class or age of onset.
set trait data K discrete with liability
This statement allows trait data to be input as a pair of integers, the trait status and the liability class.
set K liability classes with penetrances X11 X12 X13 … XK1 XK2 XK3
This statement provides the penetrances for each of the three tloc
penetrances for each of the K liability classes. There is a hard
maximum of 24 liability classes.
set trait K1 data discrete with covariate
The statement allows a covariate such as age-of-onset to be input as a
part of the trait data. The program then expected that the
input pedigree record trait
statement will specify
integer pairs
.
set standard deviations for genotypes X1 X2 X3
This statement specifies the standard deviation of age-of-onset for each
of the three tloc
genotypes. The mean age-of-onset is specified via
the set traits for tlocs genotype means
parameter statement (see genedrop population model parameters).
set width of ages of onset window X1
This number is used in the age-of-onset penetrance function, allowing actual onset to be up to X1 units (usually years) earlier than the recorded age of onset.
select trait K
lm_pval
needs to know which members of the pedigree are affected with the
disease. Discrete or genotypic data for the selected trait is used to determine
the disease affection status of the individuals. Here, lm_pval
is to
determine the affected individuals from the trait data in the pedigree file. A
trait genotypic code of 3 (genotype (1 2) or (2 1)) or 4 (genotype (2 2))
indicates an affected individual. The trait number, K, determines the
position of this genotypic code in the pedigree records
(see Pedigree file description statements).
See Concept Index for: Autozyg computational parameters, marker data, missing marker data, scoreset, proband gametes, locus window, scoreset, window patterns, trait data, genotypic trait, discrete trait, quantitative trait, incomplete penetrance, affected individuals.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Please see that section for details regarding:
use (locus-by-locus sampling | sequential imputation) for setup use I sequential imputation realizations for setup set MC iterations I set burn-in iterations I sample by (scan | step) set L-sampler probability X check progress I MC iterations |
One additional statement is specific to the gl_auto
program:
set MCMC markers only
This statement will cause the gl_auto
program to do MCMC only for the
markers, and not for the assumed unlinked no-data ‘trait’ locus.
This allows
for identical MCMC with the lm_linkage
lod score program if the same
MCMC options are used, and hence for comparison of results among these
programs.
See Concept Index for: Autozyg MCMC parameters and options.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Elizabeth Thompson on September 6, 2019 using texi2html 1.82.