Current and future MORGAN developments: August 2007.
Updated March and August 2008, March 2009
MORGAN V2.8.1 was released in April 2006, and MORGAN V2.8.2 in
April 2007. MORGAN V2.8.3 was released March 2008 and MORGAN V2.9 in August 2008.
A preliminary version of MORGAN-3, containing only the PedComp, Genedrop, and
Autozyg programs, was released in March 2008. The full version of the
package (including two new programs) is released March 2009, although it
remains a beta-test version with much testing and documentation remaining
to be done.
The current MORGAN Tutorial and Examples released in
August 2007 are based on MORGAN 2.8.1. We therefore here detail
developments of MORGAN 2.8.2 and MORGAN 2.8.3, as well as upcoming changes,
and the
development of MORGAN 3.0. The information is taken from the
Future directions section of the 2007 MORGAN Tutorial.
The information remains pertinent to MORGAN 2.9.
1. General Library functions:
Only functions affecting general MORGAN capabilities or
enabling new programs are listed here. Many other general
improvements in Morgan library functions and MORGAN set-up
routines have been made. Some details may be found in MORGAN
web release notes, or in the `README_relnotes' of each
MORGAN release.
- 1. Hidden Markov computations:
In MORGAN V2.8.2, forward HMM computation for multiple
meioses has been replaced by a factored version (FHMM),
enabling much faster exact computation on small pedigree
components and multiple-meiosis sampling for larger
numbers of meioses.
Exact computation of lodscores on small pedigree components
has been implemented for `lm_markers': computation uses
the FHMM version of the Baum algorithm.
- 2. Multiple meiosis sampler:
MM-sampler updates multiple meioses jointly and is therefore a
generalization of the meiosis sampler (M-sampler). There are
four types of update in MM-sampler: random meiosis update,
individual update, sib update and 3-generation update. This
is based on work by Liping Tong.
For more detail on L- and M-samplers, see the tutorial section in
``Using MCMC to
Estimate Parameters of Interest in Pedigree Data.''
- 3. MCMC and pedigree peeling by component
Up to MORGAN V2.8.2, MCMC was performed globally over pedigree
components (except those small enough for exact computation).
The L-sampler peeling and lod score estimation could be done
either by component (using "set peeling by component") or
globally (the default).
With MORGAN V2.8.3, and specifically to accommodate the new
`lm_haplotype' program, the preferred option is to do both
MCMC and pedigree peeling (lod score estimation) by
component, and to use exact computation on all sufficiently
small component pedigrees. The alternative, retained so that
older data sets can be rerun, is to use "set global MCMC", in
which case no exact computation will be done, and MCMC will
be done globally over all component pedigrees. In this case,
the "set peeling by component" option is retained.
- 4. Pedigree peeling for multiallelic loci with general
penetrance;
As yet, loci are either multiallelic marker loci assumed
observed without error, or trait loci which may have general
penetrance functions but are diallelic. In order to allow
models for "non-genotypic" markers, general joint peeling
programs have been implemented, based on Thompson (1976: UU
Tech Rept, #6). These peeling routines are used by the
`lm_map' program which allows for errors in marker data.
They are not yet released, as they are still in process of
testing: they are not yet released in MORGAN V2.8.3.
- 5. Penetrance functions and trait models:
In MORGAN V2.8.2, liability penetrances (previously
available only for `lm_bayes') have been implemented for
`lm_markers'. Penetrances for each liability class are
now read from an input file using the "input extra data
file S" parameter statement.
Additionally, an age-based penetrance function for a
qualititative trait has been implemented. That is,
penetrances are directly dependent on age, rather than going
through a liability class specification.
In two-locus models for a quantitative trait, penetrances may
be specified as additive, with a genotypic mean for each
trait genotype for each locus. Alternatively, a matrix array
of 2-locus genotypic means may be specified, allowing for
epistasis (see Sung and Wijsman, 2007, Human Heredity 63:
144-152).
- 6. Traits and trait loci:
With more complex trait models, including those of `lm_twoqtl'
(see Sung et al., 2007, Genetic Epidemiology 31: 103-114), a
more general specification of traits is required. In MORGAN
V3.0, completely new structures have been introduced,
separating traits (phenotypes) from trait loci ("tlocs").
Traits may be affected by genotypes at several tlocs; the
genotypes at a tloc may affect several traits. This more
general structure is not yet released - see `lm_twoqtl' below.
2. Autozyg programs:
- 1. Map estimation: `lm_map' with and without error models:
Some updates and corrections to the `lm_map' program are made
in MORGAN V2.8.2. Additonally under development is a version
of `lm_map' which allows for error in observation of marker
genotypes. This version may be released with MORGAN V2.8.3.
- 2. Latent p-values: (including programs using Lodscore
statistics)
The version of the program `lm_pval' released in MORGAN V.2.8
and subsequent, and described in this tutorial, uses the
latent p-value distribution of Thompson and Geyer (2007,
Biometrika). Additional programs using these ideas are under
development, including programs for the distribution of
latent lod scores obtained in MCMC sampling (`lm_fuzlod'),
p-values and randomized tests based on latent lod score
statistics (`lm_fzplod'), and randomized confidence sets for
the location of a trait locus (`lm_fzconf'). These are
working names only. The methods are described in Thompson
(2007: submitted).
- 3. Gold (Gold2) and Gold1 subdirectories:
In MORGAN V2.8.3, `Gold' replaces the previous `Gold2'
subdirectory, for tests of `lm_auto', `lm_pval',
`lm_map' and `lm_ibdtests'.
Gold1 `lm_auto' tests remain temporarily, since they
provides the only tests of MCMC samplers on looped
pedigrees. Gold1 `lm_auto' gold standards were omitted
from the released MORGAN V2.8.2, due to delays in
checking looped pedigree peeling routines: they will be
reinstated in MORGAN V2.8.3.
3. LR_Lods programs:
To make room for new Lodscore programs being released in
MORGAN V2.8.2, 2.8.3 and 3.0, the two older programs
`lm_schnell' and `lm_lods' have been moved to the new
directory LR_Lods. These two programs differ in several ways
from newer programs, but the principal one is that they use the
methods of combining likelihood ratios (LR) along the chromosome
in order to estimate lod scores (see Thompson & Guo, 1991, IMA J
Math Appl in Med & Biol).
The Gold1 subdirectory remains temporarily, since it provides
the only tests of MCMC samplers on looped pedigrees. Gold1
`lm_lods' gold standards were omitted from the released MORGAN
V2.8.2, due to delays in checking looped pedigree peeling
routines: they will be reinstated in MORGAN V2.8.3.
4. Lodscore programs:
- 1. `lm_multiple' and `lm_markers'
A new lod score calculation program `lm_multiple' was
released in MORGAN V2.8.2 (Spring, 2006). The `lm_markers'
program is still made as a separate executable, but is
compiled as a special case of `lm_multiple' code. In V2.8.2,
both programs perform exact lodscore computations on small
pedigree components. In V2.8.3 (unreleased) this is optional:
see `lm_haplotype' below.
The difference between `lm_multiple' and `lm_markers' is that
the new `lm_multiple' substitutes the old single-meoisis
M-sampler updates for the new multiple-meiosis (MM) sampler
that is the work of Liping Tong (Tong & Thompson, 2007,
Human Heredity:in press): see above.
`lm_multiple' runs with the same parameter file and other
input files required by `lm_markers'. The output is also
essentially the same as that from `lm_markers'.
More information on lod score calculation programs in MORGAN
V.2.8.1 (and previous) can be found in the MORGAN Tutorial.
- 2. lm_twoqtl
The new program `lm_twoqtl' allows two (linked or unlinked)
quantitative trait loci to contribute additively or
epistatistically to a single trait (see Sung et al., 2007, Genetic
Epidemiology 31: 103-114). Full implementation of lm_twoqtl requires
the new MORGAN 3.0 structure. However, due to delays in the finalization
of MORGAN 3.0, a beta-test version of lm_twoqtl was released under
MORGAN 2.8.3.
- 3. `lm_haplotype'
The `lm_haplotype' program is a generalization of
`lm_multiple' in which haplotypes of key individuals dividing
the pedigree are sampled in addition to meiosis indicators.
To facilitate efficient implementation of this algorithm, new
peeling-by-component routines need to be implemented and
checked. This program is the work of Liping Tong (Tong and
Thompson, 2007, Human Heredity:in press). The program is in
process of release: not yet released in MORGAN V2.8.3.
- 4. Gold subdirectory (previously Gold1 and Gold2)
In MORGAN V2.8.2, Gold standards for exact computation and
for `lm_multiple' are added in the Lodscore/Gold2
subdirectory.
In MORGAN V2.8.3, the `Gold' directory replaces the
previous `Gold2' directory. Gold directories for
`lm_lods' and `lm_schnell' are moved to the new `LR_Lods'
program directory. Thus Gold1 no longer exists in Lodscore.
5. MORGAN 3.0 (release-2; March 2009) and two new programs
In March 2009, a full version of the MORGAN package using the
MORGAN-3 "tloc" structures for trait loci is released. All
MORGAN-2 programs, including lm_twoqtl, are included in this new
release. The program `lm_twoqtl' allows two (linked or unlinked)
quantitative trait loci to contribute additively or
epistatistically to a single trait (see Sung et al., 2007, Genetic
Epidemiology 31: 103-114).
Additionally there are two new programs:
1) The Autozyg program "gl_auto" is a more general version of the original
lm_auto MORGAN program (which is still included in the package). The
gl_auto program provides full lm_auto capabilities with the
componentwise multiple-meiosis sampling of the lm_multiple program.
Additionally it permits output of joint realizations of meiosis
indicators or founder genome labels (hence "gl") sampled conditionally
on marker data. These realizations may be used in subsequent analyses
of trait data using a variety of trait models (e.g.)
2) The PedComp program "translink" also takes lm_auto style input but
performs no genetic analyses. This program produces a pedigree file
pedfile.dat and output file datafile.dat in LINKAGE format, in accordance
with the pedigree, trait, marker, maps and models specified in the
MORGAN parameter file, in order that other software can be more
easily run using the same or eqivalent input information.