13. Polygenic Modeling of Quantitative Traits by EM Algorithm

See Concept Index for: polygenic model, PolyEM, EM algorithm, quantitative trait.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

13.1 Introduction to PolyEM programs

See References, for details of the cited papers.

PolyEM is a set of programs to evaluate the likelihood and compute MLEs for polygenic models of quantitative traits by EM algorithms. The original versions of these programs were based on the work described in [TS90] and [TS92].

There are four main programs whose features are summarized below:

univar: This program fits a univariate trait model. It is primarily for test purposes. The likelihood is computed by three methods including classical pedigree polygenic peeling [ES71], which does not extend to looped pedigrees, by direct inversion of the covariance matrix, which does not extend to large pedigrees, and by a general matrix elimination peeling method which is the method used by the other ‘PolyEM’ programs.
unibig: An extension of univar to big pedigrees that implements more efficient methods to compute the polygenic likelihood on large looped pedigrees.
bivar: An extension of unibig for bivariate traits.
multivar: An extension of bivar for multivariate traits.

All programs can work with looped pedigrees. The exception is that looped pedigrees cannot be used for the polygenic peeling algorithm in univar. The other programs do not use polygenic peeling to evaluate the likelihood.

Only examples and statement references for multivar are given since it has the most complete features. The statements for other programs are similar with some exceptions. For example, any statements with between-trait covariances do not apply to univar or unibig, since these deal only with a single trait.

See Concept Index for: PolyEM introduction, univar, unibig, bivar, multivar.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

13.2 Sample `multivar` parameter file

The example pedigree file ‘polyem.ped’ for the PolyEM programs is a 90-member pedigree consisting of two 45-member components. The format is similar to ‘ped73.ped’, which was used in most of the previous examples.

The first three entries in each line consist of the individual’s name, father’s name and mother’s name. Integers starting with the fourth column (usually gender) can be fixed effects (gender, age class, etc.) or discrete phenotypes.

For quantitative traits, real numbers follow the names and integers. These real numbers represent trait measurements. Missing values are coded with integer part ‘999’, such as 999.5 in the following example.

Here is part of the pedigree file ‘polyem.ped’. This file can be found in the ‘PolyEM’ subdirectory of ‘MORGAN_Examples’.

 input pedigree size 90
 input pedigree record names 3 integers 3 reals 2
****************************************
     1     0     0     1     1     0    0.0246   -1.0125
     2     0     0     2     1     0   -0.5978    1.5963
     3     0     0     1     1     0   -0.8124    0.5662
     4     0     0     2     1     0    0.4334    1.7721
     5     1     2     1     1     0    0.1802   -1.4672
     6     1     2     1     1     0   -1.7557    0.8091
     7     3     4     2     1     0    999.5     999.5
     8     3     4     2     1     0    1.9128    0.9780
     9     0     0     2     1     0    0.9530    2.3473
     ...

Below is the example multivar parameter file, ‘polyem.par’.

input pedigree file ‘polyem.ped’

select traits 1 2
set trait 2 effects 1 2

start residual covariance  -0.09 
start additive covariance  -0.0017
start residual variance     1.10   0.65
start additive variance     0.037  0.0288

fit residual covariance 
    1

fit additive covariance
    1

fit environmental model

output spacing 20 EM iterations
limit EM iterations 200

multivar can fit a polygenic model with one to five traits, which can be modeled as dependent and/or independent. One ‘select traits’ statement must be given. The number of integer values entered as arguments to the statement must be the number of quantitative traits expected by the program being run. For example, for programs univar and unibig, the ‘select traits’ statement must have a single integer argument. When running bivar, two integer arguments must be given. For multivar, one to five integer arguments must be given in the ‘select traits’ statement, for the one to five quantitative traits selected. Unlike other MORGAN programs, the trait numbers correspond to the column number of the reals in the pedigree file. In the example, the statement ‘select traits 1 2’ indicates that the first two column of real numbers contain the trait data to be analyzed.

The statement ‘set trait 2 effects 1 2’ indicates that the second column of real numbers is to be modeled with two fixed effects (also called covariates). The integers give the location of the fixed effects (covariates) starting with column 4 in the pedigree file. In this example, the fixed effects are to be found in columns 4 and 5. Important: a fixed effect location of ‘1’ indicates that the effect value will be found in column 4 (after the 3 name columns). The most commonly modeled fixed effect is gender, which, if present, resides in column 4 of a MORGAN pedigree file.

The statement ‘start trait mean’ allows the user to specify the starting trait mean for a selected trait. Since no ‘start trait mean’ statement is included after either of the ‘select trait’ statements, both of the initial means are computed by the PolyEm program. Similarly, one may specify the initial values for each effect with the statement ‘start trait I effect M X1 X2 ...’, where ‘I’ is the trait number, ‘M’ is the effect number and ‘Xi’ is the starting value of the ith level (i = 1, 2, 3, ...). These starting values represent deviations from the global mean. The starting values are normalized so that their weighted sum is zero (weighted by the number of individuals in that level). When using the ‘start trait I effect M X1 X2 ...’ it is important to know that if more levels are present in the column of numbers corresponding to the trait ‘I’ in the pedigree file than are specified in the ‘start trait’ statement, PolyEm programs will compute the starting value(s) for these additional levels. Since the program will not issue a warning or error message, it is important to always check the output to confirm that the number of levels present in the file was as intended. Since the ‘start trait I effect M X1 X2 ...’ statement is not included in this example, the PolyEm program will compute the initial values of the effect.

Initial values for additive and residual variances and covariances are specified in the next four statements. These statements are required. With the variance statements, the number of arguments must be the same as the number of traits selected and must be in order of increasing trait number. With the covariance statements, the number of arguments must be the same as the number of pairs of traits selected. See multivar segregation model parameters, for discussion of the ordering of these arguments.

multivar can also fit a purely environmental model with no genetic component. The ‘fit environmental model’ statement tells multivar to fit a purely environmental model, with no genetic variance. This null hypothesis model is produced in addition to the genetic/environmental model.

The final two statements specify the number of EM iterations and how often the EM estimates are to be printed out.

Note that one has the opportunity to provide predetermined eigenvalues of the G-matrix of observed individuals. The ‘set eigenvalues’ statement is used to specify the eigenvalues, with the number of values equal to the number of observed individuals. If desired, the eigenvalues can be provided through an input file accessed with a ‘input eigenvalue file’ statement in the parameter file, or through the command line (see multivar computational parameters).

See Concept Index for: multivar sample parameter file, missing quantitative trait data.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

13.3 Running multivar example and sample output

The command to run multivar (unibig and bivar have the same set of options) is:

./multivar parfile [ped pedfile] [eigen eigenfile]

where parfile is the name of the parameter file and is required. pedfile overrides the ‘input pedigree file’ statement, and eigenfile overrides the ‘input eigenvalue file’ statement in the parameter file.

Under the subdirectory ‘PolyEM/’, run the example by typing:

./multivar polyem.par

Toward the end of the multivar output, are the parameter estimates and the log-likelihood from the last iteration of the EM algorithm. If you chose to fit a null (purely environmental) model, using the ‘fit environmental model’ statement, those parameter estimates and log-likelihood are also given. A likelihood ratio test can then be performed, with test statistic equal to the absolute value of 2 times the difference between the log-likelihoods of the two models. A conservative test is provided by comparing the test statistic to a chi-squared distribution, with the degrees of freedom being the difference in the numbers of estimated parameters between these two models. Note that in these ‘Polyem’ programs, model log-likelihoods are in base e rather than the usual lod score base-10 convention.

 iteration #201:

    additive variance estimates (traits 1, 2)
        0.816    0.037
    covariances
        0.138

    residual variance estimates (traits 1, 2)
        0.223    0.610
    covariances
       -0.239
    
    trait 1
       overall mean      -0.063
    trait 2
       overall mean       1.780
       fixed effect  1   -0.717    0.546
       fixed effect  2   -1.008   -0.552    1.167

 current log-likelihood = -183.098

 estimates of environmental model

    residual variance estimates (traits 1, 2)
        1.136    0.642
    covariances
       -0.102
    
    trait 1
       overall mean       0.062
    trait 2
       overall mean       1.801
       fixed effect  1   -0.773    0.589
       fixed effect  2   -1.010   -0.553    1.170

 environmental model log-likelihood = -197.799

In this example, we see that the fitted genetic model has a very significantly larger log-likelihood: 2(-183.098 + 197.799) = 29.4, with 3 extra genetic variance parameters fitted. The estimates of the fixed effects are little altered by fitting the genetic model but for trait 1 the additive genetic variance is large relative to the residual variance, indicating a strong genetic component. Note that, for each trait, the sum of the additive and residual variances from the genetic model is close to the residual variance in the environmental model.

See Concept Index for: running multivar example, multivar sample output.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

13.4 `multivar` statements

See Concept Index for: PolyEM statements, multivar statements.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

13.4.1 `multivar` computing requests

select traits I1...: One ‘select trait’ statement is required, and must list the traits to be modeled. Up to five traits are allowed for multivar. The trait number, I, corresponds to the column of real numbers in the pedigree file, with the first column of real numbers being trait 1, the second column trait 2 and so on.
set trait I effects M1...: M1... are the fixed effects to be modeled for a specified trait I. They are the integer columns in which they appear in the pedigree file. That is the columns after the three names, so that fixed effect ‘1’ is in the 4th column (usually gender), effect ‘2’ is in column 5, and so on.

See Concept Index for: multivar computing requests.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

13.4.2 `multivar` segregation model parameters

start trait I mean X: There is one statement per trait, specifying the starting value for the mean trait values. The PolyEM programs will compute the initial values if not given.
start trait I effect M X1 X2 …: Starting values for the fixed effect levels for the traits are computed, unless specified in this statement.
start additive variances X1 X2 …: The starting values for the variances of the traits are required. The number of values must be the same as the number of traits selected, in the order of increasing trait number.
start residual variances X1 X2 …: Starting values for residual variances are also required.
start additive (covariances | correlations) X12 …: Starting values for the covariances (or correlations) between the traits are required. They are given the order: X12, X13, …, X1n, X23, …, X2n, …, where Xij is the covariance for the ith and jth selected traits from the pedigree file.
start residual (covariances | correlations) X12 …: See the ‘start additive...’ statements.

See Concept Index for: multivar segregation model parameters.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

13.4.3 `multivar` computational parameters

fit additive (covariances | correlations) X12 …

This statement specifies which covariances to be estimated and which to be fixed at 0. The order of values is the same as the ‘start additive covariances’ statement with ’1’ indicating a covariance to be fit and ’0’ a covariance to be fixed.

Note that if trait 1 is correlated with trait 3, and so is trait 2 with trait 3, the correlation between 1 and 2 cannot be zero. So we have to be a bit careful in specifying the correlation structure.

fit residual (covariances | correlations) X12 …

Similar statement for residual covariances.

set eigenvalues X1 X2 …

Optional. This statement is used to provide predetermined eigenvalues of the G-matrix of observed individuals, with the number of values the same as the number of observed individuals.

input eigenvalue file eigenfile

Optional. If present, it overrides the ‘set eigenvalues’ statements.

limit breeding iterations I

This statement specifies the maximum number of breeding–values iterations. The default number currently is 20.

set breeding convergence X

This statement specifies the convergence criterion for breeding–values iterations. The default number is currently is 1.0e-8.

limit EM iterations I

This statement specifies the number of EM iterations. The default number presently is 200. There is no option to specify convergence criterion. If convergence has not been achieved, the final estimates can be used as starting values to rerun the program.

See Concept Index for: multivar computational parameters, observed individuals.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

13.4.4 `multivar` computational options

compute eigenvalues

If this statement is present, the values given in either the eigenfile or the statement ‘set eigenvalues’ are ignored and the eigenvalues are computed by the program. This is the default action if no eigenvalues are given.

use (full | partitioned) EM

Use this statement to choose between two iterative procedures in maximum likelihood estimates. With the ‘full EM’ option, the fixed effects, additive and residual variance and covariance are simultaneously updated. This is the default action.

With the ‘partitioned EM’ option, the maximization step is partitioned into two parts. The first part is to maximize the likelihood over additive and residual variances/covariances; the second part over residual variances/covariances and fixed effects. The expectation step is run after each part. Partitioned EM takes more computer time.

fit environmental model

This statement asks a purely environmental model with no genetic variances to be fit, in additional to the genetic/environmental model.

See Concept Index for: multivar computational options.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

13.4.5 `multivar` output options

output statistics (covariances | correlations): By default, covariances are printed out.
output final adjusted phenotypes: If this option is specified, trait values adjusted for all fixed effects are computed and output.
output spacing I EM iterations: This statement requests a print out of the EM estimates every Ith iteration. The default number is defined in the program header file.
check gmatrix: This statement requests a print out of the G matrix for observed individuals and quit without doing the likelihood computation.
check ginverse: This statement requests a print out of the G inverse matrix and the program quits unless ‘check eigenvalues’ has also been specified.
check eigenvalues: This statement requests the program to print out of the eigenvalues, whether computed or input, and then to quit. These eigenvalues can then be used as input in subsequent runs.
check eigenvalue computation: This statements causes some comments to be printed by the function that computes the eigenvalues.
check trace: This statements requests the trace of the G–inverse matrix to be printed.

See Concept Index for: multivar output options.

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by Elizabeth Thompson on September 6, 2019 using texi2html 1.82.

13.4.1 `multivar` computing requests
13.4.2 `multivar` segregation model parameters
13.4.3 `multivar` computational parameters
13.4.4 `multivar` computational options
13.4.5 `multivar` output options

13.1 Introduction to PolyEM programs
13.2 Sample `multivar` parameter file
13.3 Running multivar example and sample output
13.4 `multivar` statements

13. Polygenic Modeling of Quantitative Traits by EM Algorithm

13.1 Introduction to PolyEM programs

13.2 Sample multivar parameter file

13.3 Running multivar example and sample output

13.4 multivar statements

13.4.1 multivar computing requests

13.4.2 multivar segregation model parameters

13.4.3 multivar computational parameters

13.4.4 multivar computational options

13.4.5 multivar output options

13.2 Sample `multivar` parameter file

13.4 `multivar` statements

13.4.1 `multivar` computing requests

13.4.2 `multivar` segregation model parameters

13.4.3 `multivar` computational parameters

13.4.4 `multivar` computational options

13.4.5 `multivar` output options