5. Simulating Marker and Trait Data in Pedigrees

See Concept Index for: simulating marker and trait data.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

5.1 Introduction to `genedrop`.

genedrop simulates pedigree data for analysis by other programs. Given a genetic map, it simulates genotypes at marker loci (linked or unlinked) and the discrete genotypes and polygenic values contributing to quantitative traits. The trait loci may or may not be linked to marker maps. Thus, one or more of three kinds of loci are simulated on a chromosome: markers, traits linked to markers, and traits not linked to markers.

genedrop assigns marker and trait genotypes and polygenic trait values to the founders by using a random number generator. Meiosis indicators are then simulated for non-founders in chronological order, thus determining the founder genome labels inherited. Markers and traits, if present, are then simulated for each individual: First, marker genes are simulated in the order mapped on the chromosome, then linked traits are simulated in map order, and finally, unlinked traits are simulated.

Because founders of a pedigree are assumed to be unrelated, a unique identifier or founder genome label is assigned to each of the two haploid genomes of each founder. The user may choose to identify the ancestral source of each gene at each locus in non-founders by including the founder labels in the output pedigree.

The user may provide random number seeds for both the marker simulation and the trait simulation. This permits multiple simulations, for a pedigree, of identical marker genotypes, but with different quantitative trait values.

The population and segregation model parameters (trait genotype means, additive and residual variances) may be specified by the user and take default values if not specified. Allele frequencies have no default values and must be specified by the user. Several different trait models can be specified as in the following table:

	Equal Genotypic Means	Zero Additive Variance
non-genetic model	YES	YES
polygenic model	YES	NO
major gene model	NO	YES
mixed model	NO	NO

The trait locus must have two alleles and the trait residual variance must be greater than zero. A very small residual variance can be specified if one desires to simulate a qualitative trait.

Genetic data on all individuals may be included in the simulated pedigree, or some individuals may be specified as ‘missing’. If any individuals are to be missing genetic data, an ‘observed’ indicator column must be included in the pedigree file. See Pedigree file, for details.

See Concept Index for: genedrop introduction, quantitative trait, polygenic model, major gene model, mixed model, non-genetic model, founder genome labels, seeds for data simulation, additive variance, unobserved individuals.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

5.2 Sample `genedrop` parameter file

Files for genedrop may be found in the ‘Simulation’ subdirectory of ‘MORGAN_Examples’. The example here refers to ‘ped73_gdrop.par’.

The seed file is used to store the random seeds used in the simulations. Occasionally one will want to use the same seed with multiple runs, but most often one will want to use new seeds so as to obtain different output with each run. The seed file contains one or more statements like ‘set marker seeds 0xde5e8d39’. For more about the way genedrop handles seeds See genedrop computational parameters.

The seed file can be specified in the command line or in the parameter file. The following statements are needed to specify the seed file in the parameter file:

input seed file '../marker.seed'
output marker seeds only
output overwrite seed file '../marker.seed'

The first line specifies ‘marker.seed’ in the main examples directory as the input seed file for the marker simulation. The second statement, ‘output marker seeds only’, overrides the default behavior of saving both the marker and the trait seeds and causes the program to save only the marker seeds before exiting. The ‘overwrite’ option in line 3 enables the program to replace the current seed file content with the newly generated random numbers, which can be used for simulation in the future. When an overwrite is not requested, MORGAN appends the new output seeds to the existing file at the end of the run. Thus, at the next run, more than one ‘set marker seeds’ statement exists in the seed file. The program uses only the last ‘set marker seeds’ statement in the file.

In the example, we have chosen to access the seed file from the command line, which will overrule the parameter file statement and generate a warning. See the next section for command line implementation.

Note: The statement ‘output pedigree chronological’ is included in the example ‘ped73_gdrop.par’ file so that the output pedigree will be in the chronological order required for use with other MORGAN programs.

The next statements in the parameter file are the simulation requests:

simulate chrom 1 markers
simulate traits 1
set traits 1 tlocs 1

The above statement asks genedrop to simulate marker loci on chromosome 1. Additionally, one quantitative trait controlled by one tloc will be simulated. The number of markers, and the relative locations of tloc and marker loci will be determined from the ‘map’ statements below. In MORGAN-3, traits are distinguished from trait loci, and thus the statement ‘set traits 1 tlocs 1’ assigns trait 1 to trait locus 1. In general one or more traits may be assigned to any given trait locus. If no trait locus is to be simulated, the lines ‘simulate traits 1’ and ‘set traits 1 tlocs 1’ can be removed.

map chrom 1 marker dist  10 10 10 10 10 10 10 10 10
map chrom 1 tlocs 1 marker 5 dist 5

The above statement indicates a marker map on chromosome 1, with 10 equally spaced markers, each at a distance of 10 (Haldane) centiMorgans from the preceding one. Note that the number of markers is inferred from this statement. The trait locus is between markers 5 and 6 on chromosome 1, at a distance of 5 cM to marker 5.

A marker map or tloc position can also be specified by recombination fractions. For example,

map chrom 1 marker recomb fracs 0.1 0.5 0.2

gives a map of four ordered markers, M1,M2,M3 and M4, with recombination fraction 0.1 between M1 and M2, 0.5 between M2 and M3, and 0.2 between M3 and M4.

Marker allele frequencies are set by the following lines:

set chrom 1 markers 1  allele freqs 0.13 0.66 0.16 0.05
set chrom 1 markers 2  allele freqs 0.06 0.23 0.41 0.25 0.05
set chrom 1 markers 3  allele freqs 0.11 0.02 0.01 0.06 0.24 0.56
set chrom 1 markers 4  allele freqs 0.07 0.04 0.89
set chrom 1 markers 5  allele freqs 0.12 0.11 0.03 0.03 0.50 0.21
set chrom 1 markers 6  allele freqs 0.50 0.44 0.06
set chrom 1 markers 7  allele freqs 0.01 0.33 0.62 0.04
set chrom 1 markers 8  allele freqs 0.20 0.05 0.42 0.27 0.06
set chrom 1 markers 9  allele freqs 0.18 0.18 0.25 0.16 0.08 0.15
set chrom 1 markers 10 allele freqs 0.17 0.35 0.04 0.29 0.15

In the case where several markers have the same number of alleles and allele frequencies, one can group those markers together into one line:

set chrom 1 markers 11 12 13 15 allele freqs 0.2 0.8

However, we consider it good practice to specify the frequencies separately for each marker.

The following five lines describe the trait model. The trait locus can have only two alleles; here the frequencies are 0.5 and 0.5, for alleles 1 and 2, respectively. The mean values of the trait for each trait locus genotype are on the next line. Values correspond to the (1 1), (1 2) and (2 2) genotypes, respectively. The residual variance gives the within-genotype variance of phenotypic values about the mean. The additive variance (0 in this example, and by default if not specified) is the variance of an additive polygenic contribution to trait values.

set trait 1 allele freqs 0.5 0.5

set trait 1 for tlocs 1 geno means 90 100 110
set trait 1 residual variance 25.0
set trait 1 additive variance 0.0

The following three lines may be included in the parameter file (we have commented them out in the example so as to keep the output file small and easy to read).

output pedigree record founder genome labels
output pedigree record trait latent variables
output pedigree record unobserved variables

These lines request that the founder genome labels and latent variable values for the trait be included in the output file, and that the data be output for all (observed and unobserved) individuals. Founder gene labels indicate, for all non-founders, which founder alleles were passed to the individual. For the trait variables, the latent founder genome labels, the trait locus genotype, and the additive and residual contributions to the trait value are given. Latent trait variables will precede the trait value in the output file.

See Concept Index for: genedrop sample parameter file, seed file, additive variance, residual variance, founder genome labels.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

5.3 Running `genedrop` examples and sample output

Two examples are available under the subdirectory ‘Simulation/’. The only difference is in whether command line options are to replace some parameter statements (see the ‘README’ file in the ‘Simulation’ directory).

The command to run the first example is:

./<program> <parfile> [ped <pedfile>] [seed <seedfile>] [oped <opedfile>]
./genedrop ped73_gdrop.par ped ../ped73.ped seed ../marker.seed oped gdrop.oped

For the parameter file ‘ped73_gdrop_2.par'’ the three files (input and output pedigree, and seed) are given in the file, and so are not needed as command line options. This file may be run simply as

./genedrop ped73_gdrop_2

The output is the same as for ‘ped73_gdrop.par’.

When running the genedrop example, we here use an (unchanging) input marker seed file ‘../marker.seed’ but output to the current directory file ‘marker.seed’. However, in practice a file such as ‘marker.seed’ can be specified as both the input and output seed file. If a ‘overwrite’ option if not included in the ‘output seed file’ statement, successive runs will generate warnings (W), but this is not a concern. Recall from the previous section that, by default, MORGAN appends the new output seeds to the existing seed file at the end of each run. In the next run, the last (most recent) seed will be used. To avoid this warning (and an ever-growing seed file), either use the ‘overwrite’ when outputting the seeds (see the previous section Sample genedrop parameter file), or manually edit the seed file removing earlier lines.

Since the function of genedrop is to simulate marker and trait data, it, unlike other MORGAN programs, always creates and outputs a pedigree file. The output file ‘gdrop.oped’ is structured similarly to the input file ‘ped73.ped’, with one individual per record (line). However, the output file contains additional columns and does not include the parameter statements found at the top of the input file. The first four items are the individual’s name, the names of the parents, and gender. If no addition output options are set, the next items are the genotypes of the markers (two items per marker) in the order they are found on the chromosomes, followed by the trait values in the order of the trait labels.

Notice the three statements at the end of the parameter file. In order to save space and make the output more readable, these statements have been commented out so that they are not executed by the program.

If the statement ‘output pedigree record trait latent variables’ was included in the parameter file, the output file would contain four additional columns preceding the trait value. The first two of these columns would be the trait locus genotype, followed by the additive component of the trait value and the residual component of the trait value. In this example, everyone has a ‘0.000’ in the additive component column because we set the additive variance to zero in the parameter file.

If the ‘output pedigree record founder gene labels’ is set, the founder genome labels (FGL) for markers precede the marker genotypes and the trait FGL precede the trait values (or the trait latent variables, if these are requested).

Also, if the ‘output pedigree record unobserved variables’ statement is included in ‘gdrop.par’, an observed indicator would follow gender in the output pedigree file. Also, marker and trait data would be output for all individuals, not only those indicated as ‘observed’.

See Concept Index for: running genedrop examples, genedrop sample output, seeds for data simulation.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

output pedigree record founder gene labels

When this option is selected, each record contains a pair of founder genome labels for each locus. Each founder is assigned a pair of labels, which are in the same order as the names of the parents. Then, for each locus of each descendant, founder genome labels are determined by the simulated meiosis indicators.

This statement is useful in cases where the founder origins or descent of trait locus alleles are required, for example in assessing the results of subsequent analyses of the simulated data.

output pedigree record trait latent variables

This statement requests that the quantitative trait latent variables be included in the output. The genotype at each trait locus, as well as the additive and residual component of each quantitative trait, will appear in the output record.

output pedigree record unobserved variables

If this option is set, genotypes, gene labels and trait values are output for both observed and unobserved individuals. An additional data field, following the gender indicator, specifies whether the individual is observed (‘1’) or unobserved(‘0’).

When this option is not selected, unobserved individuals take on default values; the genotype at each locus represented as ‘0 0’, the founder genome label (if requested) at each locus represented as ‘0 0’, and each quantitative trait value is recorded as ‘999’.

input pedigree record observed (absent | present)

The observed indicator is used to designate which members are observed, with ’0’ indicating unobserved, ’1’ indicating observed. When the observed indicator is present in the pedigree file, it follows gender (or parents, if gender is not present). If this statement is not given, all pedigree members are assumed to be observed. See also the next statement ‘assume all observed’.

If individuals are flagged in the pedigree file as unobserved, the default behavior is to indicate in the output pedigree file that the data for these individuals is missing.

assume all observed

When this statement is used, all members of the pedigree are treated as “observed” in the simulation. If an observed indicator column is present in the input file, it is ignored by the simulation.

See Concept Index for: genedrop output pedigree options, founder genome labels, meiosis indicators, inheritance indicators, unobserved individuals.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

5.4.6 `genedrop` output seed file options

output (marker | trait) seeds only: If an output seed file is given, both ending marker and trait seeds are saved unless one or the other is requested in this statement.

See Concept Index for: genedrop output seed file, seeds for data simulation.

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by Elizabeth Thompson on September 6, 2019 using texi2html 1.82.

5.1 Introduction to `genedrop`.
5.2 Sample `genedrop` parameter file
5.3 Running `genedrop` examples and sample output
5.4 `genedrop` statements

5.4.1 `genedrop` computing requests
5.4.2 `genedrop` mapping model parameters
5.4.3 `genedrop` population model parameters
5.4.4 `genedrop` computational parameters
5.4.5 `genedrop` output pedigree options
5.4.6 `genedrop` output seed file options

5. Simulating Marker and Trait Data in Pedigrees

5.1 Introduction to genedrop.

5.2 Sample genedrop parameter file

5.3 Running genedrop examples and sample output

5.4 genedrop statements

5.4.1 genedrop computing requests

5.4.2 genedrop mapping model parameters

5.4.3 genedrop population model parameters

5.4.4 genedrop computational parameters

5.4.5 genedrop output pedigree options

5.4.6 genedrop output seed file options

5.1 Introduction to `genedrop`.

5.2 Sample `genedrop` parameter file

5.3 Running `genedrop` examples and sample output

5.4 `genedrop` statements

5.4.1 `genedrop` computing requests

5.4.2 `genedrop` mapping model parameters

5.4.3 `genedrop` population model parameters

5.4.4 `genedrop` computational parameters

5.4.5 `genedrop` output pedigree options

5.4.6 `genedrop` output seed file options