[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6. Simulating Marker Data Conditional on Trait Data in Pedigrees


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.1 Introduction to markerdrop

markerdrop simulates marker data at markers linked to a hypothetical trait locus. The user must specify whether marker data simulation is to be conditional on a trait model and trait data (the trait option) or as a (possibly partial) specification in the pedigree file of the inheritance at the trait location (the inheritance option). The choice of a trait model or an inheritance pattern will dictate which additional parameter statements must (or may) be included in the parameter file. For the trait option, the pedigree file contains trait data; for the inheritance option, the pedigree file contains meiosis (inheritance) indicators. See Specifying inheritance.

The program markerdrop first generates descent at the trait locus, and then, conditionally on this, descent at marker locations across the chromosome. At the requested locations, founder genome labels of observed individuals are determined, and then marker genotypes are imposed at each locus given the realized FGL and marker allele frequencies. In MORGAN V3.4, a new option to simulate ’dense markers’ is provided. In this case, the simulation of descent across the chromosome, instead of being generated marker-ti-marker, is done by simulating recombination breakpoints, and an ibd graph; See Running ibd_create examples and sample output; simpop_fgl. Only the descent determination uses this algorithm; thereafter, at each requested marker, founder genome labels are determined and alleles and genotypes assigned as before.

See Concept Index for: markerdrop introduction, trait model, incomplete penetrances, meiosis indicators, dense markers, founder genome labels.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.2 Sample markerdrop parameter file – conditional on trait

Files for markerdrop may be found in the ‘Simulation’ subdirectory of ‘MORGAN_Examples’. The sample parameter file ‘ped73_mdrop_trait.par’ requests simulation of marker data conditional on a trait model. The trait is assumed to be discrete when simulation is conditional on a trait model. Examples using the ’dense marker’ option may be found in the ‘Genedrop’ Gold standards.

Note that the ‘ped73_mdrop_trait.par’ parameter file contains a ‘set printlevel’ statement. MORGAN programs will produce varying levels of output given the print level. We recommend setting the print level to 5 for initial testing purposes. However, if the ’dense markers’ option is selected, it is best to suppress printing of the marker map, by, for example, ‘set printlevel 3’.

Many of the statements for simulation of the markers conditional on trait data are similar to those used in genedrop: See Sample genedrop parameter file. However, rather than simulating trait loci or trait data, these are provided to the markerdrop program.

The relevant section of the file is:

 
simulate markers 
select trait 2
set traits 2 tlocs 1
map marker positions 10 20 30 40 50 60 70 80 90 100 
map tlocs 1  marker 5  dist 5.0

set trait 2 data discrete                          
              
set traits 2 for tlocs 1 incomplete penetrances 0.05 0.8 0.95
set tlocs 1 allele freqs 0.5 0.5

set markers 1	allele freqs 0.13 0.66 0.16 0.05
set markers 2   allele freqs 0.06 0.23 0.41 0.25 0.05
.
.
set markers 10 data

101 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
201 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
202 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
301 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
302 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
304 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
.
.
.

The first four lines are required for markerdrop and must be included in the parameter file. The ‘map tlocs’ statement identifies the trait locus to be used in the simulation and gives its position relative to the markers on which we are simulating data. In this example, the trait locus follows marker 5 at a distance of 5 centiMorgans. The ‘simulate markers’ and ‘select trait’ statement indicates that the markers will be conditional on a trait model. The ‘map marker positions’ statement specifies the spacing of the markers to be simulated, from which the number of markers is also inferred.

Note that the parameter file for running a simulation conditional on a trait model requires two more lines than the parameter file for simulation conditional on an inheritance pattern (see next section). These two additional lines are required for discrete traits (the default for simulation conditional on a trait). The statement ‘set traits 2 for tlocs 1 incomplete penetrances ...’ specifies the probability of exhibiting the trait for individuals with trait locus genotypes ‘1 1’, ‘1 2’ (or ‘2 1’) and ‘2 2’, respectively. The statement ‘set tlocs ... allele freqs’ specifies trait locus allele frequencies.

The ‘set markers...allele freqs’ statements be included; they specify allele frequencies at each markers.

The markerdrop program uses the ‘set markers 10 data’ statement to specify which individuals and at which loci marker data are required. This is the same statement used by other programs in the analysis of marker data; see, for example Autozyg computational parameters. Marker data are specified for each marker locus as a pair of integer alleles, and ‘0’ indicates a missing value. For markerdrop any non-zero value will indicate that the data are to be observed. Typically, one may enter regular marker data, in order to generate other marker data with the same missingness pattern. Alternatively, as here, a ‘1’ may be used to indicate that the corresponding marker data are observed. Note that individual ‘302’ is the first observed member in this data set, and is observed for all 10 markers.

The parameter file ‘ped73_mdrop_trait.par’ uses the pedigree file ‘ped73.ped’, which is found in the ‘MORGAN_Examples’ directory. The file format section and first few lines of the pedigree data section of this file are below.

 
input pedigree size 73
input pedigree record names 3 integers 7 reals 1

***************************************************
101 0 0 1 0 0 0 -1 -1 0 999.5
102 0 0 2 0 0 0 -1 -1 0 999.5
201 101 102 1 0 0 0 0 1 0 999.5
202 101 102 2 0 0 0 1 1 0 999.5
2010 0 0 2 0 0 0 -1 -1 0 999.5
301 201 2010 1 0 0 0 1 1 0 999.5
302 201 2010 2 1 3 2 1 1 0 105.945

The first three columns are indices are ’names’ which are character strings. They are unique identifiers of each individual and his/her parents. By default, the parent order is father followed by mother. The next four columns are sex (1=male, 2=female), observed status (0=unobserved, 1=observed) and possible trait or other data. Recall that in the parameter file we had

 
select trait 2

Now we see also in the parameter file

 
input pedigree record trait 2 integer 4

This statement specifies that ‘trait 2’ is the fourth integer in the pedigree file, after the three names (that is, the 7 th item). Traits may be given any integer label: here ‘2’ is an arbitrary choice. This column of the pedigree file contains the affection status for a discrete traits (0=missing, 1=unaffected, 2=affected).

If desired, this statement can be included in the pedigree file instead. Other columns is the pedigree file are explained in the next section.

Note that markerdrop can simulate data for markers linked to only one trait locus, as specified in the ‘map’ statement in the parameter file.

See Concept Index for: markerdrop parameter file – conditional on trait, printlevel control.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.3 Sample markerdrop parameter file – conditional on inheritance pattern

The sample parameter file, ‘ped73_mdrop_inhe.par’, requests simulation of marker data conditional on an inheritance pattern. The relevant section of the file is:

 
simulate markers 
select inheritance 1
set inheritance 1 tlocs 1
map marker positions 10 20 30 40 50 60 70 80 90 100 
map tlocs 1 marker 4 recomb frac 0.01

set markers 1	allele freqs 0.13 0.66 0.16 0.05
set markers 2   allele freqs 0.06 0.23 0.41 0.25 0.05
.
.
.
set markers 10 data

101 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
201 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
202 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
301 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
302 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
304 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
.
.
.

The first five lines are required and must be included in the parameter file. The ‘simulate markers’ and ‘select inheritance 1’ statement indicates that we are simulating marker data conditional on inheritance, and identifies the inheritance pattern to simulate. The ‘set inheritance 1 tlocs 1’ maps this inheritance pattern to the first trait locus. In this example, the trait locus follows marker 4 with a recombination fraction of 0.01, as is indicated by the statement ‘map tlocs 1 marker 4 ...’. The ‘map marker positions’ statement specifies the spacing of the markers to be simulated, and also implicitly indicates the number of markers. The ‘map marker positions’ statements beginning at line 4 must be included; they specify allele frequencies at the first two markers.

Following the ‘set markers 10 data’ statement, the marker data availability is specified for each of the two associated alleles. A ’0’ indicates the data is unobserved, while a ’1’ indicates the data is observed. This specifies which alleles are to be output as data in the output simulated marker data.

The parameter file ‘ped73_mdrop_inhe.par’ uses pedigree file ‘ped73.ped’. The file format section and first few lines of the pedigree data section of this file are below.

 
input pedigree record names 3 integers 7 reals 1
***************************************************
101 0 0 1 0 0 0 -1 -1 0 999.5
102 0 0 2 0 0 0 -1 -1 0 999.5
201 101 102 1 0 0 0 0 1 0 999.5
202 101 102 2 0 0 0 1 1 0 999.5
2010 0 0 2 0 0 0 -1 -1 0 999.5
301 201 2010 1 0 0 0 1 1 0 999.5
302 201 2010 2 1 3 2 1 1 0 105.945

The first three columns are indices of individuals and their parents. The next two are sex and observation status. Integer columns 5 and 6 are inheritance indicators with the first being the paternal ones and the second the maternal ones. A founder’s meiosis indicators are ‘-1 -1’.

The connection to these inheritance data is through the statement

 
input pedigree record trait 3 integer pair 5 6

in the parameter file. Recall that on counting the pedigree file columns the integers follow the three names, so that integers 5 and 6 are columns 8 and 9 overall.

Note that markerdrop can only simulate data for markers linked to exactly one trait locus, as specified in the ‘map’ statement in the parameter file.

For more information on markerdrop options see markerdrop statements.

See Concept Index for: markerdrop parameter file – conditional on inheritance pattern.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.4 Running markerdrop examples and sample output

The markerdrop examples can be run while in the ‘Simulation/’ subdirectory. The syntax for running a MORGAN program is:

 
<./program> <parameter file> [> <output file>]
or
<program> <parameter file> [> <output file>]

if your PATH includes your current directory.

Note that if the output file command is not included, the results will print to the console. To run a simulation of marker data conditional on a trait model, type the following into the console:

 
./markerdrop ped73_mdrop_trail.par > mdrop_trait.out

Likewise to simulate marker data conditional on an inheritance pattern, type the following:

 
./markerdrop ped73_mdrop_inhe.par > mdrop_inhe.out

After running markerdrop with the parameter file ‘mdrop_inhe.par’, and the pedigree file ‘ped73.ped’ (as in the above example), the output file ‘mdrop_inhe.out’ is generated. Some sections of this output file are given below. Note that similar output would be generated using ‘ped73_mdrop_trait.par’.

 




Inter-locus distances in cM, using Haldane map function:

                            T1
 ----------------------------+------------------------------------------
   10.0   10.0   10.0    1.0    9.0   10.0   10.0   10.0   10.0   10.0
 +------+------+------+-------------+------+------+------+------+------+
M1     M2     M3     M4            M5     M6     M7     M8     M9     M10

 ......

 Assigned FGL in all listed individuals:
 trait locus, followed by 10 marker loci
 101  2 1  2 1  2 1  2 1  2 1  2 1  2 1  2 1  2 1  2 1  2 1
 102  4 3  4 3  4 3  4 3  4 3  4 3  4 3  4 3  4 3  4 3  4 3
 201  2 3  2 3  2 3  2 3  2 3  2 3  2 3  2 3  2 3  2 3  2 3
 202  2 4  2 3  2 3  2 3  2 4  2 4  2 4  2 4  2 4  2 4  2 4
 2010  6 5  6 5  6 5  6 5  6 5  6 5  6 5  6 5  6 5  6 5  6 5
 301  2 6  2 5  2 6  2 6  2 6  2 6  2 5  2 5  2 5  2 5  2 5
 302  2 6  3 6  2 6  2 6  2 6  2 5  2 5  2 5  2 6  2 6  2 6
 ......

 Assigned marker genotypes in accordance with data availability:
 101  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0
 102  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0
 201  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0
 202  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0
 2010  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0
 301  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0  0 0
 302  4 2  4 3  5 6  3 3  6 1  1 2  3 3  1 4  1 6  1 4
 ......

In the output file above, the marker map is shown, as specified in the parameter file. Below the map, founder genome labels (FGL) are listed. In this section of the pedigree, individuals 101, 102 and 2020 are founders and so each of them has been assigned two unique FGL. One of each founder’s FGL has been randomly selected to be passed to their offspring. Using the FGL, marker genotypes have been assigned to individuals on whom data were specified as available in the parameter file, individual 302 for example.

The FGL and genotypes output by markerdrop are ordered (or phased). That is at each locus the paternal allele precedes the maternal allele. When these marker data are read by another MORGAN program they are read as unphased.

Also note that the printlevel has been set to 5 in this example; without doing so, the default behavior would be to omit printing the marker map as well as the FGL data.

See Concept Index for: running markerdrop examples, markerdrop output, founder genome labels, phased and unphased genotypes.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.5 markerdrop statements

See Concept Index for: markerdrop statements.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.5.1 markerdrop computing requests

markerdrop always requires the following statement:

simulate [chromosome I] [dense] markers

This statement requests that markers are to be simulated. Whether the simulation is conditional on a trait model or on an inheritance pattern is inferred from the following statements. If the option to use dense markers is selected, descent is simulated by creating the recombination breakpoints in a meiosis, rather than simulating inheritance from marker to marker using recombination fractions.

markerdrop always requires one of the following two statements to establish whether the trait option or inheritance option is to be used.

select trait K

This statement requests the simulation of markers conditional on a trait model using trait K. If marker data are simulated conditional on a trait model, the user must specify trait allele frequencies, genotypic penetrances and a map position for the trait locus within the parameter file. Affection status of each individual must be specified in the pedigree file following gender, if present.

select inheritance H

This statement requests the simulation of markers conditional on an inheritance pattern at the trait locus. If marker data are to be simulated conditional inheritance pattern, the user must specify a map position for the trait locus within the parameter file. In addition, a pair of meiosis indicators for each individual must be included in the pedigree file following gender, if present. The first of the pair describes paternal inheritance at the trait locus and the second describes maternal inheritance. Inheritance indicators are coded as ‘0’, ‘1’ or ‘-1’, corresponding to segregation of the trait allele from the individual’s grandmother, grandfather, or unknown, respectively. For example, ‘0 0’ indicates that the individual inherited the alleles carried by both grandmothers at the trait locus, while ‘0 1’ indicates inheritance of the paternal grandmother’s and maternal grandfather’s alleles.

set traits K1… tlocs L1

This statement establishes the correspondence of traits to trait loci; it is used when the trait option is selected.

set inheritance H1… tlocs L1

This statement establishes the correspondence between loci and sets of partial inheritance indicators; it is used for the inheritance option. there may be more than one set of inheritance indicators assigned to a specific trait locus.

See Concept Index for: markerdrop computing requests marker simulation, marker simulation using trait, marker simulation using meiosis indicators.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.5.2 markerdrop mapping model parameters

map [gender (F | M)] marker ( [Kosambi] distances | recombination fractions | [Kosambi] positions) X1 X2

This statement is required for markerdrop if more than one marker is to be simulated. It specifies the marker map (optionally a sex-specific map), in units of genetic distance (cM), marker position (cM), or recombination fraction. If distance is selected, markerdrop will expect one fewer values than the number of markers, as these are intermarker distances. If position is expected, the same number of values as markers will be expected, as these are the positions of the markers relative to some zero point to the left of marker 1. If Kosambi is not specified, the Haldane mapping function is used to convert between genetic distance and recombination fraction.

map [gender (F | M)] tlocs K marker J ( [Kosambi] distance | recombination fraction ) X

This statement is required for markerdrop; it tells the program which trait locus to use in the simulation of marker data and gives a location for the trait locus, either as a map distance or recombination fraction, following the marker listed in the statement. As with genedrop, to simulate a trait locus position that precedes all markers, list the marker number as ‘0’.

See Concept Index for: markerdrop mapping model parameters


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.5.3 markerdrop population model parameters

set tlocs K1 allele frequencies X1 X2

This statement specifies trait locus allele frequencies. Trait loci must have two alleles; both allele frequencies must be listed and must sum to a value between 0.9999 and 1.0001. Otherwise markerdrop automatically normalizes the allele frequencies and issues a warning. Only one trait may be included in this statement.

set [chromosome I] marker names N1 N2...

This statement specifies marker names in the order of their position along the chromosome. Default names are marker-1, marker-2, etc.

set [chromosome I] markers K1 … allele frequencies X1 X2

Marker allele frequencies are specified using this statement. A marker can have up to 100 alleles and all allele frequencies must be listed. For each marker, the allele frequencies should sum to between 0.9999 and 1.0001. Otherwise they are automatically normalized and a warning message will be issued. Multiple markers can be included in a single statement if they have the same number of alleles with the same frequencies.

See Concept Index for: markerdrop population model parameters, trait allele frequencies, marker names, marker allele frequencies.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.5.4 markerdrop computational parameters

set traits K1 … for tlocs L1 … incomplete penetrances X1 X2 X3

This statement is required for markerdrop when using a trait model or when using meiosis indicators with a discrete trait. A penetrance, the probability of expressing the trait given a particular trait locus genotype, must be specified for each of the 3 possible genotypes at the trait locus. For example ‘incomplete penetrances 0.15 0.85 0.99’ specifies that the probability of expressing the trait is 0.15, 0.85 and 0.99 for (1,1), (1,2) and (2,2) trait locus genotypes, respectively.

set trait K data discrete

This statement is optional. A discrete trait is the default when simulating conditional on trait data.

As with genedrop, marker seeds and trait seeds can be specified or the default values can be used, See genedrop computational parameters.

See Concept Index for: markerdrop computational parameters, penetrance, incomplete penetrance, discrete trait, trait data, marker seeds, trait seeds.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.5.5 markerdrop input file options

The statements below are optional for markerdrop; they are used to indicate a change from the default order of trait values in the pedigree file. The first statement may be included if marker data are to be simulated conditional on a trait model and the second may be included if data are to be simulated conditional on an inheritance pattern.

input pedigree record traits K1 K2 K3 … integers I1 I2 I3

Unless this statement is present, the first integer following gender, if present, is assumed to be data for trait 1, the next integer for trait 2, and so on. Use this statement to specify an alternate correspondence between integer values in the record and trait numbers.

input pedigree record inheritance K1 K2 … integer pairs I11 I12 I21 I22

Unless this statement is present, the first two integers following gender, if present, in the pedigree file are assumed to be the meiosis indicators at the locus for trait 1. The next two integers are assumed to be the inheritance indicators at the locus for trait 2, and so on.

See Concept Index for: markerdrop input file options, pedigree record format.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Elizabeth Thompson on September 6, 2019 using texi2html 1.82.