[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7. Estimating a priori ibd Probabilities by Monte Carlo

See Concept Index for: a priori ibd probabilities, identity by descent, ibd.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.1 Introduction to ibddrop

ibddrop estimates probabilities of gene identity by descent, ibd, (such as kinship, inbreeding, or multi-gene identities) by Monte Carlo in the absence of data. Given the pedigree and a genetic map, ibddrop simulates meioses indicators and scores them to estimate the ibd probabilities among a set of gametes. As originally written, the parameter format of ibddrop was set up to parallel that of the program lm_auto See Estimating Conditional IBD Probabilities by MCMC. The lm_auto program also estimates ibd probabilities, but does so conditionally on marker and (if requested) trait data. This format has been retained in the current ibddrop but it is important to recognize that in ibddrop ’markers’ and ’tloc’ (trait locus) refers only to a location on the chromosome, and not to any allelic or phenotypic entities.

The simplest example of estimation of ibd probabilities among a set of gametes is the computation of an individual’s inbreeding coefficient. In this example, the set of gametes in question are the maternal and paternal gametes that make up the individual. A set of two gametes can be either ibd or not-ibd. To keep track of ibd status among the gametes, we can label the paternal allele ‘1’. If the two alleles are ibd, the maternal allele would also be labeled ‘1’, and the resulting ibd pattern would be ‘1 1’. If the two alleles are not ibd, the maternal allele would be labeled ‘2’ and the resulting pattern would be ‘1 2’. The individual’s inbreeding coefficient is the probability that the two alleles follow the ‘1 1’ pattern.

If there are three gametes in the set, there are five potential ibd patterns: ‘1 1 1’ (all three gametes are ibd), ‘1 1 2’ (the first two are ibd and the third is not), ‘1 2 1’ (the first and third are ibd) , ‘1 2 2’ (the last two are ibd), and ‘1 2 3’ (none are ibd). ibddrop can estimate probabilities of ibd patterns among up to 10 gametes in a set. ibddrop outputs a probability for each ibd pattern at each marker.

Gene identity can be scored either for each locus separately, in which patterns of identity among up to ten gametes can be scored, or it can be scored jointly over a moving window of several loci. If the moving window option is selected, ibddrop estimates the probabilities of each ibd/non-ibd pattern at loci across the window, for the specified pair of gametes.

For MORGAN V3.4, the ibddrop program has been extensively revised.

See Concept Index for: ibddrop introduction, ibd pattern, meiosis indicators, simulation of descent in a pedigree.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.2 Sample ibddrop parameter file

See the Concept Index for: ibddrop sample parameter file,


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.2.1 Classic ibddrop parameter file

The example parameter file ‘jv_ped_ibd.par’ in the ‘IBD’ subdirectory of ‘MORGAN_Examples’ has been updated so that it will run under MORGAN V3.4. So also have examples parameter files ‘ibddrop1.par’ and ‘ibddrop2.par’. Details may be found in ‘README_ibd’ in that same subdirectory. However, for convenience, we describe here examples in the Gold subdirectory of the Genedrop program directory. One such example is the parameter file parIBD_LL:

 
set printlevel 5

input pedigree file "ped45"

simulate markers
simulate tloc 11

map          markers   recomb fract .18 .1 .1 .1
map tloc 11  marker 2  recomb fract .06

set component 1  scoreset 1  proband gametes 331 0 333 1
set component 1  scoreset 2  proband gametes 531 0 531 1 331 0 333 1
set component 2  scoreset 1  proband gametes 3v1 0 3v3 1
set component 2  scoreset 2  proband gametes 5v1 0 5v1 1 3v1 0 3v3 1
set component 3  scoreset 2  proband gametes 5w1 0 5w1 1 3w1 0 3w3 1
set component 3  scoreset 1  proband gametes 3w1 0 3w3 1

set sampler seeds 0x8a226a51 0xd2978c71

simulate 20000 ibd realizations

The parameter file specifies the pedigree file name ‘ped45’ and then asks simulation at markers and at one trait locus. The number of markers is determined by the map statement: since there are 4 recombination parameters provided, there will be 5 marker locations in addition to the trait location. Note that, since there are no data, this is simply a way to specify 6 locations, one of which (the tloc) may play a special role and/or may be unlinked. ‘ped45’ contains 45 individuals, who are 3 replicates of the JV pedigree. The file includes also gender and a ’trait’ but trait data are ignored by ibddrop. (Other programs which use the trait data, such as lm_auto may also used this same pedigree file. See Estimating Conditional IBD Probabilities by MCMC.)

The two ‘map’ statements specify the genetic map. From the first statement, the genetic distances between the markers are 44.6, 44.6, 11.2 and 11.2 centiMorgans. From the second statement, the trait lies between markers 2 and 3, at 22.3 centiMorgans with marker 2.

The ‘set proband gametes’ statements tell ibddrop which gametes to score: that is, the gametes among which the ibd probabilities will be estimated. In this example, we selected, from component 1 (the first family in the data set), the maternal (0) gamete of ‘331’ and the paternal (1) gamete of ‘333’. The next statement selected four gametes to score from family 2. Note that characters are allowed in the names of individuals.

The ‘input seed file’ statement enables the file to use the seeds from file ‘sampler.seed’. The ‘output overwrite seed file’ statement allows the program to replace the contents of the seed file with the newly generated seeds. If this options were omitted, when the program finished running, new seeds would be appended to the end of the file. Seeds can also be set using the ‘set sampler seeds’ statement (see ibddrop statements).

The number of Monte Carlo realizations is set to be 20,000 by the ‘simulate ibd realizations’ statement.

Note that to compute a multilocus ibd probability, the statement ‘set locus window’ can be used to specify the number of loci to score jointly. ibddrop has limited functionality for computing multilocus probabilities, it can only examine two gametes to determine whether or not the two are ibd. For an example see the parameter file parIBD_WL where the statement ‘set locus window 3’ is included, and each proband gamete set is of two gametes only. For additional options, including scoring of a specific multi-gamete ibd pattern over two or more loci, see the file Sample lm_auto parameter file: lm_auto has this more general option of scoring multi-gamete ibd patterns.

See Concept Index for: map function, proband gametes, seeds for sampler, seed file.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.2.2 Using dense marker simulation

An example of the ’dense’ simulation is given in the ‘Gold’ file dense_markers.par:

 
set printlevel 5

map chromosome 2 markers  recomb fract .1 .05 .15 .2
map chromosome 3 markers  recomb fract .04 .12 .1 .1
map chromosome 4 markers  recomb fract .18 .1 .1 .1

set component 1  scoreset 1  proband gametes 331 0 333 1
set component 1  scoreset 2  proband gametes 531 0 531 1 331 0 333 1
set component 2  scoreset 1  proband gametes 3v1 0 3v3 1
set component 2  scoreset 2  proband gametes 5v1 0 5v1 1 3v1 0 3v3 1
set component 3  scoreset 1  proband gametes 3w1 0 3w3 1
set component 3  scoreset 2  proband gametes 5w1 0 5w1 1 3w1 0 3w3 1

simulate chromosome 4 dense markers

simulate 100 ibd realizations

# Tell ibddrop where to read the pedigree from.
input pedigree file           "./ped45"

# Provide a file name for the ibddrop ibdgraphs results file.
output overwrite scores file  "./dense_markers.ibdgraphs"

# Set the sampler seeds.
set sampler seeds 0x8a226a51 0xd2978c71

# The following select markers parameter statement tells ibddrop
# where to score and output IBD probabilities at.

select chromosome 4 markers 1 3 5

Here the pedigree ‘ped45’ and proband gametes are as before. Locations on 3 different chromosomes are given, but just one (here ‘Chromosome 4’ must be chosen for simulation of ibd. the ‘simulate dense markers’ option specifies that simulation will be by generating recombination breakpoints in a continuum, rather than using marker-to-marker computations, and again a number of ‘ibd realizations’ is requested. An ‘output scores file’ is optionally provided. If it is, the simulated ‘ibdgraphs’ will be output in compact format. See The ibd_class utility. The scoring of IBD patterns over loci and outputting of IBD probabilities remain the same as before. Optionally, the ‘select markers’ statement can be used to specify a subset of the original ’marker’ locations at which to score ibd. If this statement is not present, ibd will be scored at all the mapped locations for that chromosome. The length of the simulated chromosome is determined by the distance between the first and the last selected markers.

Although here in this small example only a few locations are specified, this ’dense’ option is intended for cases where there are many marker locations, so marker-to-marker simulation is inefficient. If the ‘ibdgraph’ is output, subsets of marker locations for later scoring may later be selected without re-simulation. See Sample parameter files for ibd_create; fgl2ibd and fgl2haplo and fgl2dgl. Since scoring is done at each specified location, this part of the computation will be linear in the number of markers selected. With very large numbers of selected markers, it is suggested that ‘printlevel 3’ is a better option than ‘printlevel 5’ to avoid attempts to print the marker map. The capability of scoring jointly over a moving window is not currently available for the "dense" option.

See Concept Index for: dense markers


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.2.3 Selection against autozygosity

Examples in which selection is imposed at the first location (the ‘tloc’) are provided in the two Gold files ‘ibd_cleo_sparse.par’ and ‘ibd_cleo_dense.par’. An additional example, ‘cleo_segments.par’, is included in the ‘IBD’ subdirectory of ‘MORGAN_Examples’. These examples are so named because they use the very highly inbred Cleopatra pedigree, ‘cleopatra.ped’.

 
# for running "ibddrop"  with selection against autozygosity and against
#    mom-identity, at target locus 0,  on Cleopatra pedigree
# Aug 6, 2017 -- new version for Gold standard
#    Note this version prints the FGL statistics both to file and to output
#    It also prints autozygosity to the extra output file
#

input pedigree file 'cleopatra.ped'

# Include everything in the output file.
set printlevel 5

simulate markers
simulate tloc 11

map          markers   distances  1 1 2 3 5 8 13 21 34 55
map tloc 11  marker 0  distance 1

set component 1  scoreset 1  proband gametes 
Ptolemy-VIII  0 Ptolemy-VIII 1 Cleopatra-III  0 Cleopatra-III 1
set component 1  scoreset 2  proband gametes 
Berenice-III  0 Berenice-III 1 Cleopatra-VII 0 Cleopatra-VII  1
set component 1  scoreset 4  proband gametes 
Cleopatra-V 0 Cleopatra-V 1 Cleopatra-VII 0 Cleopatra-VII  1

set sampler seeds 0x8a226a51 0xd2978c71

simulate 100000 ibd realizations

output overwrite extra file "ibd_cleo_sparse.scor"

set dummy reals 0.9, 0.5, 0.9    # selection at 0.9 against autozygosity
                                 # selection of 0,5 against mom-identity
                                 #    and 0.9 for both

Much of the parameter file is as before, but note that the tloc at which selection is to be imposed should be at the left end of the set of markers at which ibd is to be scored. As before, ibd probabilities among the proband gametes is printed to the standard output. Since in ibddrop, the ‘output scores file’ (if requested) is used to save the realized ibd graphs, the ‘ioutput extra file’ is used to print the autozygosity statistics to file for possible later analyses.

Selection is imposed at the target tloc by means of three selection indices input via the ‘set dummy reals’ statement. See Input extra variables. The three numbers (0.9, 0.5 and 0.9 in the above example) are viability weights relative to a norm of 1. The first is for a potentially autozygous offspring (ibd between the two offspring gametes), the second for a potential offspring being identical (ibd at both gametes) to the mother, and the third is for a potential offspring with both characteristic (so that all four gametes of offspring and mother are ibd). Selection at the ‘tloc’ is imposed by modifying the segregation at this location. Descent at the linked markers is then simulated conditional on descent at the ‘tloc’. Note that selection does not change the pedigree structure, but only the segregation of DNA within the pedigree. The Cleopatra pedigree is chosen for the example, because its extreme inbreeding results in major effects of both types of selection.

The parameter file for dense marker simulation in this example is ibd_cleo_dense.par. It differs only in the statements:

 
simulate dense markers
output overwrite extra file "ibd_cleo_dense.scor"

Modulo variation due to randomness, the results for these two versions are identical. Only the methods of simulation of meiosis across the chromosome differs, with the ‘sparse’ realizations being done marker to marker using recombination fractions, and the ‘dense’ version generating recombination breakpoints.

An alternative is provided in the parameter file cleo_segments.par. This file was not included in the MORGAN V3.4 release, but instead is now added to ‘MORGAN_Examples/IBD’. This is very similar to the cleo_dense.par file, but differs in that the FGL graphs are additionally printed. The ‘output scores file’ is used for these FGL graphs. From the FGL graphs, segments of ibd among any set of gametes can be reconstructed and scored, rather than scoring marker-by-marker.

See Concept Index for: Cleopatra pedigree, selection against autozygosity, selection against maternal identity.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.3 Running ibddrop example and sample output

The syntax for running this MORGAN program is:

 
<./program> <parameter file> [ > <output file name> ]

where , optionally, ‘>’ redirects the standard output (<stdout>) to an output file instead of to the screen.

For example, the ‘parIBD_LL’ example can be run in the ‘Gold’ subdirectory of ‘Genedrop’ with the following command:

 
../ibddrop parIBD_LL  > ibddrop.out

The genetic map specified by the statements ‘map markers recomb fract’ and ‘map tlocs 11 marker 2 recomb fract’ is below. Note the position of the trait locus (T11) with respect to the marker loci.

 
 Chromosome map
 ..............

 Inter-locus distances in cM, using Haldane map function:

              T11
 --------------+---------------------
   22.3    6.4    4.8   11.2   11.2
 +------+-------------+------+------+
M1     M2            M3     M4     M5

Since the parameter file contains six‘set proband gametes’ statements, ibddrop will produce six sets of results in the output file (here ‘ibddrop.out’).

The exact probability estimates will, of course, depend on the random seed used. Some example results for the second component are detailed below.

 
 Summary for component 2:

    Probabilities of IBD patterns

       Proband gamete set 1:  3v1 0  3v3 1

       pattern marker-1 marker-2  tloc-11 marker-3 marker-4 marker-5    label

          1 1    .2526    .2497    .2517    .2509    .2495    .2500        0
          1 2    .7473    .7503    .7483    .7491    .7506    .7500        1

    Probabilities of IBD patterns

       Proband gamete set 2:  5v1 0  5v1 1  3v1 0  3v3 1

       pattern marker-1 marker-2  tloc-11 marker-3 marker-4 marker-5    label

       1 1 1 1    .0314    .0300    .0309    .0303    .0290    .0294        0
       1 1 1 2    .0284    .0272    .0278    .0288    .0290    .0289        1
       1 1 2 1    .0149    .0144    .0135    .0129    .0137    .0143        3
       1 1 2 2    .0100    .0087    .0092    .0092    .0092    .0092        4
       1 1 2 3    .0263    .0301    .0283    .0277    .0271    .0263        5
       1 2 1 1    .0658    .0649    .0637    .0627    .0631    .0625        6
       1 2 1 2    .0056    .0051    .0060    .0060    .0056    .0063        7
       1 2 1 3    .0589    .0593    .0583    .0590    .0588    .0594        8
       1 2 2 1    .0648    .0678    .0697    .0701    .0714    .0698        9
       1 2 2 2    .0494    .0505    .0503    .0507    .0486    .0493       10
       1 2 2 3    .1366    .1416    .1389    .1386    .1393    .1387       11
       1 2 3 1    .1366    .1348    .1349    .1346    .1330    .1349       12
       1 2 3 2    .0296    .0279    .0265    .0251    .0260    .0280       13
       1 2 3 3    .0961    .0955    .0975    .0979    .0995    .0995       14
       1 2 3 4    .2455    .2420    .2444    .2464    .2469    .2434       15

The probabilities are summarized by the ibd pattern. Each integer in the pattern represents one of the gametes that ibddrop was asked to score. Same numbers indicate gametes that are ibd. For instance, ‘1 1 1 1’ means all four gametes are ibd; ‘1 2 1 1’ means gametes 1, 3, and 4 are ibd, while gamete 2 is not ibd with the others; ‘1 2 3 4’ means all four gametes are not ibd.

The ibd patterns are scored for each locus separately; there is a column for each of the five markers and one for the trait locus. The final column ’label’ is a label for the state that can be easily inverted to obtain the ibd pattern; its main use is internal to the program. However, the in lm_auto program one may request scoring of specific ibd patterns by specifying the desired state labels (see Sample lm_auto parameter file).

To compute multilocus ibd probabilities, say for 3 loci, use the parameter file ‘parIBD_WL’ which contains the line ‘set locus window 3’. The interesting part of the output for component 2 is:

 
    Probabilities of IBD patterns for windows of 3 loci

       Proband gamete set 1:  5v1 0  5v1 1

         IBD  wndw 1 wndw 2 wndw 3 wndw 4

       0 0 0   .7826  .7680  .7902  .7919
       0 0 1   .0255  .0459  .0473  .0471
       0 1 0   .0712  .0377  .0264  .0249
       0 1 1   .0109  .0374  .0257  .0274
       1 0 0   .0312  .0694  .0488  .0484
       1 0 1   .0496  .0062  .0050  .0047
       1 1 0   .0045  .0161  .0267  .0268
       1 1 1   .0244  .0192  .0300  .0288

This time, ibddrop was asked to compute ibd probabilities in windows of three loci at a time. The four windows can be seen from the marker map: (M1,M2,T11), (M2,T11,M3), (T11,M3, M4), (M3, M4, M5). If the trait locus were unlinked to the marker loci, it would be placed to the left of the five marker loci on the map. Thus the first window, ‘wndw 1’, would include the trait locus and the first two marker loci. The values in the ‘ibd’ column at the left of the table represent ‘ibd’ patterns. The pattern ‘0 0 0’ means that the selected gametes are not ibd at the three loci in each window. The pattern ‘0 0 1’ means that the selected gametes are not ibd at the first two loci in the window, but are ibd at the third. The values in the columns give the probability of the ibd pattern at the left for each of the four windows. For example, the probability that the maternal and paternal gametes of individual 5v1 are ibd at marker loci 3 and 5, but not at marker locus 4 is 0.0047.

The format of output for the dense marker options is effectively identical to the above: only the simulation of meioses and recombination breakpoints differ. Example outputs for the beta-test version of ibddrop with selection may be found in the ‘Gold’ subdirectory of ‘Genedrop’. The relevant files all contain the string ‘cleo’, since the examples use the Cleopatra pedigree. The output for this beta-test program will not be further discussed here.

The alternative of outputting FGL graphs, and scoring ibd directly by segments as mentioned in see Selection against autozygosity, is provided in the parameter file ‘cleo_segments.par’. For reference this is included in the ‘IBD’ subdirectory of ‘MORGAN_Examples’. Note that, for compatibility with MORGAN V3.4, it is also necessary to output minimal regular scoring output. However, the intention would be to use the simulated FGL graphs in any downstream analyses.

See Concept Index for: running ibddrop example, ibddrop sample output, ibd pattern.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.4 ibddrop statements

Note that ibddrop does not simulate or use marker or trait data. The statements are used only to specify the map of the loci at at which descent is to be simulated and ibd scored. The locations of loci are specified in this way so that direct comparisons can be made between output of ibddrop and of lm_auto (see Running lm_auto example and sample output), where simulation is conditional on marker and trait data.

The additional ibddrop statements are:

simulate [chromosome I] [dense] markers

This statement requests that markers are to be simulated. For ‘ibddrop’ the name ’markers’ refers only to chromosome locations, not to actual alleles or individual genotypes. The number of markers (i.e. locations) is inferred from the marker map. If the option to use dense markers is selected, descent is simulated by creating the recombination breakpoints in a meiosis, rather than simulating inheritance from marker to marker using recombination fractions.

simulate tloc L

This statement, which typically follows the simulate markers statement, establishes the trait locus to be simulated. Note that this trait locus must be mapped onto the chromosome selected for marker simulation.

map tlocs L1 … unlinked

This statement specifies a trait to be simulated that is not linked to markers. Only one trait can be simulated and this trait will be placed to the left of all markers.

set [component M] proband gametes N1 K1 N2 K2...

In this statement, the user specifies which gametes ibddrop is to score. Each statement must contain gametes from a single component, as the components are assumed to be independent, i.e. the probability of ibd between gametes from different components is zero. Pairs consisting of an individual’s name and a meiosis indicator are listed, with ‘0’ indicating the individual’s maternal gamete and ‘1’ indicating their paternal gamete.

In the current version of MORGAN, the number of proband gametes in a set is limited to 10.

set [chromosome I] locus window K

This statement gives the window size (number of loci) for which the multilocus ibd probabilities are scored. If no size is given, each locus is scored separately.

set sampler seeds H1 H2

This statement initializes a pair of seeds for the random number generator. The seeds must be positive and no greater than ‘0xFFFFFFFF’, with the first seed (congruential seed) odd, and the second seed (Tausworthe seed) nonzero. If no seeds are specified, default seeds are used.

simulate K ibd realizations

This statement is requited for ibddrop.

See Concept Index for: ibddrop statements, proband gametes, meiosis indicators, seeds for sampler.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Elizabeth Thompson on September 6, 2019 using texi2html 1.82.