4. Computing Kinship and Other Pedigree Computations

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

4.1 Introduction to `kin`

kin computes kinship coefficients for pairs of pedigree members. It also computes single-locus and two-locus inbreeding coefficients for members of the pedigree. Briefly, the kinship coefficient between individuals i and j is the probability that a randomly-drawn allele from i is identical by descent (ibd) to randomly-drawn allele from individual j at the same locus. A single-locus inbreeding coefficient is the probability that an individual carries two copies of a gene that are ibd, at a given autosomal locus. In other words, an individual’s single-locus inbreeding coefficient is equal to the kinship coefficient of his parents, as an individual’s gametes can be thought of as random draws from his parents’ chromosomes. A two-locus inbreeding coefficient is the probability that an individual carries two ibd copies of a gene at each of two linked loci. kin presents two-locus inbreeding coefficients as a function of the recombination fraction between the two loci.

Note: The kin program does check for duplicate requests within a pedigree component of any inbreeding or kinship coefficients; each will be computed once only, even if the individuals are specified in reverse order. However, it does check (and quits with an error) if a request is made for kinship of an individual with him/her self. This bug will be fixed in a future MORGAN release.

See Concept Index for: kin introduction, kinship coefficient, inbreeding coefficient, ibd,

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

4.2 Sample `kin` parameter file

Files for kin may be found in the ‘IBD’ subdirectory of ‘MORGAN_Examples’. Below is a sample kin parameter file, ‘jv_rep_kin.par’.

set printlevel 5
input pedigree file 'jv_rep.ped'
compute component 1 kinship coeff            531 431   431 432
compute component 1 inbreeding coeff         332   531
compute component 2 kinship coeff            341  442
compute component 2 inbreeding coeff         441   541
compute component 1 two-locus inbreed coeff  531
compute component 2 two-locus inbreed coeff  441
set recomb freqs .01 .05 .04 .10 .18 .30 .50 .0

The statements on lines 3 – 8 request computation of kinship coefficients for the pairs ‘531 431’ and ‘431 432’, and then inbreeding coefficients for individuals ‘332’ and ‘531’, from component 1. It then requests kinship coefficients for the pair ‘341 442’ and inbreeding coefficients for individuals ‘441’ and ‘541’ from component 2. Finally, it requests the two-locus inbreeding coefficient for ‘531’ from component 1 and ‘441’ from component 2. The two-locus inbreeding coefficient will be computed for two loci at distances specified in the ‘set recomb freqs’ statement. (Note these need not be ordered, but the program will order them in the output.) If there is more than one component (connected pedigree) in the file, the component number must be specified. MORGAN assigns component numbers to the connected pedigrees within the pedigree file. If your data set contains more than one component, you may first run pedcheck to determine which individuals are assigned which component numbers.

See Concept Index for: kin sample parameter file.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

4.3 Running `kin` example and sample output

Under the subdirectory ‘IBD/’, run the example above using the command below. To send the output to a file instead of the screen, include ‘> filename’ (without quotes) after the parameter file name on the command line:

./kin jv_rep_kin.par    or
./kin jv_rep_kin.par > out-file-name

When you run this example you should get two warnings, ’(W)’, as there are two duplicate requests in the parameter file: kin recognizes these duplicates even though the order of individuals is reversed.

Below is the relevant part of the kin output.

 Component 1:

    Kinship coefficients:

    531  431   .32031
    431  432   .10938


    Inbreeding coefficients:

    332   .00000
    531   .10938


    2-locus inbreeding coefficients:
    (g4link is probability of IBD at both of 2 linked loci)

    proband  recomb   g4link
               freq     prob

        531    .000   .10938
               .010   .10234
               .040   .08386
               .050   .07849
               .100   .05660
               .180   .03455
               .300   .01910
               .500   .01196


 Component 2:

    Kinship coefficients:

    341  442   .15625


    Inbreeding coefficients:

    441   .06250
    541   .10938


    2-locus inbreeding coefficients:
    (g4link is probability of IBD at both of 2 linked loci)

    proband  recomb   g4link
               freq     prob

        441    .000   .06250
               .010   .05885
               .040   .04905
               .050   .04614
               .100   .03388
               .180   .02060
               .300   .01008
               .500   .00391

Note that when the recombination frequency is 0.0, the two-locus inbreeding coefficient is the same as the one-locus inbreeding coefficient, as there is no recombination between the loci, thus they act as a single locus. When the recombination frequency is 0.5, the two loci are independent and the two-locus inbreeding coefficient is the square of the one-locus inbreeding coefficient.

See Concept Index for: running kin example, kin sample output.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

4.4 `kin` statements

At least one of the following ‘compute ...’ statements are required to run program kin. If there is more than one component (connected pedigree) in the file, the component number must be specified. MORGAN assigns component numbers to the connected pedigrees within the pedigree file. If your data set contains more than one component, you may first run pedcheck to determine which individuals are assigned which component numbers. pedcheck will sort by component number in the output pedigree file, although it will not list component numbers in the file. The screen output generated when running pedcheck will give component numbers and the number of individuals in each component. Check the error and warning messages when running kin to verify that component numbers were correctly specified. The program will quit if an individual’s component number is incorrectly specified in the parameter file or if there is more than one component in the data set and no component is specified.

compute [component M] kinship coefficient N1 N2...: This statement names one or more pairs of pedigree members for which the kinship coefficient is to be computed.
compute [component M] inbreeding coefficient N1...: This statement names one or more pedigree members for whom the inbreeding coefficient is to be computed.
compute [component M] two-locus inbreeding coefficient N1...: This statement requests the computation of two-locus inbreeding coefficients, i.e. the probability of ibd at both loci, for the named individual. For the recombination frequencies at which the coefficients are computed, see the following statement.
set recombination frequencies X1 X2...: Two-locus inbreeding coefficients are computed for each of the list of recombination frequencies, in the range of 0.0 to 0.5. If frequencies are not given, the default values are: 0.0, 0.01, 0.04, 0.05, 0.10, 0.18, 0.30, and 0.50.

See Concept Index for: kin statements, component, kinship coefficient, inbreeding coefficient, recombination.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

4.5 The `translink` utility

The translink program converts MORGAN-style input files, including pedigree, trait, and marker data, marker map etc., into a LINKAGE-style input file.

The current format of the output dat file is for linkmap program. Since LINKAGE format requires integer IDs, if individuals are output by index, they are renamed 1 through n. However, recoding may still be required for some LINKAGE programs, if there are multiple component pedigrees.

The MORGAN statements used to specify the marker and trait information are those used by the ‘Autozyg’ programs among others: see genedrop mapping model parameters. Note that because the program runs as an ‘Autozyg’ programs various dummy statements not relevant to translink are required in the parameter file.

Here is an example of a translink parameter file:

# Version of pedigree and markers data for testing translink program.
# It includes 2 components, selection of markers, various traits.

# Include everything in the output file.
set printlevel 5

input pedigree file 'linkage.ped'
input marker data file 'linkage.markers'
input extra file 'translink.xtra'

select markers 3 4 6 7
select trait 1
set trait 1 tloc 6

set tloc 6 allele freqs 0.5 0.5

# ONE of the following three statements is required,
#  depending on the interval in which the trait locus is to be mapped
# map tloc 6 marker 1 recom frac 0.08   # First locus; final recom.
map tloc 6 marker 4 recom frac 0.01     # Within marker interval; "frac" ignored.
# map tloc 6 marker 7 recom frac 0.1    # Last locus; initial recom.

# Note; getting the left-side tloc position if markers are selected is a real
#   pain as specification is w.r.t original marker map.  Here 0.08 recombination
#   right of marker 1 is little bit left of marker 2, which is 10 cM from
#   marker 3: the first selected marker.
# Right hand side much easier -- just use last selected marker.

input pedigree record trait 1 integer 8
set trait 1 data discrete
set trait 1 for tloc 6 incomplete penetrances 0.05 0.6 0.95

# The following are DUMMY statements, required but irrelevant to translink
use single meiosis sampler
set MC iterations 1              
set proband gametes 202 0 202 1 
set sampler seeds  0x53f78285 0xdfbca001

The formats of these statements will be described in relevant sections later in the tutorial.

See Concept Index for: translink utility, LINKAGE package format, dummy statements.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

4.6 The `ibd_class` utility

See References, for details of the cited papers.

An ibd graph is a compact specification of the patterns of identity by descent among a set of individuals and across a chromosome [Tho11]. Chromosomal locations of changes in bd are indexed by integers; typically these are either marker or base-pair indices, The ibd graphs may be realized conditional on marker data (for example by using the program gl_auto), and used in subsequent analyses of trait data (for example by using the program gl_lods).

Normally the requirement of the trait analysis is to compute the probability of trait data given the ibd graph. However, on a given subset of individuals observed for the trait, many different realization of the graph, at many different marker loci, and even different graphs may give rise to identical trait probabilities. Recognizing when ibd graphs are equivalent in terms of their trait probabilities is key to efficient analysis: clearly probabilities should be computed once only for each set of equivalent graphs. The IBDgraph library, written by Hoyt Koepke [KT13], determines equivalence classes of graphs, across loci and across realizations.

The IBDgraph library can be accessed through the gl_lods program, but (from MORGAN 3.2.1) the utility ibd_class provides a direct standalone interface to the library.

There are five basic options in the ibd_class program. The following table described them using an input file of ibd graphs ‘subpedb.dat’ in which there are are 1000 graphs and marker indices range from 1 to 201. The output is directed to ‘ibd_class.out’ (appended in the case of examples after the first).

(1) ../ibd_class subpedb.dat -m 77 > ibd_class.out

The -m option gives the equivalence classes at the specified index: here at marker 77.

The first part of the output for this example is

Testing at specified marker location 77.

1       : 254
1       : 414
1       : 639
6       : 9, 11, 105, 117, 122, 852
495     : 1, 3, 4, 17, ....

showing 3 classes of size 1, one of size 6, and then two large classes of size 495 and 496. Classes are listed in increasing size order.

(2) ../ibd_class subpedb.dat -r 156 160 >> ibd_class.out

The -r options gives the classes that equivalent over a specified index range; here the range of markers 156 to 160. If the range is beyond the index range in the file, then the final graph specified (here marker 201) is assumed.

For this example the 72 classes range in size from 1 (24 classes) to 41.

(3) ../ibd_class subpedb.dat -s 155 99 >> ibd_class.out

The -s options gives the range around the specified index (here marker 99) for which the specified graph (here graph 155) is invariant. For example, the result returned here is ‘IBD graph 155 is invariant on the interval [86,116).’

(4) ../ibd_class subpedb.dat -a 544 >> ibd_class.out

The -a option gives all the index ranges over which the specified graph (here graph 544) is invariant. In this example the following output is produced:

IBD graph  544 is invariant on: [0,1),
                                [1,27),
                                [27,31), [36,57),
                                [31,36),
                                [57,78),
                                [78,86),
                                [86,112), [113,117),
                                [112,113),
                                [117,121),
                                [121,137),
                                [137,138),
                                [138,155),
                                [155,156),
                                [156,160),
                                [160,170),
                                [170,173),
                                [173,177),
                                [177,184),
                                [184,197),
                                [197,inf)

Note that the sets of indices need not be contiguous. In this example, the graph at markers 31 to 35 differs from same graph on each side, and the graph at marker 112 also provides a break to a different graph.

(5) ../ibd_class subpedb.dat >> ibd_class.out

Without any option specified, the program produces the classes of graph that are equivalent across all indices. In this example. there are no graphs equivalent across all 201 markers, so the program produces a list of 1000 classes each of size 1.

(6) ../ibd_class subpedb.dat -e >> ibd_class.out

The -e option is the most complex, giving the equivalence classes both across marker ranges and across graphs. The output is not easy to parse by hand, but it is the option required for most efficient use of the software. The program gl_lods uses this option to ensure that lod score contributions are computed once only for each equivalence class of graphs in the entire set.

See Concept Index for: IBDgraph library, ibd_class utility, ibd graph

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by Elizabeth Thompson on September 6, 2019 using texi2html 1.82.

4.1 Introduction to `kin`
4.2 Sample `kin` parameter file
4.3 Running `kin` example and sample output
4.4 `kin` statements
4.5 The `translink` utility
4.6 The `ibd_class` utility

4. Computing Kinship and Other Pedigree Computations

4.1 Introduction to kin

4.2 Sample kin parameter file

4.3 Running kin example and sample output

4.4 kin statements

4.5 The translink utility

4.6 The ibd_class utility

4.1 Introduction to `kin`

4.2 Sample `kin` parameter file

4.3 Running `kin` example and sample output

4.4 `kin` statements

4.5 The `translink` utility

4.6 The `ibd_class` utility