[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
4.1 Introduction to kin | ||
4.2 Sample kin parameter file | ||
4.3 Running kin example and sample output | ||
4.4 kin statements | ||
4.5 The translink utility | ||
4.6 The ibd_class utility |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
kin
kin
computes kinship coefficients for pairs of pedigree members. It also
computes single-locus and two-locus inbreeding coefficients for members of the
pedigree. Briefly, the kinship coefficient between individuals i and j
is the probability that a randomly-drawn allele from i is identical by
descent (ibd) to randomly-drawn allele from individual j at the same
locus. A single-locus inbreeding coefficient is the probability that an
individual carries two copies of a gene that are ibd, at a given autosomal
locus. In other words, an individual’s single-locus inbreeding coefficient is
equal to the kinship coefficient of his parents, as an individual’s gametes can
be thought of as random draws from his parents’ chromosomes. A two-locus
inbreeding coefficient is the probability that an individual carries two ibd
copies of a gene at each of two linked loci. kin
presents two-locus
inbreeding coefficients as a function of the recombination fraction between the
two loci.
Note: The kin
program does check for duplicate requests
within a pedigree component of any inbreeding or kinship coefficients;
each will be computed once only, even if the individuals are specified
in reverse order.
However, it does
check (and quits with an error) if a request is made for kinship of an
individual with him/her self. This bug will be fixed in a future
MORGAN release.
See Concept Index for:
kin
introduction,
kinship coefficient,
inbreeding coefficient,
ibd,
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
kin
parameter fileFiles for kin
may be found in the ‘IBD’ subdirectory of
‘MORGAN_Examples’.
Below is a sample kin
parameter file, ‘jv_rep_kin.par’.
set printlevel 5 input pedigree file 'jv_rep.ped' compute component 1 kinship coeff 531 431 431 432 compute component 1 inbreeding coeff 332 531 compute component 2 kinship coeff 341 442 compute component 2 inbreeding coeff 441 541 compute component 1 two-locus inbreed coeff 531 compute component 2 two-locus inbreed coeff 441 set recomb freqs .01 .05 .04 .10 .18 .30 .50 .0 |
The statements on lines 3 – 8 request computation of kinship coefficients for
the pairs ‘531 431’ and ‘431 432’, and then inbreeding coefficients
for individuals ‘332’ and ‘531’, from component 1. It then requests
kinship coefficients for the pair ‘341 442’ and inbreeding coefficients for
individuals ‘441’ and ‘541’ from component 2. Finally, it requests
the two-locus inbreeding coefficient for ‘531’ from component 1 and
‘441’ from component 2. The two-locus inbreeding coefficient will be
computed for two loci at distances specified in the ‘set recomb freqs’
statement. (Note these need not be ordered, but the program will order them in
the output.) If there is more than one component (connected pedigree) in the
file, the component number must be specified. MORGAN assigns
component numbers to the connected pedigrees within the pedigree file. If your
data set contains more than one component, you may first run pedcheck
to
determine which individuals are assigned which component numbers.
See Concept Index for:
kin
sample parameter file.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
kin
example and sample outputUnder the subdirectory ‘IBD/’, run the example above using the command below. To send the output to a file instead of the screen, include ‘> filename’ (without quotes) after the parameter file name on the command line:
./kin jv_rep_kin.par or ./kin jv_rep_kin.par > out-file-name |
When you run this example you should get two warnings, ’(W)’,
as there are two
duplicate requests in the parameter file: kin
recognizes these
duplicates even though the order of individuals is reversed.
Below is the relevant part of the kin
output.
Component 1: Kinship coefficients: 531 431 .32031 431 432 .10938 Inbreeding coefficients: 332 .00000 531 .10938 2-locus inbreeding coefficients: (g4link is probability of IBD at both of 2 linked loci) proband recomb g4link freq prob 531 .000 .10938 .010 .10234 .040 .08386 .050 .07849 .100 .05660 .180 .03455 .300 .01910 .500 .01196 Component 2: Kinship coefficients: 341 442 .15625 Inbreeding coefficients: 441 .06250 541 .10938 2-locus inbreeding coefficients: (g4link is probability of IBD at both of 2 linked loci) proband recomb g4link freq prob 441 .000 .06250 .010 .05885 .040 .04905 .050 .04614 .100 .03388 .180 .02060 .300 .01008 .500 .00391 |
Note that when the recombination frequency is 0.0, the two-locus inbreeding coefficient is the same as the one-locus inbreeding coefficient, as there is no recombination between the loci, thus they act as a single locus. When the recombination frequency is 0.5, the two loci are independent and the two-locus inbreeding coefficient is the square of the one-locus inbreeding coefficient.
See Concept Index for:
running kin
example,
kin
sample output.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
kin
statementsAt least one of the following ‘compute ...’ statements are required to run
program kin
. If there is more than one component (connected pedigree) in
the file, the component number must be specified. MORGAN assigns
component numbers to the connected pedigrees within the pedigree file. If your
data set contains more than one component, you may first run pedcheck
to
determine which individuals are assigned which component numbers.
pedcheck
will sort by component number in the output pedigree file,
although it will not list component numbers in the file. The screen output
generated when running pedcheck
will give component numbers and the
number of individuals in each component. Check the error and warning messages
when running kin
to verify that component numbers were correctly
specified. The program will quit if an individual’s component number is
incorrectly specified in the parameter file or if there is more than one
component in the data set and no component is specified.
compute [component M] kinship coefficient N1 N2...
This statement names one or more pairs of pedigree members for which the kinship coefficient is to be computed.
compute [component M] inbreeding coefficient N1...
This statement names one or more pedigree members for whom the inbreeding coefficient is to be computed.
compute [component M] two-locus inbreeding coefficient N1...
This statement requests the computation of two-locus inbreeding coefficients, i.e. the probability of ibd at both loci, for the named individual. For the recombination frequencies at which the coefficients are computed, see the following statement.
set recombination frequencies X1 X2...
Two-locus inbreeding coefficients are computed for each of the list of recombination frequencies, in the range of 0.0 to 0.5. If frequencies are not given, the default values are: 0.0, 0.01, 0.04, 0.05, 0.10, 0.18, 0.30, and 0.50.
See Concept Index for:
kin
statements,
component,
kinship coefficient,
inbreeding coefficient,
recombination.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
translink
utilityThe translink program
converts
MORGAN-style input files, including
pedigree, trait, and marker data, marker map etc., into a
LINKAGE-style input file.
The current format of the output dat file is for linkmap program. Since LINKAGE format requires integer IDs, if individuals are output by index, they are renamed 1 through n. However, recoding may still be required for some LINKAGE programs, if there are multiple component pedigrees.
The MORGAN statements used to specify the marker and trait information are those used by the ‘Autozyg’ programs among others: see genedrop mapping model parameters. Note that because the program runs as an ‘Autozyg’ programs various dummy statements not relevant to translink are required in the parameter file.
Here is an example of a translink
parameter file:
# Version of pedigree and markers data for testing translink program. # It includes 2 components, selection of markers, various traits. # Include everything in the output file. set printlevel 5 input pedigree file 'linkage.ped' input marker data file 'linkage.markers' input extra file 'translink.xtra' select markers 3 4 6 7 select trait 1 set trait 1 tloc 6 set tloc 6 allele freqs 0.5 0.5 # ONE of the following three statements is required, # depending on the interval in which the trait locus is to be mapped # map tloc 6 marker 1 recom frac 0.08 # First locus; final recom. map tloc 6 marker 4 recom frac 0.01 # Within marker interval; "frac" ignored. # map tloc 6 marker 7 recom frac 0.1 # Last locus; initial recom. # Note; getting the left-side tloc position if markers are selected is a real # pain as specification is w.r.t original marker map. Here 0.08 recombination # right of marker 1 is little bit left of marker 2, which is 10 cM from # marker 3: the first selected marker. # Right hand side much easier -- just use last selected marker. input pedigree record trait 1 integer 8 set trait 1 data discrete set trait 1 for tloc 6 incomplete penetrances 0.05 0.6 0.95 # The following are DUMMY statements, required but irrelevant to translink use single meiosis sampler set MC iterations 1 set proband gametes 202 0 202 1 set sampler seeds 0x53f78285 0xdfbca001 |
The formats of these statements will be described in relevant sections later in the tutorial.
See Concept Index for:
translink
utility,
LINKAGE package format,
dummy statements.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
ibd_class
utilitySee References, for details of the cited papers.
An ibd graph is a compact specification of the patterns of
identity by descent among a set of individuals and across a chromosome
[Tho11]. Chromosomal locations of changes in bd are indexed
by integers; typically these are either marker or base-pair indices,
The ibd graphs may be realized conditional on marker data
(for example by using the program gl_auto
), and used in subsequent
analyses of trait data (for example by using the program gl_lods
).
Normally the requirement of the trait analysis is to compute the probability of trait data given the ibd graph. However, on a given subset of individuals observed for the trait, many different realization of the graph, at many different marker loci, and even different graphs may give rise to identical trait probabilities. Recognizing when ibd graphs are equivalent in terms of their trait probabilities is key to efficient analysis: clearly probabilities should be computed once only for each set of equivalent graphs. The IBDgraph library, written by Hoyt Koepke [KT13], determines equivalence classes of graphs, across loci and across realizations.
The IBDgraph library can be accessed through the gl_lods
program,
but (from MORGAN 3.2.1) the utility ibd_class
provides a direct
standalone interface to the library.
There are five basic options in the ibd_class program. The following table described them using an input file of ibd graphs ‘subpedb.dat’ in which there are are 1000 graphs and marker indices range from 1 to 201. The output is directed to ‘ibd_class.out’ (appended in the case of examples after the first).
(1) ../ibd_class subpedb.dat -m 77 > ibd_class.out
The -m
option gives the equivalence classes at the specified index:
here at marker 77.
The first part of the output for this example is
Testing at specified marker location 77. 1 : 254 1 : 414 1 : 639 6 : 9, 11, 105, 117, 122, 852 495 : 1, 3, 4, 17, .... |
showing 3 classes of size 1, one of size 6, and then two large classes of size 495 and 496. Classes are listed in increasing size order.
(2) ../ibd_class subpedb.dat -r 156 160 >> ibd_class.out
The -r
options gives the classes that equivalent over a specified
index range; here the range of markers 156 to 160.
If the range is beyond the index range in the file, then the final
graph specified (here marker 201) is assumed.
For this example the 72 classes range in size from 1 (24 classes) to 41.
(3) ../ibd_class subpedb.dat -s 155 99 >> ibd_class.out
The -s
options gives the range around the specified index
(here marker 99) for which the specified graph (here graph 155)
is invariant. For example, the result returned here is
‘IBD graph 155 is invariant on the interval [86,116).’
(4) ../ibd_class subpedb.dat -a 544 >> ibd_class.out
The -a
option gives all the index ranges over which the specified
graph (here graph 544) is invariant. In this example the following output
is produced:
IBD graph 544 is invariant on: [0,1), [1,27), [27,31), [36,57), [31,36), [57,78), [78,86), [86,112), [113,117), [112,113), [117,121), [121,137), [137,138), [138,155), [155,156), [156,160), [160,170), [170,173), [173,177), [177,184), [184,197), [197,inf) |
Note that the sets of indices need not be contiguous. In this example, the graph at markers 31 to 35 differs from same graph on each side, and the graph at marker 112 also provides a break to a different graph.
(5) ../ibd_class subpedb.dat >> ibd_class.out
Without any option specified, the program produces the classes of graph that are equivalent across all indices. In this example. there are no graphs equivalent across all 201 markers, so the program produces a list of 1000 classes each of size 1.
(6) ../ibd_class subpedb.dat -e >> ibd_class.out
The -e
option is the most complex, giving the equivalence classes
both across marker ranges and across graphs. The output is not easy to
parse by hand, but it is the option required for most efficient use
of the software. The program gl_lods
uses this option to ensure
that lod score contributions are computed once only for each
equivalence class of graphs in the entire set.
See Concept Index for:
IBDgraph library,
ibd_class
utility,
ibd graph
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Elizabeth Thompson on September 6, 2019 using texi2html 1.82.