Please note that the format of the compu* files
has changed, so setup files from older version of MORGAN
will not work.

This README file contains:
  A. To install the ibd_haplo program within your MORGAN-3
  B. Running the program on the Gold test examples (and your examples)
  C. Running the program on your test examples:
  D. Details of input files for examples
  E. Details of output files for examples
  F. A note on the python script pairwise.py
-----------------------------------------------------


A. To install the ibd_haplo program within your MORGAN-3:
-----------------------------------------------------

0) Download and install your MORGAN-3
   In the main MORGAN-3 directory you will say
         make morgan.gcc.dbg
   (Of course, you may use any of the morgan make options, but
      I always use this one)

1) Untar the ibdhap_prog.tar.gz file within your main MORGAN-3 directory.
   This will create a subdirectory IBD_Haplo, which contains a Makefile
   and four *.c source code files.

2) cd into the IBDHAP_PROG subdirectory;
   then 
         make ibd_haplo.gcc.dbg
   Note 1: it is probably advisable to use the same make option as in step 0
   Note 2: you will likely get a lot of warnings of the form
    ../Makefile.progs:188: warning: ignoring old commands for target `.cc.dml'
    You can ignore these; they come about because we have only one main
       program in this subdirectory

3) To remove the executable either simply
        rm ibd_haplo
   or
        make myclean
   Note:  You MUST do this if for any reason you remake the rest of
     your MORGAN-3.  The general "make morgan" commands will not
     clean and remake the ibd_haplo program, so library links will
     be incorrect. i.e. if you redo step 0, you MUST remove the
     ibd_haplo executable and redo step 2.

     --------------------------------------------------------------
B.  Running the Gold test examples;

    Similarly to other MORGAN programs, the IBD_Haplo directory 
    includes a Gold subdirectory, which actually includes this README file.

    1. There are currently 4 examples, 
        (i) for 4 sets of 4 haplotypes;                      ibd_haplo_gold
       (ii) for 4 pairs of individuals with genotypic data:  ibd_geno_gold
      (iii) same as (ii) bt data input as haplotypes         ibd_hapgen_gold
       (iv) for 4 pairs of individuals with partially phased
             genotypic data:                                 ibd_pphas_gold 
             (there are currently no gold output files for this example,
             but the parameter files are provided)

    2. As with other MORGAN programs these more easily run using
        the "make" command:  e.g. make ibd_haplo gold, but to prepare
	for your own examples you may prefer to run them directly:
	   ../ibd_haplo  ibd_haplo.par > ibd_haplo.out
	   or
	   ../ibd_haplo  ibd_genos.par > ibd>genos.out
    
    3. In this case, make sure you have cleaned our old output files
       first: see the Makefile, and/or say "make.help" for more info.


     ---------------------------------------------------------------
C. Running the program on your test examples:

To run your own examples you may prefer to have your own directory.
The easiest way is to set up any directory, and put your input and
parameter files (see section D) there and then

1.  make a (soft) link to the ibd_haplo executable: for example
    ln -s /castor/thompson/MORGAN_3_Feb09_CVS/IBD_Halo/ibd_haplo ibd_haplo
   
   Note 1:  Of course, you will use the full pathname of where your ibd_haplo
       program is.
   Note 2:  If 
         ls -l ibd_haplo 
       shows there is already an ibd_haplo link, you should probably unlink 
       it first:  "unlink ibd_haplo".
   Note 3:  If preferred the link may go in your bin directory, or anywhere
       your system will look for commands, as may be preferred by you
       or your system administrator.

4.  As with other MORGAN programs, the general format is
          progname  parfile  > outfile
    So here, for example, one might say
          ./ibd_haplo ibd_haplo.par > ibd_haplo.out
	
5.  You will find you have generated two new output files
        e.g. qibd_h.out,  ibd_haplo.out
    These are described in detail below in section E.

6.  Before rerunning, remove or move (if you want to keep them) these
    two output files!!  In the case of "ibd_haplo.out", the program probably
    (depending on your setup) will not run if that file already exists.
    In the case of the "qibd_h.out" file, the next output will append
    to the file, and this (large) file gets ever larger and confusing!!

----------------------------------------------------------------------

C.  Details of input files for examples

1.  First look at the MORGAN parameter file  in the Gold subdirectory
         ibd_haplo.par
    It just gives the names of two files:
         an input file          ibdhap_input_files
     and an output file         qibd_h.out   -- which we met above.
   Note: clearly you could change these names for your examples.
   (The ibd_genos.par and ibd_hapgen.par give a parallel examples, 
      but we do not describe the details.)

2.  Now look at the MORGAN extra input file
          ibdhap_input_files
    It consists only of three (optionally four) more filenames:
          compu_4haps.dat
	  haplotypes.dat
	  chr07.markers
      (phasing.txt) -- only in case of partially phased genotypes
      which we now describe.
    Note: again, you could change these names for your examples.

3:  chr07.markers:
       This file contains the marker information, in a very similar way
       to other MORGAN files, but without the explicit MORGAN parameter
       statements.  The data are for 2132 SNP markers on "chr07".

       First the 2132 chromosomal positions of the SNPs are listed.
       For convenience they are here in 213 lines of 10, with two extra,
       but that is not required.  These are sex-averaged cM positions--
       only differences (cM distances) are used.  If your position
       info is in bp, a rough translation is given by dividing by 10^6.

       Then the allele frequencies of each SNP are given, for markers
       1 to 2132, in order  (the integer count is for convenience, it
       is ignored by the program).  Again these are put one per line
       for convenience.  This is not required, but having them in
       correct order is!!

4. haplotypes.dat
       This file contains the haplotypic data for the haplotypes or genotypes
       to be analyzed.
       This particular data set consists of 16 haplotypes, each with an 
       integer "name".  The name is followed by 2132 alleles making up the
       haplotype. 1 and 2 are the SNP alleles, and 0 denotes missing.

       For genoptypic data the format would be the same, but there would be
       2132*2 = 4264 alleles following each "name".  The two alleles making
       up a genotype can be entered in either order  ("1  2" or "2  1").
       The program makes no assumptions about phase when analyzing genotypic
       data.

       Here the program has all the data for one haplotype/genotype on one 
       line, because this is how the script produced it.  Again, lines may
       be cut for convenience if desired.

5.  compu_4haps.dat
       This is the complicated file that tells the program what to do!!

The comment in the compu_4haps.dat file reads
# [# of states] [# of allelic phenotypes] [data input as genotypic] [analysis to be done as genotypic]
# [# of sets of haplotypes] [# of haplotypes in a set] [total chromosome length]
# [total # of markers] [ffkin] [ffrate] [delta]
#

(a)  This particular file "compu_4haps.dat" reads:
   15   16  0 0
   4   4  192.30
  2132   0.15 0.1 0.2

(b) A similar example for genotypic data would read:
   9   9  1 1
   4   4  192.30
  2132   0.1 0.1 0.2

(c) An analysis of partially phased genotypic data would read:
   15   16  0 2
   4   4  192.30
  2132   0.15 0.1 0.2

Line one:
# [# of states] [# of allelic phenotypes] [data input as genotypic] [analysis to be done as genotypic]
Examples (a):
For 4 haplotypes there are 15 ibd states, and 16 phenotypic data
configurations at each SNP (not counting missing data): i.e. each SNP can
be allele 1 or 2 on each of the 4 ordered haplotypes.
Example (b): For a pair of genotypes there are 9 ibd states, and 9 data
configurations -- each individual can be 1 1, 1 2 or 2 2.
Example (c): For analysis of partially phased data, we model 
15 underlying haplotypic states, although we may not be able to 
distinguish between some states if the data are unphased.  The last
two fields indicate that the data is formatted as haplotypic data (0)
but should be analyzed as partially phased data (2).  These parameters
must have these values for partially phased data.

Note the format of input data is now separated from the interpretation
for analysis.  That is  data may be put in either as haplotypic
(a line for each haplotype) or genotypic  (a line for each individual),
and then analyzed and either gentypic data or as haplotypic data.
If haplotype data are to be analyzed as genotypic, the phasing is ignored.
If genotypic data are to be analyzed as haplotypic, the first [second]
   allele of each pair is assumed to constitute the first [second] haplotype.
(Currently, partially phased data can only be input as haplotypic--
   future versions will do analysis on either form  of input).


Line two:
# [# of sets of haplotypes] [# of haplotypes in a set] [total chromosome length]

Example (a),(c):  4 sets of 4 haplotypes to be analyzed
        (b):  same, but this time it will take each successive pair
	        of alleles and interpret as unphased genotypes

Total chromosome length is given in centimorgans.

Line three:
# [total # of markers] [ffkin] [ffrate] [delta]
	IMPORTANT:  fkin is prior prob of IBD ---0.15 is VERY high --
	                 unless you know you have a lot of IBD
	            ffrate is rate change parameter for IBD-- 0.1
                        this is total change rate ped cM;
                        approximately it is the inverse of ibd length
                          between any pair of haplotypes, but where there
                          are >2 haplotypes, the length in a given ibd state
                          will be shorter.
                    delta:  This is a parameter that modifies the
                         transition matrix to alloc for ancestral shared
                         junctions.


6. phasing.txt
This is an optional file for when the genotypes are only partially phased.
Each line starts with the id numbers for the first "haplotype" in each pair,
then has a 1 or a 0 for each locus on the haplotype. 1 indicates that the
pair of genotypes has been phased into four haplotypes at that locus; 0 
indicates that the genotypes are unphased, and the analysis should treat
the phase as unknown.

----------------------------------------------------------------------

D.  Details of output files for Gold examples

As we have seen in B.5 there are two output files, one specified in
the parameter file (e.g. qibd_h.out) and the other as standard output in
your command line (e.g. ibd_haplo..out).

"qibd_h.out": this is the core output, which can then be processed in R (e.g).
    Each line is for each marker:
       the marker number, 1,2,3,...
       the marker position,  in cM  ..as originally input
       and in the current example 15 additional probabilities summing 
          (hopefully) to 1.
   IMPORTANT:  these are the probabilities, under the given model and
         conditional on the data, of each of the 15 stated of ibd among
	 the four haplotypes.a  The ordering of the 15 states is
	      states 1111,1122,1112,1121,1123, 1211,1222,1233,
	                1212,1221,  1213,  1231, 1223, 1232, 1234 
     Note: for genotypic analyses there will be 9 state probs (11 columns)
              we have the same 15 latent states, but genotypically
	      equivalent ones are combined. The order is the same:
	      states 1111,1122,1112+1121,1123, 1211+1222,1233,
	                1212+1221,  1213+1231+1223+1232, 1234 
           for pairwise haplotype analyses there will be 2 state probs
	             (4 columns); of the two state-probs, first is ibd,
		     and second is non-ibd.

"ibd_haplo.out": standard output file.
     As with many MORGAN programs this is simply a summary of what
     has been read in, and what is processes, mainly so the user can check
     that all is as expected:
     first all the various parameters of the run are printed.
     then the genetic data, with the first 10 alleles only (for checking)

     Next the equilibrium state probabilities under the provided
         fkin value, and the latent-ibd- process transition matrix under
	 the given parameters is printed for a 1cM distance.

    This is repeated (unnecessarily!!) for each set of haplotypes processed.


The other two gold standards:  ibd_genos_gold and ibd_hapgen_gold
       each has a similar two output files.
---------------------------------------------------------------------------

There is a new option in ibdhaplo, to include data that are partially
haplotypic and partially genotypic; see the ibd_pphas_gold above.
Gold standard output files for this option remain to be added.
----------------------------------------------------------------------

   F. The python script pairwise.py
        For details see the comments in this script file.
        Bascialy the script sets up the data files for an IBD_Haplo 
          analysis of all pairs of individuals from a set of individuals, 
----------------------------------------------------------------------