[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2. Common Features and File Formats

All MORGAN programs use the same command line syntax, share many statements, and use the same pedigree data format. Most of the MORGAN programs need at least two input files in order to run: one parameter file and one pedigree data file. The parameter file contains computing requests, model parameters and input/output file options. It may also contain genotype data or other information specific to a particular MORGAN program. The pedigree file contains, at minimum, information on family relationships among the individuals in the sample. If the general syntax and format descriptions of this section seem complex, readers may find it easier to proceed to the actual examples of the following chapter. In the context of those examples, the general format may become clearer.

It is worth pointing out that white space in any input file is defined to be any of these characters: ‘,’ (comma), ‘\t’ (horizontal tab), ‘\v’ (vertical tab), ‘\n’ (line feed, or newline), ‘\f’ (form-feed), ‘\r’ (carriage return).

See Concept Index for: whitespace.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.1 Command syntax

The parameter file name must be passed to MORGAN on the command line when calling the program. Other file names can be passed to MORGAN on the command line or in the parameter file. The minimum syntax to call a MORGAN program is:

 
./progname parfile 

In the statement above, progname is the name of one of several MORGAN main programs, such as genedrop or lm_bayes. The parfile is the name of the parameter file which must be present. For example, to run genedrop using a parameter file named ‘genedrop.par’, the command is:

 
./genedrop genedrop.par

Note that if the current directory is in your PATH, you may say

 
progname parfile

but the form ./progname is more universal, and used throughout this tutorial.

Additional file names can be passed to MORGAN on the command line, but these file names must be accompanied by a file type to identify them. The syntax is:

 
./progname parfile [filetype filename]...

Square brackets indicate optional arguments. Possible filetype options include:

  ped

Input pedigree file

  xtra

input extra file

  mark

Input marker data file (Note that not all programs use marker data)

  oped

Output pedigree file

  seed

Input seeds for random number generator

  oseed

Output random seeds

  oscor

Output score file

  oxtr

Output extra file

If the name for a particular filetype is given both in the command line and in a parameter statement, the name in the command line takes precedence.

The programs put informational messages to stdout and error messages to stderr which default to the screen. It is possible to redirect either or both to a named file.

See Concept Index for: MORGAN files, command syntax, command line options, filetype codes.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.2 Parameter file

A MORGAN parameter file contains a series of statements. Many statements are common to all MORGAN programs, particularly those that define the format of the pedigree file and identify other files to be used for program input or output. Many statements are optional, with some default behavior. If statements irrelevant to the MORGAN program called by the user are included in the parameter file, those statements are ignored and a warning message is issued.

Each statement must begin on a new line and begins with one of the MORGAN statement keywords. A statement consists of any number of lines. Case is not significant for the keywords. Only the first four letters of the keywords are significant; the remainder of the word is ignored. The order of the statements does not matter. If the same statement is repeated, the last one overrides previous ones and a warning is given in the output file. A # starts a comment so that the rest of the line is ignored. Either single or double quotation marks (' or ") can be used to delimit strings such as file names. Look at the warnings issued by MORGAN to make sure the parameters are as you intended.

Note that the parameter statement form is extremely flexible. There is a limit on the line input buffer (set to 30,000 characters), but this does not impose any restriction as a statement may extend over multiple lines. The parameter files described in this tutorial and MORGAN Examples and Gold standards are generally aligned and words stated in full for clarity, but this is not necessary. As an example, a parameter file for the lm_linkage program is given both in clear form and edited to show this flexibility in Sample parameter files for lm_linkage and lm_bayes.

The most common statements are for identifying input and output files (counterparts of the command line options) and for describing the input pedigree file format.

Below is a simple parameter file, ‘check.par’, from the examples included with the MORGAN software under the subdirectory ‘MORGAN_Examples/Pedcheck’.

 
set printlevel 5
input pedigree file `check.ped'
input pedigree size 30
input pedigree record gender absent
input pedigree record observed present
assign gender
output pedigree chronological
output overwrite pedigree file `check.oped'
output overwrite individuals file 'indiv_oped'

A brief description of the most commonly used parameter file statements follows in the next section. For a complete and more detailed description of MORGAN statements, please see the sections of this tutorial relevant to specific MORGAN programs and the documentation that comes with MORGAN in the files ‘README_userdoc’ in the various program subdirectories.

See Concept Index for: line length limit, parameter file, parameter statements, parameter statement flexibility.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.3 File identification statements

Within the parameter file, file names are delimited with single or double quotation marks (` or "). File names submitted on the command line are not delimited with quotation marks. In a parameter file, either of the two statements below would identify ‘pedchk.ped’ as the pedigree file to be read.

 
input pedigree file "pedchk.ped"
input pedigree file `pedchk.ped'

The most commonly used file identification statements are:

input pedigree file filename

The input pedigree file is required for most programs and may be specified either in the parameter file or through command line options.

input extra file filename

The input extra file is used by some programs to input additional information, typically information needed by the program but for which parameter statements have not yet been implemented.

input individuals file filename

Several newer MORGAN programs do not use a pedigree file but may still require a list of the individuals to be used in the analysis. Such a list is provided by the individuals file. See Individuals file.

input marker data file filename

Marker data, such as marker allele frequencies, map distances between markers and individuals’ genotypes, can be included in the parameter file itself or in a separate file, called the marker data file. This statement is used when the marker data are not included within the parameter file. The marker file contains the ‘set marker data’ statements. Marker data are used by Autozyg programs. See Autozyg computational parameters.

input seed file filename

This file contains statements to set random seeds for the Monte Carlo based programs. The seed file may contain multiple lines (as in the case when the input seed file is also used for the output seed file). If so, the seeds in the last line override previous ones (with warnings issued). If no seed file is named on the command line or in a parameter statement and there are no statements to set random seeds in the parameter file, default seeds (12345, 1073 (hexadecimal 0x3039, 0x431)) are used.

output [overwrite] pedigree file filename

The output pedigree file is required by genedrop. Other programs also check for errors in the pedigree. If there are errors that the program is able to correct or if there are requested changes to the pedigree file format, the new pedigree data is written to this file.

output [overwrite] individuals file filename

An individuals file may be created for those downstream programs that require is by using the output individuals file option out of the pedcheck program. See Creating an individuals file.

output [overwrite] seed file filename

The final random seeds are saved if an output seed file is named. This file could be the same as the input seed file. New entries are appended to the old file, unless the overwrite option is specified.

output [overwrite] score file filename

The output score (or scores) file is used by several programs to output numerical results, typically in a format for input to another analysis program.

output [overwrite] extra file filename

The output extra file is used by some programs to output additional results.

Note that with MORGAN 3.0 several overwrite options have been added for output files, including pedigree and output scores files. Previously output scores were appended to existing output, if the file already existed, leading to confusion. This remains so, unless the overwrite option is used. Users should be cautious is using (and in not using) the overwrite option, and should, if using the option, be careful to copy previous output to another filename should they wish to retain it.

See Concept Index for: file names, pedigree file, individuals file, extra file, overwrite file options, marker data file, seed file.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.4 Limit overriding

Many global constants are set in the header file ‘limdefs.h’ in the Headers directory. Limits on variables may be altered by editing this file, but caution is recommended! Other limits may be set in other header files, for example ‘parseprog_opts.h’, and may be program-specific.

Additionally, there are three parameter statements which allow overriding of the preset limit values for the pedigree and trait-data file:

allow component size N

This statement overrides the program-defined maximum pedigree component size (presently 400 individuals for most programs).

allow observed individuals N

This statement overrides the program-defined maximum number of observed individuals; this applies only to some programs.

allow pedigree size N

This statement overrides the program-defined maximum pedigree size (presently 20,000 individuals for most programs).

Finally there are three ‘limit’ statements, relating to specific programs which will be detailed in the relevant chapters (Chapters 12 and 13).

See Concept Index for: limit overriding, limit statements, allow statements,


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.5 Input extra variables

There are two statements that are provided as development tools. These allow the user to input any number of integers and any number of reals into the program. These variables are counted (counts in global variables ‘NumbDummyInts’ and ‘NumbDummyReals’) and placed in global vectors ‘DummyInts’ and ‘DummyReals’. The values of these integers/reals can then be accessed in any program.

These dummy variables should not be confused with ‘dummy statements’ in some parameter files. These are statements which not relevant to the program in question, but are required in the parameter file in some programs which have been developed from other programs which use the statements.

set dummy integers I1 I2...

Any number of integer variables may be entered into the program using this statement

set dummy reals X1 X2...

Any number of reals variables may be entered into the program using this statement

See Concept Index for: dummy statements, dummy variables


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.6 Output control

By default, MORGAN sends its main output to standard output, stdout, and most warning and error messages to standard error, stderr. By default, both these will go to the terminal screen. Some programs use additional output files, such as the output score file or output extra file, to produce additional output, typically in a format for input for subsequent analyses.

Standard output may be redirected to a file, using the ‘>’ symbol. For example

 
./genedrop genedrop.par > output-filename

The way in which standard output and standard error may both be redirected to the output file depends on the shell in use, but typically the ‘>&’ redirect, or something similar, should work. For example

 
./genedrop genedrop.par >& output-filename

It is strongly recommended that users study the output warnings (W) produced, to check the program is interpreting parameter statements as expected.

Additionally, the standard output from each program is controlled by the following statement:

set printlevel N

The level of output produced by all MORGAN programs is controlled by a printlevel ranging from 0 to 5. The value 5 leads to full output. For larger runs, particularly with large numbers of genetic markers, the user may prefer to suppress some output. It is recommended that users initially run their test data with printlevel 5, to check their input is being interpreted as expected.

While the set printlevel statement may be used to suppress unwanted output, the set debug statements can be used to obtain additional output. These statements are available for all main programs and libraries, but their result depends on what has been coded by developers in checking the software. The statements are intended primarily for developers, not the general user.

set debug main

The set debug main statement applied in each main program to print additional information to stdout as coded in that specific program. Some programs may contain no such additional debug output code.

set debug libname

The set debug libname statement is available for each library, where libname is one of cmf, ibdgraph, markers, nghds, pars, pedchk, peel, quant, rans, sample, stuff or twoqtl, corresponding to the relevant library name. The statement will cause additional information to be printed to stdout as coded in the subroutine files of that specific library.

set debug level

The set debug level statement can be used in any program to, in principle, set the required level of additional debugging statements. However, currently only the lm_twoqtl program and ‘TwoQTL’ library include code making use of the debug level.

In some earlier releases of MORGAN run-time display was available for the lm_auto program, using the GLUT library system, and the MORGAN library, GLDisp identified as gldisp. The display output is controlled by a set of 6 display statements. The run-time display, while in theory operational if GLUT libraries are installed, are not currently maintained, and are omitted from the current tutorial.

See Concept Index for: output control, debug control, printlevel control, output –redirect, output warnings (W), display options, display –GLUT runtime.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.7 Pedigree file

The pedigree file may contain two sections, formatting statements and pedigree data, separated by the file separator ‘****’. The first section is optional; if present, it contains statements that describe the contents and format of the pedigree file, as many MORGAN users find it convenient to describe the pedigree data within the pedigree file itself. The alternative is to put these formatting statements in the parameter file.

The pedigree data begin below the file separator. Data for each individual must be placed on a separate line. Each line begins with three names, followed by integers, then real numbers. The only required fields, the three ‘names’, are identifiers for each individual and his or her parents. Names may include up to 15 alphanumeric characters.

Whitespace (comma, space, tabs, linefeed), single (') and double (") quotes, and the hash mark (#) cannot be included in names. Names longer than 15 characters are truncated to 15 characters. Pedigree founders should be given parents with names ‘0’.

Gender, if present, is the fourth item in each line. Gender is coded as an integer, such that ‘1’, ‘2’ and ‘0’ represent male, female, and unsexed, respectively.

These three or four values may be followed by an “observed” indicator, with values of ‘0’, indicating an unobserved individual, or ‘1’, indicating an observed individual. The optional “observed” indicator is followed by other integers, if present, and real numbers, if present. Integers and real numbers can represent individuals’ trait data, covariates, or other information (for example, year of birth).

The format of the file is flexible and is specified by the user with ‘input pedigree record …’ statements, described in the next section.

Unlike LINKAGE format pedigree files, marker genotype data are not included in a MORGAN pedigree file.

See Concept Index for: pedigree file, parameter statements in pedigree files, pedigree file separator, whitespace, individuals –names, observed individuals, unobserved individuals.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.8 Individuals file

Several newer MORGAN programs do not use a pedigree file, but still require a list of the individuals to be used in the analysis, potentially together with information such as gender, other covariate information or trait values. One such program is ibd_haplo which uses marker data to infer IBD using a population-based model. See Population-based inference of IBD.

Another program using the individuals file is gl_lods. See Parameter files for the gl_lods program. In this program, ibd graphs inferred from marker data using the gl_auto program are used to provide lod scores, given a trait model and trait data on some of the individuals in the ibd graphs. The goal is first to enable analysis of multiple trait models and even traits without re-running marker-based MCMC. Second, data security is enhanced by separating the pedigree information from the trait data.

The individuals file has similar format to the pedigree file, except that father and mother specifications for each individual are replaced by a component indicator. Since ibd graphs are generally produced by component, the gl_lods program does require this specification of pedigree component.

To assist users, the individuals file may be produced using the pedcheck program, using, for example, the same pedigree specification as is used for gl_auto. This will ensure consistency of component specification between the individuals file and the ibd graph input to gl_lods. See Creating an individuals file.

See Concept Index for: individuals file


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

2.9 Pedigree file description statements

Any of the following statements can be placed either in the parameter file or in the top section of the pedigree file, above the file separator, ‘****’. Most parameters have default values, in which case the statement is usually not required.

allow pedigree size N

This statement overrides the program-defined maximum pedigree size (presently 20,000 individuals).

input pedigree size N

Here, N is the number of records to be read. It may be less than the actual number of individuals in the pedigree file.

input pedigree record names 3 [integers I] [reals J]

This specifies the numbers of entries in each line of the pedigree file. There must be three names (up to 15 alphanumeric characters each) identifying an individual and his or her parents. Optional integers include gender and phenotypic or discrete trait data. Real numbers could be covariates or quantitative trait values.

input pedigree record (father mother | mother father)

This statement specifies the order of parental names. ‘father mother’ is the default.

input pedigree record gender (present | absent)

Gender, which follows the required triplet of names, is optional. If this statement is not included, the default is ‘gender present’. Gender is coded as an integer, such that ‘1’, ‘2’ and ‘0’ represent male, female, and unsexed, respectively.

input pedigree record observed (absent | present)

The observed indicator designates which members of the pedigree are observed and which are unobserved, indicated by ‘1’ and ‘0’, respectively. When the observed indicator is present, it follows gender (or parents if gender is not present). If this statement is absent, all pedigree members are assumed to be observed.

input pedigree record traits K1 K2... integers I1 I2...

This statement is needed when integer data for traits are included, and the trait values do not immediately and consecutively follow gender (if present). Use this statement to specify the correspondence between trait numbers and integers in the record.

input pedigree record traits K1 K2... reals X1 X2...

This statement is needed when real (non-integer) data for traits are included. The statement provides a correspondence between the trait and the column of the pedigree input file that contains those trait values. A real value with integer part 999 indicates a missing value.

Below are the first several lines of the sample pedigree file, ‘ped73.ped’ in ‘MORGAN_Examples’.

 
input pedigree size 73
input pedigree record names 3 integers 7 reals 1

***************************************************
101 0 0 1 0 0 0 -1 -1 0 999.5
102 0 0 2 0 0 0 -1 -1 0 999.5
201 101 102 1 0 0 0 0 1 0 999.5
202 101 102 2 0 0 0 1 1 0 999.5
2010 0 0 2 0 0 0 -1 -1 0 999.5
301 201 2010 1 0 0 0 1 1 0 999.5
302 201 2010 2 1 3 2 1 1 0 105.945
304 201 2010 2 0 0 0 1 0 0 999.5

Note that marker genotype data are not contained in the pedigree file. These data, if required for the MORGAN program invoked, are contained in the parameter file or in a marker data file specified in the parameter file using the ‘input marker data file’ statement. The second parameter statement in the file, ‘input pedigree record names 3 integers 7 reals 1’ describes the format of the data on each line (also called a record) in the file. The first three values in each row, the names, give an individual’s identification number followed by those of his or her father, then mother.

Because there is no ‘input pedigree record gender’ statement, gender is assumed to be present and to directly follow the three names. Absence of an ‘input pedigree record observed’ statement means that the genedrop program assumes all individuals are observed. This statement is not relevant to most other MORGAN programs although it can be used also by pedcheck.

The 6 integers following gender and the real number in the final column represent individual data. Lack of an ‘input pedigree record traits integers’ statement would imply that the first integer following gender corresponds to trait 1, the second to trait 2, etc. However, these parameter statements are typically provided in the parameter file, not in the pedigree file.

These (and other) parameter statement defaults apply only if there is no overriding statement in any of the parameter files used. Programs will generally provide a warning statement (coded “(W)”) when default values are being used due to absence of a relevant parameter statement.

See Concept Index for: pedigree file descriptions, pedigree size, pedigree record format, individuals –gender, pedigree trait data order.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Elizabeth Thompson on September 6, 2019 using texi2html 1.82.