Software for genetic association analyses in case-parent triads, case-control data (or combined case-parent control-parent triads), with SNP haplotypes

Web page last updated: May 26, 2010
Most recent version: Haplin 3.5, uploaded May 26, 2010


HAPLIN is free software written for the purpose of analyzing case-parent triad (trio) data and/or case-control data. Some of the main features of Haplin are:
The models estimated by Haplin are described in detail in Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.
PDF version here.
Also available from Blackwell

Important update: An easily accessible Graphical User Interface for generating Haplin syntax is now available from haplin.fhi.no, thanks to Nguyen Trung Truc. The syntax generator helps setting up Haplin commands which can be cut and pasted into your own R window. It includes most (but not all)  features currently available in Haplin.

What's new in this version of Haplin?

Some features high on the Wish List for Haplin


Haplin is written by Hakon K. Gjessing. Hilde-Gunn Bruu contributed to early versions of the data reading and preparation parts. Rolv Terje Lie has contributed with numerous useful and insightful suggestions, and inspired the work from its beginning. Nguyen Trung Truc programmed the nice external GUI for generating Haplin syntax. Øivind Skare has done extensive testing and simulations with the more recent versions of Haplin, and added a TDT test. Astanand (Anil) Jugessur has provided very useful feedback from a user's perspective.
Please feel free to contact me at hakon.gjessing@fhi.no, with questions or bug reports.

Note: Although we have done our best to avoid errors, the software is offered without any warranties. We cannot take responsibility for any problems or damages caused by using it.

Please: If you use Haplin in your analyses, it will be much appreciated if you refer to Haplin (this web page), or better, to the Annals of Human Genetics paper above.


Haplin is written for use with the statistical software R. However, it is easy to install and requires no previous knowledge of R. R can be downloaded free of charge from The R Project for Statistical Computing. For Windows users, a shortcut to the R installation file is found here. Haplin is implemented as a standard R library, and should run without problems on all reasonably new R versions, for Windows, Linux or UNIX.

To install Haplin in R:
Start R and type install.packages("Haplin")
Haplin will then be installed automatically over the internet from the CRAN library.
To start using Haplin, use the R command library(Haplin).
Haplin is then loaded and ready for use.

NOTE: Every time you start a new R session you must load Haplin with the R command library(Haplin). (However, you only need to install it from CRAN once.)

NOTE: To S-Plus users: Previous versions of Haplin did also run under S-Plus, but due to S-Plus's new licencing system I have decided it is not worth the trouble to maintain an S-Plus version. However, it should be easy for you to download R and run Haplin very much the same way as you would under S-Plus.

Running Haplin

Haplin is run by the single command


(or whatever the path to the data file is). The data file (data.dat) can have any name, but should be a text file in a specific format (see below). This command reads data, performs the estimation and prints and plots the result in one run.

By default, Haplin excludes triads with missing data. To include these triads in the calculations, include the use.missing argument:
haplin("C:/work/data.dat", use.missing = T)
(The letter "T" is short for TRUE in R)

For more examples of how to run Haplin, see the haplin help file (in R, type ?haplin).

I have collected a few pieces of advice that may be useful if you encounter problems.

Data format

The data format is a fairly simple ASCII file, described here.

For user convenience, it is also possible to convert files from the standard ped-format to the Haplin format. See here for details.

Trial run

To test that Haplin runs properly, you can download the trial data files HAPLIN.trialdata.txt and HAPLIN.trialdata2.txt, and run Haplin with the commands

haplin("HAPLIN.trialdata.txt", use.missing = T, maternal = T)
haplin("HAPLIN.trialdata2.txt", use.missing = T, n.vars = 2, ccvar = 2, design = "cc.triad", reference = "ref.cat", response = "mult")

The results should look something like this: HAPLIN.trialrun.txt, HAPLIN.trialrun2.txt.

In addition, a plot is produced, which should look something like this: HAPLIN.trialrun.jpg, HAPLIN.trialrun2.jpg.

Model and estimation

The models implemented in Haplin are extensions of the log-linear models described and developed in the papers

Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396. Wilcox AJ, Weinberg CR, Lie RT (1998). Distinguishing the effects of maternal and offspring genes through studies of "case-parent triads". American Journal of Epidemiology, 148(9): 893-901.
Weinberg CR, Wilcox AJ, Lie RT (1998). A log-linear approach to case-parent-triad data: assessing effects of disease genes that act directly or though maternal effects and that may be subject to parental imprinting. American Journal of Human Genetics, 62: 969-78

and follow-ups to these. The basic log-linear model for case-parent triad data allows a user to compute relative risks associated with a variant allele, together with corresponding confidence intervals and p-values. It also allows a similar effect estimation for maternal alleles, i.e. to study the effect of genes of the mother that may influence the development of the fetus. Haplin extends these models to situations with multiple densely spaced SNPs (or other markers), where phase is unknown. Haplin then estimates the relative risks associated with haplotypes, not only single markers. In addition, Haplin uses a parametrization that will detect (at least with sufficient sample size) dominance- or recessive deviations from a dose-response model. For some details about parametrization, choice of reference category and interpretation of results, see parametrization.pdf. The most recent Haplin version also includes the option to run on case-control data, or to combine case-parent triads with control-parent triads.

Old versions of Haplin

Hakon K. Gjessing
Professor/Senior Scientist
Division of Epidemiology
Norwegian Institute of Public Health
P.O.Box 4404 Nydalen
N-0403 Oslo, NORWAY
Email: hakon.gjessing@fhi.no

eXTReMe Tracker(started Apr. 29, 2009)