Classification of germline variants within disease-causing genes as pathogenic or benign is crucial for the translation of clinical genetic testing results. Cosegregation analysis of pedigree data (also called causality analysis) is a useful tool for assessing germline variant pathogenicity. This server performs cosegregation analysis by the Full-Likelihood method [1] and outputs a Bayes factor that can be integrated into a multifactorial variant classification scheme. It can also be transformed into a strength category to be used in the application of the variant classification guidelines developed by the American College of Medical Genetics and Genomics (ACMG) and the Association of Molecular Pathology (AMP). This server provides penetrance for 16 cancer-associated genes. It can also analyze other cancer genes if you provide a relative risk table, or other non-cancer genes if you provide penetrance. The website makes pedigree drawings from an input pedigree file.


The website is designed for classifying variant in a known disease gene as pathogenic or benign within the context of clinical genetic testing. If you want to conduct cosegregation analysis for a gene discovery research, i.e., to find out which gene associates with the disease of interest, then please see my other package VICTOR.


Now I have put COOL in two servers, Amazon Web Services and University of Utah Center for High-Performance Computing. They should be identical. If one server is down, please use the other one. Later when the website is stable, I will shutdown the Amazon server.



Quick start


The simplest way to do a cosegregation analysis is to provide a gene symbol and a pedigree file. The gene symbol must be one of the genes for which this server provides penetrance. For a quick tutorial, (1) copy the below Pedigree File example and save to a file in your computer; (2) click "Choose File" to upload the Pedigree File; (3) type BRCA1 in the "Gene Symbol" box; (4) click "Submit".


The cosegregation analysis results will be written to another web page; the URL of which will be shown on the screen. The result page contains parameters, Bayes factors and pedigree drawings. Results will be stored on the server for 30 days. After each analysis, you can click “show the previous page” (Safari) or “go back” (Chrome, FireFox) to do another analysis.


Long text will be truncated in the pedigree drawing. However, if you hover your mouse over the individual symbol, the comment will be shown on the screen. You can save the result page as a webarchive or print it to a PDF. It is recommended to save as webarchive because it will keep the mouse-hovering feature. The webarchive  contains pedigree drawings as .svg files. You can open and modify those files by Adobe Illustrator.


Advanced Options


Allele frequency: Allele frequency of the variant of interest from databases such as gnomAD. Otherwise, program will use the default 0.0001.


Break loops: If the program complains about loops, check your Pedigree File. If the loops are real, input yes and then run analysis again. The program will break loops for you. Loop can be due to consanguinity (e.g., someone married a cousin) or marriage (e.g., two brothers married two sisters).


Population: One or more population names divided by a comma. Alternatively, it can be NFE, LAT, AFR, SAS, EAS, or FIN, which stands for non-Finish European, Latino, African, South Asian, East Asian, or Finish, respectively. This option is not required if you use the Comprehensive Pedigree Format (see below).


Year range: It can be 1993-1997 or 1998-2002 or 2003-2007 or 2008-2012.  This option is not required if you use the Comprehensive Pedigree Format (see below).


Variant of interest: If your Pedigree File is in Comprehensive Pedigree Format (see below), it may contain genotypes of multiple variants. This option allows you to choose which variant to analyze.



Pedigree File


The Pedigree File must be a text file. If you edit data in Excel, please save as “Tab delimited Text (.txt)”. The recommended file format is called Comprehensive Pedigree Format (CPF). This format is designed to contain all necessary information for cosegregation analysis, while eliminating ambiguity (e.g. discrepancy in affection status assignment or proband definition) and improving usability (i.e., directly analyzable by COOL) and re-usability (i.e., re-analyze an old file without modifications when we change the disease risk model or the gene of interest). I hope it helps to streamline the analysis process and facility data sharing. If CPF is too difficult, you can use the Freestyle Format below. The server can read both formats.


In addition to performing cosegregation analysis, you can use this website to check for problems in a CPF file. If a computation is not successful, error message will be shown on the screen. Even if computation is successful, you can scroll down to the bottom of the result page to look for warnings about the file.



Relative Risk File


If your gene of interest is a cancer-associated gene and the penetrance is not provided by this website, you can upload a Relative Risk File. The website will calculate penetrance from data in your Relative Risk File and the Cancer Incidence in 5 Continents (CI5).


This file contains relative risk for multiple genes. The file can be divided into sections; one section for one gene. Between sections, blank lines or comment lines (starting with #) will be ignored. Each gene section starts with a "--gene=<GennSymbol>" line and ends with a "TableEnd" line. Between them is a relative risk table; one row for one age group and one column for a genotype, sex, and cancer site combination. The first three rows are headers for genotype, sex, and disease, respectively. The first column defines age group. Each gene can associate with an unlimited number of diseases. Please see “Supported Cancer Sites” below for the allowed disease names. List the most prevalent disease on the left-most column. Age group is defined by two numbers: the first can be 0/5/10/15/20/25/30/35/40/45/50/55/60/65/70/75/80/85; the second can be 4/9/14/19/24/29/34/39/44/49/54/59/64/69/74/79/84/+.


For dominant genes, add the --dominant line right after --gene=<GennSymbol>, then you don't need the "hom" columns. For recessive genes, you only need the "hom" columns, skipping the "het" columns. The --crhf option specifies the cumulative risk haplotype carrier frequency in the population. When this option is set, the program will calculate the non-carrier and carrier incidence rates so that the combined incidence, based on this carrier frequency, is equal to the observed population incidence in CI5.


Analysis by this website can use a model in which each family member is followed-up until the diagnosis of first cancer, risk-reducing treatment, or the last observation, whichever occurred first. This can be specified using the options “--ce-file=” or “--censoring+=”. The former will read a file for earliest censoring events such as cancers or risk-reducing treatments. The website provides two files, coseg2.ce.cancer.txt and coseg2.ce.HBOC.txt. The first one includes all cancer sites. The second one includes all cancer sites plus prophylactic mastectomy and prophylactic oophorectomy that are relevant to BRCA1/2. If you want to define your own censoring events, then use the “--censoring+=” option. Please notice the “+” sign. It means you can use this option many times, each time adding new events to the list. Arguments of this option can be one event or multiple events separated by a comma (,).


The Relative Risk File must be a text file. If you edit data in Excel, please save as “Tab delimited Text (.txt)”.


Below is an example Relative Risk File:


# BRCA1/2 Female BrCa & OvCa from Kuchenbaecker KB et al. JAMA. 2017;317(23):2402-2416. [2]

# BRCA1/2 Male BrCa from Antoniou AC et al. Br J Cancer. 2008;98(8):1457-66. [3]

# BRCA1/2 PanCa from Mocci E et al. Cancer Epidemiol Biomarkers Prev. 2013;22(5):803-11. [4]

# BRCA2 ProCa from BOADICEA (V7 Release 114) [5]


--gene=BRCA1

--crhf=0.00075

--ce-file=coseg2.ce.HBOC.txt

Geno     het   het   het    het     het     het     hom   hom   hom    hom     hom     hom

Sex      Male  Male  Male   Female  Female  Female  Male  Male  Male   Female  Female  Female

Disease  BrCa  OvCa  PanCa  BrCa    OvCa    PanCa   BrCa  OvCa  PanCa  BrCa    OvCa    PanCa

0-29     8     1     4.68   73.7    1       4.68    8     1     4.68   73.7    1       4.68

30-39    8     1     4.68   46.2    41.4    4.68    8     1     4.68   46.2    41.4    4.68

40-49    8     1     4.68   17.2    56.7    4.68    8     1     4.68   17.2    56.7    4.68

50-59    8     1     1.40   9.7     53.3    1.40    8     1     1.40   9.7     53.3    1.40

60-69    8     1     1.40   7.0     69.1    1.40    8     1     1.40   7.0     69.1    1.40

70-79    8     1     1.40   4.8     11.8    1.40    8     1     1.40   4.8     11.8    1.40

TableEnd


--gene=BRCA2

--dominant

--crhf=0.0013

--ce-file=coseg2.ce.HBOC.txt

Geno     het   het   het    het    het     het     het     het

Sex      Male  Male  Male   Male   Female  Female  Female  Female

Disease  BrCa  OvCa  PanCa  ProCa  BrCa    OvCa    PanCa   ProCa

0-29     80    1     4.77   7.33   60.8    1       4.77    1

30-39    80    1     4.77   7.33   20.3    7.3     4.77    1

40-44    80    1     4.77   7.33   16.4    15.9    4.77    1

45-49    80    1     4.77   7.33   16.4    15.9    4.77    1

50-54    80    1     2.03   7.33   11.4    24.5    2.03    1

55-59    80    1     2.03   7.33   11.4    24.5    2.03    1

60-64    80    1     2.03   7.33   6.4     21.5    2.03    1

65-69    80    1     2.03   3.39   6.4     21.5    2.03    1

70-74    80    1     2.03   3.39   6.6     4.4     2.03    1

75-79    80    1     2.03   3.39   6.6     4.4     2.03    1

80-84    1     1     1      3.39   1       1       1       1

TableEnd



Penetrance File


Penetrance File is useful for non-cancer genes, or when you want to use your own incidence data instead of those from CI5. But you have to calculate a liability class for each individual and make sure the liability class in the Pedigree File matches with the lines in the Penetrance File.


If you have N liability classes, the first line in the Penetrance File is the number N followed by some options about the liability class model including --genes, --diseases, --age-div, --unkn-aoo, and --unkn-sex. The programs will check whether the number N matches the option --age-div and --diseases. Suppose your --age-div argument has 6 numbers, they divide ages into 6+1=7 age groups. Suppose the --gene is associated with 2 --diseases, then you have (2+1)*7=21 liability classes for one sex, and N=21*2=42 liability classes in total.


After the first line are N lines of penetrance. Each line has 3 numbers corresponding to the penetrance for non-carriers, heterozygous carriers, and homozygous carriers, respectively. After the numbers are description of the liability class. For unaffected class, the description should contain the phrase “unaff”. Line order is important too. The first few lines are penetrance for unaffected individuals, then affected individuals in the same order as the --diseases argument. Within each unaffected and affected section, lines are sorted by sex (male to female) and age group (young to old).


Below is an example Penetrance File for BRCA2 in standard format. This file is currently in use by this web tool by default. Please see Reference for the source of penetrance for female breast and ovarian cancer [2], male breast cancer [3], pancreatic cancer [4], and prostate cancer [5].


The Penetrance File must be a text file. If you edit data in Excel, please save as “Tab delimited Text (.txt)”.


120 << --age-div=30,40,45,50,55,60,65,70,75,80,85 --gene=BRCA2 --diseases=BrCa,OvCa,PanCa,ProCa --rr=coseg_rr.txt --bing=Yes

   0.000007279   0.000092860   0.000092860   <<  1 unaff male 0-29 UK 1998-2002

   0.000056313   0.000660008   0.000660008   <<  2 unaff male 30-39 UK 1998-2002

   0.000183361   0.002015465   0.002015465   <<  3 unaff male 40-44 UK 1998-2002

   0.000558795   0.005528094   0.005528094   <<  4 unaff male 45-49 UK 1998-2002

   0.001815873   0.015685099   0.015685099   <<  5 unaff male 50-54 UK 1998-2002

   0.005487972   0.042679786   0.042679786   <<  6 unaff male 55-59 UK 1998-2002

   0.014030174   0.101554466   0.101554466   <<  7 unaff male 60-64 UK 1998-2002

   0.030272736   0.174792630   0.174792630   <<  8 unaff male 65-69 UK 1998-2002

   0.054956917   0.251743197   0.251743197   <<  9 unaff male 70-74 UK 1998-2002

   0.087542077   0.345250834   0.345250834   << 10 unaff male 75-79 UK 1998-2002

   0.126170026   0.436764058   0.436764058   << 11 unaff male 80-84 UK 1998-2002

   0.158273381   0.485094871   0.485094871   << 12 unaff male 85+ UK 1998-2002

   0.000361579   0.013853224   0.013853224   << 13 unaff female 0-29 UK 1998-2002

   0.003126994   0.070377778   0.070377778   << 14 unaff female 30-39 UK 1998-2002

   0.008572049   0.155056368   0.155056368   << 15 unaff female 40-44 UK 1998-2002

   0.016490198   0.256563784   0.256563784   << 16 unaff female 45-49 UK 1998-2002

   0.028756473   0.374883650   0.374883650   << 17 unaff female 50-54 UK 1998-2002

   0.044220151   0.488151860   0.488151860   << 18 unaff female 55-59 UK 1998-2002

   0.061233051   0.575776823   0.575776823   << 19 unaff female 60-64 UK 1998-2002

   0.078286465   0.636193577   0.636193577   << 20 unaff female 65-69 UK 1998-2002

   0.095545853   0.681989014   0.681989014   << 21 unaff female 70-74 UK 1998-2002

   0.114693025   0.718623145   0.718623145   << 22 unaff female 75-79 UK 1998-2002

   0.135302716   0.739892811   0.739892811   << 23 unaff female 80-84 UK 1998-2002

   0.151947021   0.744899538   0.744899538   << 24 unaff female 85+ UK 1998-2002

   0.000001337   0.000106941   0.000106941   << 25 BrCa male 0-29 UK 1998-2002

   0.000006959   0.000556450   0.000556450   << 26 BrCa male 30-39 UK 1998-2002

   0.000010889   0.000869874   0.000869874   << 27 BrCa male 40-44 UK 1998-2002

   0.000023083   0.001840075   0.001840075   << 28 BrCa male 45-49 UK 1998-2002

   0.000048862   0.003872901   0.003872901   << 29 BrCa male 50-54 UK 1998-2002

   0.000074754   0.005840535   0.005840535   << 30 BrCa male 55-59 UK 1998-2002

   0.000110857   0.008349190   0.008349190   << 31 BrCa male 60-64 UK 1998-2002

   0.000154046   0.010734333   0.010734333   << 32 BrCa male 65-69 UK 1998-2002

   0.000209181   0.013686549   0.013686549   << 33 BrCa male 70-74 UK 1998-2002

   0.000281452   0.016855908   0.016855908   << 34 BrCa male 75-79 UK 1998-2002

   0.000317733   0.000214757   0.000214757   << 35 BrCa male 80-84 UK 1998-2002

   0.000189659   0.000116019   0.000116019   << 36 BrCa male 85+ UK 1998-2002

   0.000457111   0.027415880   0.027415880   << 37 BrCa female 0-29 UK 1998-2002

   0.004354596   0.082488696   0.082488696   << 38 BrCa female 30-39 UK 1998-2002

   0.005580613   0.078193196   0.078193196   << 39 BrCa female 40-44 UK 1998-2002

   0.008737492   0.108811200   0.108811200   << 40 BrCa female 45-49 UK 1998-2002

   0.013129075   0.097601856   0.097601856   << 41 BrCa female 50-54 UK 1998-2002

   0.013723716   0.085408869   0.085408869   << 42 BrCa female 55-59 UK 1998-2002

   0.014530921   0.043067514   0.043067514   << 43 BrCa female 60-64 UK 1998-2002

   0.012251269   0.031824492   0.031824492   << 44 BrCa female 65-69 UK 1998-2002

   0.013579442   0.031674024   0.031674024   << 45 BrCa female 70-74 UK 1998-2002

   0.014597368   0.030789054   0.030789054   << 46 BrCa female 75-79 UK 1998-2002

   0.015189253   0.004569037   0.004569037   << 47 BrCa female 80-84 UK 1998-2002

   0.008885981   0.002672967   0.002672967   << 48 BrCa female 85+ UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 49 OvCa male 0-29 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 50 OvCa male 30-39 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 51 OvCa male 40-44 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 52 OvCa male 45-49 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 53 OvCa male 50-54 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 54 OvCa male 55-59 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 55 OvCa male 60-64 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 56 OvCa male 65-69 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 57 OvCa male 70-74 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 58 OvCa male 75-79 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 59 OvCa male 80-84 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 60 OvCa male 85+ UK 1998-2002

   0.000257524   0.000257524   0.000257524   << 61 OvCa female 0-29 UK 1998-2002

   0.000410633   0.002912907   0.002912907   << 62 OvCa female 30-39 UK 1998-2002

   0.000429027   0.006064498   0.006064498   << 63 OvCa female 40-44 UK 1998-2002

   0.000873136   0.011204105   0.011204105   << 64 OvCa female 45-49 UK 1998-2002

   0.001362280   0.022947794   0.022947794   << 65 OvCa female 50-54 UK 1998-2002

   0.001923589   0.027049552   0.027049552   << 66 OvCa female 55-59 UK 1998-2002

   0.002457267   0.024832032   0.024832032   << 67 OvCa female 60-64 UK 1998-2002

   0.002686060   0.023581526   0.023581526   << 68 OvCa female 65-69 UK 1998-2002

   0.002825266   0.004555578   0.004555578   << 69 OvCa female 70-74 UK 1998-2002

   0.002897527   0.004240821   0.004240821   << 70 OvCa female 75-79 UK 1998-2002

   0.002942136   0.000885016   0.000885016   << 71 OvCa female 80-84 UK 1998-2002

   0.001362269   0.000409780   0.000409780   << 72 OvCa female 85+ UK 1998-2002

   0.000007080   0.000033773   0.000033773   << 73 PanCa male 0-29 UK 1998-2002

   0.000065878   0.000314147   0.000314147   << 74 PanCa male 30-39 UK 1998-2002

   0.000107852   0.000513814   0.000513814   << 75 PanCa male 40-44 UK 1998-2002

   0.000250502   0.001191192   0.001191192   << 76 PanCa male 45-49 UK 1998-2002

   0.000464364   0.000935538   0.000935538   << 77 PanCa male 50-54 UK 1998-2002

   0.000873503   0.001736104   0.001736104   << 78 PanCa male 55-59 UK 1998-2002

   0.001334669   0.002560209   0.002560209   << 79 PanCa male 60-64 UK 1998-2002

   0.001981595   0.003521981   0.003521981   << 80 PanCa male 65-69 UK 1998-2002

   0.002660290   0.004448589   0.004448589   << 81 PanCa male 70-74 UK 1998-2002

   0.003407621   0.005230734   0.005230734   << 82 PanCa male 75-79 UK 1998-2002

   0.003944167   0.002665873   0.002665873   << 83 PanCa male 80-84 UK 1998-2002

   0.002279036   0.001394143   0.001394143   << 84 PanCa male 85+ UK 1998-2002

   0.000008647   0.000041245   0.000041245   << 85 PanCa female 0-29 UK 1998-2002

   0.000044443   0.000206249   0.000206249   << 86 PanCa female 30-39 UK 1998-2002

   0.000075664   0.000321852   0.000321852   << 87 PanCa female 40-44 UK 1998-2002

   0.000152430   0.000590494   0.000590494   << 88 PanCa female 45-49 UK 1998-2002

   0.000310199   0.000440009   0.000440009   << 89 PanCa female 50-54 UK 1998-2002

   0.000537457   0.000640837   0.000640837   << 90 PanCa female 55-59 UK 1998-2002

   0.000943501   0.000923928   0.000923928   << 91 PanCa female 60-64 UK 1998-2002

   0.001349378   0.001151147   0.001151147   << 92 PanCa female 65-69 UK 1998-2002

   0.001961502   0.001465275   0.001465275   << 93 PanCa female 70-74 UK 1998-2002

   0.002608439   0.001768415   0.001768415   << 94 PanCa female 75-79 UK 1998-2002

   0.003199886   0.000962549   0.000962549   << 95 PanCa female 80-84 UK 1998-2002

   0.001862729   0.000560322   0.000560322   << 96 PanCa female 85+ UK 1998-2002

   0.000006142   0.000045017   0.000045017   << 97 ProCa male 0-29 UK 1998-2002

   0.000010673   0.000078220   0.000078220   << 98 ProCa male 30-39 UK 1998-2002

   0.000051853   0.000379627   0.000379627   << 99 ProCa male 40-44 UK 1998-2002

   0.000306786   0.002240654   0.002240654   << 100 ProCa male 45-49 UK 1998-2002

   0.001421401   0.010296184   0.010296184   << 101 ProCa male 50-54 UK 1998-2002

   0.004466379   0.031616930   0.031616930   << 102 ProCa male 55-59 UK 1998-2002

   0.010243630   0.068722198   0.068722198   << 103 ProCa male 60-64 UK 1998-2002

   0.018716491   0.054352174   0.054352174   << 104 ProCa male 65-69 UK 1998-2002

   0.025765479   0.069769740   0.069769740   << 105 ProCa male 70-74 UK 1998-2002

   0.033055940   0.081351487   0.081351487   << 106 ProCa male 75-79 UK 1998-2002

   0.036556585   0.079747356   0.079747356   << 107 ProCa male 80-84 UK 1998-2002

   0.021156806   0.012942145   0.012942145   << 108 ProCa male 85+ UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 109 ProCa female 0-29 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 110 ProCa female 30-39 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 111 ProCa female 40-44 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 112 ProCa female 45-49 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 113 ProCa female 50-54 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 114 ProCa female 55-59 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 115 ProCa female 60-64 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 116 ProCa female 65-69 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 117 ProCa female 70-74 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 118 ProCa female 75-79 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 119 ProCa female 80-84 UK 1998-2002

   0.000000000   0.000000000   0.000000000   << 120 ProCa female 85+ UK 1998-2002



If your liability classes are not in this standard format, i.e., some classes are omitted or collapsed, then you can add the --no-check option to skip the format checking, such as the example below. Below is an example Penetrance File for BRCA1 in non-standard format (notice the --no-check option). As you can see this file does not consider pancreatic cancer. This is the old version and is not used by default.


21 << --no-check --gene=BRCA1 --diseases=BrCa,OvCa --age-div=30,40,50,60,70,80 --unkn-aoo=65 --unkn-sex=1 --chr=13
   0.00025    0.00314    0.00314 <<  1.  0-29 unaffected female and unaffected male
   0.00242    0.07307    0.07307 <<  2. 30-39 unaffected female
   0.01082    0.31266    0.31266 <<  3. 40-49 unaffected female
   0.02621    0.53921    0.53921 <<  4. 50-59 unaffected female
   0.04567    0.68111    0.68111 <<  5. 60-69 unaffected female
   0.06756    0.74146    0.74146 <<  6. 70-79 unaffected female
   0.09151    0.74810    0.74810 <<  7. 80+   unaffected female
   0.00034    0.00612    0.00612 <<  8.  0-29 BrCa female and BrCa male
   0.00336    0.12096    0.12096 <<  9. 30-39 BrCa female
   0.01130    0.35030    0.35030 << 10. 40-49 BrCa female
   0.01480    0.23680    0.23680 << 11. 50-59 BrCa female
   0.01800    0.19800    0.19800 << 12. 60-69 BrCa female
   0.02000    0.02000    0.02000 << 13. 70-79 BrCa female
   0.02390    0.02390    0.02390 << 14. 80+   BrCa female
   0.00016    0.00016    0.00016 << 15.  0-29 OvCa female
   0.00048    0.01824    0.01824 << 16. 30-39 OvCa female
   0.00178    0.10858    0.10858 << 17. 40-49 OvCa female
   0.00347    0.10410    0.10410 << 18. 50-59 OvCa female
   0.00411    0.19728    0.19728 << 19. 60-69 OvCa female
   0.00430    0.00430    0.00430 << 20. 70-79 OvCa female
   0.00384    0.00384    0.00384 << 21. 80+   OvCa female



How to convert other files to CPF


This website tries to read various formats whenever possible. Below are 4 formats that this website supports. If the website can read your file, a cosegregation analysis will be performed. A CPF file will be created for download. You may also want to check out my other program PedPro, which is more flexible in converting pedigree files.


Please keep in mind that if data are not collected or are misinterpreted in the first place, converting files from other formats to CPF cannot solve the problems. Also, it is very difficult to design a program to read all file formats, especially when the formats are constantly changing or highly customizable. Therefore, be prepared that this website may or may not read your format. If you need help, please email me.


1. Freestyle Format


A Pedigree File in freestyle format does not require a specific order of columns, but it should contain a header row so that the programs can identify the content of each column. Required columns are IndID Father Mother Sex Aff Age (or Liab) Proband Geno (or Allele1 and Allele2). Tabs or spaces delimit columns. Multiple delimiters are treated as one. Missing value is represented by a period (recommended) or 0 for all fields. Empty fields, space within fields, or quoted fields are not allowed. Sex, affection status, and genotypes are case-insensitive. IDs are case-sensitive. Extra columns with unrecognized headers are allowed but ignored.


PedID: Pedigree_ID. Alphanumeric. It cannot be 0. Multiple pedigrees are allowed. (See Tips 1 below).

IndID: Individual_ID. Alphanumeric. It cannot be 0. IndID does not need to be unique across pedigrees.

Father: Individual_ID of father. Put 0 for founders. If one parent is 0, both parents must be 0.

Mother: Individual_ID of mother. Put 0 for founders. If one parent is 0, both parents must be 0.

Sex: Biological sex, not the personal identification of one's own gender. 0=UnknSex, 1=M=Male, 2=F=Female.

Twin: Twin status. 0: not twin; positive integer: siblings with the same number are twins.

Aff: Affection status. Value is UnknAff (unknown), Unaff (unaffected), or a disease name (See Tips 2 below).

Age: Age-of-onset for affected and age-of-the-last-exam for unaffected individuals. 0 for unknown age.

Proband/FPTP: The first person tested positive for the variant in each pedigree. 1 for FPTP; 0 for others.

Geno: Genotype. 0=unknown, neg=negative, het=heterozygous-carrier, hom=homozygous-carrier (See Tips 3).

Allele1 Allele2: Genotype in two columns. 0=unknown, 1=reference-allele, 2=mutant-allele.

Liab: Liability class. Integers starting from 1. Zero is not allowed. This column is ignored if Age column exists.

Pop: Population. This column overrides the advanced option "Population" on the webpage.

Cohort: Year range. This column overrides the advanced option "Year range" on the webpage.

Comment: Information to be shown in pedigree drawings. (See “Tips 2” below).


Tips 1, regarding column contents: A good practice in creating an input file is to make it self-contained and self-explained. So it is better to use alphanumeric strings to name Pedigree_IDs and make it meaningful. If there is only one pedigree, the PedID column can be omitted but not recommended for the same reason. Another trick to make the file self-explained is to use English words (or at least a letter) to represent data rather than using numeric codes. For example, neg/het/hom are better than 0/1/2 for genotypes, Male/Female are better than 1/2 for sex, and the actual disease names BrCa,OvCa,PanCa are better than 2,3,4 for affection status.


Tips 2, regarding affection status: If a person is affected with multiple diseases known to be associated with the gene, then Aff should be the disease with the earliest age of onset. If a person is affected with other diseases unrelated to the associated diseases, those diseases should be ignored. You can still input the disease name in Aff and the last exam age in Age. The programs will ignore the disease and set the person as unaffected. The pedigree drawing will show this information in the comment line. If a person is affected with both an associated disease and an unrelated disease, then input the associated disease in Aff and the corresponding age of onset in Age. For cancer-associated genes, subjects are right-censored at the first diagnosis of any cancer.


Tips 3, regarding genotype: You don’t need to input genotype for obligatory carriers as the program will infer automatically. You don’t need to input negative genotype (non-carrier) for spouses even if you assume that the variant enter the pedigree only once. The program will fill in the genotype based on allele frequency. So, if the allele frequency is low, most likely the variant will enter the pedigree only once.


Below is an example Pedigree File and a drawing for BRCA1. Please note that affection status for analysis depends on the gene. Because the liability class model for BRCA1 involves only breast, ovarian, and pancreatic cancer, individual 6 (prostate cancer) and 17 (lung cancer) are “unaffected”. Therefore, the squares for these two persons are not filled with solid black color. Also note that the proband, individual 17, is not affected. This is correct as the definition of proband is the first person who tested positive for the variant.


PedID IndID Father Mother Sex Twin Aff    Age Geno FPTP

ped1  1     0      0      M   0    .      79  .    0

ped1  2     0      0      F   0    .      78  .    0

ped1  3     1      2      F   0    BrCa   40  .    0

ped1  4     1      2      F   0    Unaff  100 .    0

ped1  5     1      2      F   0    BrCa   85  .    0

ped1  6     1      2      M   0    ProCa  43  .    0

ped1  7     0      0      M   0    Unaff  80  .    0

ped1  8     7      3      M   0    Unaff  73  .    0

ped1  9     7      3      M   0    Unaff  41  .    0

ped1  10    0      0      F   0    .      89  .    0

ped1  11    7      3      M   0    PanCa  30  Het  0

ped1  12    0      0      F   0    Unaff  80  .    0

ped1  13    9      10     F   0    BrCa   41  Het  0

ped1  14    9      10     M   0    Unaff  60  .    0

ped1  15    9      10     F   0    BrCa   50  Het  0

ped1  16    9      10     F   0    Unaff  60  Het  0

ped1  17    11     12     M   0    Lung   49  Het  1

ped1  18    11     12     F   0    Unaff  38  .    0

ped1  19    11     12     M   0    Unaff  36  Het  0

ped1  20    11     12     F   0    OvCa   48  Het  0



Below is a list of strings for the “Aff” column:

==============================================================================================================================

Name     Description

------------------------------------------------------------------------------------------------------------------------------

Unaff    Unaffected

Lip      Lip

Tongue   Tongue

Mouth    Mouth

Oral     Oral cavity (lip, tongue, mouth)

Saliv    Salivary gland

Parotid  Parotid gland

Tonsil   Tonsil

Oroph    Oropharynx

Nasoph   Nasopharynx

Pyrifm   Pyriform sinus

Hypoph   Hypopharynx

Pharynx  Pharynx (includes Oropharynx, Nasopharynx, Hypopharynx, Pharynx unspecified)

BCP      Buccal cavity & pharynx (includes Lip, Tongue, Mouth, Saliv, Parotid, Tonsil, Pharynx)

Throat   Oropharynx, Tonsil, Base of tongue

Nasal    Nasal cavity and middle ear

A.sinus  Accessory sinuses

Larynx   Larynx

Trachea  Trachea

Oesoph   Oesophagus

Stomach  Stomach (synonym: Gastric)

SmBowel  Small intestine

Colon    Colon

RS.junc  Rectosigmoid junction

Rectum   Rectum

CRC      Colorectal cancer (includes Colon, RS.junc, Rectum)

Anus     Anus

Liver    Liver

Gall     Gallbladder

Biliary  Biliary tract

PanCa    Pancreas

BilPan   Biliary tract and Pancreas

Lung     Lung

Thymus   Thymus

Heart    Heart

Bone     Bone

Bone.l   Bone of limbs

Bone.o   Bone other than limbs

Osteo    Osteosacoma

Sarcoma  Soft tissue sarcoma or bone sarcoma

CM       Cutaneous melanoma

NM.skin  Non-melanoma of skin

Meso     Mesothelioma

BrCa     Breast

Vagina   Vagina

Cervix   Cervix uteri

Corpus   Corpus uteri (Synonym: Endomet Endometrial Endometrium)

Uterus   Uterus

OvCa     Ovary

Penis    Penis

ProCa    Prostate

Testis   Testis

UpUrin   Upper urinary tract malignancy (kidney and renal pelvis)

Kidney   Kidney

RenalCC  Renal Cell Carcinoma (only for CI5-XI, a subset of UpUrin)

RCC      Renal Cell Carcinoma (only for CI5-XI, a subset of UpUrin)

Ureter   Ureter

Bladder  Bladder

Urinary  Urinary tract (includes Kidney, Renal pelvis, Ureter, Bladder, Other urinary organs)

Eye      Eye

UM       Uveal melanoma

Mening   Meninges

CNS      Central nervous system

Brain    Brain

Thyroid  Thyroid

MTC      Medullary thyroid cancer

Adrenal  Adrenal gland

Hodgkin  Hodgkin lymphoma

NH.lym   Non-Hodgkin lymphoma

IPD      Immunoproliferative diseases

Myeloma  Multiple myeloma

L.leuk   Lymphoid leukemia

M.leuk   Myeloid leukemia

U.leuk   Cell-unspecified leukemia

Leuk     Leukemia

Lymph    Lymphoid neoplasms (includes Hodgkin lymphoma, Non-Hodgkin lymphoma, MALT-lymphoma, Lymphoid leukemia)

STS      Soft-tissue sarcoma (Mesothelioma, Kaposi, Peripheral nerves, Peritoneum & retroperitoneum, Connective & soft tissue)

==============================================================================================================================


2. BOADICEA Format


I don’t have enough resource to catch up with the changing BOADICEA format, and I don’t have a detailed documentation about the format. Please use this format with caution.


BOADICEA v3


This web tool supports pedigree files created by the BOADICEA website. BOADICEA v3 format is automatically detected by the header line, so please do not change the header when you copy and paste from the BOADICEA website. This format is tab-delimited. The program will read column 3 (Tgt) for proband, column 1,4,5,6,7 (PedID, IndivID, FathID, MothID, Sex) for pedigree structure, column 10 (Age) for the last exam age, column 11 (YoB) for year of birth, column 12-16 (BrCa, OvCa, ProCa, PanCa) for affection status, column 18 (Mutn) for genotype, and 20-24 for breast cancer pathology (will not be used for analysis but written in the output CPF). Below is an example:


   Name Tgt IndivID FathID MothID Sex Twin Status Age Yob  1BrCa 2BrCa OvCa ProCa PanCa Gtest Mutn  Ashkn Er  Pr  Her2 Ck14 Ck56

1  Eva  1   1       2      3      F               29  1974 29                           srch  brca1 A     +ve -ve -ve          

..


BOADICEA v4 (beta)


Below is an example:


  Name Tgt IndivID FathID MothID Sex MZtwin Status Age Yob  1BrCa 2BrCa OvCa ProCa PanCa Ashkn GeneticTests           Pathology 

1 Eva  <   1       3      2      F          Alive  23  1979 21                                 BRCA1+[gt] BRCA2–[ms]  ER+ PR+ HER2–

2 mom      2                     F          Alive  64  1950                                    BRCA1–[gt] BRCA2–[ms]  

3 dad      3                     M          Alive  65  1940                                                           



3. PROGENY Format


It should be noted that PROGENY is highly customizable.  The format described here is from our institute. If you want me to adapt my programs to read your format, please send me an example file. When you click Export in PROGENY, please choose Text [tab delimited] and check the boxes “Convert newlines to spaces”, “include column headings”, and “export one row per individual”. PROGENY format is recognized if the file has a column named “Global ID” and a column named “UPN”.  The file must have the following columns: Pedigree name, CreatedYr, Global ID, UPN, Mother ID, Father ID, Proband status, Gender, Genotype, Age, YoB, Cancer1, Age1. There can be multiple cancer name and cancer age columns, which should be named Cancer2, Age2, Cancer3, Age3, and so on. Here, Age# is age of diagnosis for Cancer#. The Age column is the last exam age or age of death. Program will calculate age for cosegregation analysis following these criteria: (1) if Cancer# is one of the diseases caused by the gene, age is Age#. (2) if the person is unaffected, then age is Age. If Age is missing, then calculate age from CreatedYr and YoB. Below is an example file. I formatted it to make it look good, but it should be delimited by tab, not space. Spaces are not treated as delimiters and will be converted to _ internally.


Pedigree CreatedYr Global_ID UPN Mother Father Proband Gender Genotype Age YoB  Cancer1 Age1 Cancer2 Age2

TestP1   2018      2067963   2   6      7      0       M      0        .   .    .       .    .       .

TestP1   2018      2067964   3   4      0      0       F      0        49  .    Ovarian 27   Breast  43

TestP1   2018      2067965   4   0      0      0       F      0        .   .    .       .    .       .

TestP1   2018      2067967   6   0      0      0       F      0        .   .    .       .    .       .

TestP1   2018      2067968   7   0      0      0       M      0        .   .    Brain   50   .       .

TestP1   2018      2067970   9   6      7      0       F      0        .   .    .       .    .       .

TestP1   2018      2067971   10  6      7      0       F      0        .   .    .       .    .       .

TestP1   2018      2067972   11  4      0      0       M      0        .   .    .       .    .       .

TestP1   2018      2067973   12  4      0      0       M      0        .   .    .       .    .       .

TestP1   2018      2067975   14  3      0      0       M      Pos      46  .    .       .    .       .

TestP1   2018      2067976   15  3      0      0       M      0        .   .    .       .    .       .

TestP1   2018      2067977   16  3      0      0       F      0        .   .    .       .    .       .

TestP1   2018      2067978   17  0      0      0       F      0        .   .    .       .    .       .

TestP1   2018      2067979   18  17     2      0       M      0        .   .    .       .    .       .

TestP1   2018      2066941   1   3      2      1       F      Pos      .   1974 .       .    .       .



4. LINKAGE Format


The website supports LINKAGE format before “makeped” (10 columns for PedID, IndivID, FathID, MothID, Sex, AffectionStatus, Liability, Proband, Allele1, Allele2) and LINKAGE format after “makeped”. Header row is allowed. Because these formats doesn’t have an age column, you need to provide a Penetrance File.



How to cite


Belman S, Parsons MT, Spurdle AB, Goldgar DE, Feng BJ. Considerations in assessing germline variant pathogenicity using cosegregation analysis. Genet Med. 2020



References


  1. 1.Thompson D, Easton DF, Goldgar DE. A Full-Likelihood Method for the Evaluation of Causality of Sequence Variants from Family Data. Am J Hum Genet. 2003;73(3):652-655.

  2. 2.Kuchenbaecker KB et al. Risks of Breast, Ovarian, and Contralateral Breast Cancer for BRCA1 and BRCA2 Mutation Carriers. JAMA. 2017;317(23):2402-2416. PMID:28632866.

  3. 3.Antoniou AC et al. The BOADICEA model of genetic susceptibility to breast and ovarian cancers: updates and extensions. Br J Cancer. 2008;98(8):1457-66. PMID:18349832.

  4. 4.Mocci E et al. Risk of pancreatic cancer in breast cancer families from the breast cancer family registry. Cancer Epidemiol Biomarkers Prev. 2013;22(5):803-11. doi: 10.1158/1055-9965.EPI-12-0195. PMID:23456555.

  5. 5.BOADICEA V7 Release 114.

  6. 6.Belman S, Parsons MT, Spurdle AB, Goldgar DE, Feng BJ. Considerations in assessing germline variant pathogenicity using cosegregation analysis. Genet Med. 2020




2020-09-18: Support BOADICEA v4 beta


2021-03-29: Allow multiple probands in one pedigree and use them all in cosegregation analysis to adjust for ascertainment. This is useful when different branches of a pedigree are ascertained separately and merged into a big pedigree.



Common errors or warnings


  1. 1.“!!! Error:   different number of columns between lines in the input pedigree file”


You pedigree file has inconsistent numbers of columns between lines. Check whether some missing values are represented by an empty string, which causes the program to skip a column. Also check whether there are some trailing empty space or tabs in some but not all lines.


  1. 2.“These defined events are treated as unaffected with age being the earliest event date”  or

    “These undefined events are treated as unaffected with age being the latest event date”


The first one is OK. That means the program recognizes the event name and knows that the person should be treated as unaffected. The latter could be a problem: the program does not recognize the event name. It is possible that you correctly input an event that is not known to the program, which is OK. But if it is a cancer site, make sure it is among one of the site names; the algorithm treats cancer and non-cancer diseases differently. Therefore, unrecognizing a cancer site may cause a different output for a cancer-related gene.


  1. 3.