Help

GRIMMARD performs HLA imputation: given an individual's HLA typing (complete or partial, high- or low-resolution) together with the population(s) they belong to, it returns the most probable full genotypes, the haplotype pairs that compose them, and the individual haplotype probabilities — each with an associated probability. It combines py-ard (which normalizes the many ways HLA typing can be written into a single standard form) with the graph-based imputation engine ML-GRIM (the imputation component of GRIMM-II).

This page explains every field on the Home form, how to write the typing input, and how to read the results. New users can jump straight to the worked cases on the Example page.

1. What you provide

The Home form has three parts, filled top to bottom:

Output loci — which loci you want the imputation to return.
Population (race) — the population frequencies used to score solutions.
Typing — the observed HLA alleles, entered as a GL‑string or in the per‑allele boxes.

Then press Submit form.

2. Output loci

GRIMMARD supports up to nine loci. Tick the loci you want included in the imputed result:

Group	Loci
HLA class I	`A`, `B`, `C`
HLA class II	`DRB1`, `DQA1`, `DQB1`, `DRB3/4/5`, `DPA1`, `DPB1`

You do not have to type every locus you select. You may type only a few loci and ask the algorithm to impute (fill in) the rest from population frequencies.
DRB3/4/5 is treated as a single locus. The DRB3, DRB4 and DRB5 genes are present or absent depending on the DRB1 haplotype; their absence is handled as a null allele (see §6).
HLA-DQ and HLA-DP are heterodimers, so the alpha-chain loci (DQA1, DPA1) and beta-chain loci (DQB1, DPB1) are reported separately.
The frequency tables limit which loci can be returned. If you request a locus that is not covered by the chosen population's frequencies, it cannot be imputed.

3. Population (race)

HLA alleles do not occur independently; they travel together on haplotypes whose frequencies differ markedly between populations. Imputation therefore needs a population so it can assign probabilities. Choosing a population that matches the individual is the single biggest factor in getting accurate, well-ranked results.

GRIMMARD uses the NMDP / Be The Match operational categories: five broad populations, each subdividing into more specific detailed populations. Use a detailed population when you know it; otherwise fall back to the broad one.

Broad	Detailed code	Description
AFA African American	`AFA`	African American (broad)
	`AAFA`	African American
	`AFB`	African
	`CARB`	Black Caribbean
	`SCAMB`	Black, South or Central American
API Asian or Pacific Islander	`API`	Asian or Pacific Islander (broad)
	`AINDI`	South Asian Indian
	`FILII`	Filipino
	`HAWI`	Hawaiian or other Pacific Islander
	`JAPI`	Japanese
	`KORI`	Korean
	`NCHI`	Chinese
	`SCSEAI`	Southeast Asian
	`VIET`	Vietnamese
CAU Caucasian	`CAU`	Caucasian (broad)
	`EURCAU`	White European
	`MENAFC`	Middle Eastern or North Coast of Africa
HIS Hispanic	`HIS`	Hispanic (broad)
	`CARHIS`	Hispanic Caribbean
	`MSWHIS`	Mexican or Chicano
	`SCAHIS`	Hispanic, South or Central American
NAM Native American	`CARIBI`	Caribbean Indian
	`AMIND`	North American Indian
	`AISC`	American Indian, South or Central American
	`ALANAM`	Alaska Native or Aleut

Tips on choosing a population

You may select more than one population. Each haplotype is then scored under every selected population, which is useful for individuals of mixed or uncertain ancestry and for comparing how a result is supported across groups.
If you only know the broad category (e.g. the person is Caucasian but not which subgroup), pick the broad code (CAU).
The match likelihood of finding compatible donors varies strongly by population, so the population you choose changes both which genotypes appear and their ranking.

4. Entering the typing

You can enter the observed alleles in two equivalent ways.

4a. GL‑string (recommended)

A Genotype List string (GL‑string) is the standard text format for HLA typing (Milius et al., 2013). It is built from allele names and a small set of operators:

Operator	Meaning	Example
`*` and `:`	Separate the locus, allele family and protein fields of an allele name	`A*02:01`
`+`	Joins the two alleles at one locus (the two chromosomes)	`A02:01+A24:02`
`^`	Separates one locus from the next	`A02:01+A24:02^B40:01+B57:01`
`/`	Allelic ambiguity: the allele is one of several possibilities	`A02:01/A02:02`

A typical well-formed five-locus GL‑string looks like this:

A*02:01+A*24:02^C*03:03+C*07:01^B*40:01+B*57:01^DRB1*04:01+DRB1*07:01^DQB1*03:02+DQB1*03:03

GRIMMARD also accepts lower-resolution and ambiguous input, including:

Multiple Allele Codes (MAC / NMDP codes), e.g. A*02:AB, which stand for a defined set of alleles.
ARD groups — G groups and P groups, e.g. A*02:01:01G or A*02:01P.
Serological / antigen-level typing (e.g. A2, B40).
Mixed resolution in a single string — some loci high-resolution, others low-resolution or serological.

py-ard normalizes all of these to a common form before imputation, so you can paste typing exactly as it arrives from the lab.

4b. Per‑allele boxes

If you prefer not to write a GL‑string, fill the allele boxes on the Home form. They are laid out two rows per locus group, one row per chromosome:

Row	Boxes
Class I, chromosome 1	`A1` `B1` `C1`
Class I, chromosome 2	`A2` `B2` `C2`
DR / DQ, chromosome 1	`DRB1` `DQB1` `DRB3` `DRB4` `DRB5`
DQ / DRB3-4-5, chromosome 2	`DRB1` `DQB1` `DRB3` `DRB4` `DRB5`
DP / DQA1, chromosome 1	`DPB1` `DPA1` `DQA1`
DP / DQA1, chromosome 2	`DPB1` `DPA1` `DQA1`

Leave a box empty for any allele you have not typed. The two methods are interchangeable; use whichever is more convenient.

5. Partial and ambiguous typing

Untyped loci: simply omit them from the GL‑string or leave their boxes blank. The imputation will fill them in from population frequencies. As few as one locus can be provided, though more typed loci yield sharper, better-ranked results.
Untyped second allele at a locus: if only one allele at a locus was reported, give the one you have; the engine considers the possible partners.
Accuracy depends on which loci you type. The most polymorphic, most informative loci (typically HLA-A, HLA-B and HLA-DRB1) constrain the rest of the genotype most strongly. Typing informative loci gives much better recovery of the true genotype than typing the same number of weakly informative loci.

6. Null alleles and DRB3/4/5

The DRB3, DRB4 and DRB5 genes are not present on every haplotype. When a gene is absent, that is handled the same way as a null allele, so blank DRB3/4/5 boxes are expected and correct on many haplotypes.
Only one of DRB3, DRB4 or DRB5 occurs on a given chromosome, which is why the three are grouped as a single DRB3/4/5 locus.

7. How the imputation works (in brief)

ML-GRIM uses a fast two-stage procedure. First a blocking stage runs classical graph-based imputation on the (up to three) most informative typed loci, producing every genotype consistent with those loci. Each candidate is then checked for consistency with the remaining typed loci and the missing loci are filled in from the population haplotype frequencies. This keeps memory and run time low while guaranteeing that no consistent genotype is missed. Typical run time is well under one second. Probabilities are computed from the population frequencies and normalized so that the genotype probabilities for the individual sum to 1. By default the most probable solutions are returned (the engine considers a large candidate set and reports the top-ranked genotypes).

8. Reading the results

After you submit, the results are organized into tabs:

Tab	What it shows
Genotypes	Ranked list of complete unphased genotypes consistent with your input, each with its probability and frequency. The probabilities are normalized to sum to 1 across the returned genotypes.
Haplotype couples	The phased haplotype pairs (the two chromosomes) that make up the genotypes, each with its probability. One unphased genotype can arise from several haplotype pairings.
Haplotype agents	The probability (frequency) of each individual haplotype, reported separately for each population you selected — useful for seeing how strongly each population supports a given haplotype.

A higher probability means a solution is more consistent with the typing under the chosen population frequencies. If the true genotype is rare or the typing is sparse, probability mass spreads over many candidates and the top result may carry only a modest probability — this is expected and reflects genuine ambiguity, not an error.

9. Troubleshooting

Symptom	Likely cause / fix
No results / empty output	The typing may be internally inconsistent, or no haplotype with those alleles exists in the chosen population's frequencies. Re-check the alleles and try the broad population code, or add the population that fits the individual.
An allele is rejected	Check spelling and format (`Locusfield:field`, e.g. `DRB104:01`). Very new or non-standard allele names may not be in the reference; try a G/P group or MAC equivalent.
A requested locus is missing from the output	That locus is not covered by the selected population's frequencies, or it was not ticked under output loci.
Top genotype has low probability	Normal for sparse or low-resolution typing. Type more (or more informative) loci, or at higher resolution, to concentrate the probability.

10. Citation and source code

Imputation engine (ML-GRIM): github.com/nmdp-bioinformatics/py-graph-imputation
Typing normalization (py-ard): github.com/nmdp-bioinformatics/py-ard
GRIMMARD web application: github.com/louzounlab/grimmard

If you use GRIMMARD in your work, please cite the GRIMM-II paper (Kirshenboim et al.) describing ML-GRIM and ML-GRMA, together with the underlying GRIMM framework (Maiers et al., 2019; Israeli et al.). For questions, contact louzouy@math.biu.ac.il.