---
layout: rule
id: GORULE:0000027
title: Each identifier in GAF is valid
type: report
fail_mode: soft
status: implemented
contact: "go-quality@lists.stanford.edu"
implementations:
  - language: python
    source: https://github.com/biolink/ontobio/blob/master/ontobio/io/gafparser.py
---
-   DB (GAF and GPAD column 1); and all DB abbreviations in 'with' field (GAF column 8; GPAD column 7) and in the annotation extensions (GAF column 16; GPAD column 11) must be in [db-xrefs.yaml](https://github.com/geneontology/go-site/blob/master/metadata/db-xrefs.yaml) (see below)
-   id_syntax information in the [db-xrefs.yaml](https://github.com/geneontology/go-site/blob/master/metadata/db-xrefs.yaml) file can be used to validate local identifiers.
-   The 'with' field can either contain GO terms, when the Evidence code is IC. GO terms are checked in GORULE:0000001, or DB:ID, which are checked as Columns 1 & 2.
-   The `assigned_by` field (GAF column 15; GPAD column 10) is checked against [groups.yaml](https://github.com/geneontology/go-site/blob/master/metadata/groups.yaml)
-   The 'extension' field (GAF column 16; GPAD column 11) can either contain GO terms, or DB:ID, which are checked as Columns 1 & 2.
-   TBC (this may be GORULE:0000001) All GO IDs must be extant in current ontology: GO IDs can be present in Columns 5, 8, and 16 of GAF (4, 7, 11 in GPAD).
  
### Additional notes on identifiers

In GAF and GPAD, the identifier is represented using two fields, column 1 is the prefex (DB), and column 2 is the local identifier. 
The global id is formed by concatenating these with `:`.
In all other fields, such as the "With/from" field, the reference, the extensions, a global ID is specified, which MUST always be prefixed; 
i. e. contain a namespace and an identifier, separated by a colon.

In all cases, the prefix MUST be in [db-xrefs.yaml](https://github.com/geneontology/go-site/blob/master/metadata/db-xrefs.yaml).
The prefix SHOULD be identical (case-sensitive match) to the `database` field.

When consuming GAF files, programs SHOULD *repair* by replacing prefix synonyms with the canonical form, in addition to reporting on the mismatch. For example, as part of the association file release the submitted files should swap out legacy uses of 'UniProt' with 'UniProtKB'.

### Reference formatting must be correct
References in the GAF (Column 6) should be of the format db_name:db_key. Multiple values can be pipe-separated, 
e.g. SGD_REF:S000047763|PMID:2676709. PMID, DOIs, Agricola, GO_REF and internal MOD references are allowed. 
