Collections of metadata files of use by the GOC.

In general we follow the pattern:

 * metadata source in YAML (or YAML embedded inside Markdown)
 * schema for each file also specified in YAML
 * metadata can be edited via github web interface, followed by Pull Request
 * Travis-CI checks file against schema - see the [../.travis.yml](../.travis.yml), if passes can be merged
 * Jenkins jobs publish metadata files (e.g. http://current.geneontology.org/metadata)

# users.yaml

 - [users.yaml](users.yaml) - metadata on GOC members and contributors
 - [users.schema.yaml](users.schema.yaml) - schema

Content:

Each entry is for metadata about a single user. This drives a lot of
behavior such as who can do what in Noctua or TG, and is also used for
provenance purposes. We want to track all contributions made to any GO
content (ontology, annotations or models) and so we want to be sure we
have a way of uniquely identifying users through their different
aliases and accounts.

_note_ - for historic purposes, some entries in users.yaml are
actually transient _groups_ of users. these will be migrated to
groups.yaml. The main blocker for this is that TG reads users.yaml but
not groups.yaml.

Fields:

 * nickname (REQUIRED) - typically first plus last name (not actually nickname in the usual sense)
 * uri (RECOMMENDED, UNIQUE) - A [Uniform Resource Indicator](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier) or Compact URI that uniquely identifies a person.
    * Typically an ORCID http URL
    * If no ORCID available then a GOC Compact URI is used, e.g. GOC:cjm
    * __Noctua__ - uses this field for auto-assigning dc:creator to instances
 * xref (OPTIONAL, UNIQUE) - a compact URI that uniquely identifiers the person, e.g. GOC:cjm
    * optional
    * this is partly historical. The ontology definition xrefs field uses these
    * __TermGenie__ - uses this as a lookup for ontology definition xrefs
 * organization (RECOMMENDED) - the primary organization to which a person belongs
    * although a person may be involved in more than one, typically their GO role will be through one
    * this field is primarily for informational purposes
 * groups (ZERO TO MANY) - the groups a person belongs to (see below for more on groups)
    * __Noctua__ uses this information to allow a person to attribute pav:provided_by annotations
 * accounts (DICT) - a dictionary mapping account type to username
    * __Noctua__ uses this information for login/authentication
    * __TermGenie__ uses this information for login/authentication
 * authorizations (DICT)
    * __Noctua__ uses this information to authorization (determining if your account is allowed to edit)
    * __TermGenie__ uses this information to authorization (determining if your account is allowed to edit)
 * email-md5 __deprecated__

Tracking contributions to GO:

In the GO graphstore, we typically have triples:

    <instance> dc:author <user-uri>
    <instance> dc:contributor <user-uri>

These are auto-generated by Noctua.

Additionally, where provenance is added directly in the ontology, the
information is stored as a dbxref "axiom annotation" on top of the
association between the term URI and the definition string. See
[section 5.6 of the obo-syntax spec](http://owlcollab.github.io/oboformat/doc/obo-syntax.html#5.6)
for full details.

# groups.yaml

 - [groups.yaml](groups.yaml) - metadata on GOC groups
 - [groups.schema.yaml](groups.schema.yaml) - schema

Groups encompasses organizations, projects, working groups, content
meetings, grants, etc. We call these "groups" as these typically consist
of groups of users. Some groups may be transient (e.g. projects or working groups). Others may be
permanent institutions, such as Cambridge University.

Fields:

 * id (REQUIRED, UNIQUE) - a URI uniquely identifying the group. Typically the official URL.
 * label (REQUIRED, UNIQUE) - e.g. university name, grant name. Should be unique but this is not actually tracked

__TODO__: each group should have a point of contact, and that POC should be in users

## SOP for adding new groups

Click on [groups.yaml](groups.yaml) and add a new entry. This assumes you have familiarity with making pull requests via the github web interface. If you can't do that file a ticket in this tracker.

The group **must** have a stable URL that directs to a page about the group. See existing entries for details.

## Tracking contributions to GO using groups.yaml

In the GO graphstore, we typically have triples:

    <instance> pav:providedBy <group-uri>

These are added by Noctua. Note the user must select one or more group roles (multiple roles OK).

# db-xrefs.yaml

Registry of database prefixes

 - [db-xrefs.yaml](db-xrefs.yaml) - prefix registry
 - [db-xrefs.schema.yaml](db-xrefs.schema.yaml) - schema

# datasets

Metadata about locations and contents of GAFS and GPADs contributed to GO Central

See the [datasets/](datasets) directory for more details

 - [datasets.schema.yaml](datasets.schema.yaml) - schema

# gorules

Enumerated rules used for QC within the GO

See the [gorules/](gorules) directory for more details

# gorefs.yaml

- [gorefs.yaml](gorefs.yaml) - metadata on GOC members and contributors
- [gorefs.schema.yaml](gorefs.schema.yaml) - schema (uses [LinkML](https://linkml.io/linkml/))

Ad-hoc references and publications referenced within GO, where no PMID or DOI available.

Fields:

- id (REQUIRED, UNIQUE) - A URI uniquely identifying the reference. Follows the pattern `GO_REF:NNNNNNN` where N is a digit. Typically the number should be the next available number (e.g. `GO_REF:0000119`)
- title (REQUIRED) - The title of the reference.
- description (REQUIRED) - A description or abstract for the reference.
- comments (ZERO TO MANY) - Comments on the reference. These will be displayed separately from the description. Rarely used except for by some old references.
- alt_id (ZERO TO MANY) - Alternative IDs for the reference. Must follow the same pattern as the id field.
- authors (REQUIRED) - Authors of the reference.
- citation (OPTIONAL) - PMID of a published citation for this reference (e.g. `PMID:30272209`)
- evidence_codes (ZERO TO MANY) - Evidence codes that are used in the reference. Must be an ECO term ID (e.g. `ECO:0000501`)
- external_accession (ZERO TO MANY) - Cross references to other databases for the reference. Must be of the form `PFX:ID` where PFX is the database prefix and ID is the accession number (e.g. `SGD_REF:S000148669`)
- is_obsolete (BOOLEAN, OPTIONAL) - Whether the reference is obsolete. If true, the title should also begin with "OBSOLETE".
- url (OPTIONAL) - a URL to get more information about the reference.
- year (OPTIONAL, INTEGER) - The year the reference was created. 

# retracted-publications.txt

- PMID's from retracted publications. Some entries have associated PMCID delimited by comma


