README file for PRO release 58.0 (18-Apr-2019) The Protein Ontology Consortium--the Protein Information Resource, The Jackson Laboratory, Reactome, and the Department of Philosophy at the State University of New York at Buffalo--is pleased to announce PRO Release 58.0 (18-Apr-2019). PRO describes the relationships of proteins and protein evolutionary classes, delineates the multiple protein forms of a gene locus (ontology for protein forms), and interconnects existing ontologies. In addition, PRO formally describes protein complexes. Further information is available at http://purl.obolibrary.org/obo/pr. The stable URLs for this release are: http://purl.obolibrary.org/obo/pr/2019-04-18/pr.obo (retrieves release_58.0/pro_reasoned.obo) http://purl.obolibrary.org/obo/pr/2019-04-18/pr.owl (retrieves release_58.0/pro_reasoned.owl) http://purl.obolibrary.org/obo/pr/2019-04-18/pr-asserted.obo (retrieves release_58.0/pro_nonreasoned.obo) http://purl.obolibrary.org/obo/pr/2019-04-18/pr-asserted.owl (retrieves release_58.0/pro_nonreasoned.owl) It is also possible to retrieve the above release-specific files using the release number in place of the date in the URL (for example, http://purl.obolibrary.org/obo/pr/58.0/pr.obo). The latest version can be retrieved using similar PURLs that lack the /pr/2019-04-18 part (for example, http://purl.obolibrary.org/obo/pr.obo). ===== Files included with this release are: pro_reasoned.obo PRO ontology file containing implied links via the Elk reasoner; OBO 1.2 format pro_reasoned.owl PRO ontology file containing implied links via the Elk reasoner; OWL RDF format pro_nonreasoned.obo PRO ontology file without reasoning applied; OBO 1.2 format pro_nonreasoned.owl PRO ontology file without reasoning applied; OWL RDF format PAF.txt Annotations to PRO terms; tab-delimited promapping.txt Cross-references to external databases; tab-delimited pro_readme.txt This file pro_release_note.txt Statistics and changes pertinent to this release ===== Terms in PRO fall under the following major categories: Category=family -> Proteins represented by terms distinguished from siblings based on family; for example SMADs vs TLRs Category=gene -> Proteins represented by terms distinguished from siblings based on encoding gene; for example SMAD1 vs SMAD2 Category=genegroup -> Proteins represented by terms distinguished from siblings based on encoding gene, but when the gene is indicated only for a parent taxon; for example the same flu NA gene is indicated for all strains Category=seqgroup -> Proteins represented by terms distinguished from siblings based on sets of encoding mRNAs from the same gene; for example HLA-A*24 vs HLA-A*68 Category=sequence -> Proteins represented by terms distinguished from siblings based on encoding mRNA; for example SMAD1 isoform 1 vs SMAD1 isoform 2 Category=modification -> Proteins represented by terms distinguished from siblings based on co- or post-translational modification; for example SMAD1/iso:1/Phos:1 vs SMAD1/iso:1/Phos:2 Category=complex -> Proteins represented by terms distinguished from siblings based on complex components; for example BUB1:BUB3 vs BUB1B:BUB3 At the sequence level, the translation products of the differently mature transcripts of a gene are referred herein as isoforms, whereas sequence polymorphisms are referred as sequence variants. Terms that are organism specific have the organism name displayed in parenthesis, e.g. PR:000025483, delphilin isoform 2 (mouse). In such cases, the Category token is preceded by organism-, e.g. Category=organism-gene. ===== PAF.txt file format (17 tab-delimited fields) Column Column Title Description 1 PRO_ID PRO identifier, mandatory 2 Object_term Name of the PRO term 3 Object_synonym Other names by which the described object is known 4 Modifier Flags that modify the interpretation of an annotation 5 Relation Relation to the corresponding annotation. 6 Ontology_ID ID for the corresponding annotation. 7 Ontology_term Term name for the corresponding ontology ID. 8 Relative_to Modifiers increased, decreased and altered require an entry in this column to indicate what the change is relative to. 9 Interaction_with To indicate binding partner. 10 Evidence_source Pubmed ID or database source for the evidence. 11 Evidence_code Same as evidence code for GO annotations 12 Taxon Taxon identifier for the species that the annotation is extracted from. 13 Inferred_from Use only for evidence code: IPI and ISS for PRO. 14 DB_ID One or more unique identifiers for a single source cited as an authority for the attribution of the ontology term. 15 Date Date on which the annotation was made. 16 Assigned_by The database which made the annotation. 17 Comments Curator comments, free text. ===== PRO mapping tab-delimited file (promapping.txt) Column Column Title Description 1 -- PRO identifier 2 -- Cross-referenced database object 3 -- Mapping type Mappings can be of two types: i) exact: The database object has the same scope as the object described by PRO. ii) is_a: The database object is more specific than the object described by PRO. In general, database objects specific to an organism will map exactly to the corresponding organism-specific PRO term, while those same database objects will map as is_a to the corresponding organism-generic PRO term. For example, PR:000026465 describes an isoform of 6-phosphofructokinase type C in any organism, so UniProtKB:Q01813-1 (human) and UniProtKB:Q9WUA3-1 (mouse) are mapped to this term. ===== Reasoning Reasoned files contain the ontology with implied links produced by the Elk reasoner already in place, and thus show a more-complete hierarchy. For example: in pro_nonreasoned.obo [Term] id: PR:Q61699 name: heat shock protein 105 kDa (mouse) def: "A heat shock protein 105 kDa that is encoded in the genome of mouse." [PRO:DAN] comment: Category=organism-gene. synonym: "mHSPH1" EXACT PRO-short-label [PR:DNx] xref: UniProtKB:Q61699 intersection_of: PR:000003410 ! heat shock protein 105 kDa intersection_of: only_in_taxon NCBITaxon:10090 ! Mus musculus in pro_reasoned.obo [Term] id: PR:Q61699 name: heat shock protein 105 kDa (mouse) def: "A heat shock protein 105 kDa that is encoded in the genome of mouse." [PRO:DAN] comment: Category=organism-gene. synonym: "mHSPH1" EXACT PRO-short-label [PR:DNx] xref: UniProtKB:Q61699 is_a: PR:000003410 ! heat shock protein 105 kDa is_a: PR:000029032 ! Mus musculus protein intersection_of: PR:000003410 ! heat shock protein 105 kDa intersection_of: only_in_taxon NCBITaxon:10090 ! Mus musculus relationship: only_in_taxon NCBITaxon:10090 ! Mus musculus Note that the reasoner determined that the indicated term is a Mus musculus protein (owing to the assertion that the protein is only_in_taxon NCBITaxon:10090).