BH12.12/Identifiers.org連携
提供:TogoWiki
(→INSDC からの db_xref と identifiers.org の対応確認) |
|||
672行: | 672行: | ||
** /db_xref="LocusID:51199" | ** /db_xref="LocusID:51199" | ||
** NO NEED | ** NO NEED | ||
+ | |||
+ | [[Category:INSDC]][[Category:DDBJ]] |
2013年8月21日 (水) 10:18時点における最新版
目次 |
Identifiers.org の ID を共通 URI として使う
TogoGenome では xref を Identifiers.org の URIs に統一することでいろいろな RDF とマージしやすくしたいと考えています。
<http://our.db/db1/entry1> rdfs:seeAlso <http://identifiers.org/db1/entry1> <http://our.db/db2/entry2> rdfs:seeAlso <http://identifiers.org/db2/entry2> :
これらの identifiers.org の URI がどの DB の URI なのか、人間が見たらわかるけれど、自動的に処理するためには正規表現を使う方法とクラス定義を使う方法が考えられます。
正規表現を使う方法は SPARQL の FILTER 文を利用します。
# ex1: SPARQL query to select links to "db1" by FILTER with a regular expression. SELECT * WHERE { ?entry rdfs:seeAlso ?link . FILTER (regex(str(?link), "identifiers.org/db1/")) }
この方法では URI を文字列として、コストの掛かる正規表現マッチを利用するので遅くなります。
クラス定義を使う方法は UniProt でも利用されています。
<http://rdf.identifiers.org/Db1> rdfs:subClassOf <http://rdf.identifiers.org/Database> <http://rdf.identifiers.org/Db2> rdfs:subClassOf <http://rdf.identifiers.org/Database> :
<http://identifiers.org/db1/entry1> rdf:type <http://rdf.identifiers.org/Db1> <http://identifiers.org/db2/entry2> rdf:type <http://rdf.identifiers.org/Db2> :
# ex2: SPARQL query to effectively select links to "db1" by the database type. SELECT * WHERE { ?entry rdfs:seeAlso ?link . ?link rdf:type <http://rdf.identifiers.org/Db1> . }
この方法では、全ての URI に rdf:type (a) のトリプルをつけるのでデータは増えますが、インデックス化されるので検索は速い(はず)です。
Identifiers.org が各 DB の prefix に対しデータベースをクラス定義してくれるといいですね。 さらに、その DB についての情報を SPRAQL エンドポイントから利用できるようになっているとすばらしいです。
このあたりについて、Identifiers.org の方にリクエストしています。
INSDC や RefSeq の /db_xref リスト
INSDC (GenBank/DDBJ/EMBL) で使われている /db_xref の DB 名のリストは下記にあります:
これと Identifiers.org のレジストリを照らし合わせることで、まだ Identifiers.org に登録されていない DB をリストアップして、追加してもらおうと思います。
現状の問題点としては、<http://identifiers.org/db名/entry名> で転送すべき先の URL を元データベースが提供してくれていない場合などがあります。
Identifiers.org に未登録の RefSeq (prokaryote) に出てくる /db_xref
RefSeq に出てきた下記のデータベースについてはまだ Identifiers.org に登録されていなかったので、URI の登録を検討中。
- NCBI GI (includes both nucleic and protein sequence entries)
- ERIC
- Euteropathogen Resource Integration Center
- http://www.ericbrc.org/portal/eric/
- ?? (http://identifiers.org/eric/)
- HMP
- PSEUDO
- EMBL pseudo protein identifier
- ?? (http://identifiers.org/pseudo/)
- Pathema
- Pathema Genome Resource
- http://pathema.jcvi.org/
- ?? (http://identifiers.org/pathema/)
- PseudoCap
- Pseudomonas Genome Database
- http://www.pseudomonas.com/
- http://www.pseudomonas.com/getAnnotation.do?locusID=
- ?? (http://identifiers.org/pseudocap/)
INSDC からの db_xref と identifiers.org の対応確認
上記の INSDC リンク先リストから、すでに Identifiers.org に登録があるかどうかを確認中。
OK
- AceView/WormGenes
- AceView Worm Genome
- /db_xref="AceView/WormGenes:vha-6"
- http://identifiers.org/aceview.worm/
- ApiDB_CryptoDB
- Cryptosporidium Genome Resources
- /db_xref="ApiDB_CryptoDB:cgd7_20"
- http://identifiers.org/cryptodb/
- ApiDB_PlasmoDB
- Plasmodium Genome Resources
- /db_xref="ApiDB_PlasmoDB: PF11_0344"
- http://identifiers.org/plasmodb/
- ApiDB_ToxoDB
- Toxoplasma Genome Resources
- /db_xref="ApiDB_ToxoDB:49.m00014"
- http://identifiers.org/toxoplasma/
- ASAP
- A Systematic Annotation Package for Community Analysis of Genomes
- /db_xref="ASAP:ABE-0000006"
- http://identifiers.org/asap/
- ATCC
- American Type Culture Collection database
- /db_xref="ATCC:123456"
- http://identifiers.org/atcc/
- BEETLEBASE
- Tribolium Genome Database -- Insertion
- /db_xref="BEETLEBASE:TC030551"
- http://identifiers.org/beetlebase/
- dbProbe
- NCBI Probe database Public registry of nucleic acid reagents
- /db_xref="dbProbe:38"
- http://identifiers.org/dbprobe/
- EcoGene
- Database of Escherichia coliSequence and Function
- /db_xref="EcoGene:EG11277"
- http://identifiers.org/ecogene/
- FLYBASE
- Database of Genetic and molecular data of Drosophila.
- /db_xref="FLYBASE:FBgn0000024"
- http://identifiers.org/flybase/
- GABI
- Network of Different Plant Genomic Research Projects
- /db_xref="GABI:HA05J18"
- http://identifiers.org/gabi/
- GeneDB
- Curated gene database for Schizosaccharomyces pombe, Leishmania major and Trypanosoma brucei
- /db_xref="GeneDB:SPCC285.16c"
- http://identifiers.org/genedb/
- GeneID
- Entrez Gene Database (replaces NCBI Locus Link)
- /db_xref="GeneID:3054987"
- http://identifiers.org/ncbigene/
- GOA
- Gene Ontology Annotation Database Identifier
- /db_xref=" GOA :P01100"
- http://identifiers.org/goa/
- ^([A-N,R-Z][0-9][A-Z][A-Z, 0-9][A-Z, 0-9][0-9])|([O,P,Q][0-9][A-Z, 0-9][A-Z, 0-9][A-Z, 0-9][0-9])$
- Greengenes
- 16S rRNA gene database
- /db_xref="Greengenes:269185"
- http://identifiers.org/greengenes/
- GRIN
- Germplasm Resources Information Network
- /db_xref="GRIN:1005973"
- http://identifiers.org/grin.taxonomy/
- HGNC
- Human Gene Nomenclature Database
- /db_xref="HGNC:2041"
- http://identifiers.org/hgnc/
- http://identifiers.org/hgnc.symbol/
- H-InvDB
- H-Invitational Database
- /db_xref="H-InvDB:HIT000000001"
- /db_xref="H-InvDB:HIX0000001"
- http://identifiers.org/hinv.locus/
- http://identifiers.org/hinv.protein/
- http://identifiers.org/hinv.transcript/
- HSSP
- Database of homology-derived secondary structure of proteins
- /db_xref="HSSP:12GS"
- http://identifiers.org/hssp/
- IMGT/LIGM
- Immunogenetics database, immunoglobulins and T-cell receptors
- /db_xref="IMGT/LIGM:U03895"
- http://identifiers.org/imgt.ligm/
- IMGT/HLA
- Immunogenetics database, human MHC
- /db_xref="IMGT/HLA:HLA00031"
- http://identifiers.org/imgt.hla/
- Interpro
- InterPro protein sequence database
- /db_xref="InterPro:IPR002928"
- http://identifiers.org/interpro/
- ISFinder
- Insertion sequence elements database
- /db_xref="ISFinder:ISA1083-2"
- http://identifiers.org/isfinder/
- JCM
- Japan Collection of Microorganisms
- /db_xref="JCM:1339"
- http://identifiers.org/jcm/
- MaizeGDB
- Maize Genome Database unique identifiers
- /db_xref="MaizeGDB:635633 "
- http://identifiers.org/maizegdb.locus/
- MGI
- Mouse Genome Informatics
- /db_xref="MGI:1894891"
- http://identifiers.org/mgd/
- MIM
- Mendelian Inheritance in Man numbers
- /db_xref="MIM:123456"
- http://identifiers.org/omim/
- miRBase
- The microRNA database
- /db_xref="miRBase: MI0001857"
- http://identifiers.org/mirbase/
- http://identifiers.org/mirbase.mature/
- NBRC
- NITE Biological Resource Center
- /db_xref="NBRC:3189"
- http://identifiers.org/nbrc/
- NextDB
- Nematode Expression Pattern DataBase
- /db_xref="NextDB:CELK01662"
- http://identifiers.org/nextdb/
- niaEST
- NIA Mouse cDNA Project
- /db_xref="niaEST:L0304H12-3"
- http://identifiers.org/niaest/
- PDB
- Biological macromolecule three dimensional structure database
- /db_xref="PDB:12GS"
- http://identifiers.org/pdb/
- PFAM
- Collection of protein families
- /db_xref="PFAM:PF00003"
- http://identifiers.org/pfam/
- PomBase
- Database of Structural and Functional Data for Schizosaccaromyces pombe
- /db_xref="PomBase:SPBC1709.20"
- http://identifiers.org/pombase/
- PseudoCap
- Pseudomonas Genome Database
- /db_xref="PseudoCap:PA0001"
- http://identifiers.org/pseudomonas/
- RGD
- Rat Genome Database
- /db_xref="RGD:620528"
- http://identifiers.org/rgd/
- SoyBase
- Glycine max Genome Database
- /db_xref="SoyBase:Satt005"
- http://identifiers.org/soybase/
- taxon
- NCBI's taxonomic identifier
- /db_xref="taxon:4932"
- http://identifiers.org/taxonomy/
- The Arabidopsis IR
- The Arabidopsis Information Resource
- /db_xref="TAIR:AT1F51370"
- http://identifiers.org/tair.locus/
- TIGRFAM
- TIGR protein families
- /db_xref="TIGRFAM:TIGR00094"
- http://identifiers.org/tigrfam/
- UniProtKB/Swiss-Prot
- section of the UniProt Knowledgebase, containing annotated records, which include curator-evaluated computational analysis, as well as, information extracted from the literature
- /db_xref="UniProtKB/Swiss-Prot:P12345"
- http://identifiers.org/uniprot/
- UniProtKB/TrEMBL
- section of the UniProt Knowledgebase, containing computationally analysed records waiting for full manual annotation
- /db_xref=" UniProtKB/TrEMBL:Q00177"
- http://identifiers.org/uniprot/
- UNITE
- Molecular database for the identification of fungi
- /db_xref=" UNITE:UDB000157"
- http://identifiers.org/unite/
- VBASE2
- Integrative database of germ-line V genes from the immunoglobulin loci of human and mouse
- /db_xref="VBASE2:humIGKV165"
- http://identifiers.org/vbase2/
- WorfDB
- C. elegans ORFeome cloning project
- /db_xref="WorfDB:pos-1"
- http://identifiers.org/worfdb/
Need Check
- AntWeb
- Ant Database
- /db_xref="AntWeb:CASENT0058943-D01"
- http://identifiers.org/antweb/
- ^casent\d+$
- ApiDB
- Apicomplexan Database Resources
- /db_xref="ApiDB:cgd1_1090"
- CryptoDB is one of the databases that can be accessed through the EuPathDB (http://EuPathDB.org; formerly ApiDB) portal
- http://identifiers.org/cryptodb/cgd1_1090
- How to map ApiDB:ID to one of the followings?
- AmoebaDB http://identifiers.org/amoebadb/
- CryptoDB http://identifiers.org/cryptodb/
- GiardiaDB http://identifiers.org/giardiadb/
- MicrosporidiaDB http://identifiers.org/microsporidia/
- PiroplasmaDB http://identifiers.org/piroplasma/
- PlasmoDB http://identifiers.org/plasmodb/
- ToxoDB http://identifiers.org/toxoplasma/
- TriTrypDB http://identifiers.org/tritrypdb/
- TrichDB http://identifiers.org/trichdb/
- BDGP_EST
- Berkeley Drosophila Genome Project EST database
- /db_xref="BDGP_EST:123456"
- http://identifiers.org/bdgp.est/
- ^\w+(\.)?(\d+)?$
- BDGP_INS
- Berkeley Drosophila Genome Project database -- Insertion
- /db_xref="BDGP_INS:123456"
- http://identifiers.org/BDGP.insertion/
- http://www.ebi.ac.uk/miriam/main/collections/MIR:00000156
- ?? (why this one is not http://identifiers.org/bdgp.insertion/ but BDGP.insertion ?)
- BOLD
- Barcode of Life database
- /db_xref="Bold:EPAF263"
- http://identifiers.org/bold.taxonomy/
- ^\d+$
- CABRI
- Common Access to Biological Resources and Information project
- /db_xref="CABRI: ACC 424"
- http://identifiers.org/cabri/
- ^([A-Za-z]+)?(\_)?([A-Za-z-]+)\:([A-Za-z0-9 ]+)$
- CDD
- Conserved Domain Database
- /db_xref="CDD:02194
- http://identifiers.org/cdd/
- ^cd\d{5}$
- dbEST
- EST database maintained at the NCBI.
- /db_xref="dbEST:123456"
- /db_xref="dbEST:BP535535"
- http://identifiers.org/dbest/
- ^BP\d+$
- dbSNP
- Variation database maintained at the NCBI.
- /db_xref="dbSNP:4647"
- /db_xref="dbSNP:rs133073"
- http://identifiers.org/dbsnp/
- ^\d+$
- dbSTS
- STS database maintained at the NCBI.
- /db_xref="dbSTS:456789"
- /db_xref="dbSTS:BV210161"
- http://identifiers.org/unists/
- ^\d+$
- dictyBase
- Dictyostelium genome database
- /db_xref="dictyBase:DDB0191090"
- http://identifiers.org/dictybase.gene/
- http://identifiers.org/dictybase.est/
- To which URI?
- ENSEMBL
- Database of automatically annotated genomic data
- /db_xref="ENSEMBL:HUMAN-Clone-AC005612"
- /db_xref="ENSEMBL:HUMAN-Gene-ENSG00000007102"
- http://identifiers.org/ensembl/
- ^ENS[A-Z]*[FPTG]\d{11}(\.\d+)?$
- GO
- Gene Ontology Database identifier
- /db_xref="GO:123"
- http://identifiers.org/obo.go/
- ^GO:\d{7}$
- HOMD
- Human Oral Microbiome Database
- /db_xref="HOMD:tax_078"
- /db_xref="HOMD:seq_1603”
- http://identifiers.org/homd.taxon/
- http://identifiers.org/homd.seq/
- need to remove tax_ or seq_ prefix
- IRD
- Influenza Research Database
- /db_xref="IRD:CEIRS-CIP045-123456.2"
- http://identifiers.org/ird.segment/
- ^\w+(\_)?\d+(\.\d+)?$
- JGIDB
- JGI Genome Portal
- /db_xref="JGIDB:Chluvu1_81011"
- http://identifiers.org/img.gene/
- ^\d+$
- MycoBank
- Fungal Databases, Nomenclature and Species Banks
- /db_xref="MycoBank:MB519473"
- http://identifiers.org/mycobank/
- ^\d+$
- PGN
- Plant Genome Network
- /db_xref="PGN:aam01-1ms3-a05"
- http://identifiers.org/pgn/
- ^\d+$
- RAP-DB
- Rice Annotation Project Database
- /db_xref="RAP-DB:Os01g1234567"
- http://identifiers.org/ricegap/
- ^LOC\_Os\d{1,2}g\d{5}$
- SGD
- Saccharomyces Genome Database
- /db_xref="SGD:L0000470"
- http://identifiers.org/sgd/
- ^S\d+$
- SGN
- SOL Genomics Network
- /db_xref="SGN:E553090"
- http://identifiers.org/sgn/
- ^\d+$
- VectorBase
- Bioinformatics Resource Center for Invertebrate Vectors of Human Pathogens
- /db_xref="VectorBase:ENSANGG00000007825"
- http://identifiers.org/vectorbase/
- ^\D{4}\d{6}(\-\D{2})?$
- WormBase
- Caenorhabditis elegans Genome Database
- /db_xref="WormBase:R13H7"
- http://identifiers.org/wormbase/
- ^WBGene\d{8}$
- Xenbase
- Xenopus laevis and tropicalis biology and genomics resource
- /db_xref=Xenbase:XB-GENE-1019547
- http://identifiers.org/xenbase/
- ^\d+$
- ZFIN
- Zebrafish Information Network
- /db_xref="ZFIN:ZDB-GENE-011205-17"
- http://identifiers.org/zfin/
- ZDB\-GENE\-\d+\-\d+
NG
- AFTOL
- Assembling the Fungal Tree of Life
- /db_xref="AFTOL:959"
- ??
- APHIDBASE
- Aphid Genome Database
- /db_xref="APHIDBASE:ACYPI007424"
- ??
- ATCC(in host)
- American Type Culture Collection database
- /db_xref="ATCC(in host):123456"
- ??
- ATCC(dna)
- American Type Culture Collection database
- /db_xref="ATCC(dna):123456”
- ??
- Axeldb
- A Xenopus laevis database
- /db_xref="Axeldb:32B3.1"
- ??
- BGD
- Bovine Genome Database
- /db_xref="BGD:BT10004"
- ??
- CCAP
- Culture Collection of algae and protozoa
- /db_xref="CCAP: 1460/15"
- ??
- EPD
- Eukaryotic Promotor Database
- /db_xref="EPD: EP00576"
- ??
- ERIC
- Enteropathogen Resource Integration Center
- /db_xref="ERIC:ABY-0246137"
- ??
- ESTLIB
- EBI's EST library identifier
- /db_xref="ESTLIB:1200"
- ??
- FANTOM_DB
- Database of Functional Annotation of Mouse
- /db_xref="FANTOM_DB:0610005A07"
- ??
- FBOL
- International Fungal Working Group Fungal Barcoding
- /db_xref="FBOL:2224"
- ??
- GDB
- Human Genome Database accession numbers
- /db_xref="GDB:G00-128-600"
- ??
- GI
- GenInfo identifier, used as a unique sequence identifier for nucleotide and proteins
- /db_xref="GI:1234567890"
- ??
- HMP
- Human Microbiome Project
- /db_xref="HMP:0536"
- ??
- IMGT/GENE-DB
- Immunogenetics database, immunoglobulin and T-cell receptor genes
- /db_xref="IMGT/GENE-DB:IGKC"
- ??
- JGI's Phytozome
- Comparative genomics of plants
- /db_xref="Phytozome:Glyma0021s00410"
- /db_xref="Phytozme:POPTR_1446s00200"
- ??
- NMPDR
- National Microbial Pathogen Data Resource
- /db_xref="NMPDR:fig|306254.1.peg.183"
- ??
- NRESTdb
- Natural Rubber EST database
- /db_xref="NRESTdb:Y01A01"
- ??
- Osa1
- Rice Genome Annotation Project
- /db_xref="Osa1:LOC_Os01g12345"
- ??
- Pathema
- Pathema Genome Resource
- /db_xref="Pathema:BA_4405"
- /db_xref="Pathema:191218"
- ??
- PBmice
- PiggyBac Mutagenesis Information Center
- /db_xref="PBmice:38"
- ??
- PIR
- Protein Information Resource accession numbers
- /db_xref="PIR:S12345"
- ??
- PSEUDO
- EMBL pseudo protein identifier
- /db_xref="PSEUDO:CAC44644.1"
- ??
- RATMAP
- Rat Genome Database
- /db_xref="RATMAP:5"
- ??
- RFAM
- RNA families database of alignments and CMs
- /db_xref="RFAM:RF00230"
- ??
- RiceGenes
- Rice database accession numbers
- /db_xref="RiceGenes:AA231856"
- ??
- RZPD
- Resource Centre Primary Database Clone Identifiers
- /db_xref="RZPD:IMAGp998I142450Q6"
- ??
- SEED
- The SEED Database
- /db_xref="SEED:fig|83331.1.peg.1"
- ??
- SK-FST
- Saskatoon Arabidopsis T-DNA mutant population - SK Collection
- /db_xref="SK-FST: FST:SK32219"
- ??
- SubtiList
- Bacillus subtilis genome sequencing project
- /db_xref="SubtiList:BG10001"
- ??
- UNILIB
- Unified Library Database, a library-level view of the EST and SAGE libraries present in dbEST, UniGene and SAGEmap
- /db_xref="UNILIB:1002"
- ??
- ViPR
- Virus Pathogen Resource
- /db_xref="ViPR:HRV-A34_p1058_sR263_2008"
- ??
TODO
- LocusID
- NCBI LocusLink ID **Discontinued March 2005
- /db_xref="LocusID:51199"
- NO NEED