BH12.12/SPARQLthon5/INSDCオントロジー

提供:TogoWiki

2013年8月21日 (水) 10:39時点におけるTfuji (トーク | 投稿記録)による版
(差分) ←前の版 | 最新版 (差分) | 次の版→ (差分)
移動: 案内, 検索

目次

INSDC

  • insdc.owlアップデート

bioproject

INSDC公開データの確認を行なった。

公開データ

2013-02-20時点のFTPサイトのデータを確認した。ENAについては不明。

Organism Name   TaxID   Project Accession       Project ID      Project Type    Project Data Type       Date
Borrelia burgdorferi B31        224326  PRJNA3  3       Primary submission      Genome sequencing       2003/02/23
Treponema denticola ATCC 35405  243275  PRJNA4  4       Primary submission      Genome sequencing       2004/04/06
Treponema pallidum subsp. pallidum str. Nichols 243276  PRJNA5  5       Primary submission      Genome sequencing       2003/02/25
Magnetospirillum magnetotacticum MS-1   272627  PRJNA6  6       Primary submission      Genome sequencing       2003/02/25
Campylobacter fetus subsp. venerealis str. Azul-94      593452  PRJNA7  7       Primary submission      Genome sequencing       2009/04/22
.
.
.
68749件
Refseq accn,Genbank accn,Organism name,TaxID
PRJNA116,PRJNA10719,Arabidopsis thaliana,3702
PRJNA122,PRJNA12269,Oryza sativa Japonica Group,39947
PRJNA122,PRJNA13141,Oryza sativa Japonica Group,39947
PRJNA127,PRJNA13836,Schizosaccharomyces pombe 972h-,284812
PRJNA127,PRJNA20755,Schizosaccharomyces pombe,4896
PRJNA128,PRJNA13838,Saccharomyces cerevisiae S288c,559292
PRJNA128,PRJNA43747,Saccharomyces cerevisiae S288c,559292
PRJNA132,PRJNA13841,Neurospora crassa OR74A,367110
PRJNA148,PRJNA13173,Plasmodium falciparum 3D7,36329
PRJNA155,PRJNA13833,Encephalitozoon cuniculi GB-M1,284813
.
.
.
7606件
Organism Name	Project Accession	Project ID	Project Type	Project Data Type	Released	Updated
Gluconobacter frateurii NBRC 101659	PRJDB2		Primary submission	Genome sequencing	2012/05/14	2012/05/14
Gordonia otitidis NBRC 100426	PRJDB3		Primary submission	Genome sequencing	2011/11/01	2012/02/16
Gordonia rhizosphera NBRC 16068	PRJDB4		Primary submission	Genome sequencing	2011/11/01	2012/08/29
Escherichia coli str. K-12 substr. MDS42	PRJDB5		Primary submission	Genome sequencing	2011/12/02	2012/02/08
.
.
.
448件

RDF

  • prokaryotes.txt.bioproject_refseq.nt
<http://www.ncbi.nlm.nih.gov/bioproject/43389>  <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/nuccore/NZ_CM000855.1> .
<http://www.ncbi.nlm.nih.gov/nuccore/NZ_CM000855.1>     <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/43389> .
<http://www.ncbi.nlm.nih.gov/nuccore/NZ_CM000855.1>     <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>       <http://purl.obolibrary.org/obo/SO_0000340> .
<http://www.ncbi.nlm.nih.gov/bioproject/43389>  <http://www.w3.org/2000/01/rdf-schema#label>    "Campylobacter jejuni subsp. jejuni 414" .
<http://www.ncbi.nlm.nih.gov/bioproject/43389>  <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/nuccore/CM000855.1> .
<http://www.ncbi.nlm.nih.gov/nuccore/CM000855.1>        <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/43389> .
<http://www.ncbi.nlm.nih.gov/nuccore/CM000855.1>        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>       <http://purl.obolibrary.org/obo/SO_0000340> .
<http://www.ncbi.nlm.nih.gov/bioproject/43391>  <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/nuccore/NZ_CM000854.1> .
<http://www.ncbi.nlm.nih.gov/nuccore/NZ_CM000854.1>     <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/43391> .
<http://www.ncbi.nlm.nih.gov/nuccore/NZ_CM000854.1>     <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>       <http://purl.obolibrary.org/obo/SO_0000340> .
<http://www.ncbi.nlm.nih.gov/bioproject/43391>  <http://www.w3.org/2000/01/rdf-schema#label>    "Campylobacter jejuni subsp. jejuni 1336" .
<http://www.ncbi.nlm.nih.gov/bioproject/43391>  <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/nuccore/CM000854.1> .
<http://www.ncbi.nlm.nih.gov/nuccore/CM000854.1>        <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/43391> .
<http://www.ncbi.nlm.nih.gov/nuccore/CM000854.1>        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>       <http://purl.obolibrary.org/obo/SO_0000340> .
<http://www.ncbi.nlm.nih.gov/bioproject/57587>  <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/nuccore/NC_002163.1> .
<http://www.ncbi.nlm.nih.gov/nuccore/NC_002163.1>       <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/57587> .
<http://www.ncbi.nlm.nih.gov/nuccore/NC_002163.1>       <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>       <http://purl.obolibrary.org/obo/SO_0000340> .
<http://www.ncbi.nlm.nih.gov/bioproject/57587>  <http://www.w3.org/2000/01/rdf-schema#label>    "Campylobacter jejuni subsp. jejuni NCTC 11168 = ATCC 700819" .
<http://www.ncbi.nlm.nih.gov/bioproject/57587>  <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/nuccore/AL111168.1> .
<http://www.ncbi.nlm.nih.gov/nuccore/AL111168.1>        <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/57587> .
<http://www.ncbi.nlm.nih.gov/nuccore/AL111168.1>        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>       <http://purl.obolibrary.org/obo/SO_0000340> .
<http://www.ncbi.nlm.nih.gov/bioproject/57899>  <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/nuccore/NC_003912.7> .
<http://www.ncbi.nlm.nih.gov/nuccore/NC_003912.7>       <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/57899> .
<http://www.ncbi.nlm.nih.gov/nuccore/NC_003912.7>       <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>       <http://purl.obolibrary.org/obo/SO_0000340> .
<http://www.ncbi.nlm.nih.gov/bioproject/57899>  <http://www.w3.org/2000/01/rdf-schema#label>    "Campylobacter jejuni RM1221" .
<http://www.ncbi.nlm.nih.gov/bioproject/57899>  <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/nuccore/CP000025.1> .
<http://www.ncbi.nlm.nih.gov/nuccore/CP000025.1>        <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/57899> .
<http://www.ncbi.nlm.nih.gov/nuccore/CP000025.1>        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>       <http://purl.obolibrary.org/obo/SO_0000340> .
<http://www.ncbi.nlm.nih.gov/bioproject/58503>  <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/nuccore/NC_008787.1> .
<http://www.ncbi.nlm.nih.gov/nuccore/NC_008787.1>       <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/58503> .
<http://www.ncbi.nlm.nih.gov/nuccore/NC_008787.1>       <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>       <http://purl.obolibrary.org/obo/SO_0000340> .
<http://www.ncbi.nlm.nih.gov/bioproject/58503>  <http://www.w3.org/2000/01/rdf-schema#label>    "Campylobacter jejuni subsp. jejuni 81-176" .
<http://www.ncbi.nlm.nih.gov/bioproject/58503>  <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/nuccore/CP000538.1> .
<http://www.ncbi.nlm.nih.gov/nuccore/CP000538.1>        <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/58503> .
<http://www.ncbi.nlm.nih.gov/nuccore/CP000538.1>        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>       <http://purl.obolibrary.org/obo/SO_0000340> .
<http://www.ncbi.nlm.nih.gov/bioproject/58503>  <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/nuccore/NC_008787.1> .
<http://www.ncbi.nlm.nih.gov/nuccore/NC_008787.1>       <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/58503> .
<http://www.ncbi.nlm.nih.gov/nuccore/NC_008787.1>       <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>       <http://purl.obolibrary.org/obo/SO_0000155> .
<http://www.ncbi.nlm.nih.gov/bioproject/58503>  <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/nuccore/CP000549.1> .
<http://www.ncbi.nlm.nih.gov/nuccore/CP000549.1>        <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/58503> .
<http://www.ncbi.nlm.nih.gov/nuccore/CP000549.1>        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>       <http://purl.obolibrary.org/obo/SO_0000155> .
<http://www.ncbi.nlm.nih.gov/bioproject/58503>  <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/nuccore/CP000550.1> .
  • summary.txt.bioproject_taxid_gold.nt
<http://www.ncbi.nlm.nih.gov/bioproject/116>    <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/taxonomy/3702> .
<http://www.ncbi.nlm.nih.gov/taxonomy/3702>     <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/116> .
<http://www.ncbi.nlm.nih.gov/bioproject/116>    <http://www.w3.org/2000/01/rdf-schema#label>    "RefSeq Genome" .
<http://www.ncbi.nlm.nih.gov/bioproject/116>    <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/10719> .
<http://www.ncbi.nlm.nih.gov/bioproject/10719>  <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/116> .
<http://www.ncbi.nlm.nih.gov/bioproject/10719>  <http://www.w3.org/2000/01/rdf-schema#label>    "Genome sequencing" .
<http://www.ncbi.nlm.nih.gov/bioproject/10719>  <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/taxonomy/3702> .
<http://www.ncbi.nlm.nih.gov/taxonomy/3702>     <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/10719> .
<http://www.ncbi.nlm.nih.gov/bioproject/122>    <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/taxonomy/39947> .
<http://www.ncbi.nlm.nih.gov/taxonomy/39947>    <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/122> .
<http://www.ncbi.nlm.nih.gov/bioproject/122>    <http://www.w3.org/2000/01/rdf-schema#label>    "RefSeq Genome" .
<http://www.ncbi.nlm.nih.gov/bioproject/122>    <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/13141> .
<http://www.ncbi.nlm.nih.gov/bioproject/13141>  <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/122> .
<http://www.ncbi.nlm.nih.gov/bioproject/13141>  <http://www.w3.org/2000/01/rdf-schema#label>    "Genome sequencing" .
<http://www.ncbi.nlm.nih.gov/bioproject/13141>  <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/taxonomy/39947> .
<http://www.ncbi.nlm.nih.gov/taxonomy/39947>    <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/13141> .
<http://www.ncbi.nlm.nih.gov/bioproject/13141>  <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.genomesonline.org/cgi-bin/GOLD/GOLDCards.cgi?goldstamp=Gc00603> .
<http://www.genomesonline.org/cgi-bin/GOLD/GOLDCards.cgi?goldstamp=Gc00603>     <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/13141> .
<http://www.ncbi.nlm.nih.gov/bioproject/127>    <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/taxonomy/284812> .
<http://www.ncbi.nlm.nih.gov/taxonomy/284812>   <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/127> .
<http://www.ncbi.nlm.nih.gov/bioproject/127>    <http://www.w3.org/2000/01/rdf-schema#label>    "RefSeq Genome" .
<http://www.ncbi.nlm.nih.gov/bioproject/127>    <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/20755> .
<http://www.ncbi.nlm.nih.gov/bioproject/20755>  <http://www.w3.org/2000/01/rdf-schema#seeAlso>  <http://www.ncbi.nlm.nih.gov/bioproject/127> .
<http://www.ncbi.nlm.nih.gov/bioproject/20755>  <http://www.w3.org/2000/01/rdf-schema#label>    "Genome sequencing" .

XML2RDFコンバーター開発

フラットファイルではなくXMLからのコンバーターを作成する

  • BioProjectID -- INSDC Entry Accession のデータがXMLにない
  • xmlファイルが大きいため、Nokogiri:XML::SAX::Document を使ってパースする
  • project id はプレフィックスPRJDBが付く http://identifiers.org/bioproject/PRJDB116 をSubjectにする?

前回