BH12.12/SPARQLthon8/INSDCオントロジー

提供:TogoWiki

2013年6月2日 (日) 05:08時点におけるTfuji (トーク | 投稿記録)による版
移動: 案内, 検索

目次

refseq2ttl.rbによる変換

2013.05.23最新版 v6 https://gist.github.com/ktym/5547403

ruby refseq2ttl.rb genome_conf/datasets/cyanobase/data_sources/Synechocystis/genbank/NC_000911.gb

スクリプト実行結果のエントリーメタデータ+source feature部分

   @prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
   @prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
   @prefix xsd:    <http://www.w3.org/2001/XMLSchema#> .
   @prefix obo:    <http://purl.obolibrary.org/obo/> .
   @prefix faldo:  <http://biohackathon.org/resource/faldo#> .
   @prefix idorg:  <http://rdf.identifiers.org/database/> .
   @prefix insdc:  <http://insdc.org/owl/> .
   
   <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> rdf:type        obo:SO_0000340 .  # SO:chromosome
   <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> rdfs:label      "Synechocystis sp. PCC 6803 chromosome, complete genome." .
   <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> insdc:sequence_version  "NC_000911.1" .
   <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> insdc:sequence_length   3573470 .
   <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> insdc:sequence_fasta    <http://togows.dbcls.jp/entry/nucleotide/NC_000911.1.fasta> .
   <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> rdf:type        obo:SO_0000988 .  # SO:circular
   <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> insdc:sequence_date     "2012-01-19"^^xsd:date .
   <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> rdfs:seeAlso    <http://identifiers.org/ncbigi/16329170> .
   <http://identifiers.org/ncbigi/16329170>        rdfs:label      "GI:16329170" .
   <http://identifiers.org/ncbigi/16329170>        rdf:type        idorg:GI .
   <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> rdfs:seeAlso    <http://identifiers.org/refseq/NC_000911.1> .
   <http://identifiers.org/refseq/NC_000911.1>     rdfs:label      "RefSeq:NC_000911.1" .
   <http://identifiers.org/refseq/NC_000911.1>     rdf:type        idorg:RefSeq .
   <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> rdfs:seeAlso    <http://identifiers.org/bioproject/57659> .
   <http://identifiers.org/bioproject/57659>       rdfs:label      "BioProject:57659" .
   <http://identifiers.org/bioproject/57659>       rdf:type        idorg:BioProject .
   <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> rdfs:seeAlso    <http://identifiers.org/pubmed/9724772> .
   <http://identifiers.org/pubmed/9724772> rdfs:label      "PubMed:9724772" .
   <http://identifiers.org/pubmed/9724772> rdf:type        idorg:PubMed .
   <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> rdfs:seeAlso    <http://identifiers.org/pubmed/8905231> .
   <http://identifiers.org/pubmed/8905231> rdfs:label      "PubMed:8905231" .
   <http://identifiers.org/pubmed/8905231> rdf:type        idorg:PubMed .
   <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> rdfs:seeAlso    <http://identifiers.org/pubmed/8590279> .
   <http://identifiers.org/pubmed/8590279> rdfs:label      "PubMed:8590279" .
   <http://identifiers.org/pubmed/8590279> rdf:type        idorg:PubMed .
   <urn:uuid:69ff7bb6-54cb-47f7-81ce-fc0fb95d246d> insdc:location  "1..3573470" .
   <urn:uuid:69ff7bb6-54cb-47f7-81ce-fc0fb95d246d> rdf:type        faldo:Region .
   <urn:uuid:69ff7bb6-54cb-47f7-81ce-fc0fb95d246d> faldo:begin     <urn:uuid:ce08fa48-a23b-40f8-aa50-553f90a7c177> .
   <urn:uuid:69ff7bb6-54cb-47f7-81ce-fc0fb95d246d> faldo:end       <urn:uuid:326eba1a-590e-4a56-b5fb-addc30a6983d> .
   <urn:uuid:ce08fa48-a23b-40f8-aa50-553f90a7c177> faldo:position  1 .
   <urn:uuid:ce08fa48-a23b-40f8-aa50-553f90a7c177> faldo:reference <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> .
   <urn:uuid:ce08fa48-a23b-40f8-aa50-553f90a7c177> rdf:type        faldo:ExactPosition .
   <urn:uuid:ce08fa48-a23b-40f8-aa50-553f90a7c177> rdf:type        faldo:ForwardStrandPosition .
   <urn:uuid:326eba1a-590e-4a56-b5fb-addc30a6983d> faldo:position  3573470 .
   <urn:uuid:326eba1a-590e-4a56-b5fb-addc30a6983d> faldo:reference <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> .
   <urn:uuid:326eba1a-590e-4a56-b5fb-addc30a6983d> rdf:type        faldo:ExactPosition .
   <urn:uuid:326eba1a-590e-4a56-b5fb-addc30a6983d> rdf:type        faldo:ForwardStrandPosition .
   <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> faldo:location  <urn:uuid:69ff7bb6-54cb-47f7-81ce-fc0fb95d246d> .
   <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> rdfs:seeAlso    <http://identifiers.org/taxonomy/1148> .
   <http://identifiers.org/taxonomy/1148>  rdfs:label      "taxon:1148" .
   <http://identifiers.org/taxonomy/1148>  rdf:type        idorg:Taxonomy .
   <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> insdc:source_organism   "Synechocystis sp. PCC 6803" .
   <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> insdc:source_mol_type   "genomic DNA" .
   <urn:uuid:05b20629-fb41-4fd0-8e41-1011fd4375a4> insdc:source_strain     "PCC 6803" .


NC000911エントリーで現在利用しているinsdc

insdc:
insdc:feature_EC_number
insdc:feature_codon_start
insdc:feature_function
insdc:feature_gene
insdc:feature_gene_synonym
insdc:feature_locus_tag
insdc:feature_note
insdc:feature_product
insdc:feature_transl_table
insdc:feature_translation
insdc:location
insdc:sequence_date
insdc:sequence_fasta → insdc:sequence
insdc:sequence_length
insdc:sequence_version
insdc:source_mol_type
insdc:source_organism
insdc:source_strain

refseq2ttl.rbとINSDCオントロジーの差異

  1. feature_はついていない
  2. source_はつけていない
  3. sequence_ はつけなくてもよさそう メタデータのSOURCE, ORGANISMを除いて
    • insdseq.dtdを確認する【Todo】

SPARQL

gene attributes

例)gene-id=slr0473

    
prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#>
prefix xsd:    <http://www.w3.org/2001/XMLSchema#>
prefix obo:    <http://purl.obolibrary.org/obo/>
prefix faldo:  <http://biohackathon.org/resource/faldo#>
prefix idorg:  <http://rdf.identifiers.org/database/>
prefix insdc:  <http://insdc.org/owl/>
    
SELECT *
from <http://togogenome.org/refseq/>
from <http://togogenome.org/so/>
where{
#?gene a obo:SO_0000704 .
?gene a ?gene_type.
?gene_type rdfs:label ?gene_type_label.
?gene insdc:feature_locus_tag ?locus_tag.
?gene rdfs:label ?label.
?gene rdfs:seeAlso ?seeAlso.
?gene obo:so_part_of ?parent_so.
#?gene ?v ?o.
?parent_so rdfs:label ?parent_so_label.
#?parent_so ?v ?o.
?gene faldo:location ?faldo.
#?faldo ?faldo_v ?faldo_o.
?faldo insdc:location ?faldo_location.
?faldo faldo:begin ?faldo_begin.
#?faldo_begin ?faldo_begin_v ?faldo_begin_o.
?faldo_begin faldo:position ?faldo_begin_position.
?faldo_begin rdf:type ?faldo_begin_type.
?faldo faldo:end ?faldo_end.
#?faldo_end ?faldo_end_v ?faldo_end_o.
?faldo_end faldo:position ?faldo_end_position.
?faldo_end rdf:type ?faldo_end_type.
?faldo rdf:type ?faldo_type.
FILTER(?gene_type =  obo:SO_0000704 || ?gene_type = obo:SO_0000252 || ?gene_type = obo:SO_0000253)
FILTER(?locus_tag = "slr0473")
}