BH12.12/TogoGenome

提供:TogoWiki

2012年12月20日 (木) 07:24時点におけるKtym (トーク | 投稿記録)による版
移動: 案内, 検索

ゲノム情報の RDF 化

  • INSDC のオントロジー 作成
    • SO
    • FALDO
    • INSDC.owl
      • Feature/Qualifier - FT <-> SO
      • DB XREF - Identifiers.org
  • RefSeq (のprokaryote) エントリの RDF を生成
    • DDBJ の http://fat:8892/sparql に <http://v5.genome.db/> としてストアされているものが最新
      • 元データは ~ktym/project/rdfgenome/wget_prokaryote.v5/**/*.ttl
      • 42GB
      • 455,322,591 triples
      • 19,981,922 URIs
      • 75,363,941 UUIDs
      • 96 predicates
152898623 rdf:type
44102584 rdfs:label
40926808 faldo:reference
40926808 faldo:position
30128855 rdfs:seeAlso
20463404 faldo:end
20463404 faldo:begin
20459536 obo:so_part_of
13973729 insdc:location_string
13973729 faldo:location
13868333 insdc:feature_locus_tag
6701192 insdc:feature_product
6486350 obo:so_has_part
6486350 insdc:feature_transl_table
6486350 insdc:feature_codon_start
6485705 insdc:feature_translation
4419283 insdc:feature_note
2793204 insdc:feature_gene
1706510 insdc:feature_inference
762918 insdc:feature_EC_number
309060 insdc:feature_function
164543 insdc:feature_old_locus_tag
126161 insdc:feature_pseudo
72012 insdc:feature_gene_synonym
20815 insdc:feature_experiment
14092 insdc:feature_codon_recognized
12731 insdc:feature_operon
11131 insdc:feature_rpt_family
10385 insdc:feature_mobile_element_type
6145 insdc:feature_anticodon
4558 insdc:feature_rpt_type
  :
  • ヒストリ
    • v1: BioRuby を使った RefSeq -> Turtle コンバータ
    • v2: URI を見直し
    • v3: URI を URN (UUID) 化
    • v4: Identifiers.org を使用、バグフィックス
    • v5: FALDOの更新 fix、INSDC オントロジーへの暫定移行
    • v6: INSDC.owl の正式採用(予定)

Stanzathon

UniProt

% wget http://www.uniprot.org/uniprot/P16033.rdf

% rapper -i rdfxml -o turtle P16033.rdf > P16033.ttl 
rapper: Parsing URI file:///Users/ktym/P16033.rdf with parser rdfxml
rapper: Serializing with serializer turtle
rapper: Parsing returned 702 triples

% export SPARQL_ENDPOINT="http://beta.sparql.uniprot.org/sparql"
% sparql.rb query '
prefix up: <http://purl.uniprot.org/core/>   
prefix tax: <http://purl.uniprot.org/taxonomy/>
select *
where {                      
  ?s up:locusName "slr1311" .
  ?s ?p ?o .
}'

s	p	o
_5031363033330011	<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>	<http://purl.uniprot.org/core/Gene>
_5031363033330011	<http://purl.uniprot.org/core/locusName>	slr1311
_5031363033330011	<http://www.w3.org/2004/02/skos/core#prefLabel>	psbA2
_5031363033330011	<http://www.w3.org/2004/02/skos/core#altLabel>	psbA-2

TogoGenome

% export SPARQL_ENDPOINT="http://lod.dbcls.jp/openrdf-sesame/repositories/togogenome"

% sparql.rb query '
select *
where {
  ?s rdfs:label "slr1311" .
  ?s ?p ?o .
}
'
s	p	o
<urn:uuid:aaf399d2-f84a-4feb-a689-966311a3b116>	<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>	<http://purl.obolibrary.org/obo/SO_0000316>
<urn:uuid:aaf399d2-f84a-4feb-a689-966311a3b116>	<http://biohackathon.org/resource/faldo#location>	<urn:uuid:3114165b-ffee-4816-b9bf-811dbbcb9b06>
<urn:uuid:aaf399d2-f84a-4feb-a689-966311a3b116>	<http://www.w3.org/2000/01/rdf-schema#seeAlso>	<http://identifiers.org/ncbigene/951890>
<urn:uuid:aaf399d2-f84a-4feb-a689-966311a3b116>	<http://www.w3.org/2000/01/rdf-schema#seeAlso>	<http://identifiers.org/ncbigi/16329178>
<urn:uuid:aaf399d2-f84a-4feb-a689-966311a3b116>	<http://www.w3.org/2000/01/rdf-schema#seeAlso>	<http://identifiers.org/ncbiprotein/NP_439906.1>
<urn:uuid:aaf399d2-f84a-4feb-a689-966311a3b116>	<http://www.w3.org/2000/01/rdf-schema#label>	slr1311
<urn:uuid:aaf399d2-f84a-4feb-a689-966311a3b116>	<http://rdf.insdc.org/feature_gene>	psbA2
<urn:uuid:aaf399d2-f84a-4feb-a689-966311a3b116>	<http://rdf.insdc.org/feature_locus_tag>	slr1311
<urn:uuid:aaf399d2-f84a-4feb-a689-966311a3b116>	<http://purl.obolibrary.org/obo/so_part_of>	<urn:uuid:8683a33d-e496-43da-a4ce-a454faeb228c>
<urn:uuid:aaf399d2-f84a-4feb-a689-966311a3b116>	<http://rdf.insdc.org/feature_translation>	MTTTLQQRESASLWEQFCQWVTSTNNRIYVGWFGTLMIPTLLTATTCFIIAFIAAPPVDIDGIREPVAGSLLYGNNIISGAVVPSSNAIGLHFYPIWEAASLDEWLYNGGPYQLVVFHFLIGIFCYMGRQWELSYRLGMRPWICVAYSAPVSAATAVFLIYPIGQGSFSDGMPLGISGTFNFMIVFQAEHNILMHPFHMLGVAGVFGGSLFSAMHGSLVTSSLVRETTEVESQNYGYKFGQEEETYNIVAAHGYFGRLIFQYASFNNSRSLHFFLGAWPVIGIWFTAMGVSTMAFNLNGFNFNQSILDSQGRVIGTWADVLNRANIGFEVMHERNAHNFPLDLASGEQAPVALTAPAVNG
<urn:uuid:aaf399d2-f84a-4feb-a689-966311a3b116>	<http://purl.obolibrary.org/obo/so_has_part>	node9
<urn:uuid:aaf399d2-f84a-4feb-a689-966311a3b116>	<http://rdf.insdc.org/feature_codon_start>	1
<urn:uuid:aaf399d2-f84a-4feb-a689-966311a3b116>	<http://rdf.insdc.org/feature_transl_table>	11
<urn:uuid:aaf399d2-f84a-4feb-a689-966311a3b116>	<http://rdf.insdc.org/feature_product>	photosystem II D1 protein
<urn:uuid:8683a33d-e496-43da-a4ce-a454faeb228c>	<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>	<http://purl.obolibrary.org/obo/SO_0000704>
<urn:uuid:8683a33d-e496-43da-a4ce-a454faeb228c>	<http://biohackathon.org/resource/faldo#location>	<urn:uuid:d90ec492-6164-4f68-b47c-d89e19506302>
<urn:uuid:8683a33d-e496-43da-a4ce-a454faeb228c>	<http://www.w3.org/2000/01/rdf-schema#seeAlso>	<http://identifiers.org/ncbigene/951890>
<urn:uuid:8683a33d-e496-43da-a4ce-a454faeb228c>	<http://www.w3.org/2000/01/rdf-schema#label>	slr1311
<urn:uuid:8683a33d-e496-43da-a4ce-a454faeb228c>	<http://rdf.insdc.org/feature_gene>	psbA2
<urn:uuid:8683a33d-e496-43da-a4ce-a454faeb228c>	<http://rdf.insdc.org/feature_locus_tag>	slr1311
<urn:uuid:8683a33d-e496-43da-a4ce-a454faeb228c>	<http://purl.obolibrary.org/obo/so_part_of>	<urn:uuid:182f171a-7928-4324-8d41-f3e820a872fd>
/mw/BH12.12/TogoGenome」より作成