SPARQLthon46/DDBJ

提供:TogoWiki

移動: 案内, 検索

目次

DDBJ Annotated sequence RDF

公開に向けて、SPARQLthon45以降の進捗は以下の通り

  • FTPサイト公開に向けた調整、手続き、準備 ftp://ftp.ddbj.nig.ac.jp/rdf (予定)
  • cron job変換スクリプトをUGEアレイジョブを動的に生成するように再実装
    • DDBJ業務用UGEに変更
    • gzip出力に変更
    • 変換エラー時の通知【Todo】
[w3sw@t347 tmp]$  egrep -v '^(Warning|Features|Error):' ~/ftp/log/ddbj/105.0/ddbj*.error
/home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error:/home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:590:in `source_link': undefined method `each' for nil:NilClass (NoMethodError)
/home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error:	from /home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:580:in `parse_source'
/home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error:	from /home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:343:in `block in parse_entry'
/home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error:	from /home/w3sw/local/lib/ruby/site_ruby/1.9.1/bio/io/flatfile.rb:336:in `each_entry'
/home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error:	from /home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:338:in `parse_entry'
/home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error:	from /home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:184:in `initialize'
/home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error:	from /home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:854:in `new'
/home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error:	from /home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:854:in `<main>'

変換エラーの確認

  • 入力データのエントリーに source featureにdb_xrefがひとつも含まれない (db_xref=taxon:が存在しない)エントリーが存在した
    • 例)ddbjbct30.seq.gz > CP014351.1
[w3sw@t347 tmp]$ zcat /maid01/services/ftp/data/ftp/database/ddbj/ddbjbct30.seq.gz |egrep -A 50 CP014351.1
VERSION     CP014351.1
DBLINK      BioProject: PRJNA311246
            BioSample: SAMN04481062
KEYWORDS    .
SOURCE      Borrelia hermsii HS1
  ORGANISM  Borrelia hermsii HS1
            Bacteria; Spirochaetes; Spirochaetales; Borreliaceae; Borrelia.
REFERENCE   1  (bases 1 to 27881)
  AUTHORS   Barbour,A.G.
  TITLE     Complete genome of the tickborne relapsing fever agent Borrelia
            hermsii strain HS1 Browne Mountain isolate
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 27881)
  AUTHORS   Barbour,A.G.
  TITLE     Direct Submission
  JOURNAL   Submitted (16-FEB-2016) Microbiology and Molecular Genetics,
            University of California Irvine, 3012 Hewitt, Irvine, CA
            92697-4028, United States of America
COMMENT     The circular topology of these plasmids of Borrelia hermsii HS1 was
            experimentally demonstrated by Stevenson et al. Infect. Immun.
            68(7):3900-8, 2000 (PMC101665). Source DNA is available from Alan
            Barbour, Department of Microbiology and Molecular Genetics,
            University of California Irvine, Irvine, CA 92697
            (abarbour@uci.edu).
            
            ##Assembly-Data-START##
            Assembly Method       :: SMRT Analysis HGAP v. 2.1; CLC Assembly
                                     Cell v. 8.5
            Coverage              :: 1000X
            Sequencing Technology :: Illumina; PacBio
            ##Assembly-Data-END##
FEATURES             Location/Qualifiers
     source          1..27881
                     /organism="Borrelia hermsii HS1"
                     /mol_type="genomic DNA"
                     /strain="HS1"
                     /isolate="Browne Mountain"
                     /isolation_source="Ornithodoros hermsi"
                     /lab_host="Mus musculus"
                     /plasmid="cp28"
                     /country="USA:WA:Spokane County"
                     /lat_lon="47.6007 N 117.3283 W"
                     /collection_date="1968"
                     /collected_by="Willy Burgdorfer"
     gene            85..1293
                     /locus_tag="AXX13_P01

データ生成

  • release 104 (239億トリプル、圧縮145GB、非圧縮2.4TB)
  • release 105 再生成中

DDBJ taxonomy.owl

  • private版tax dumpの変換のための拡張 → pull requestしました


+:geneticCodePt
+  a owl:ObjectProperty, owl:FunctionalProperty ;
+  rdfs:label "Plastid genetic code" ;
+  rdfs:domain :Taxon ;
+  rdfs:range :GeneticCode .
+:formalNameIndicator
+  a owl:DatatypeProperty ;
+  rdfs:label "formal name indicator" ;
+  rdfs:domain :Taxon ;
+  rdfs:range xsd:boolean .
+:DummyTaxon
+  a owl:Class ;
+  rdfs:subClassOf :Taxon ;
+  rdfs:label "dummy taxon" .


出力例

taxid:3702      a       :Taxon .
taxid:3702      rdfs:subClassOf taxid:3701 .
taxid:3702      dcterms:identifier      3702 .
taxid:3702      owl:sameAs      taxddbj:3702 .
taxid:3702      owl:sameAs      taxncbi:3702 .
taxid:3702      owl:sameAs      taxobo0:3702 .
taxid:3702      owl:sameAs      taxobo1:3702 .
taxid:3702      owl:sameAs      taxobo2:3702 .
taxid:3702      rdfs:seeAlso    taxup:3702 .
taxid:3702      :rank   :Species .
taxid:3702      :geneticCode    :GeneticCode1 .
taxid:3702      :geneticCodeMt  :GeneticCode1 .  ###← GeneticCode > 0の時のみ、出力しないように変更
taxid:3702      :geneticCodePt  :GeneticCode11 . ###← 新規
taxid:3702      :formalNameIndicator    true .       ###← 新規、true if scientific name complies with formal name rules for the respective nomenclature code
taxid:3702      rdfs:label      "Arabidopsis thaliana" .
taxid:3702      :scientificName "Arabidopsis thaliana" .
taxid:3702      :authority      "Arabidopsis thaliana (L.) Heynh." .
taxid:3702      :misspelling    "Arabidopsis thaliana (thale cress)" .
taxid:3702      :misspelling    "Arabidopsis_thaliana" .
taxid:3702      :misspelling    "Arbisopsis thaliana" .
taxid:3702      :commonName     "mouse-ear cress" .
taxid:3702      :genbankCommonName      "thale cress" .
taxid:3702      :commonName     "thale-cress" .
/mw/SPARQLthon46/DDBJ」より作成