SPARQLthon46/DDBJ
提供:TogoWiki
目次 |
DDBJ Annotated sequence RDF
公開に向けて、SPARQLthon45以降の進捗は以下の通り
- FTPサイト公開に向けた調整、手続き、準備 ftp://ftp.ddbj.nig.ac.jp/rdf (予定)
- cron job変換スクリプトをUGEアレイジョブを動的に生成するように再実装
- DDBJ業務用UGEに変更
- gzip出力に変更
- 変換エラー時の通知【Todo】
[w3sw@t347 tmp]$ egrep -v '^(Warning|Features|Error):' ~/ftp/log/ddbj/105.0/ddbj*.error /home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error:/home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:590:in `source_link': undefined method `each' for nil:NilClass (NoMethodError) /home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error: from /home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:580:in `parse_source' /home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error: from /home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:343:in `block in parse_entry' /home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error: from /home/w3sw/local/lib/ruby/site_ruby/1.9.1/bio/io/flatfile.rb:336:in `each_entry' /home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error: from /home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:338:in `parse_entry' /home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error: from /home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:184:in `initialize' /home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error: from /home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:854:in `new' /home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error: from /home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:854:in `<main>'
変換エラーの確認
- 入力データのエントリーに source featureにdb_xrefがひとつも含まれない (db_xref=taxon:が存在しない)エントリーが存在した
- 例)ddbjbct30.seq.gz > CP014351.1
[w3sw@t347 tmp]$ zcat /maid01/services/ftp/data/ftp/database/ddbj/ddbjbct30.seq.gz |egrep -A 50 CP014351.1 VERSION CP014351.1 DBLINK BioProject: PRJNA311246 BioSample: SAMN04481062 KEYWORDS . SOURCE Borrelia hermsii HS1 ORGANISM Borrelia hermsii HS1 Bacteria; Spirochaetes; Spirochaetales; Borreliaceae; Borrelia. REFERENCE 1 (bases 1 to 27881) AUTHORS Barbour,A.G. TITLE Complete genome of the tickborne relapsing fever agent Borrelia hermsii strain HS1 Browne Mountain isolate JOURNAL Unpublished REFERENCE 2 (bases 1 to 27881) AUTHORS Barbour,A.G. TITLE Direct Submission JOURNAL Submitted (16-FEB-2016) Microbiology and Molecular Genetics, University of California Irvine, 3012 Hewitt, Irvine, CA 92697-4028, United States of America COMMENT The circular topology of these plasmids of Borrelia hermsii HS1 was experimentally demonstrated by Stevenson et al. Infect. Immun. 68(7):3900-8, 2000 (PMC101665). Source DNA is available from Alan Barbour, Department of Microbiology and Molecular Genetics, University of California Irvine, Irvine, CA 92697 (abarbour@uci.edu). ##Assembly-Data-START## Assembly Method :: SMRT Analysis HGAP v. 2.1; CLC Assembly Cell v. 8.5 Coverage :: 1000X Sequencing Technology :: Illumina; PacBio ##Assembly-Data-END## FEATURES Location/Qualifiers source 1..27881 /organism="Borrelia hermsii HS1" /mol_type="genomic DNA" /strain="HS1" /isolate="Browne Mountain" /isolation_source="Ornithodoros hermsi" /lab_host="Mus musculus" /plasmid="cp28" /country="USA:WA:Spokane County" /lat_lon="47.6007 N 117.3283 W" /collection_date="1968" /collected_by="Willy Burgdorfer" gene 85..1293 /locus_tag="AXX13_P01
データ生成
- release 104 (239億トリプル、圧縮145GB、非圧縮2.4TB)
- release 105 再生成中
DDBJ taxonomy.owl
- private版tax dumpの変換のための拡張 → pull requestしました
+:geneticCodePt + a owl:ObjectProperty, owl:FunctionalProperty ; + rdfs:label "Plastid genetic code" ; + rdfs:domain :Taxon ; + rdfs:range :GeneticCode .
+:formalNameIndicator + a owl:DatatypeProperty ; + rdfs:label "formal name indicator" ; + rdfs:domain :Taxon ; + rdfs:range xsd:boolean .
- DummyTaxonクラスの定義
- http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=1274375 はunpublished namesをインデックスするためのダミーレコード
+:DummyTaxon + a owl:Class ; + rdfs:subClassOf :Taxon ; + rdfs:label "dummy taxon" .
taxid:3702 a :Taxon . taxid:3702 rdfs:subClassOf taxid:3701 . taxid:3702 dcterms:identifier 3702 . taxid:3702 owl:sameAs taxddbj:3702 . taxid:3702 owl:sameAs taxncbi:3702 . taxid:3702 owl:sameAs taxobo0:3702 . taxid:3702 owl:sameAs taxobo1:3702 . taxid:3702 owl:sameAs taxobo2:3702 . taxid:3702 rdfs:seeAlso taxup:3702 . taxid:3702 :rank :Species . taxid:3702 :geneticCode :GeneticCode1 . taxid:3702 :geneticCodeMt :GeneticCode1 . ###← GeneticCode > 0の時のみ、出力しないように変更 taxid:3702 :geneticCodePt :GeneticCode11 . ###← 新規 taxid:3702 :formalNameIndicator true . ###← 新規、true if scientific name complies with formal name rules for the respective nomenclature code taxid:3702 rdfs:label "Arabidopsis thaliana" . taxid:3702 :scientificName "Arabidopsis thaliana" . taxid:3702 :authority "Arabidopsis thaliana (L.) Heynh." . taxid:3702 :misspelling "Arabidopsis thaliana (thale cress)" . taxid:3702 :misspelling "Arabidopsis_thaliana" . taxid:3702 :misspelling "Arbisopsis thaliana" . taxid:3702 :commonName "mouse-ear cress" . taxid:3702 :genbankCommonName "thale cress" . taxid:3702 :commonName "thale-cress" .