SPARQLthon46/DDBJ
提供:TogoWiki
目次 |
DDBJ Annotated sequence RDF
公開に向けて、SPARQLthon45以降の進捗は以下の通り
- FTPサイト公開に向けた調整、手続き、準備 ftp://ftp.ddbj.nig.ac.jp/rdf (予定)
- cron job変換スクリプトをUGEアレイジョブを動的に生成するように再実装
- DDBJ業務用UGEに変更
- gzip出力に変更
- 変換エラー時の通知【Todo】
[w3sw@t347 tmp]$ egrep -v '^(Warning|Features|Error):' ~/ftp/log/ddbj/105.0/ddbj*.error /home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error:/home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:590:in `source_link': undefined method `each' for nil:NilClass (NoMethodError) /home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error: from /home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:580:in `parse_source' /home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error: from /home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:343:in `block in parse_entry' /home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error: from /home/w3sw/local/lib/ruby/site_ruby/1.9.1/bio/io/flatfile.rb:336:in `each_entry' /home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error: from /home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:338:in `parse_entry' /home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error: from /home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:184:in `initialize' /home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error: from /home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:854:in `new' /home/w3sw/ftp/log/ddbj/105.0/ddbjbct30.error: from /home/w3sw/tf/rdfsummit/insdc2ttl/insdc2ttl.rb:854:in `<main>'
変換エラーの確認
- 入力データのエントリーに source featureにdb_xrefがひとつも含まれない (db_xref=taxon:が存在しない)エントリーが存在した
- 例)ddbjbct30.seq.gz > CP014351.1
[w3sw@t347 tmp]$ zcat /maid01/services/ftp/data/ftp/database/ddbj/ddbjbct30.seq.gz |egrep -A 50 CP014351.1
VERSION CP014351.1
DBLINK BioProject: PRJNA311246
BioSample: SAMN04481062
KEYWORDS .
SOURCE Borrelia hermsii HS1
ORGANISM Borrelia hermsii HS1
Bacteria; Spirochaetes; Spirochaetales; Borreliaceae; Borrelia.
REFERENCE 1 (bases 1 to 27881)
AUTHORS Barbour,A.G.
TITLE Complete genome of the tickborne relapsing fever agent Borrelia
hermsii strain HS1 Browne Mountain isolate
JOURNAL Unpublished
REFERENCE 2 (bases 1 to 27881)
AUTHORS Barbour,A.G.
TITLE Direct Submission
JOURNAL Submitted (16-FEB-2016) Microbiology and Molecular Genetics,
University of California Irvine, 3012 Hewitt, Irvine, CA
92697-4028, United States of America
COMMENT The circular topology of these plasmids of Borrelia hermsii HS1 was
experimentally demonstrated by Stevenson et al. Infect. Immun.
68(7):3900-8, 2000 (PMC101665). Source DNA is available from Alan
Barbour, Department of Microbiology and Molecular Genetics,
University of California Irvine, Irvine, CA 92697
(abarbour@uci.edu).
##Assembly-Data-START##
Assembly Method :: SMRT Analysis HGAP v. 2.1; CLC Assembly
Cell v. 8.5
Coverage :: 1000X
Sequencing Technology :: Illumina; PacBio
##Assembly-Data-END##
FEATURES Location/Qualifiers
source 1..27881
/organism="Borrelia hermsii HS1"
/mol_type="genomic DNA"
/strain="HS1"
/isolate="Browne Mountain"
/isolation_source="Ornithodoros hermsi"
/lab_host="Mus musculus"
/plasmid="cp28"
/country="USA:WA:Spokane County"
/lat_lon="47.6007 N 117.3283 W"
/collection_date="1968"
/collected_by="Willy Burgdorfer"
gene 85..1293
/locus_tag="AXX13_P01
データ生成
- release 104 (239億トリプル、圧縮145GB、非圧縮2.4TB)
- release 105 再生成中
DDBJ taxonomy.owl
- private版tax dumpの変換のための拡張 → pull requestしました
+:geneticCodePt + a owl:ObjectProperty, owl:FunctionalProperty ; + rdfs:label "Plastid genetic code" ; + rdfs:domain :Taxon ; + rdfs:range :GeneticCode .
+:formalNameIndicator + a owl:DatatypeProperty ; + rdfs:label "formal name indicator" ; + rdfs:domain :Taxon ; + rdfs:range xsd:boolean .
- DummyTaxonクラスの定義
- http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=1274375 はunpublished namesをインデックスするためのダミーレコード
+:DummyTaxon + a owl:Class ; + rdfs:subClassOf :Taxon ; + rdfs:label "dummy taxon" .
出力例
taxid:3702 a :Taxon . taxid:3702 rdfs:subClassOf taxid:3701 . taxid:3702 dcterms:identifier 3702 . taxid:3702 owl:sameAs taxddbj:3702 . taxid:3702 owl:sameAs taxncbi:3702 . taxid:3702 owl:sameAs taxobo0:3702 . taxid:3702 owl:sameAs taxobo1:3702 . taxid:3702 owl:sameAs taxobo2:3702 . taxid:3702 rdfs:seeAlso taxup:3702 . taxid:3702 :rank :Species . taxid:3702 :geneticCode :GeneticCode1 . taxid:3702 :geneticCodeMt :GeneticCode1 . ###← GeneticCode > 0の時のみ、出力しないように変更 taxid:3702 :geneticCodePt :GeneticCode11 . ###← 新規 taxid:3702 :formalNameIndicator true . ###← 新規、true if scientific name complies with formal name rules for the respective nomenclature code taxid:3702 rdfs:label "Arabidopsis thaliana" . taxid:3702 :scientificName "Arabidopsis thaliana" . taxid:3702 :authority "Arabidopsis thaliana (L.) Heynh." . taxid:3702 :misspelling "Arabidopsis thaliana (thale cress)" . taxid:3702 :misspelling "Arabidopsis_thaliana" . taxid:3702 :misspelling "Arbisopsis thaliana" . taxid:3702 :commonName "mouse-ear cress" . taxid:3702 :genbankCommonName "thale cress" . taxid:3702 :commonName "thale-cress" .