BH12.12/SPARQLthon/TogoAnnotation
提供:TogoWiki
作業
- TogoAnnotationのRDF化: exportスクリプトRDF出力を実装
script/export rdf -o genome script/export rdf -o gene script/export rdf -o reference script/export rdf -o annotation [Todo] script/export rdf -o bookmark [Todo] script/export rdf -o tag [Todo]
- 4store@localhostへのデータインストール
4s-backend-setup tga_v1 4s-backend tga_v1 4s-import -v tga_v1 -M http://togo.annotation.jp tga/genome.ttl tga/gene.ttl tga/reference.ttl tga/annotation.ttl
- reference/annotation件数を取得するSPARQLテストとRDFデータのバグ取りのイテレーション
4s-query tga_v1 -f text < query/genome-reference_count.rq 4s-query tga_v1 -f text < query/gene-reference_count.rq 4s-query tga_v1 -f text < query/pubmed-annotations_count.rq
SPARQL実例
- query/genome-reference_count.rq ゲノム毎のreference_countを件数の多い順で取得
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX tga: <http://togo.annotation.jp/sw/> PREFIX obo: <http://purl.obolibrary.org/obo/> SELECT * WHERE { ?genome tga:reference_count ?reference_count. ?genome tga:dsn ?dsn. ?genome rdf:type obo:SO_000034. } ORDER BY DESC(?reference_count) LIMIT 30
結果
?genome ?reference_count ?dsn <http://genome.kazusa.or.jp/cyanobase/Synechocystis> 2394 "Synechocystis" <http://genome.kazusa.or.jp/cyanobase/Anabaena> 905 "Anabaena" <http://genome.kazusa.or.jp/cyanobase/SYNPCC7942> 775 "SYNPCC7942" <http://genome.kazusa.or.jp/rhizobase/Bradyrhizobium> 549 "Bradyrhizobium" <http://genome.kazusa.or.jp/cyanobase/SYNPCC7002> 250 "SYNPCC7002" <http://genome.kazusa.or.jp/cyanobase/Thermo> 250 "Thermo" <http://genome.kazusa.or.jp/rhizobase/Sinorhizobium> 240 "Sinorhizobium" <http://genome.kazusa.or.jp/cyanobase/Chlorobium> 143 "Chlorobium" <http://genome.kazusa.or.jp/cyanobase/NPUN> 137 "NPUN" <http://genome.kazusa.or.jp/rhizobase/Mesorhizobium> 115 "Mesorhizobium" <http://streptomyces.nih.go.jp/gview> 111 "NBRC133050" <http://genome.kazusa.or.jp/cyanobase/AVA> 109 "AVA" <http://genome.kazusa.or.jp/rhizobase/NGR234> 107 "NGR234" <http://genome.kazusa.or.jp/rhizobase/Leguminosarum> 83 "Leguminosarum" <http://genome.kazusa.or.jp/cyanobase/MED4> 58 "MED4" <http://genome.kazusa.or.jp/cyanobase/Gloeobacter> 44 "Gloeobacter" <http://genome.kazusa.or.jp/cyanobase/MIT9313> 40 "MIT9313" <http://genome.kazusa.or.jp/cyanobase/SS120> 34 "SS120" <http://genome.kazusa.or.jp/rhizobase/NGR234abc> 8 "NGR234abc" <http://genome.kazusa.or.jp/cyanobase/NIES39> 7 "NIES39" <http://genome.kazusa.or.jp/cyanobase/WH8102> 4 "WH8102" <http://genome.kazusa.or.jp/cyanobase/TERY> 1 "TERY" <http://genome.kazusa.or.jp/cyanobase/PCC6301> 1 "PCC6301" <http://genome.kazusa.or.jp/cyanobase/P9303> 0 "P9303" <http://genome.kazusa.or.jp/rhizobase/FRANEAN1> 0 "FRANEAN1" <http://genome.kazusa.or.jp/rhizobase/FRANCCI3> 0 "FRANCCI3" <http://genome.kazusa.or.jp/cyanobase/NATL1> 0 "NATL1" <http://genome.kazusa.or.jp/cyanobase/P9215> 0 "P9215" <http://genome.kazusa.or.jp/rhizobase/Etli> 0 "Etli" <http://genome.kazusa.or.jp/rhizobase/ORS571> 0 "ORS571" #EOR
- query/gene-reference_count.rq 遺伝子毎のreference_countを件数の多い順で取得
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX tga: <http://togo.annotation.jp/sw/> PREFIX obo: <http://purl.obolibrary.org/obo/> SELECT ?gene ?annotation_id ?reference_count ?annotation_count ?last_bookmarked_at ?label WHERE { ?gene tga:reference_count ?reference_count. ?gene tga:annotation_id ?annotation_id. ?gene tga:annotation_count ?annotation_count. ?gene tga:last_bookmarked_at ?last_bookmarked_at. ?gene rdfs:label ?label. ?gene rdf:type obo:SO_0000704. } ORDER BY DESC(?reference_count) LIMIT 30
結果
?gene ?annotation_id ?reference_count ?annotation_count ?last_bookmarked_at ?label <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/slr1311> <http://togo.annotation.jp/annotations/86386> 430 2212 1351220054 "psbA2, psba-2, psbA, psbAII" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/sll1867> <http://togo.annotation.jp/annotations/86385> 362 1587 1340503463 "psbA3, psba-3, psbA, psbAIII" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/slr1181> <http://togo.annotation.jp/annotations/86384> 343 1430 1340503642 "psbA1, psba-1, psbA, psbAI" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/sll0849> <http://togo.annotation.jp/annotations/86417> 289 1129 1321521079 "psbD1, psbD, psbD-1, psbDI" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/sll0851> <http://togo.annotation.jp/annotations/86659> 275 1059 1321521184 "psbC" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/slr0906> <http://togo.annotation.jp/annotations/85206> 257 1043 1351042495 "psbB" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/slr1834> <http://togo.annotation.jp/annotations/85191> 250 1097 1321520766 "psaA" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/slr0927> <http://togo.annotation.jp/annotations/86418> 250 864 1321521272 "psbD2, psbD, psbD-2, psbDII" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/slr1835> <http://togo.annotation.jp/annotations/85420> 240 1096 1321520918 "psaB" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/slr0737> <http://togo.annotation.jp/annotations/85197> 163 671 1318718649 "psaD" <http://genome.kazusa.or.jp/cyanobase/Anabaena/genes/alr4392> <http://togo.annotation.jp/annotations/83251> 157 732 1350976117 "ntcA, bifA, VF1" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/sll0427> <http://togo.annotation.jp/annotations/85956> 153 675 1351042996 "psbO" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/ssl0563> <http://togo.annotation.jp/annotations/85209> 139 551 1319021431 "psaC, psaC2" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/ssr3451> <http://togo.annotation.jp/annotations/86674> 138 484 1351037253 "psbE" <http://genome.kazusa.or.jp/cyanobase/Anabaena/genes/alr2339> <http://togo.annotation.jp/annotations/83361> 136 766 1349940899 "hetR, alr2339" <http://genome.kazusa.or.jp/cyanobase/SYNPCC7942/genes/Synpcc7942_0424> <http://togo.annotation.jp/annotations/88868> 133 697 1349152383 "psbAI, psbA, psbA1, psbA-I, ps2B, Synpcc7942_0424" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/ssr2831> <http://togo.annotation.jp/annotations/85199> 131 575 1318718592 "psaE" <http://genome.kazusa.or.jp/cyanobase/Anabaena/genes/all1455> <http://togo.annotation.jp/annotations/84072> 129 417 1348469613 "nifH, all1455" <http://streptomyces.nih.go.jp/> <http://togo.annotation.jp/annotations/148044> 126 912 1351178853 "crt" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/smr0006> <http://togo.annotation.jp/annotations/86686> 121 418 1319710867 "psbF" <http://genome.kazusa.or.jp/cyanobase/Anabaena/genes/all1454N> <http://togo.annotation.jp/annotations/84532> 108 390 1348469753 "nifD" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/sll0819> <http://togo.annotation.jp/annotations/89515> 108 426 1316885994 "psaF" <http://genome.kazusa.or.jp/cyanobase/Anabaena/genes/all1454C> <http://togo.annotation.jp/annotations/84533> 107 369 1348469927 "nifD" <http://genome.kazusa.or.jp/cyanobase/synpcc7942/genes/Synpcc7942_1389> <http://togo.annotation.jp/annotations/63477> 105 502 1348980402 "psbAII, psbA, psbA2, ps2B" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/sll1577> <http://togo.annotation.jp/annotations/85186> 103 354 1315317095 "cpcB" <http://genome.kazusa.or.jp/cyanobase/Anabaena/genes/all1440> <http://togo.annotation.jp/annotations/65137> 99 256 1347972417 "nifK" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/sll0247> <http://togo.annotation.jp/annotations/86241> 98 446 1349151744 "isiA, psbC" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/slr2076> <http://togo.annotation.jp/annotations/90605> 97 334 1337685669 "groEL1, cpn60-1, groEL, groEL-1, slr2076" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/slr0009> <http://togo.annotation.jp/annotations/86407> 96 338 1325674864 "rbcL, slr0009" <http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/sll1578> <http://togo.annotation.jp/annotations/85185> 96 350 1325630872 "cpcA, sll1578" #EOR
- query/pubmed-annotations_count.rq 文献毎のannotation_countを件数の多い順で取得
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX tga: <http://togo.annotation.jp/sw/> PREFIX obo: <http://purl.obolibrary.org/obo/> SELECT ?journal ?tag_id ?annotations_count ?journal_title WHERE { ?journal tga:annotations_count ?annotations_count. ?journal tga:tag_id ?tag_id. ?journal rdf:type tga:JournalPublication. ?journal tga:title ?journal_title } ORDER BY DESC(?annotations_count) LIMIT 30
結果
?journal ?tag_id ?annotations_count ?journal_title <http://pubmed.org/18000013> <http://togo.annotation.jp/tags/24309> 10374 "A large-scale protein protein interaction analysis in Synechocystis sp. PCC6803." <http://pubmed.org/12597279> <http://togo.annotation.jp/tags/203400> 8465 "Complete genomic sequence of nitrogen-fixing symbiotic bacterium Bradyrhizobium japonicum USDA110 (supplement)." <http://pubmed.org/18192278> <http://togo.annotation.jp/tags/24316> 7984 "A large scale analysis of protein-protein interactions in the nitrogen-fixing bacterium Mesorhizobium loti." <http://pubmed.org/14621296> <http://togo.annotation.jp/tags/209836> 4687 "Complete genome structure of Gloeobacter violaceus PCC 7421, a cyanobacterium that lacks thylakoids (supplement)." <http://pubmed.org/12240836> <http://togo.annotation.jp/tags/205290> 2728 "Complete genome structure of the thermophilic cyanobacterium Thermosynechococcus elongatus BP-1 (supplement)." <http://pubmed.org/14612435> <http://togo.annotation.jp/tags/29654> 1333 "Alterations in global patterns of gene expression in Synechocystis sp. PCC 6803 in response to inorganic carbon limitation and the inactivation of ndhR, a LysR family regulator." <http://pubmed.org/12446635> <http://togo.annotation.jp/tags/18871> 1100 "Global gene expression profiles of the cyanobacterium Synechocystis sp. strain PCC 6803 in response to irradiation with UV-B and white light." <http://pubmed.org/9163424> <http://togo.annotation.jp/tags/203944> 1085 "Molecular basis of symbiosis between Rhizobium and legumes." <http://pubmed.org/14702322> <http://togo.annotation.jp/tags/30019> 1060 "An evolutionary hot spot: the pNGR234b replicon of Rhizobium sp. strain NGR234." <http://pubmed.org/19216801> <http://togo.annotation.jp/tags/204622> 958 "The time course of the transcriptomic response of Sinorhizobium meliloti 1021 following a shift to acidic pH." <http://pubmed.org/11283337> <http://togo.annotation.jp/tags/17750> 852 "DNA microarray analysis of cyanobacterial gene expression during acclimation to high light." <http://pubmed.org/18511436> <http://togo.annotation.jp/tags/203244> 726 "Soybean seed extracts preferentially express genomic loci of Bradyrhizobium japonicum in the initial interaction with soybean, Glycine max (L.) Merr." <http://pubmed.org/15289483> <http://togo.annotation.jp/tags/29838> 701 "Elucidation of gene interaction networks through time-lagged correlation analysis of transcriptional data." <http://pubmed.org/11858227> <http://togo.annotation.jp/tags/32628> 694 "Characterization of genes encoding multi-domain proteins in the genome of the filamentous nitrogen-fixing Cyanobacterium anabaena sp. strain PCC 7120." <http://pubmed.org/17164256> <http://togo.annotation.jp/tags/34005> 666 "Genome-wide analysis of ATP-binding cassette (ABC) proteins in a model legume plant, Lotus japonicus: comparison with Arabidopsis ABC protein family." <http://pubmed.org/17600135> <http://togo.annotation.jp/tags/18664> 654 "Long-term response toward inorganic carbon limitation in wild type and glycolate turnover mutants of the cyanobacterium Synechocystis sp. strain PCC 6803." <http://pubmed.org/20203057> <http://togo.annotation.jp/tags/208125> 647 "Genomic structure of an economically important cyanobacterium, Arthrospira (Spirulina) platensis NIES-39." <http://pubmed.org/12886952> <http://togo.annotation.jp/tags/32575> 598 "Genome-wide expression analysis of the responses to nitrogen deprivation in the heterocyst-forming cyanobacterium Anabaena sp. strain PCC 7120." <http://pubmed.org/12480098> <http://togo.annotation.jp/tags/18903> 595 "Genomic analysis of protein kinases, protein phosphatases and two-component regulatory systems of the cyanobacterium Anabaena sp. strain PCC 7120." <http://pubmed.org/15000396> <http://togo.annotation.jp/tags/34187> 590 "Global changes in gene expression in Sinorhizobium meliloti 1021 under microoxic and symbiotic conditions." <http://pubmed.org/12913140> <http://togo.annotation.jp/tags/18975> 557 "Microarray analysis of the genome-wide response to iron deficiency and iron reconstitution in the cyanobacterium Synechocystis sp. PCC 6803." <http://pubmed.org/15141946> <http://togo.annotation.jp/tags/30107> 551 "Genome-wide comparison of the His-to-Asp phosphorelay signaling components of three symbiotic genera of Rhizobia." <http://pubmed.org/17337580> <http://togo.annotation.jp/tags/210988> 520 "A-factor and phosphate depletion signals are transmitted to the grixazone biosynthesis genes via the pathway-specific transcriptional activator GriR." <http://pubmed.org/15221452> <http://togo.annotation.jp/tags/34120> 507 "Global transcriptional analysis of the phosphate starvation response in Sinorhizobium meliloti strains 1021 and 2011." <http://pubmed.org/9298645> <http://togo.annotation.jp/tags/208039> 499 "Towards a proteome project of cyanobacterium Synechocystis sp. strain PCC6803: linking 130 protein spots with their respective genes." <http://pubmed.org/16622784> <http://togo.annotation.jp/tags/203775> 482 "High throughput two-dimensional blue-native electrophoresis: a tool for functional proteomics of cytoplasmatic protein complexes from Chlorobium tepidum." <http://pubmed.org/8581740> <http://togo.annotation.jp/tags/205899> 466 "Assignment of 82 known genes and gene clusters on the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803." <http://pubmed.org/12795377> <http://togo.annotation.jp/tags/67269> 454 "A global analysis of protein expression profiles in Sinorhizobium meliloti: discovery of new genes for nodule occupancy and stress adaptation." <http://pubmed.org/16159767> <http://togo.annotation.jp/tags/211441> 454 "Three chymotrypsin genes are members of the AdpA regulon in the A-factor regulatory cascade in Streptomyces griseus." <http://pubmed.org/14686584> <http://togo.annotation.jp/tags/19082> 449 "Structural analysis of four large plasmids harboring in a unicellular cyanobacterium, Synechocystis sp. PCC 6803." #EOR
Todo
- RDF作成【済】
- script/exportでannotation, bookmark, tagのRDF出力 【済】
- 親タグ表現→同種タグセットの取得するクエリーは利用頻度が高そうなのでrdf:type で表現しておく【済】
- virtuosoにデータインポート【済】
- RDF更新
- pubmed URIをhttp://identifiers.org/pubmed/ に置き換える Genome-RDF, 菌株reference情報表現とそろえる【SPARQLthon2】
- voidでメタデータを記述する
- SPARQL
- CyanoBaseとのjoin
- MicrobeDB.jp関連DBとのjoin
- 文献情報とのjoin (journal title, authorなど)
- dbpediaとのjoin 例)http://live.dbpedia.org/page/DNA_polymerase