BH12.12/SPARQLthon/TogoAnnotation

提供:TogoWiki

移動: 案内, 検索

作業

  • TogoAnnotationのRDF化: exportスクリプトRDF出力を実装
script/export rdf -o genome
script/export rdf -o gene
script/export rdf -o reference
script/export rdf -o annotation [Todo]
script/export rdf -o bookmark  [Todo]
script/export rdf -o tag  [Todo]
  • 4store@localhostへのデータインストール
4s-backend-setup tga_v1
4s-backend tga_v1
4s-import -v tga_v1 -M http://togo.annotation.jp tga/genome.ttl tga/gene.ttl tga/reference.ttl tga/annotation.ttl 
  • reference/annotation件数を取得するSPARQLテストとRDFデータのバグ取りのイテレーション
4s-query tga_v1 -f text < query/genome-reference_count.rq 
4s-query tga_v1 -f text < query/gene-reference_count.rq 
4s-query tga_v1 -f text < query/pubmed-annotations_count.rq 

SPARQL実例

  • query/genome-reference_count.rq ゲノム毎のreference_countを件数の多い順で取得
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX tga: <http://togo.annotation.jp/sw/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
SELECT 
 * 
WHERE 
{
?genome tga:reference_count ?reference_count.
?genome tga:dsn ?dsn.
?genome rdf:type obo:SO_000034.
} 
ORDER BY DESC(?reference_count)
LIMIT 30

結果

?genome	?reference_count	?dsn
<http://genome.kazusa.or.jp/cyanobase/Synechocystis>	2394	"Synechocystis"
<http://genome.kazusa.or.jp/cyanobase/Anabaena>	905	"Anabaena"
<http://genome.kazusa.or.jp/cyanobase/SYNPCC7942>	775	"SYNPCC7942"
<http://genome.kazusa.or.jp/rhizobase/Bradyrhizobium>	549	"Bradyrhizobium"
<http://genome.kazusa.or.jp/cyanobase/SYNPCC7002>	250	"SYNPCC7002"
<http://genome.kazusa.or.jp/cyanobase/Thermo>	250	"Thermo"
<http://genome.kazusa.or.jp/rhizobase/Sinorhizobium>	240	"Sinorhizobium"
<http://genome.kazusa.or.jp/cyanobase/Chlorobium>	143	"Chlorobium"
<http://genome.kazusa.or.jp/cyanobase/NPUN>	137	"NPUN"
<http://genome.kazusa.or.jp/rhizobase/Mesorhizobium>	115	"Mesorhizobium"
<http://streptomyces.nih.go.jp/gview>	111	"NBRC133050"
<http://genome.kazusa.or.jp/cyanobase/AVA>	109	"AVA"
<http://genome.kazusa.or.jp/rhizobase/NGR234>	107	"NGR234"
<http://genome.kazusa.or.jp/rhizobase/Leguminosarum>	83	"Leguminosarum"
<http://genome.kazusa.or.jp/cyanobase/MED4>	58	"MED4"
<http://genome.kazusa.or.jp/cyanobase/Gloeobacter>	44	"Gloeobacter"
<http://genome.kazusa.or.jp/cyanobase/MIT9313>	40	"MIT9313"
<http://genome.kazusa.or.jp/cyanobase/SS120>	34	"SS120"
<http://genome.kazusa.or.jp/rhizobase/NGR234abc>	8	"NGR234abc"
<http://genome.kazusa.or.jp/cyanobase/NIES39>	7	"NIES39"
<http://genome.kazusa.or.jp/cyanobase/WH8102>	4	"WH8102"
<http://genome.kazusa.or.jp/cyanobase/TERY>	1	"TERY"
<http://genome.kazusa.or.jp/cyanobase/PCC6301>	1	"PCC6301"
<http://genome.kazusa.or.jp/cyanobase/P9303>	0	"P9303"
<http://genome.kazusa.or.jp/rhizobase/FRANEAN1>	0	"FRANEAN1"
<http://genome.kazusa.or.jp/rhizobase/FRANCCI3>	0	"FRANCCI3"
<http://genome.kazusa.or.jp/cyanobase/NATL1>	0	"NATL1"
<http://genome.kazusa.or.jp/cyanobase/P9215>	0	"P9215"
<http://genome.kazusa.or.jp/rhizobase/Etli>	0	"Etli"
<http://genome.kazusa.or.jp/rhizobase/ORS571>	0	"ORS571"
#EOR
  • query/gene-reference_count.rq 遺伝子毎のreference_countを件数の多い順で取得
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX tga: <http://togo.annotation.jp/sw/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
SELECT 
?gene ?annotation_id ?reference_count ?annotation_count ?last_bookmarked_at ?label 
WHERE 
{
?gene tga:reference_count ?reference_count.
?gene tga:annotation_id ?annotation_id.
?gene tga:annotation_count ?annotation_count.
?gene tga:last_bookmarked_at ?last_bookmarked_at.
?gene rdfs:label ?label.
?gene rdf:type obo:SO_0000704.
} 
ORDER BY DESC(?reference_count)
LIMIT 30

結果

?gene	?annotation_id	?reference_count	?annotation_count	?last_bookmarked_at	?label
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/slr1311>	<http://togo.annotation.jp/annotations/86386>	430	2212	1351220054	"psbA2, psba-2, psbA, psbAII"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/sll1867>	<http://togo.annotation.jp/annotations/86385>	362	1587	1340503463	"psbA3, psba-3, psbA, psbAIII"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/slr1181>	<http://togo.annotation.jp/annotations/86384>	343	1430	1340503642	"psbA1, psba-1, psbA, psbAI"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/sll0849>	<http://togo.annotation.jp/annotations/86417>	289	1129	1321521079	"psbD1, psbD, psbD-1, psbDI"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/sll0851>	<http://togo.annotation.jp/annotations/86659>	275	1059	1321521184	"psbC"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/slr0906>	<http://togo.annotation.jp/annotations/85206>	257	1043	1351042495	"psbB"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/slr1834>	<http://togo.annotation.jp/annotations/85191>	250	1097	1321520766	"psaA"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/slr0927>	<http://togo.annotation.jp/annotations/86418>	250	864	1321521272	"psbD2, psbD, psbD-2, psbDII"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/slr1835>	<http://togo.annotation.jp/annotations/85420>	240	1096	1321520918	"psaB"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/slr0737>	<http://togo.annotation.jp/annotations/85197>	163	671	1318718649	"psaD"
<http://genome.kazusa.or.jp/cyanobase/Anabaena/genes/alr4392>	<http://togo.annotation.jp/annotations/83251>	157	732	1350976117	"ntcA, bifA, VF1"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/sll0427>	<http://togo.annotation.jp/annotations/85956>	153	675	1351042996	"psbO"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/ssl0563>	<http://togo.annotation.jp/annotations/85209>	139	551	1319021431	"psaC, psaC2"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/ssr3451>	<http://togo.annotation.jp/annotations/86674>	138	484	1351037253	"psbE"
<http://genome.kazusa.or.jp/cyanobase/Anabaena/genes/alr2339>	<http://togo.annotation.jp/annotations/83361>	136	766	1349940899	"hetR, alr2339"
<http://genome.kazusa.or.jp/cyanobase/SYNPCC7942/genes/Synpcc7942_0424>	<http://togo.annotation.jp/annotations/88868>	133	697	1349152383	"psbAI, psbA, psbA1, psbA-I, ps2B, Synpcc7942_0424"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/ssr2831>	<http://togo.annotation.jp/annotations/85199>	131	575	1318718592	"psaE"
<http://genome.kazusa.or.jp/cyanobase/Anabaena/genes/all1455>	<http://togo.annotation.jp/annotations/84072>	129	417	1348469613	"nifH, all1455"
<http://streptomyces.nih.go.jp/>	<http://togo.annotation.jp/annotations/148044>	126	912	1351178853	"crt"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/smr0006>	<http://togo.annotation.jp/annotations/86686>	121	418	1319710867	"psbF"
<http://genome.kazusa.or.jp/cyanobase/Anabaena/genes/all1454N>	<http://togo.annotation.jp/annotations/84532>	108	390	1348469753	"nifD"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/sll0819>	<http://togo.annotation.jp/annotations/89515>	108	426	1316885994	"psaF"
<http://genome.kazusa.or.jp/cyanobase/Anabaena/genes/all1454C>	<http://togo.annotation.jp/annotations/84533>	107	369	1348469927	"nifD"
<http://genome.kazusa.or.jp/cyanobase/synpcc7942/genes/Synpcc7942_1389>	<http://togo.annotation.jp/annotations/63477>	105	502	1348980402	"psbAII, psbA, psbA2, ps2B"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/sll1577>	<http://togo.annotation.jp/annotations/85186>	103	354	1315317095	"cpcB"
<http://genome.kazusa.or.jp/cyanobase/Anabaena/genes/all1440>	<http://togo.annotation.jp/annotations/65137>	99	256	1347972417	"nifK"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/sll0247>	<http://togo.annotation.jp/annotations/86241>	98	446	1349151744	"isiA, psbC"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/slr2076>	<http://togo.annotation.jp/annotations/90605>	97	334	1337685669	"groEL1, cpn60-1, groEL, groEL-1, slr2076"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/slr0009>	<http://togo.annotation.jp/annotations/86407>	96	338	1325674864	"rbcL, slr0009"
<http://genome.kazusa.or.jp/cyanobase/Synechocystis/genes/sll1578>	<http://togo.annotation.jp/annotations/85185>	96	350	1325630872	"cpcA, sll1578"
#EOR
  • query/pubmed-annotations_count.rq 文献毎のannotation_countを件数の多い順で取得
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX tga: <http://togo.annotation.jp/sw/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
SELECT 
?journal ?tag_id ?annotations_count ?journal_title
WHERE 
{
?journal tga:annotations_count ?annotations_count.
?journal tga:tag_id ?tag_id.
?journal rdf:type tga:JournalPublication.
?journal tga:title ?journal_title
} 
ORDER BY DESC(?annotations_count)
LIMIT 30

結果

?journal	?tag_id	?annotations_count	?journal_title
<http://pubmed.org/18000013>	<http://togo.annotation.jp/tags/24309>	10374	"A large-scale protein protein interaction analysis in Synechocystis sp. PCC6803."
<http://pubmed.org/12597279>	<http://togo.annotation.jp/tags/203400>	8465	"Complete genomic sequence of nitrogen-fixing symbiotic bacterium Bradyrhizobium japonicum USDA110 (supplement)."
<http://pubmed.org/18192278>	<http://togo.annotation.jp/tags/24316>	7984	"A large scale analysis of protein-protein interactions in the nitrogen-fixing bacterium Mesorhizobium loti."
<http://pubmed.org/14621296>	<http://togo.annotation.jp/tags/209836>	4687	"Complete genome structure of Gloeobacter violaceus PCC 7421, a cyanobacterium that lacks thylakoids (supplement)."
<http://pubmed.org/12240836>	<http://togo.annotation.jp/tags/205290>	2728	"Complete genome structure of the thermophilic cyanobacterium Thermosynechococcus elongatus BP-1 (supplement)."
<http://pubmed.org/14612435>	<http://togo.annotation.jp/tags/29654>	1333	"Alterations in global patterns of gene expression in Synechocystis sp. PCC 6803 in response to inorganic carbon limitation and the inactivation of ndhR, a LysR family regulator."
<http://pubmed.org/12446635>	<http://togo.annotation.jp/tags/18871>	1100	"Global gene expression profiles of the cyanobacterium Synechocystis sp. strain PCC 6803 in response to irradiation with UV-B and white light."
<http://pubmed.org/9163424>	<http://togo.annotation.jp/tags/203944>	1085	"Molecular basis of symbiosis between Rhizobium and legumes."
<http://pubmed.org/14702322>	<http://togo.annotation.jp/tags/30019>	1060	"An evolutionary hot spot: the pNGR234b replicon of Rhizobium sp. strain NGR234."
<http://pubmed.org/19216801>	<http://togo.annotation.jp/tags/204622>	958	"The time course of the transcriptomic response of Sinorhizobium meliloti 1021 following a shift to acidic pH."
<http://pubmed.org/11283337>	<http://togo.annotation.jp/tags/17750>	852	"DNA microarray analysis of cyanobacterial gene expression during acclimation to high light."
<http://pubmed.org/18511436>	<http://togo.annotation.jp/tags/203244>	726	"Soybean seed extracts preferentially express genomic loci of Bradyrhizobium japonicum in the initial interaction with soybean, Glycine max (L.) Merr."
<http://pubmed.org/15289483>	<http://togo.annotation.jp/tags/29838>	701	"Elucidation of gene interaction networks through time-lagged correlation analysis of transcriptional data."
<http://pubmed.org/11858227>	<http://togo.annotation.jp/tags/32628>	694	"Characterization of genes encoding multi-domain proteins in the genome of the filamentous nitrogen-fixing Cyanobacterium anabaena sp. strain PCC 7120."
<http://pubmed.org/17164256>	<http://togo.annotation.jp/tags/34005>	666	"Genome-wide analysis of ATP-binding cassette (ABC) proteins in a model legume plant, Lotus japonicus: comparison with Arabidopsis ABC protein family."
<http://pubmed.org/17600135>	<http://togo.annotation.jp/tags/18664>	654	"Long-term response toward inorganic carbon limitation in wild type and glycolate turnover mutants of the cyanobacterium Synechocystis sp. strain PCC 6803."
<http://pubmed.org/20203057>	<http://togo.annotation.jp/tags/208125>	647	"Genomic structure of an economically important cyanobacterium, Arthrospira (Spirulina) platensis NIES-39."
<http://pubmed.org/12886952>	<http://togo.annotation.jp/tags/32575>	598	"Genome-wide expression analysis of the responses to nitrogen deprivation in the heterocyst-forming cyanobacterium Anabaena sp. strain PCC 7120."
<http://pubmed.org/12480098>	<http://togo.annotation.jp/tags/18903>	595	"Genomic analysis of protein kinases, protein phosphatases and two-component regulatory systems of the cyanobacterium Anabaena sp. strain PCC 7120."
<http://pubmed.org/15000396>	<http://togo.annotation.jp/tags/34187>	590	"Global changes in gene expression in Sinorhizobium meliloti 1021 under microoxic and symbiotic conditions."
<http://pubmed.org/12913140>	<http://togo.annotation.jp/tags/18975>	557	"Microarray analysis of the genome-wide response to iron deficiency and iron reconstitution in the cyanobacterium Synechocystis sp. PCC 6803."
<http://pubmed.org/15141946>	<http://togo.annotation.jp/tags/30107>	551	"Genome-wide comparison of the His-to-Asp phosphorelay signaling components of three symbiotic genera of Rhizobia."
<http://pubmed.org/17337580>	<http://togo.annotation.jp/tags/210988>	520	"A-factor and phosphate depletion signals are transmitted to the grixazone biosynthesis genes via the pathway-specific transcriptional activator GriR."
<http://pubmed.org/15221452>	<http://togo.annotation.jp/tags/34120>	507	"Global transcriptional analysis of the phosphate starvation response in Sinorhizobium meliloti strains 1021 and 2011."
<http://pubmed.org/9298645>	<http://togo.annotation.jp/tags/208039>	499	"Towards a proteome project of cyanobacterium Synechocystis sp. strain PCC6803: linking 130 protein spots with their respective genes."
<http://pubmed.org/16622784>	<http://togo.annotation.jp/tags/203775>	482	"High throughput two-dimensional blue-native electrophoresis: a tool for functional proteomics of cytoplasmatic protein complexes from Chlorobium tepidum."
<http://pubmed.org/8581740>	<http://togo.annotation.jp/tags/205899>	466	"Assignment of 82 known genes and gene clusters on the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803."
<http://pubmed.org/12795377>	<http://togo.annotation.jp/tags/67269>	454	"A global analysis of protein expression profiles in Sinorhizobium meliloti: discovery of new genes for nodule occupancy and stress adaptation."
<http://pubmed.org/16159767>	<http://togo.annotation.jp/tags/211441>	454	"Three chymotrypsin genes are members of the AdpA regulon in the A-factor regulatory cascade in Streptomyces griseus."
<http://pubmed.org/14686584>	<http://togo.annotation.jp/tags/19082>	449	"Structural analysis of four large plasmids harboring in a unicellular cyanobacterium, Synechocystis sp. PCC 6803."
#EOR

Todo

  • RDF作成【済】
    • script/exportでannotation, bookmark, tagのRDF出力 【済】
    • 親タグ表現→同種タグセットの取得するクエリーは利用頻度が高そうなのでrdf:type で表現しておく【済】
  • virtuosoにデータインポート【済】
  • RDF更新
    • pubmed URIをhttp://identifiers.org/pubmed/ に置き換える Genome-RDF, 菌株reference情報表現とそろえる【SPARQLthon2】
    • voidでメタデータを記述する
  • SPARQL