SPARQLthon61/MicrobeDB.jp-Umakaviewer
提供:TogoWiki
MicorbeDB.jpのSPARQL epに対してumaka-viewer試用した時の作業ログおよびフィードバック
目次 |
作業手順
- SPARQL builder/metadataを実行
- java -jar ../metadata-0.0.1-SNAPSHOT-jar-with-dependencies.jar -gc http://localhost:18895/sparql graph_18895 http://microbedb.jp/jcm graph_18895
- umaka-viewer/umakaparser build_indexを実行
- umakaparser build_index sio.ttl ontology/insdc/nucleotide.ttl MEO/meo.ttl so.ttl --dist umaka_index
- umaka-viewer/umakaparser buildを実行
- umakaparser build graph_18895/turtle_graph_18895_jcm_1.ttl --assets umaka_index --dist umakaviewer_jcm
- https://umaka-viewer.dbcls.jp にサインインして、ファイルアップロード
表示までを確認した https://umaka-viewer.dbcls.jp/v/jumBPgtanD6t64ACe3yBsfesXncqY_-T
実行コマンド
metadataのヘルプ
%java -jar ../metadata-0.0.1-SNAPSHOT-jar-with-dependencies.jar
Version: 20161212-1
Usage: java org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl [options]
[options]
1. to print a list of graphURIs
-g endpointURL
2. to crawl whole data in the endpoint
-ac endpointURL crawlName outputFileName
3. to crawl the specified graph in the endpoint
-gc endpointURL crawlName graphURI outputFileName
umakaparserのヘルプ
%umakaparser Usage: umakaparser [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
build SBMに従うメタデータからモデルデータを作成します。 build_index オントロジーのファイルから、モデルデータ作成のためのassetsを作成します。
作業ログなど
versionが違う
- metadata実行時の表示は、Version: 20161212-1
- umaka-viewerのREADMEには、[Sparql Builder Metadata ver.2015]( http://www.sparqlbuilder.org/doc/sbm_2015sep/ )に従うデータ
→ バグフィックスの更新のため問題ない、山口さんに確認済み
crawlNameがどこで使われているか不明
→出力ファイルのprefixに指定される
outputFileNameはディレクトリの指定、ディレクトリがないと実行後にエラーになる
%java -jar ../metadata-0.0.1-SNAPSHOT-jar-with-dependencies.jar -ac http://localhost:18895/sparql mdb mdb_output.txt
log4j:WARN No appenders could be found for logger (org.apache.jena.riot.stream.JenaIOEnvironment).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
-----------------------------------------------------------
Graph: http://microbedb.jp/assembly 1 / 33
-----------------------------------------------------------
/Users/tf/github/metadata/mdb_20171023/mdb_output.txt/turtle_mdb_assembly_1.ttl
P
properties
http://ddbj.nig.ac.jp/ontologies/nucleotide/dblink
http://www.ncbi.nlm.nih.gov/assembly/refseq_category
http://www.ncbi.nlm.nih.gov/assembly/asm_name
http://www.ncbi.nlm.nih.gov/assembly/assembly_level
http://www.ncbi.nlm.nih.gov/assembly/taxon
.
.
.
java.io.FileNotFoundException: /Users/tf/github/metadata/mdb_20171023/mdb_output.txt/turtle_mdb_assembly_1.ttl (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
at org.sparqlbuilder.metadata.crawler.datastructure.SchemaCategory.write2File(SchemaCategory.java:20)
at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:270)
at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:86)
80747 msec.
Error occured.
Exception in thread "main" java.lang.Exception: Error occured s(341)
at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:286)
at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:86)
- グラフ単位でも
%java -jar ../metadata-0.0.1-SNAPSHOT-jar-with-dependencies.jar -gc http://localhost:18893/sparql graph_18893 http://microbedb.jp/chebi chebi Version: 20161212-1 log4j:WARN No appenders could be found for logger (org.apache.jena.riot.stream.JenaIOEnvironment). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. ----------------------------------------------------------- Graph: http://microbedb.jp/chebi 1 / 1 ----------------------------------------------------------- /Users/tf/github/metadata/mdb_20171023/chebi/turtle_graph_18893_chebi_1.ttl . . . #EndpointAccess: 161 java.io.FileNotFoundException: /Users/tf/github/metadata/mdb_20171023/chebi/turtle_graph_18893_chebi_1.ttl (No such file or directory) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.<init>(FileOutputStream.java:213) at java.io.FileOutputStream.<init>(FileOutputStream.java:162) at org.sparqlbuilder.metadata.crawler.datastructure.SchemaCategory.write2File(SchemaCategory.java:20) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:270) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:103) 130204 msec. Error occured. Exception in thread "main" java.lang.Exception: Error occured s(341) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:286) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:103)
SPARQL ep.全体にかけたときは、fmaなど大きいオントロジーも入っていると時間がかかる
→ fma, taxonomy, so, sioなどを除きグラフ単位で取得に変更 以下、グラフ一覧を取得
%java -jar ../metadata-0.0.1-SNAPSHOT-jar-with-dependencies.jar -g http://localhost:18895/sparql Version: 20161212-1 log4j:WARN No appenders could be found for logger (org.apache.jena.riot.stream.JenaIOEnvironment). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
. . .
RefSeqゲノムが格納されたグラフもエラー
%java -jar ../metadata-0.0.1-SNAPSHOT-jar-with-dependencies.jar -gc http://localhost:18895/sparql graph http://microbedb.jp/refsequence graph Version: 20161212-1 log4j:WARN No appenders could be found for logger (org.apache.jena.riot.stream.JenaIOEnvironment). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. ----------------------------------------------------------- Graph: http://microbedb.jp/refsequence 1 / 1 ----------------------------------------------------------- /Users/tf/github/metadata/mdb_20171023/graph/turtle_graph_refsequence_1.ttl HttpException: 500 at com.hp.hpl.jena.sparql.engine.http.HttpQuery.execGet(HttpQuery.java:340) at com.hp.hpl.jena.sparql.engine.http.HttpQuery.exec(HttpQuery.java:276) at com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngineHTTP.java:345) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.getRDFProperties(RDFsCrawlerImpl.java:2637) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.getRDFProperties(RDFsCrawlerImpl.java:2586) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.determineSchemaCategory(RDFsCrawlerImpl.java:395) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:266) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:103) HttpException: 500 at com.hp.hpl.jena.sparql.engine.http.HttpQuery.execGet(HttpQuery.java:340) at com.hp.hpl.jena.sparql.engine.http.HttpQuery.exec(HttpQuery.java:276) at com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngineHTTP.java:345) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.getRDFProperties(RDFsCrawlerImpl.java:2637) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.getRDFProperties(RDFsCrawlerImpl.java:2586) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.determineSchemaCategory(RDFsCrawlerImpl.java:395) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:266) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:103) 190810 msec. Error occured. Exception in thread "main" java.lang.Exception: Error occured s(341) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:286) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:103)
-gcでグラフ単位での実行時に不明なエラー出力された
ERROR: duplicate PDRD found! (L1144):
→ 同一のプロパティでDomain, Rangeが繰り返して定義されている場合に出力されるエラー、おそらくOWLがインポートされたグラフで出力されていると予想される
taxonomy.ttlのbuild_index時にエラー
%umakaparser build_index ontology/taxonomy/taxonomy.ttl --dist test_index (4, 50300, datetime.timedelta(0, 0, 56585)) (5, 100300, datetime.timedelta(0, 0, 113394)) (6, 150300, datetime.timedelta(0, 0, 181359)) (8, 200300, datetime.timedelta(0, 0, 238139)) . . . (253, 12450300, datetime.timedelta(0, 17, 255336)) (254, 12500300, datetime.timedelta(0, 17, 328418)) Traceback (most recent call last): File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/bin/umakaparser", line 11, in <module> File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 716, in __call__ File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 696, in main File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 1060, in invoke File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 889, in invoke File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 534, in invoke File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/umakaviewer/services.py", line 34, in build_index File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/umakaviewer/scripts/services/assets.py", line 114, in index_owl File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/umakaviewer/scripts/services/assets.py", line 50, in separate_large_owl File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/umakaviewer/scripts/services/assets.py", line 29, in output IOError: [Errno 24] Too many open files: '/Users/tf/github/metadata/mdb_20171023/tmpgIsfwy/tmp1p1CMl'
mccv.ttl のbuild_index時にエラー
%umakaparser build_index ontology/mccv/mccv.ttl --dist test_index
(4, 871, datetime.timedelta(0, 0, 8726))
Traceback (most recent call last):
File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/bin/umakaparser", line 11, in <module>
sys.exit(cmd())
File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 716, in __call__
return self.main(*args, **kwargs)
File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 696, in main
rv = self.invoke(ctx)
File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 1060, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 889, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 534, in invoke
return callback(*args, **kwargs)
File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/umakaviewer/services.py", line 34, in build_index
output = index_owl(owl_data_ttl, target_properties, dist)
File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/umakaviewer/scripts/services/assets.py", line 127, in index_owl
p.map(output_process, ((prefix, temp_file, output_properties, temp_dir) for temp_file in temp_files))
File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-13: ordinal not in range(128)
build_index成功
%umakaparser build_index sio.ttl ontology/insdc/nucleotide.ttl MEO/meo.ttl so.ttl --dist umaka_index (8, 108788, datetime.timedelta(0, 0, 123357)) /Users/tf/github/metadata/mdb_20171023/tmpXois1v/tmpOyoo_N 10.3637402058 >>> /Users/tf/github/metadata/mdb_20171023/umaka_index
build成功
%umakaparser build graph_18895/turtle_graph_18895_jcm_1.ttl --assets umaka_index --dist umakaviewer_jcm
('classes', 12)
('properties', 27)
>>> umakaviewer_jcm
フィードバック
- 現状は、数十〜数百ぐらいのクラス関係図を可視化するのは良さそう
- SPARQL ep. 全体の俯瞰するためには、RefSeqゲノム、Taxonomy.tllなどおそらくトリプル数の多いデータをSPARQL builder/metadata, umakaparser両方が使える必要がる
- オントロジーファイルはウェブからの取得やSPARQL ep.にもインポートされているので、build_indexの入力は、もう少しよきにはからってもらえるとうれしい