SPARQLthon61/MicrobeDB.jp-Umakaviewer
提供:TogoWiki
MicorbeDB.jpのSPARQL epに対してumaka-viewer試用した時の作業ログおよびフィードバック
目次 |
作業手順
- SPARQL builder/metadataを実行
- java -jar ../metadata-0.0.1-SNAPSHOT-jar-with-dependencies.jar -gc http://localhost:18895/sparql graph_18895 http://microbedb.jp/jcm graph_18895
- umaka-viewer/umakaparser build_indexを実行
- umakaparser build_index sio.ttl ontology/insdc/nucleotide.ttl MEO/meo.ttl so.ttl --dist umaka_index
- umaka-viewer/umakaparser buildを実行
- umakaparser build graph_18895/turtle_graph_18895_jcm_1.ttl --assets umaka_index --dist umakaviewer_jcm
- https://umaka-viewer.dbcls.jp にサインインして、ファイルアップロード
表示までを確認した https://umaka-viewer.dbcls.jp/v/jumBPgtanD6t64ACe3yBsfesXncqY_-T
実行コマンド
metadataのヘルプ
%java -jar ../metadata-0.0.1-SNAPSHOT-jar-with-dependencies.jar Version: 20161212-1 Usage: java org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl [options] [options] 1. to print a list of graphURIs -g endpointURL 2. to crawl whole data in the endpoint -ac endpointURL crawlName outputFileName 3. to crawl the specified graph in the endpoint -gc endpointURL crawlName graphURI outputFileName
umakaparserのヘルプ
%umakaparser Usage: umakaparser [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
build SBMに従うメタデータからモデルデータを作成します。 build_index オントロジーのファイルから、モデルデータ作成のためのassetsを作成します。
作業ログなど
versionが違う
- metadata実行時の表示は、Version: 20161212-1
- umaka-viewerのREADMEには、[Sparql Builder Metadata ver.2015]( http://www.sparqlbuilder.org/doc/sbm_2015sep/ )に従うデータ
→ バグフィックスの更新のため問題ない、山口さんに確認済み
crawlNameがどこで使われているか不明
→出力ファイルのprefixに指定される
outputFileNameはディレクトリの指定、ディレクトリがないと実行後にエラーになる
%java -jar ../metadata-0.0.1-SNAPSHOT-jar-with-dependencies.jar -ac http://localhost:18895/sparql mdb mdb_output.txt log4j:WARN No appenders could be found for logger (org.apache.jena.riot.stream.JenaIOEnvironment). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. ----------------------------------------------------------- Graph: http://microbedb.jp/assembly 1 / 33 ----------------------------------------------------------- /Users/tf/github/metadata/mdb_20171023/mdb_output.txt/turtle_mdb_assembly_1.ttl P properties http://ddbj.nig.ac.jp/ontologies/nucleotide/dblink http://www.ncbi.nlm.nih.gov/assembly/refseq_category http://www.ncbi.nlm.nih.gov/assembly/asm_name http://www.ncbi.nlm.nih.gov/assembly/assembly_level http://www.ncbi.nlm.nih.gov/assembly/taxon . . . java.io.FileNotFoundException: /Users/tf/github/metadata/mdb_20171023/mdb_output.txt/turtle_mdb_assembly_1.ttl (No such file or directory) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.<init>(FileOutputStream.java:213) at java.io.FileOutputStream.<init>(FileOutputStream.java:162) at org.sparqlbuilder.metadata.crawler.datastructure.SchemaCategory.write2File(SchemaCategory.java:20) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:270) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:86) 80747 msec. Error occured. Exception in thread "main" java.lang.Exception: Error occured s(341) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:286) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:86)
- グラフ単位でも
%java -jar ../metadata-0.0.1-SNAPSHOT-jar-with-dependencies.jar -gc http://localhost:18893/sparql graph_18893 http://microbedb.jp/chebi chebi Version: 20161212-1 log4j:WARN No appenders could be found for logger (org.apache.jena.riot.stream.JenaIOEnvironment). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. ----------------------------------------------------------- Graph: http://microbedb.jp/chebi 1 / 1 ----------------------------------------------------------- /Users/tf/github/metadata/mdb_20171023/chebi/turtle_graph_18893_chebi_1.ttl . . . #EndpointAccess: 161 java.io.FileNotFoundException: /Users/tf/github/metadata/mdb_20171023/chebi/turtle_graph_18893_chebi_1.ttl (No such file or directory) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.<init>(FileOutputStream.java:213) at java.io.FileOutputStream.<init>(FileOutputStream.java:162) at org.sparqlbuilder.metadata.crawler.datastructure.SchemaCategory.write2File(SchemaCategory.java:20) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:270) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:103) 130204 msec. Error occured. Exception in thread "main" java.lang.Exception: Error occured s(341) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:286) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:103)
SPARQL ep.全体にかけたときは、fmaなど大きいオントロジーも入っていると時間がかかる
→ fma, taxonomy, so, sioなどを除きグラフ単位で取得に変更 以下、グラフ一覧を取得
%java -jar ../metadata-0.0.1-SNAPSHOT-jar-with-dependencies.jar -g http://localhost:18895/sparql Version: 20161212-1 log4j:WARN No appenders could be found for logger (org.apache.jena.riot.stream.JenaIOEnvironment). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
. . .
RefSeqゲノムが格納されたグラフもエラー
%java -jar ../metadata-0.0.1-SNAPSHOT-jar-with-dependencies.jar -gc http://localhost:18895/sparql graph http://microbedb.jp/refsequence graph Version: 20161212-1 log4j:WARN No appenders could be found for logger (org.apache.jena.riot.stream.JenaIOEnvironment). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. ----------------------------------------------------------- Graph: http://microbedb.jp/refsequence 1 / 1 ----------------------------------------------------------- /Users/tf/github/metadata/mdb_20171023/graph/turtle_graph_refsequence_1.ttl HttpException: 500 at com.hp.hpl.jena.sparql.engine.http.HttpQuery.execGet(HttpQuery.java:340) at com.hp.hpl.jena.sparql.engine.http.HttpQuery.exec(HttpQuery.java:276) at com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngineHTTP.java:345) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.getRDFProperties(RDFsCrawlerImpl.java:2637) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.getRDFProperties(RDFsCrawlerImpl.java:2586) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.determineSchemaCategory(RDFsCrawlerImpl.java:395) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:266) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:103) HttpException: 500 at com.hp.hpl.jena.sparql.engine.http.HttpQuery.execGet(HttpQuery.java:340) at com.hp.hpl.jena.sparql.engine.http.HttpQuery.exec(HttpQuery.java:276) at com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngineHTTP.java:345) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.getRDFProperties(RDFsCrawlerImpl.java:2637) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.getRDFProperties(RDFsCrawlerImpl.java:2586) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.determineSchemaCategory(RDFsCrawlerImpl.java:395) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:266) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:103) 190810 msec. Error occured. Exception in thread "main" java.lang.Exception: Error occured s(341) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:286) at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:103)
-gcでグラフ単位での実行時に不明なエラー出力された
ERROR: duplicate PDRD found! (L1144):
→ 同一のプロパティでDomain, Rangeが繰り返して定義されている場合に出力されるエラー、おそらくOWLがインポートされたグラフで出力されていると予想される
taxonomy.ttlのbuild_index時にエラー
%umakaparser build_index ontology/taxonomy/taxonomy.ttl --dist test_index (4, 50300, datetime.timedelta(0, 0, 56585)) (5, 100300, datetime.timedelta(0, 0, 113394)) (6, 150300, datetime.timedelta(0, 0, 181359)) (8, 200300, datetime.timedelta(0, 0, 238139)) . . . (253, 12450300, datetime.timedelta(0, 17, 255336)) (254, 12500300, datetime.timedelta(0, 17, 328418)) Traceback (most recent call last): File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/bin/umakaparser", line 11, in <module> File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 716, in __call__ File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 696, in main File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 1060, in invoke File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 889, in invoke File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 534, in invoke File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/umakaviewer/services.py", line 34, in build_index File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/umakaviewer/scripts/services/assets.py", line 114, in index_owl File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/umakaviewer/scripts/services/assets.py", line 50, in separate_large_owl File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/umakaviewer/scripts/services/assets.py", line 29, in output IOError: [Errno 24] Too many open files: '/Users/tf/github/metadata/mdb_20171023/tmpgIsfwy/tmp1p1CMl'
mccv.ttl のbuild_index時にエラー
%umakaparser build_index ontology/mccv/mccv.ttl --dist test_index (4, 871, datetime.timedelta(0, 0, 8726)) Traceback (most recent call last): File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/bin/umakaparser", line 11, in <module> sys.exit(cmd()) File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 716, in __call__ return self.main(*args, **kwargs) File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 696, in main rv = self.invoke(ctx) File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 1060, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 889, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 534, in invoke return callback(*args, **kwargs) File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/umakaviewer/services.py", line 34, in build_index output = index_owl(owl_data_ttl, target_properties, dist) File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/umakaviewer/scripts/services/assets.py", line 127, in index_owl p.map(output_process, ((prefix, temp_file, output_properties, temp_dir) for temp_file in temp_files)) File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/multiprocessing/pool.py", line 251, in map return self.map_async(func, iterable, chunksize).get() File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/multiprocessing/pool.py", line 567, in get raise self._value UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-13: ordinal not in range(128)
build_index成功
%umakaparser build_index sio.ttl ontology/insdc/nucleotide.ttl MEO/meo.ttl so.ttl --dist umaka_index (8, 108788, datetime.timedelta(0, 0, 123357)) /Users/tf/github/metadata/mdb_20171023/tmpXois1v/tmpOyoo_N 10.3637402058 >>> /Users/tf/github/metadata/mdb_20171023/umaka_index
build成功
%umakaparser build graph_18895/turtle_graph_18895_jcm_1.ttl --assets umaka_index --dist umakaviewer_jcm ('classes', 12) ('properties', 27) >>> umakaviewer_jcm
フィードバック
- 現状は、数十〜数百ぐらいのクラス関係図を可視化するのは良さそう
- SPARQL ep. 全体の俯瞰するためには、RefSeqゲノム、Taxonomy.tllなどおそらくトリプル数の多いデータをSPARQL builder/metadata, umakaparser両方が使える必要がる
- オントロジーファイルはウェブからの取得やSPARQL ep.にもインポートされているので、build_indexの入力は、もう少しよきにはからってもらえるとうれしい