SPARQLthon61/MicrobeDB.jp-Umakaviewer

提供:TogoWiki

移動: 案内, 検索

MicorbeDB.jpのSPARQL epに対してumaka-viewer試用した時の作業ログおよびフィードバック

目次

作業手順

  1. SPARQL builder/metadataを実行
  2. umaka-viewer/umakaparser build_indexを実行
    • umakaparser build_index sio.ttl ontology/insdc/nucleotide.ttl MEO/meo.ttl so.ttl --dist umaka_index
  3. umaka-viewer/umakaparser buildを実行
    • umakaparser build graph_18895/turtle_graph_18895_jcm_1.ttl --assets umaka_index --dist umakaviewer_jcm
  4. https://umaka-viewer.dbcls.jp にサインインして、ファイルアップロード

表示までを確認した https://umaka-viewer.dbcls.jp/v/jumBPgtanD6t64ACe3yBsfesXncqY_-T

実行コマンド

metadataのヘルプ

%java -jar ../metadata-0.0.1-SNAPSHOT-jar-with-dependencies.jar
  Version: 20161212-1
Usage: java org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl [options]
   [options]
       1. to print a list of graphURIs
            -g endpointURL
       2. to crawl whole data in the endpoint
            -ac endpointURL crawlName outputFileName
       3. to crawl the specified graph in the endpoint
            -gc endpointURL crawlName graphURI outputFileName

umakaparserのヘルプ

%umakaparser Usage: umakaparser [OPTIONS] COMMAND [ARGS]...

Options:

 --help  Show this message and exit.

Commands:

 build        SBMに従うメタデータからモデルデータを作成します。
 build_index  オントロジーのファイルから、モデルデータ作成のためのassetsを作成します。

作業ログなど

versionが違う

→ バグフィックスの更新のため問題ない、山口さんに確認済み

crawlNameがどこで使われているか不明

→出力ファイルのprefixに指定される

outputFileNameはディレクトリの指定、ディレクトリがないと実行後にエラーになる

%java -jar ../metadata-0.0.1-SNAPSHOT-jar-with-dependencies.jar -ac http://localhost:18895/sparql mdb mdb_output.txt
log4j:WARN No appenders could be found for logger (org.apache.jena.riot.stream.JenaIOEnvironment).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
-----------------------------------------------------------
  Graph: http://microbedb.jp/assembly        1 / 33
-----------------------------------------------------------
/Users/tf/github/metadata/mdb_20171023/mdb_output.txt/turtle_mdb_assembly_1.ttl
P
properties
http://ddbj.nig.ac.jp/ontologies/nucleotide/dblink
http://www.ncbi.nlm.nih.gov/assembly/refseq_category
http://www.ncbi.nlm.nih.gov/assembly/asm_name
http://www.ncbi.nlm.nih.gov/assembly/assembly_level
http://www.ncbi.nlm.nih.gov/assembly/taxon
.
.
.
java.io.FileNotFoundException: /Users/tf/github/metadata/mdb_20171023/mdb_output.txt/turtle_mdb_assembly_1.ttl (No such file or directory)
    at java.io.FileOutputStream.open0(Native Method)
    at java.io.FileOutputStream.open(FileOutputStream.java:270)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
    at org.sparqlbuilder.metadata.crawler.datastructure.SchemaCategory.write2File(SchemaCategory.java:20)
    at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:270)
    at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:86)
80747 msec.
Error occured.
Exception in thread "main" java.lang.Exception: Error occured s(341)
    at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:286)
    at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:86)
  • グラフ単位でも
%java -jar ../metadata-0.0.1-SNAPSHOT-jar-with-dependencies.jar -gc http://localhost:18893/sparql graph_18893 http://microbedb.jp/chebi chebi
  Version: 20161212-1
log4j:WARN No appenders could be found for logger (org.apache.jena.riot.stream.JenaIOEnvironment).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
-----------------------------------------------------------
  Graph: http://microbedb.jp/chebi        1 / 1
-----------------------------------------------------------
/Users/tf/github/metadata/mdb_20171023/chebi/turtle_graph_18893_chebi_1.ttl
.
.
.
#EndpointAccess: 161
java.io.FileNotFoundException: /Users/tf/github/metadata/mdb_20171023/chebi/turtle_graph_18893_chebi_1.ttl (No such file or directory)
	at java.io.FileOutputStream.open0(Native Method)
	at java.io.FileOutputStream.open(FileOutputStream.java:270)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
	at org.sparqlbuilder.metadata.crawler.datastructure.SchemaCategory.write2File(SchemaCategory.java:20)
	at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:270)
	at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:103)
130204 msec.
Error occured.
Exception in thread "main" java.lang.Exception: Error occured s(341)
	at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:286)
	at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:103)

SPARQL ep.全体にかけたときは、fmaなど大きいオントロジーも入っていると時間がかかる

→ fma, taxonomy, so, sioなどを除きグラフ単位で取得に変更 以下、グラフ一覧を取得

%java -jar ../metadata-0.0.1-SNAPSHOT-jar-with-dependencies.jar -g http://localhost:18895/sparql
  Version: 20161212-1
log4j:WARN No appenders could be found for logger (org.apache.jena.riot.stream.JenaIOEnvironment).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

. . .

RefSeqゲノムが格納されたグラフもエラー

%java -jar ../metadata-0.0.1-SNAPSHOT-jar-with-dependencies.jar -gc http://localhost:18895/sparql graph http://microbedb.jp/refsequence graph
  Version: 20161212-1
log4j:WARN No appenders could be found for logger (org.apache.jena.riot.stream.JenaIOEnvironment).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
-----------------------------------------------------------
  Graph: http://microbedb.jp/refsequence        1 / 1
-----------------------------------------------------------
/Users/tf/github/metadata/mdb_20171023/graph/turtle_graph_refsequence_1.ttl
HttpException: 500
	at com.hp.hpl.jena.sparql.engine.http.HttpQuery.execGet(HttpQuery.java:340)
	at com.hp.hpl.jena.sparql.engine.http.HttpQuery.exec(HttpQuery.java:276)
	at com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngineHTTP.java:345)
	at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.getRDFProperties(RDFsCrawlerImpl.java:2637)
	at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.getRDFProperties(RDFsCrawlerImpl.java:2586)
	at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.determineSchemaCategory(RDFsCrawlerImpl.java:395)
	at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:266)
	at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:103)
HttpException: 500
	at com.hp.hpl.jena.sparql.engine.http.HttpQuery.execGet(HttpQuery.java:340)
	at com.hp.hpl.jena.sparql.engine.http.HttpQuery.exec(HttpQuery.java:276)
	at com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngineHTTP.java:345)
	at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.getRDFProperties(RDFsCrawlerImpl.java:2637)
	at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.getRDFProperties(RDFsCrawlerImpl.java:2586)
	at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.determineSchemaCategory(RDFsCrawlerImpl.java:395)
	at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:266)
	at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:103)
190810 msec.
Error occured.
Exception in thread "main" java.lang.Exception: Error occured s(341)
	at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.crawl(RDFsCrawlerImpl.java:286)
	at org.sparqlbuilder.metadata.crawler.sparql.RDFsCrawlerImpl.main(RDFsCrawlerImpl.java:103)

-gcでグラフ単位での実行時に不明なエラー出力された

ERROR: duplicate PDRD found! (L1144): 

→ 同一のプロパティでDomain, Rangeが繰り返して定義されている場合に出力されるエラー、おそらくOWLがインポートされたグラフで出力されていると予想される

taxonomy.ttlのbuild_index時にエラー

%umakaparser build_index ontology/taxonomy/taxonomy.ttl --dist test_index
(4, 50300, datetime.timedelta(0, 0, 56585))
(5, 100300, datetime.timedelta(0, 0, 113394))
(6, 150300, datetime.timedelta(0, 0, 181359))
(8, 200300, datetime.timedelta(0, 0, 238139))
.
.
.
(253, 12450300, datetime.timedelta(0, 17, 255336))
(254, 12500300, datetime.timedelta(0, 17, 328418))
Traceback (most recent call last):
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/bin/umakaparser", line 11, in <module>
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 716, in __call__
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 696, in main
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 1060, in invoke
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 889, in invoke
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 534, in invoke
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/umakaviewer/services.py", line 34, in build_index
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/umakaviewer/scripts/services/assets.py", line 114, in index_owl
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/umakaviewer/scripts/services/assets.py", line 50, in separate_large_owl
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/umakaviewer/scripts/services/assets.py", line 29, in output
IOError: [Errno 24] Too many open files: '/Users/tf/github/metadata/mdb_20171023/tmpgIsfwy/tmp1p1CMl'

mccv.ttl のbuild_index時にエラー

%umakaparser build_index ontology/mccv/mccv.ttl --dist test_index
(4, 871, datetime.timedelta(0, 0, 8726))
Traceback (most recent call last):
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/bin/umakaparser", line 11, in <module>
    sys.exit(cmd())
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/umakaviewer/services.py", line 34, in build_index
    output = index_owl(owl_data_ttl, target_properties, dist)
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/site-packages/umakaviewer/scripts/services/assets.py", line 127, in index_owl
    p.map(output_process, ((prefix, temp_file, output_properties, temp_dir) for temp_file in temp_files))
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/Users/tf/.anyenv/envs/pyenv/versions/2.7.10/lib/python2.7/multiprocessing/pool.py", line 567, in get
    raise self._value
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-13: ordinal not in range(128)

build_index成功

%umakaparser build_index sio.ttl ontology/insdc/nucleotide.ttl MEO/meo.ttl so.ttl --dist umaka_index
(8, 108788, datetime.timedelta(0, 0, 123357))
/Users/tf/github/metadata/mdb_20171023/tmpXois1v/tmpOyoo_N
10.3637402058
>>> /Users/tf/github/metadata/mdb_20171023/umaka_index

build成功

%umakaparser build graph_18895/turtle_graph_18895_jcm_1.ttl --assets umaka_index --dist umakaviewer_jcm
('classes', 12)
('properties', 27)
>>> umakaviewer_jcm

フィードバック

  • 現状は、数十〜数百ぐらいのクラス関係図を可視化するのは良さそう
  • SPARQL ep. 全体の俯瞰するためには、RefSeqゲノム、Taxonomy.tllなどおそらくトリプル数の多いデータをSPARQL builder/metadata, umakaparser両方が使える必要がる
  • オントロジーファイルはウェブからの取得やSPARQL ep.にもインポートされているので、build_indexの入力は、もう少しよきにはからってもらえるとうれしい