DiseaseID

提供:TogoWiki

移動: 案内, 検索

目次

概要

  • 現状疾患は様々なIDでマッピングされている(例:SNOMED, OMIM, Orphanet, MedDRA, MESH ... )
    • 個々のID体系にはそれぞれ特色があり,切り口も異なるのでこれらのID体系を1つにすることは困難。よって,どれかのIDが紐付いていれば,関連する疾患データが全て取得できることが望ましい。
    • しかしながら,現実的にはライセンス,バージョン等の問題,プラットホームの問題で,全てをRDF化して,継続的に管理することは難しい。
    • そこで,部分的にでも研究データと疾患のIDがマッピングされて,まとめて表記が出来るように,まずは,ICD-10のRDF化(BioPortal由来のRDFの修正)とUMLS CUIのマッピングを試みることにした。

ICD-10

  • ICD-10とは疾病及び関連保健問題の国際統計分類
    • 臨床の現場や医療統計に広く使用されており,体系的に疾患を眺めるには便利。
    • もともとは死因分類として作られたものなので,下痢と腸炎を同じIDにしてしまうなど,各IDを疾患として捉えるには問題があるところも…。
    • 来年にはICD-11が公開される予定。
    • ICD-11はICD-10の弱点を修正したり,伝統医薬の情報も加えられるらしいので期待。
    • ただ,ICD-10が発行されてもICD-9がしばらく使われていたように,しばらくはICD-10が使われるはずなので,RDF化の意義はあると思われる。
  • 参考URL

UMLS CUI

  • UMLSとは米国国立医学図書館で開発された統合(一体化)医学用語システム
  • UMLSの編集者によって出典(別々の疾患のID体系)ごとの語彙をConceptごとにまとめて固有のIDをつけているらしい。
    • このコンセプトがCUIで疾患であれば疾患の概念をまとめており,標準化した名前がTermと呼ばれ,LUIで示し,個々の文字列であるStringにはSUIが付与され,もともとのリソースがなんであったか(MeSH,ICD-10,MedDRA,MeSHなど)まで区別したものがAtomと呼ばれ,AUIで示すらしい。
    • 参考URLのUsing SNOMED CT with the UMLS が詳しい。(44ページ目付近に注目)
  • 参考URL

Why ICD-10 to UMLS CUI?

  • Bio2RDFのエントリには,UMLS CUIにマッピングされているエントリが多数あるが,これを疾患の上位概念から見ることが難しいため。
    • 例:SIDERのエントリ(症例のエントリID自体がUMLS CUIになっている。)-- http://bit.ly/1xu8NZY

進捗

2014-10-27

  • BioPortal由来のICD-10の修正
    • 以下サンプルのエントリ。
    • 便宜的に最末端のエントリにbf:classification "small"をつけました。(修正の可能性あり)
    • 念のため,ちょっとfake。
@prefix bf:    <http://bibframe.org/vocab/> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos:  <http://www.w3.org/2004/02/skos/core#> .
@prefix umls:  <http://bioportal.bioontology.org/ontologies/umls/> .
@prefix owl:   <http://www.w3.org/2002/07/owl#> .

<http://purl.bioontology.org/ontology/ICD10/Q3*.*>
        a                  owl:Class ;
        rdfs:subClassOf    <http://purl.bioontology.org/ontology/ICD10/Q3*> ;
        bf:classification  "small" ;
        umls:cui           "C04******" ;
        umls:hasSTY        <http://purl.bioontology.org/ontology/STY/T0**> ;
        umls:tui           "T0**" ;
        skos:notation      "Q3*.*" ;
        skos:prefLabel     "... lung"@en .


  • 最初にクラスを指定し,上位のクラスのサブクラスであることを提示。umlsのcuiとtui情報を書き,IDをskos:notationで書き,実際の疾患名はskos:prefLabelで書いている。
  • 確認事項
    • まず,4桁のエントリにきちんと網羅性があるかどうか確認。
    • 階層の部分的なずれの吸収方法を考える。
    • 原則として,2010年のver.を参考にする。
    • 階層構造以外のID間の関係性はとりあえず無視。


2014-10-28

  • WHOからダウンロードしたxmlデータと,BIoPortal由来のRDFデータの最末端のidの差を取ったところ,以下のようになった。
#WHO由来のみ 122個
["A09.0", "A09.9", "B17.9", "B33.4", "B98.0", "B98.1", "C79.9", "C80.0", "C80.9", "C81.4", "C82.3", "C82.4", "C82.5", "C82.6", "C84.6", "C84.7", "C84.8", "C84.9", "C85.2", "C86.0", "C86.1", "C86.2", "C86.3", "C86.4", "C86.5", "C86.6", "C88.4", "C90.3", "C91.6", "C91.8", "C92.6", "C92.8", "C93.3", "C94.6", "C96.4", "C96.5", "C96.6", "C96.8", "D46.5", "D46.6", "D47.4", "D47.5", "D68.5", "D68.6", "D89.3", "E88.3", "G21.4", "G90.4", "H54.9", "I27.2", "I72.5", "I98.3", "J12.3", "J21.1", "K12.3", "K22.7", "K35.2", "K35.3", "K35.8", "K52.3", "K85.0", "K85.1", "K85.2", "K85.3", "K85.8", "K85.9", "L89.0", "L89.1", "L89.2", "L89.3", "L89.9", "M31.7", "M72.6", "M79.7", "N18.1", "N18.2", "N18.3", "N18.4", "N18.5", "N42.3", "O14.2", "O43.2", "O60.0", "O60.1", "O60.2", "O60.3", "O96.0", "O96.1", "O96.9", "O97.0", "O97.1", "O97.9", "O98.7", "P91.6", "Q31.5", "R26.3", "R29.6", "R50.2", "R50.8", "R57.2", "R63.6", "R65.0", "R65.1", "R65.2", "R65.3", "R65.9", "X34.0", "X34.1", "X34.8", "X34.9", "X59.0", "X59.9", "Z58.7", "Z92.6", "U04.9", "U80.0", "U80.1", "U80.8", "U81.0", "U81.8", "U89.8", "U89.9"]
#BioPortal由来のみ 1084個
["C83.2", "C83.4", "C83.6", "C84.2", "C84.3", "C85.0", "C88.1", "C91.2", "C93.2", "C94.1", "C94.5", "C95.2", "C96.1", "C96.3", "D46.3", "D75.2", "D76.0", "E10.0", "E10.1", "E10.2", "E10.3", "E10.4", "E10.5", "E10.6", "E10.7", "E10.8", "E10.9", "E11.0", "E11.1", "E11.2", "E11.3", "E11.4", "E11.5", "E11.6", "E11.7", "E11.8", "E11.9", "E12.0", "E12.1", "E12.2", "E12.3", "E12.4", "E12.5", "E12.6", "E12.7", "E12.8", "E12.9", "E13.0", "E13.1", "E13.2", "E13.3", "E13.4", "E13.5", "E13.6", "E13.7", "E13.8", "E13.9", "E14.0", "E14.1", "E14.2", "E14.3", "E14.4", "E14.5", "E14.6", "E14.7", "E14.8", "E14.9", "F10.0", "F10.1", "F10.2", "F10.3", "F10.4", "F10.5", "F10.6", "F10.7", "F10.8", "F10.9", "F11.0", "F11.1", "F11.2", "F11.3", "F11.4", "F11.5", "F11.6", "F11.7", "F11.8", "F11.9", "F12.0", "F12.1", "F12.2", "F12.3", "F12.4", "F12.5", "F12.6", "F12.7", "F12.8", "F12.9", "F13.0", "F13.1", "F13.2", "F13.3", "F13.4", "F13.5", "F13.6", "F13.7", "F13.8", "F13.9", "F14.0", "F14.1", "F14.2", "F14.3", "F14.4", "F14.5", "F14.6", "F14.7", "F14.8", "F14.9", "F15.0", "F15.1", "F15.2", "F15.3", "F15.4", "F15.5", "F15.6", "F15.7", "F15.8", "F15.9", "F16.0", "F16.1", "F16.2", "F16.3", "F16.4", "F16.5", "F16.6", "F16.7", "F16.8", "F16.9", "F17.0", "F17.1", "F17.2", "F17.3", "F17.4", "F17.5", "F17.6", "F17.7", "F17.8", "F17.9", "F18.0", "F18.1", "F18.2", "F18.3", "F18.4", "F18.5", "F18.6", "F18.7", "F18.8", "F18.9", "F19.0", "F19.1", "F19.2", "F19.3", "F19.4", "F19.5", "F19.6", "F19.7", "F19.8", "F19.9", "F70.0", "F70.1", "F70.8", "F70.9", "F71.0", "F71.1", "F71.8", "F71.9", "F72.0", "F72.1", "F72.8", "F72.9", "F73.0", "F73.1", "F73.8", "F73.9", "F78.0", "F78.1", "F78.8", "F78.9", "F79.0", "F79.1", "F79.8", "F79.9", "H54.7", "K25.0", "K25.1", "K25.2", "K25.3", "K25.4", "K25.5", "K25.6", "K25.7", "K25.9", "K26.0", "K26.1", "K26.2", "K26.3", "K26.4", "K26.5", "K26.6", "K26.7", "K26.9", "K27.0", "K27.1", "K27.2", "K27.3", "K27.4", "K27.5", "K27.6", "K27.7", "K27.9", "K28.0", "K28.1", "K28.2", "K28.3", "K28.4", "K28.5", "K28.6", "K28.7", "K28.9", "K35.0", "K35.1", "K35.9", "K51.1", "M72.3", "M72.5", "N00.0", "N00.1", "N00.2", "N00.3", "N00.4", "N00.5", "N00.6", "N00.7", "N00.8", "N00.9", "N01.0", "N01.1", "N01.2", "N01.3", "N01.4", "N01.5", "N01.6", "N01.7", "N01.8", "N01.9", "N02.0", "N02.1", "N02.2", "N02.3", "N02.4", "N02.5", "N02.6", "N02.7", "N02.8", "N02.9", "N03.0", "N03.1", "N03.2", "N03.3", "N03.4", "N03.5", "N03.6", "N03.7", "N03.8", "N03.9", "N04.0", "N04.1", "N04.2", "N04.3", "N04.4", "N04.5", "N04.6", "N04.7", "N04.8", "N04.9", "N05.0", "N05.1", "N05.2", "N05.3", "N05.4", "N05.5", "N05.6", "N05.7", "N05.8", "N05.9", "N06.0", "N06.1", "N06.2", "N06.3", "N06.4", "N06.5", "N06.6", "N06.7", "N06.8", "N06.9", "N07.0", "N07.1", "N07.2", "N07.3", "N07.4", "N07.5", "N07.6", "N07.7", "N07.8", "N07.9", "N18.0", "N18.8", "O03.0", "O03.1", "O03.2", "O03.3", "O03.4", "O03.5", "O03.6", "O03.7", "O03.8", "O03.9", "O04.0", "O04.1", "O04.2", "O04.3", "O04.4", "O04.5", "O04.6", "O04.7", "O04.8", "O04.9", "O05.0", "O05.1", "O05.2", "O05.3", "O05.4", "O05.5", "O05.6", "O05.7", "O05.8", "O05.9", "O06.0", "O06.1", "O06.2", "O06.3", "O06.4", "O06.5", "O06.6", "O06.7", "O06.8", "O06.9", "Q31.4", "Q35.6", "R50.0", "R50.1", "R83.0", "R83.1", "R83.2", "R83.3", "R83.4", "R83.5", "R83.6", "R83.7", "R83.8", "R83.9", "R84.0", "R84.1", "R84.2", "R84.3", "R84.4", "R84.5", "R84.6", "R84.7", "R84.8", "R84.9", "R85.0", "R85.1", "R85.2", "R85.3", "R85.4", "R85.5", "R85.6", "R85.7", "R85.8", "R85.9", "R86.0", "R86.1", "R86.2", "R86.3", "R86.4", "R86.5", "R86.6", "R86.7", "R86.8", "R86.9", "R87.0", "R87.1", "R87.2", "R87.3", "R87.4", "R87.5", "R87.6", "R87.7", "R87.8", "R87.9", "R89.0", "R89.1", "R89.2", "R89.3", "R89.4", "R89.5", "R89.6", "R89.7", "R89.8", "R89.9", "V01.0", "V01.1", "V01.9", "V02.0", "V02.1", "V02.9", "V03.0", "V03.1", "V03.9", "V04.0", "V04.1", "V04.9", "V05.0", "V05.1", "V05.9", "V06.0", "V06.1", "V06.9", "V10.0", "V10.1", "V10.2", "V10.3", "V10.4", "V10.5", "V10.9", "V11.0", "V11.1", "V11.2", "V11.3", "V11.4", "V11.5", "V11.9", "V12.0", "V12.1", "V12.2", "V12.3", "V12.4", "V12.5", "V12.9", "V13.0", "V13.1", "V13.2", "V13.3", "V13.4", "V13.5", "V13.9", "V14.0", "V14.1", "V14.2", "V14.3", "V14.4", "V14.5", "V14.9", "V15.0", "V15.1", "V15.2", "V15.3", "V15.4", "V15.5", "V15.9", "V16.0", "V16.1", "V16.2", "V16.3", "V16.4", "V16.5", "V16.9", "V17.0", "V17.1", "V17.2", "V17.3", "V17.4", "V17.5", "V17.9", "V18.0", "V18.1", "V18.2", "V18.3", "V18.4", "V18.5", "V18.9", "V20.0", "V20.1", "V20.2", "V20.3", "V20.4", "V20.5", "V20.9", "V21.0", "V21.1", "V21.2", "V21.3", "V21.4", "V21.5", "V21.9", "V22.0", "V22.1", "V22.2", "V22.3", "V22.4", "V22.5", "V22.9", "V23.0", "V23.1", "V23.2", "V23.3", "V23.4", "V23.5", "V23.9", "V24.0", "V24.1", "V24.2", "V24.3", "V24.4", "V24.5", "V24.9", "V25.0", "V25.1", "V25.2", "V25.3", "V25.4", "V25.5", "V25.9", "V26.0", "V26.1", "V26.2", "V26.3", "V26.4", "V26.5", "V26.9", "V27.0", "V27.1", "V27.2", "V27.3", "V27.4", "V27.5", "V27.9", "V28.0", "V28.1", "V28.2", "V28.3", "V28.4", "V28.5", "V28.9", "V30.0", "V30.1", "V30.2", "V30.3", "V30.4", "V30.5", "V30.6", "V30.7", "V30.9", "V31.0", "V31.1", "V31.2", "V31.3", "V31.4", "V31.5", "V31.6", "V31.7", "V31.9", "V32.0", "V32.1", "V32.2", "V32.3", "V32.4", "V32.5", "V32.6", "V32.7", "V32.9", "V33.0", "V33.1", "V33.2", "V33.3", "V33.4", "V33.5", "V33.6", "V33.7", "V33.9", "V34.0", "V34.1", "V34.2", "V34.3", "V34.4", "V34.5", "V34.6", "V34.7", "V34.9", "V35.0", "V35.1", "V35.2", "V35.3", "V35.4", "V35.5", "V35.6", "V35.7", "V35.9", "V36.0", "V36.1", "V36.2", "V36.3", "V36.4", "V36.5", "V36.6", "V36.7", "V36.9", "V37.0", "V37.1", "V37.2", "V37.3", "V37.4", "V37.5", "V37.6", "V37.7", "V37.9", "V38.0", "V38.1", "V38.2", "V38.3", "V38.4", "V38.5", "V38.6", "V38.7", "V38.9", "V40.0", "V40.1", "V40.2", "V40.3", "V40.4", "V40.5", "V40.6", "V40.7", "V40.9", "V41.0", "V41.1", "V41.2", "V41.3", "V41.4", "V41.5", "V41.6", "V41.7", "V41.9", "V42.0", "V42.1", "V42.2", "V42.3", "V42.4", "V42.5", "V42.6", "V42.7", "V42.9", "V43.0", "V43.1", "V43.2", "V43.3", "V43.4", "V43.5", "V43.6", "V43.7", "V43.9", "V44.0", "V44.1", "V44.2", "V44.3", "V44.4", "V44.5", "V44.6", "V44.7", "V44.9", "V45.0", "V45.1", "V45.2", "V45.3", "V45.4", "V45.5", "V45.6", "V45.7", "V45.9", "V46.0", "V46.1", "V46.2", "V46.3", "V46.4", "V46.5", "V46.6", "V46.7", "V46.9", "V47.0", "V47.1", "V47.2", "V47.3", "V47.4", "V47.5", "V47.6", "V47.7", "V47.9", "V48.0", "V48.1", "V48.2", "V48.3", "V48.4", "V48.5", "V48.6", "V48.7", "V48.9", "V50.0", "V50.1", "V50.2", "V50.3", "V50.4", "V50.5", "V50.6", "V50.7", "V50.9", "V51.0", "V51.1", "V51.2", "V51.3", "V51.4", "V51.5", "V51.6", "V51.7", "V51.9", "V52.0", "V52.1", "V52.2", "V52.3", "V52.4", "V52.5", "V52.6", "V52.7", "V52.9", "V53.0", "V53.1", "V53.2", "V53.3", "V53.4", "V53.5", "V53.6", "V53.7", "V53.9", "V54.0", "V54.1", "V54.2", "V54.3", "V54.4", "V54.5", "V54.6", "V54.7", "V54.9", "V55.0", "V55.1", "V55.2", "V55.3", "V55.4", "V55.5", "V55.6", "V55.7", "V55.9", "V56.0", "V56.1", "V56.2", "V56.3", "V56.4", "V56.5", "V56.6", "V56.7", "V56.9", "V57.0", "V57.1", "V57.2", "V57.3", "V57.4", "V57.5", "V57.6", "V57.7", "V57.9", "V58.0", "V58.1", "V58.2", "V58.3", "V58.4", "V58.5", "V58.6", "V58.7", "V58.9", "V60.0", "V60.1", "V60.2", "V60.3", "V60.4", "V60.5", "V60.6", "V60.7", "V60.9", "V61.0", "V61.1", "V61.2", "V61.3", "V61.4", "V61.5", "V61.6", "V61.7", "V61.9", "V62.0", "V62.1", "V62.2", "V62.3", "V62.4", "V62.5", "V62.6", "V62.7", "V62.9", "V63.0", "V63.1", "V63.2", "V63.3", "V63.4", "V63.5", "V63.6", "V63.7", "V63.9", "V64.0", "V64.1", "V64.2", "V64.3", "V64.4", "V64.5", "V64.6", "V64.7", "V64.9", "V65.0", "V65.1", "V65.2", "V65.3", "V65.4", "V65.5", "V65.6", "V65.7", "V65.9", "V66.0", "V66.1", "V66.2", "V66.3", "V66.4", "V66.5", "V66.6", "V66.7", "V66.9", "V67.0", "V67.1", "V67.2", "V67.3", "V67.4", "V67.5", "V67.6", "V67.7", "V67.9", "V68.0", "V68.1", "V68.2", "V68.3", "V68.4", "V68.5", "V68.6", "V68.7", "V68.9", "V70.0", "V70.1", "V70.2", "V70.3", "V70.4", "V70.5", "V70.6", "V70.7", "V70.9", "V71.0", "V71.1", "V71.2", "V71.3", "V71.4", "V71.5", "V71.6", "V71.7", "V71.9", "V72.0", "V72.1", "V72.2", "V72.3", "V72.4", "V72.5", "V72.6", "V72.7", "V72.9", "V73.0", "V73.1", "V73.2", "V73.3", "V73.4", "V73.5", "V73.6", "V73.7", "V73.9", "V74.0", "V74.1", "V74.2", "V74.3", "V74.4", "V74.5", "V74.6", "V74.7", "V74.9", "V75.0", "V75.1", "V75.2", "V75.3", "V75.4", "V75.5", "V75.6", "V75.7", "V75.9", "V76.0", "V76.1", "V76.2", "V76.3", "V76.4", "V76.5", "V76.6", "V76.7", "V76.9", "V77.0", "V77.1", "V77.2", "V77.3", "V77.4", "V77.5", "V77.6", "V77.7", "V77.9", "V78.0", "V78.1", "V78.2", "V78.3", "V78.4", "V78.5", "V78.6", "V78.7", "V78.9", "V90.0", "V90.1", "V90.2", "V90.3", "V90.4", "V90.5", "V90.6", "V90.7", "V90.8", "V90.9", "V91.0", "V91.1", "V91.2", "V91.3", "V91.4", "V91.5", "V91.6", "V91.7", "V91.8", "V91.9", "V92.0", "V92.1", "V92.2", "V92.3", "V92.4", "V92.5", "V92.6", "V92.7", "V92.8", "V92.9", "V93.0", "V93.1", "V93.2", "V93.3", "V93.4", "V93.5", "V93.6", "V93.7", "V93.8", "V93.9", "V94.0", "V94.1", "V94.2", "V94.3", "V94.4", "V94.5", "V94.6", "V94.7", "V94.8", "V94.9", "Y70.0", "Y70.1", "Y70.2", "Y70.3", "Y70.8", "Y71.0", "Y71.1", "Y71.2", "Y71.3", "Y71.8", "Y72.0", "Y72.1", "Y72.2", "Y72.3", "Y72.8", "Y73.0", "Y73.1", "Y73.2", "Y73.3", "Y73.8", "Y74.0", "Y74.1", "Y74.2", "Y74.3", "Y74.8", "Y75.0", "Y75.1", "Y75.2", "Y75.3", "Y75.8", "Y76.0", "Y76.1", "Y76.2", "Y76.3", "Y76.8", "Y77.0", "Y77.1", "Y77.2", "Y77.3", "Y77.8", "Y78.0", "Y78.1", "Y78.2", "Y78.3", "Y78.8", "Y79.0", "Y79.1", "Y79.2", "Y79.3", "Y79.8", "Y80.0", "Y80.1", "Y80.2", "Y80.3", "Y80.8", "Y81.0", "Y81.1", "Y81.2", "Y81.3", "Y81.8", "Y82.0", "Y82.1", "Y82.2", "Y82.3", "Y82.8"]
  • 上記から分かるように(改行した方が良い?)両者は包含関係にない。この違いがどこから来るのだろうか。
    • 理由ははっきり分からないが(網羅的に調べてはいない),WHOのみに含まれるものは,バージョンが新しくなった際に追加されたもので(2003年には無いが2010年にはあるID), BioPortalのみに含まれるものはバージョン更新の際に削除されたものらしい。
      • 参考URL(PDF) -- http://www.who.int/classifications/icd/ICD10Volume2_en_2010.pdf?ua=1
      • エントリには両者を考慮しないでIDをマッピングしているものもあると思うので,暫定的に,WHOのみにあるものについては,rdfs:comment "ICD-10_2010_version" を付与し,BioPortalのいにあるものについては,rdfs:comment "ICD-10_old_version"を付与しておくことにする。
      • → <http://purl.org/dc/terms/hasVersion> "2003" のように dcterms:hasVersion を使用することに(川島さんからのアドバイス)
      • "2010"の部分をURIに出来そうだったら(片山さんのアドバイス)検討する。
    • 要らなくなったらこれをターゲットにしてまるっと消してしまえば良い。
    • BioPortal Onlyのものについて,具体的には,以下の様なトリプルを別途作って,ttlファイルにして保存して,トリプルストアにアップロードする。
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .

<http://purl.bioontology.org/ontology/ICD10/C**.2> dcterms:hasVersion "2003" .
<http://purl.bioontology.org/ontology/ICD10/C**.3> dcterms:hasVersion "2003" .
....

2014-10-29 & 30

  • ちまちまとマッピングした結果,122エントリ中,93エントリはマッピングできた。
    • うち,71エントリはICD-10のIDを参照できて,22エントリは文字列のみでマッピングしたもの。
    • UMLS CUIが見つからなかったものや怪しいものについては,rdf:comment "Unfound UMLS CUI" 等記述しておくことにします。
      • 割とマッピングできないものがあった。
      • ↑疾患ではなくサービスだったり,災害の犠牲者だったり,特殊目的用コードであったり…。

調査の途中で有用な資料を見つけたので,リンク先を貼っておきます。

その他コメントなど

  • 4桁コードについては後はRDF化すればいいだけ。3桁コードより上位のコードについては,後ほど取り組む。
  • 5桁のものについては,pending。必要に応じて取り組む。
  • 日本国内の臨床の現場では,ICD-10の2003年verが使用されているっぽい。
  • KEGG BRITE のデータ(日本語も含む)は2010年のものに準拠しているらしい。

OMIM to ICD-10

  • OMIMからICD-10へのマッピングがあると便利とのことを伺い,Bio2RDFのRDFを調査。
  • Orphanetでダウンロード出来るxmlからもOrphanet ID と関連付けられるOMIMとICD-10のデータがあるので,同じことがBio2RDFで公開されているSPARQLで出来ないか調査。
  • SPARQL Endpoint -- http://orphanet.bio2rdf.org/sparql
  • 以下の様なクエリを投げたら,何故か1件も無かったorz ので,他を調べる。
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX dv: <http://bio2rdf.org/bio2rdf.dataset_vocabulary:>
SELECT *
where {?s <http://bio2rdf.org/orphanet_vocabulary:x-icd10> ?icd10;
 <http://bio2rdf.org/orphanet_vocabulary:x-omim> ?omim.
}
  • もしかして,OMIMのSPARQL Endpointから調べられるかと思ったら,Object Propertiesに <http://bio2rdf.org/omim_vocabulary:x-icd10> を発見!
  • 見たところ,網羅性は無いかもしれないが,部分的にはマッピングが取れそうだったので,SPARQLで調べてみる。
  • OMIMのダウンロード出来るデータやAPIではICD-10は取得できなかったので(半年くらい前に調べた時)ウェブ上の情報から直接取ってきている?っぽい。
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX dv: <http://bio2rdf.org/bio2rdf.dataset_vocabulary:>

SELECT distinct ?OMIM ?icd10
where {?s <http://bio2rdf.org/omim_vocabulary:x-icd10> ?icd10;
 <http://bio2rdf.org/bio2rdf_vocabulary:identifier> ?OMIM.
}
ORDER BY ?OMIM
  • ビンゴ!ただ,OMIMの一般的なIDが振られているもの以外の情報もとれてきてしまったので,rdf:typeを調べたところ,以下のタイプがあった。
type
http://bio2rdf.org/omim_vocabulary:Phenotype
http://bio2rdf.org/omim_vocabulary:Resource
http://bio2rdf.org/omim_vocabulary:Predominantly-phenotypes
http://bio2rdf.org/omim_vocabulary:Gene-phenotype
http://bio2rdf.org/omim_vocabulary:Gene
http://bio2rdf.org/omim_vocabulary:Characteristic
  • rdf:typeを指定してデータを取得しようとも思ったのだが,計算時間が長くなるからやめて,とトリプルストアに言われたので,以下の様なクエリで対応データ(とrdf:type)を出すことに。
    •  ?OMIMでOMIMのIDが6桁のものだけを出している。
    • str()関数を多用しているのはあくまで見た目の問題。
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX dv: <http://bio2rdf.org/bio2rdf.dataset_vocabulary:>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT distinct ?s_OMIM ?s_ICD10 #?type 
where {?s <http://bio2rdf.org/omim_vocabulary:x-icd10> ?icd10;
 <http://bio2rdf.org/bio2rdf_vocabulary:identifier> ?omim;
rdf:type ?type.
?icd10 <http://bio2rdf.org/bio2rdf_vocabulary:identifier> ?icd10s.
 BIND (STRLEN(?omim) AS ?omimlen)
 FILTER(?omimlen = 6)
 BIND (str(?icd10s) as ?s_ICD10)
 BIND (str(?omim) as ?s_OMIM)
}
ORDER BY ?s_OMIM
個人用ツール