SPARQLthon65/ClinVar JSON-LD

提供:TogoWiki

移動: 案内, 検索

最新版 ClinVar の XML から JSON-LD を作れないか検討

ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/clinvar_variation/beta/

  • variation_archive_20180125.xsd
  • variation_archive_20180125.xml.gz

XML から JSON へ

require 'active_support'
require 'active_support/core_ext'

json = Hash.from_xml(ARGF.read).to_json
puts JSON.pretty_generate(JSON.parse(json))

元 XML

<VariationArchive Accession="VCV000001978" DateCreated="2010-12-01" DateLastUpdated="2017-09-22" NumberOfSubmissions="1" NumberOfSubmitters="1" RecordType="inte
rpreted" VariationID="1978" VariationName="ADA, IVS5DS, G-A, +1" VariationType="single nucleotide variant" Version="1">
  <RecordStatus>current</RecordStatus>
  <Species>Homo sapiens</Species>
  <InterpretedRecord>
    <SimpleAllele AlleleID="17017" VariationID="1978">
      <GeneList>
        <Gene FullName="adenosine deaminase" GeneID="100" HGNC_ID="HGNC:186" RelationshipType="asserted, but not computed" Source="submitted" Symbol="ADA">
  :

JSON

{
  "VariationArchive": {
    "Accession": "VCV000001978",
    "DateCreated": "2010-12-01",
    "DateLastUpdated": "2017-09-22",
    "NumberOfSubmissions": "1",
    "NumberOfSubmitters": "1",
    "RecordType": "interpreted",
    "VariationID": "1978",
    "VariationName": "ADA, IVS5DS, G-A, +1",
    "VariationType": "single nucleotide variant",
    "Version": "1",
    "RecordStatus": "current",
    "Species": "Homo sapiens",
    "InterpretedRecord": {
      "SimpleAllele": {
        "AlleleID": "17017",
        "VariationID": "1978",
        "GeneList": {
          "Gene": {
            "FullName": "adenosine deaminase",
            "GeneID": "100",
            "HGNC_ID": "HGNC:186",
            "RelationshipType": "asserted, but not computed",
            "Source": "submitted",
            "Symbol": "ADA",
  :

これに JSON-LD 用の変更を追加

{
  "@context": "http://med2rdf.org/context/clinvar.jsonld",
  "@id": "http://identifiers.org/clinvar/1978",
  "@type": "cvo:Variation",
  :

ここで clinvar.jsonld に色々定義を追加

{
  "@context": [
    # prefix
    "http://prefix.cc/context",
    {
      "cvo":                "http://med2rdf.org/ontology/clinvar#",
      "owl":                "http://www.w3.org/2002/07/owl#",
      "rdf":                "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
      "rdfs":               "http://www.w3.org/2000/01/rdf-schema#",
      "xsd":                "http://www.w3.org/2001/XMLSchema#",
      "dc":                 "http://purl.org/dc/elements/1.1/",
      "dct":                "http://purl.org/dc/terms/",
      "foaf":               "http://xmlns.com/foaf/0.1/",
        :
    }, {
      # property
      "VariationArchive": "cvo:variation_archive",
      "Accession": "cvo:accession",
        :
    }, {
      # class
      "Variant": "cvo:Variation",
        :
    }
  ]
}

こんなので XSD のタグ名と property や class を対応付けするだけで行けないだろうか?

XSD と ClinVar オントロジーの対応を検証