SPARQLthon24/DDBJ

提供:TogoWiki

2014年9月25日 (木) 08:55時点におけるTfuji (トーク | 投稿記録)による版
移動: 案内, 検索

目次

Taxonomy OWL

  • owl:versionInfoのオブジェクトは変換した日付からtaxdump ファイルのタイムスタンプなどデータソースのバージョンにするか? 【Done】
    • taxdumpは一日複数回更新される場合がある。owlへの変換は毎日AM4時に変換、AM6時にインポート。

DDBJ OWL

  • prefixの変更【Done】
1c1
< @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
---
> @base <http://ddbj.nig.ac.jp/ontologies/nucleotide/> .
3d2
< @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
5c4,5
< @prefix xml: <http://www.w3.org/XML/1998/namespace> .
---
> @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
6a7
> @prefix dcterms: <http://purl.org/dc/terms/> .
8c9
  • Ontology headerの変更【Done】
10,11c11,15
<     rdfs:label "insdc" ;
<     rdfs:seeAlso <http://www.insdc.org/> .
---
>     rdfs:label "DDBJ annotated nucleotide sequence ontology" ;
>     rdfs:comment "DDBJ annotated nucleotide sequence ontology for semantic representation of the INSDC (DDBJ/ENA/GenBank) sequence records" ;
>     rdfs:seeAlso <http://www.insdc.org/documents/feature-table> ;
>     dcterms:license <http://creativecommons.org/licenses/by/4.0/> ;
>     owl:versionInfo "Version 10.3 October 2013" .
  • Version 10.3 October 2013 ルールの更新
    • 新設、廃止されたfeature/qualiferの表現について
  • <Feature Class> rdfs:seeAlso SO_XXXXXXX 追加する
    • ft_so.jsonから取得、差し込む
  • Annotated Sequence エントリのメタデータブロックのRDFとの整合性
    • Structured Commentのパーサなどない
    • Division
    • Keyword
    • Reference
  • DDBJ形式特有の問題
BASE COUNT      1862754 a      1486687 c      1476489 g      1866935 t

データ例 AP011615

フラットファイル形式

LOCUS       AP011615             6788435 bp    DNA     circular HTG 16-APR-2010
DEFINITION  Arthrospira platensis NIES-39 DNA, nearly complete genome.
ACCESSION   AP011615
VERSION     AP011615.1
DBLINK      BioProject:PRJDA42161
KEYWORDS    HTG; HTGS_PHASE2.
SOURCE      Arthrospira platensis NIES-39
  ORGANISM  Arthrospira platensis NIES-39
            Bacteria; Cyanobacteria; Oscillatoriophycideae; Oscillatoriales;
            Arthrospira.
REFERENCE   1  (bases 1 to 6788435)
  AUTHORS   Fujisawa,T., Fujita,N. and Sekine,M.
  TITLE     Direct Submission
  JOURNAL   Submitted (30-NOV-2009) to the DDBJ/EMBL/GenBank databases.
            Contact:Takatomo Fujisawa
            National Institute of Technology and Evaluation, NITE, Bioresource
            Information Center, Department of Biotechnology; 2-49-10
            Nishihara, Shibuya, Tokyo 151-0066, Japan
            URL    :http://www.bio.nite.go.jp/
REFERENCE   2   
  AUTHORS   Fujisawa,T., Narikawa,R., Okamoto,S., Ehira,S., Yoshimura,H.,
            Suzuki,I., Masuda,T., Mochimaru,M., Takaichi,S., Awai,K.,
            Sekine,M., Horikawa,H., Yashiro,I., Omata,S., Takarada,H.,
            Katano,Y., Kosugi,H., Tanikawa,S., Ohmori,K., Sato,N., Ikeuchi,M.,
            Fujita,N. and Ohmori,M.
  TITLE     Genomic Structure of an Economically Important Cyanobacterium,
            Arthrospira (Spirulina) platensis NIES-39
  JOURNAL   DNA Res. 17, 85-103 (2010)
COMMENT     Genome Coverage: 11x 
            Sequencing Technology: ABI 3730
            The genome structure of A. platensis is estimated to be a single,
            circular chromosome of 6.8 Mb, based on optical mapping.

変換RDF

@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix sio:  <http://semanticscience.org/resource/> .
@prefix obo:  <http://purl.obolibrary.org/obo/> .
@prefix faldo:  <http://biohackathon.org/resource/faldo#> .
@prefix insdc:  <http://ddbj.nig.ac.jp/ontologies/nucleotide/> .

<http://identifiers.org/insdc/AP011615.1> rdf:type  insdc:Entry .
<http://identifiers.org/insdc/AP011615.1> rdfs:label  "Arthrospira platensis NIES-39 DNA, nearly complete genome." .
<http://identifiers.org/insdc/AP011615.1> insdc:sequence_version  "AP011615.1" .
<http://identifiers.org/insdc/AP011615.1> insdc:sequence_date "2010-04-16"^^xsd:date .
<http://identifiers.org/insdc/AP011615.1> insdc:sequence  <http://identifiers.org/insdc/AP011615.1#sequence> .
<http://identifiers.org/insdc/AP011615.1#sequence>  rdfs:seeAlso  <http://www.ncbi.nlm.nih.gov/nuccore/AP011615.1?report=fasta> .
<http://identifiers.org/insdc/AP011615.1#sequence>  rdfs:seeAlso  <http://www.ebi.ac.uk/ena/data/view/AP011615.1&display=fasta> .
<http://identifiers.org/insdc/AP011615.1#sequence>  rdfs:seeAlso  <http://getentry.ddbj.nig.ac.jp/getentry/na/AP011615.1?format=fasta> .
<http://identifiers.org/insdc/AP011615.1#sequence>  rdfs:subClassOf obo:SO_0000001 .  # SO:sequence
<http://identifiers.org/insdc/AP011615.1#sequence>  insdc:sequence_length 6788435 .
<http://identifiers.org/insdc/AP011615.1#sequence>  insdc:topology  insdc:circular .
<http://identifiers.org/insdc/AP011615.1#sequence>  obo:so_has_quality  obo:SO_0000988 .  # SO:circular
<http://identifiers.org/insdc/AP011615.1> rdfs:seeAlso  <http://identifiers.org/ncbigi/GI:AP011615.1> .
<http://identifiers.org/ncbigi/GI:AP011615.1> rdfs:label  "GI:AP011615.1" .
<http://identifiers.org/ncbigi/GI:AP011615.1> rdf:type  insdc:GI .
<http://identifiers.org/ncbigi/GI:AP011615.1> sio:SIO_000068  <http://identifiers.org/ncbigi> .  # sio:is-part-of
<http://identifiers.org/insdc/AP011615.1> rdfs:seeAlso  <http://identifiers.org/refseq/AP011615.1> .
<http://identifiers.org/refseq/AP011615.1>  rdfs:label  "AP011615.1" .
<http://identifiers.org/refseq/AP011615.1>  rdf:type  insdc:RefSeq .
<http://identifiers.org/refseq/AP011615.1>  sio:SIO_000068  <http://identifiers.org/refseq> .  # sio:is-part-of
<http://identifiers.org/insdc/AP011615.1#sequence>  insdc:location  "1..6788435" .
<http://identifiers.org/insdc/AP011615.1#sequence>  faldo:location  <http://identifiers.org/insdc/AP011615.1#region:1-6788435:1> .
<http://identifiers.org/insdc/AP011615.1#region:1-6788435:1>  rdf:type  faldo:Region .
<http://identifiers.org/insdc/AP011615.1#region:1-6788435:1>  faldo:begin <http://identifiers.org/insdc/AP011615.1#position:1:1> .
<http://identifiers.org/insdc/AP011615.1#region:1-6788435:1>  faldo:end <http://identifiers.org/insdc/AP011615.1#position:6788435:1> .
<http://identifiers.org/insdc/AP011615.1#position:1:1>  faldo:position  1 .
<http://identifiers.org/insdc/AP011615.1#position:1:1>  faldo:reference <http://identifiers.org/insdc/AP011615.1#sequence> .
<http://identifiers.org/insdc/AP011615.1#position:1:1>  rdf:type  faldo:ForwardStrandPosition .
<http://identifiers.org/insdc/AP011615.1#position:1:1>  rdf:type  faldo:ExactPosition .
<http://identifiers.org/insdc/AP011615.1#position:6788435:1>  faldo:position  6788435 .
<http://identifiers.org/insdc/AP011615.1#position:6788435:1>  faldo:reference <http://identifiers.org/insdc/AP011615.1#sequence> .
<http://identifiers.org/insdc/AP011615.1#position:6788435:1>  rdf:type  faldo:ForwardStrandPosition .
<http://identifiers.org/insdc/AP011615.1#position:6788435:1>  rdf:type  faldo:ExactPosition .
<http://identifiers.org/insdc/AP011615.1> rdfs:seeAlso  <http://identifiers.org/taxonomy/696747> .
<http://identifiers.org/taxonomy/696747>  rdfs:label  "696747" .
<http://identifiers.org/taxonomy/696747>  rdf:type  insdc:taxon .
<http://identifiers.org/taxonomy/696747>  sio:SIO_000068  <http://identifiers.org/taxonomy> .  # sio:is-part-of
<http://identifiers.org/insdc/AP011615.1#sequence>  obo:RO_0002162  <http://identifiers.org/taxonomy/696747> .  # RO:in taxon
<http://identifiers.org/insdc/AP011615.1> insdc:mol_type  "genomic DNA" .
<http://identifiers.org/insdc/AP011615.1> insdc:organism  "Arthrospira platensis NIES-39" .
<http://identifiers.org/insdc/AP011615.1> insdc:strain  "NIES-39" .
個人用ツール