SPARQLthon22/AssemblyReports
提供:TogoWiki
NCBI Assembly_Reportsのメタデータを調査する
- GENOME_REPORTSメタデータでは真核生物の配列Accessionを取得できないため、ASSEMBLY_REPORTSの内容を確認する
- 2014年7月8日版 assembly_summary_refseq は18262件。そのうちversion_status “latest”は17454件
- 個別のAll/*.assembly.txtファイルに配列アクセッション、配列ラベル、配列タイプが記述されているためRDF試作した
- Sequence Ontologyのアサインメントに関連するメタデータ等を確認した
メタデータ
- assembly_level
- "Gapless Chromosome" 2739
- "Chromosome" 513
- "Chromosome with gaps" 187
- "Contig" 9355
- "Scaffold" 5468
- genome_rep
- "Full" 18188
- "Partial" 74
- refseq_category
- "reference-genome" 91
- "representative-genome" 3433
- "na" 14738
- release_type
- "Patch" 10
- "Major" 18252
- version_status
- "latest" 17454
- "replaced" 808
- relationship
- "=" 7611212
- "" 1
- "<>" 1584971
- release_type
- "Patch" 10
- "Major" 18252
- sequence_role
- "assembled-molecule" 10019
- "novel-patch" 478
- "" 1
- "alt-scaffold" 8444
- "unplaced-scaffold" 8811124
- "fix-patch" 656
- "pseudo-scaffold" 15
- "unlocalized-scaffold" 365447
サンプルRDF
- Synechocystis sp. PCC 6803
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix obo: <http://purl.obolibrary.org/obo/> . @prefix asm: <http://www.ncbi.nlm.nih.gov/assembly/> . [ asm:assembly_id "GCF_000009725.1" ; asm:bioproject_accession "PRJNA57659" ; asm:bioproject <http://identifiers.org/bioproject/PRJNA57659> ; asm:biosample_accession "na" ; asm:wgs_master "na" ; asm:refseq_category "representative-genome" ; asm:tax_id "1148" ; asm:taxon <http://identifiers.org/taxonomy/1148> ; asm:species_taxid "1148" ; asm:organism_name "Synechocystis sp. PCC 6803" ; asm:infraspecific_name "strain=PCC 6803" ; asm:isolate "na" ; asm:version_status "latest" ; asm:assembly_level "Gapless Chromosome" ; asm:release_type "Major" ; asm:genome_rep "Full" ; asm:release_date "2004/05/11" ; asm:asm_name "ASM972v1" ; asm:submitter "Kazusa" ; asm:gbrs_paired_asm "GCA_000009725.1" ; asm:paired_asm_comp "identical" ; rdfs:seeAlso <http://www.ncbi.nlm.nih.gov/assembly/GCF_000009725.1> ; asm:sequnece [ asm:sequence_name "ANONYMOUS" ; asm:sequence_role "assembled-molecule" ; asm:assigned_molecule "na" ; asm:assigned_molecule_location_type "Chromosome" ; asm:genbank_accession "BA000022.2" ; asm:genbank <http://identifiers.org/insdc/BA000022.2> ; asm:relationship "=" ; asm:refseq_accession "NC_000911.1" ; asm:refseq <http://identifiers.org/refseq/NC_000911.1> ; asm:assembly_unit "Primary Assembly" ] ; asm:sequnece [ asm:sequence_name "pSYSA" ; asm:sequence_role "assembled-molecule" ; asm:assigned_molecule "pSYSA" ; asm:assigned_molecule_location_type "Plasmid" ; asm:genbank_accession "AP004311.1" ; asm:genbank <http://identifiers.org/insdc/AP004311.1> ; asm:relationship "=" ; asm:refseq_accession "NC_005230.1" ; asm:refseq <http://identifiers.org/refseq/NC_005230.1> ; asm:assembly_unit "Primary Assembly" ] ; asm:sequnece [ asm:sequence_name "pSYSG" ; asm:sequence_role "assembled-molecule" ; asm:assigned_molecule "pSYSG" ; asm:assigned_molecule_location_type "Plasmid" ; asm:genbank_accession "AP004312.1" ; asm:genbank <http://identifiers.org/insdc/AP004312.1> ; asm:relationship "=" ; asm:refseq_accession "NC_005231.1" ; asm:refseq <http://identifiers.org/refseq/NC_005231.1> ; asm:assembly_unit "Primary Assembly" ] ; asm:sequnece [ asm:sequence_name "pSYSM" ; asm:sequence_role "assembled-molecule" ; asm:assigned_molecule "pSYSM" ; asm:assigned_molecule_location_type "Plasmid" ; asm:genbank_accession "AP004310.1" ; asm:genbank <http://identifiers.org/insdc/AP004310.1> ; asm:relationship "=" ; asm:refseq_accession "NC_005229.1" ; asm:refseq <http://identifiers.org/refseq/NC_005229.1> ; asm:assembly_unit "Primary Assembly" ] ; asm:sequnece [ asm:sequence_name "pSYSX" ; asm:sequence_role "assembled-molecule" ; asm:assigned_molecule "pSYSX" ; asm:assigned_molecule_location_type "Plasmid" ; asm:genbank_accession "AP006585.1" ; asm:genbank <http://identifiers.org/insdc/AP006585.1> ; asm:relationship "=" ; asm:refseq_accession "NC_005232.1" ; asm:refseq <http://identifiers.org/refseq/NC_005232.1> ; asm:assembly_unit "Primary Assembly" ] ; ] .
- Chlamydomonas reinhardti
- http://www.ncbi.nlm.nih.gov/assembly?Db=assembly&DbFrom=bioproject&Cmd=Link&LinkName=bioproject_assembly&LinkReadableName=Assembly&ordinalpos=1&IdsFromResult=21061
[
asm:assembly_id "GCF_000002595.1" ;
asm:bioproject_accession "PRJNA21061" ;
asm:bioproject <http://identifiers.org/bioproject/PRJNA21061> ;
asm:biosample_accession "na" ;
asm:wgs_master "ABCN00000000.1" ;
asm:refseq_category "representative-genome" ;
asm:tax_id "3055" ;
asm:taxon <http://identifiers.org/taxonomy/3055> ;
asm:species_taxid "3055" ;
asm:organism_name "Chlamydomonas reinhardtii" ;
asm:infraspecific_name "strain=CC-503 cw92 mt+" ;
asm:isolate "na" ;
asm:version_status "latest" ;
asm:assembly_level "Scaffold" ;
asm:release_type "Major" ;
asm:genome_rep "Full" ;
asm:release_date "2007/10/15" ;
asm:asm_name "v3.0" ;
asm:submitter "DOE Joint Genome Institute" ;
asm:gbrs_paired_asm "GCA_000002595.2" ;
asm:paired_asm_comp "different" ;
rdfs:seeAlso <http://www.ncbi.nlm.nih.gov/assembly/GCF_000002595.1> ;
asm:sequnece [
asm:sequence_name "CHLREscaffold_1" ;
asm:sequence_role "unplaced-scaffold" ;
asm:assigned_molecule "na" ;
asm:assigned_molecule_location_type "na" ;
asm:genbank_accession "DS496108.1" ;
asm:genbank <http://identifiers.org/insdc/DS496108.1> ;
asm:relationship "=" ;
asm:refseq_accession "NW_001843471.1" ;
asm:refseq <http://identifiers.org/refseq/NW_001843471.1> ;
asm:assembly_unit "Primary Assembly" ] ;
asm:sequnece [
asm:sequence_name "CHLREscaffold_2" ;
asm:sequence_role "unplaced-scaffold" ;
asm:assigned_molecule "na" ;
asm:assigned_molecule_location_type "na" ;
asm:genbank_accession "DS496109.1" ;
asm:genbank <http://identifiers.org/insdc/DS496109.1> ;
asm:relationship "=" ;
asm:refseq_accession "NW_001843642.1" ;
asm:refseq <http://identifiers.org/refseq/NW_001843642.1> ;
asm:assembly_unit "Primary Assembly" ] ;
asm:sequnece [
asm:sequence_name "CHLREscaffold_3" ;
asm:sequence_role "unplaced-scaffold" ;
asm:assigned_molecule "na" ;
asm:assigned_molecule_location_type "na" ;
asm:genbank_accession "DS496110.1" ;
asm:genbank <http://identifiers.org/insdc/DS496110.1> ;
asm:relationship "=" ;
asm:refseq_accession "NW_001843733.1" ;
asm:refseq <http://identifiers.org/refseq/NW_001843733.1> ;
asm:assembly_unit "Primary Assembly" ] ;
# ... 省略
]
.
リンク
- ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/
- assembly_summary_refseq.txt
- assembly_summary_genbank.txt