RubyからRの機能を使う
提供:TogoWiki
目次 |
RSRuby
Tigerでは、
$ export LD_LIBRARY_PATH=:/Library/Frameworks/R.framework/Resources/lib $ ruby187/bin/gem install rsruby-0.5.1.1.gem -- --with-R-dir=$R_HOME Building native extensions. This could take a while... Successfully installed rsruby-0.5.1.1 1 gem installed Installing ri documentation for rsruby-0.5.1.1... Installing RDoc documentation for rsruby-0.5.1.1... $ R --version R version 2.8.0 (2008-10-20)
利用コードは
ENV['R_HOME']='/Library/Frameworks/R.framework/Resources/lib' require 'rsruby'
Snow Leopard と R 2.12 の場合だと、
sudo gem install rsruby -- --with-R-dir=/Library/Frameworks/R.framework/Resources
ENV['R_HOME']='/Library/Frameworks/R.framework/Resources' require 'rsruby'
で OK
RinRuby
Rserve + Rserve-Ruby-client
gem install rserve-client
ベンチマーク
ここまでが事前調査
ここからがテストしてみた結果
BioRuby と RinRuby, Rserve の比較と RSRuby, RSOAP の挫折
ベンチマークは 6.8MB, 5210 エントリの塩基配列 FASTA ファイルを読み込んで翻訳するスピードを競う、らしい。
入力データ:
% head -50 test-dna.fa >2L52.1 atgtcaatggtaagaaatgtatcaaatcagagcgaaaaattggaaatttt gtcatgtaaatgggtaggatgtctcaaatcaacagaagtgttcaaaacgg ttgaaaagttattagatcatgttacggctgatcatattccagaagttatt gtaaacgatgacgggtcggaggaagtcgtttgtcagtgggattgctgcga aatgggtgccagtcgtggaaatcttcaaaaaaagaaagagtggatggaga atcacttcaaaacacgtcatgttcgcaaagcaaaaatattcaaatgctta attgaggattgccctgtggtaaagtcaagtagtcaggaaattgaaaccca tctcagaataagtcatccaataaatccgaaaaaagagagactgaaagagt ttaaaagttctaccgaccacatcgaacctactcaagctaatagagtatgg acaattgtgaacggagaggttcaatggaagactccaccgcgggttaaaaa aaagactgtgatatactatgatgatgggccgaggtatgtatttccaacgg gatgtgcgagatgcaactatgatagtgacgaatcagaactggaatcagat gagttttggtcagccacagagatgtcagataatgaagaagtatatgtgaa cttccgtggaatgaactgtatctcaacaggaaagtcggccagtatggtcc cgagcaaacgaagaaattggccaaaaagagtgaagaaaaggctatcgaca caaagaaacaatcagaaaactattcgaccaccagagctgaataaaaataa tatagagataaaagatatgaactcaaataaccttgaagaacgcaacagag aagaatgcattcagcctgtttctgttgaaaagaacatcctgcattttgaa aaattcaaatcaaatcaaatttgcattgttcgggaaaacaataaatttag agaaggaacgagaagacgcagaaagaattctggtgaatcggaagacttga aaattcatgaaaactttactgaaaaacgaagacccattcgatcatgcaaa caaaatataagtttctatgaaatggacggggatatagaagaatttgaagt gtttttcgatactcccacaaaaagcaaaaaagtacttctggatatctaca gtgcgaagaaaatgccaaaaattgaggttgaagattcattagttaataag tttcattcaaaacgtccatcaagagcatgtcgagttcttggaagtatgga agaagtaccatttgatgtggaaataggatattga >2RSSE.1 atgacagtggcgagttacagtatggtgctgtgtggctcatctgatgatca tcgctatcgaggcagaatcgaaaaagtaaaattcggcgtacccataaacg aagcatttgcccatgacattcccgccacgcttctcatgctcttgctcaaa gtgaacaaggatggacccgcgaaaaaggatatttggcgagcgcccggaaa tcaggctcaagtgcgaaaattgtcgcaagtgatgcaacacgggcggcttg taaatatcgagaatttcacggtttacacggcggcatctgtcatcaaaaag tttctttcaaagttgccaaacggcatttttggacgggataatgaggagac actgttcaatagtgcatcgactggaatggatattgagaagcagagacagg tgttttataggatatttggatcacttccagtcgcatcccaacacttgctc gtcctacttttcggcacatttcgggtcgtcgccgactcgtcggacggtca ttcgaacgcgatgaacccgaatgcgatcgcgatttcggtggcaccatcgc tttttcacacttgtatacacgatggacggacggcgcgagtagaagacctt caacggttcaagctggcctcgaacattgtgtgctcgataatttgctcatt cggcgacacgaagctcttcccacgcgagtgctacgagtattacgccagat acacgggtcgcacgttgcgaatcgacgagaatcgaatgttcacttttcat aatccatccaaccgtcgtgctcgtggcgaagagttctccgcgttggcggc aaagtgtgcgggcgcctactcgctggccgccatccacctggccgaagaag cgtcaccggagcccactccgacaacctcgaagcctccacgtggcaacggc gtcgggcgtgccgggagtctgaagcagcacgcgttgacccagacgacgga tcatccgaagagaagcgtgtcgatcgcggctaaggatccgtatccaactg atttaaggacatcggtcagctgtgatttttga >2RSSE.2 :
結果:Ruby 1.9 は 1.8 よりかなり速い。R を使うより BioRuby 単独の方が速い(ただしBioLib + Ruby は超速いらしい)。Rserve は結構安定して使える。
→ RSRuby を使うと R を利用しても BioRuby 単独より速くなった。前回インストールに失敗したのは R が古かったため。m(__)m
→ ただし、BioRuby は Ruby 1.9 の gsub(/re/, hash) を利用することでさらに高速化できることが判明。そのバージョンを使うと RSRuby よりも若干速くなり最速(約2.1秒)。
- pure BioRuby
Ruby 1.9 で約3秒、Ruby 1.8 で約6秒
% time ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa > /dev/null ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa > /dev/null 2.96s user 0.07s system 97% cpu 3.120 total % time ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null 28.25s user 0.29s system 91% cpu 31.250 total
% time ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa > /dev/null ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa > /dev/null 5.36s user 0.18s system 97% cpu 5.666 total % time ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null 52.98s user 1.48s system 91% cpu 59.319 total
- RinRuby
途中で実行が継続できなくなりテストできなかった。
- Rserve
Ruby 1.9 で約6秒、Ruby 1.8 で約16秒
% time ruby-1.9 DNAtranslate-rserve.rb test-dna.fa > /dev/null ruby-1.9 DNAtranslate-rserve.rb test-dna.fa > /dev/null 4.16s user 0.25s system 76% cpu 5.737 total % time ruby-1.9 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null ruby-1.9 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null 39.71s user 1.90s system 75% cpu 55.061 total
% time ruby-1.8 DNAtranslate-rserve.rb test-dna.fa > /dev/null ruby-1.8 DNAtranslate-rserve.rb test-dna.fa > /dev/null 13.69s user 0.28s system 87% cpu 16.010 total % time ruby-1.8 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null ruby-1.8 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null 135.66s user 2.56s system 82% cpu 2:46.68 total
- RSRuby
Snow Leopard では?インストールが困難でテストまでたどり着けなかった。
→ 後日再挑戦して無事動作。最速。1.8 と 1.9 であまり差が出ず(1.9の方が若干速い)、どちらも約 2.5 秒。
% time ruby-1.9 DNAtranslate-rsruby.rb test-dna.fa > /dev/null ruby-1.9 DNAtranslate-rsruby.rb test-dna.fa > /dev/null 2.35s user 0.12s system 99% cpu 2.481 total % time ruby-1.9 DNAtranslate-rsruby.rb test-dna.fa 10 > /dev/null ruby-1.9 DNAtranslate-rsruby.rb test-dna.fa 10 > /dev/null 19.77s user 0.42s system 99% cpu 20.280 total
% time ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa > /dev/null ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa > /dev/null 2.45s user 0.12s system 99% cpu 2.589 total % time ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa 10 > /dev/null ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa 10 > /dev/null 21.27s user 0.61s system 98% cpu 22.128 total
- RSOAP
サーバのインストール方法を調べるところから。
pure BioRuby
% cat DNAtranslate-bioruby.rb
require 'rubygems'
require 'bio'
fasta = ARGV.shift
repeat = (ARGV.shift || 1).to_i
repeat.times do
Bio::FlatFile.auto(fasta).each do |entry|
puts ">#{entry.entry_id}"
puts entry.naseq.translate
end
end
% time ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa > /dev/null ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa > /dev/null 2.96s user 0.07s system 97% cpu 3.120 total % time ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null 28.25s user 0.29s system 91% cpu 31.250 total % time ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa > /dev/null ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa > /dev/null 5.36s user 0.18s system 97% cpu 5.666 total % time ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null 52.98s user 1.48s system 91% cpu 59.319 total
Ruby 1.9 の gsub(/re/, hash) を試してみるべき。
Rinruby
Pure Ruby で書かれた R の実行ライブラリ。http://blog.itoshi.tv/2010/09/rinruby/
% sudo gem install rinruby Successfully installed rinruby-2.0.1 1 gem installed Installing ri documentation for rinruby-2.0.1... Installing RDoc documentation for rinruby-2.0.1...
% cat DNAtranslate-rinruby.rb
require 'rubygems'
require 'rinruby'
require 'bio'
R.echo(enable = false)
R.eval('library(GeneR)')
fasta = ARGV.shift
repeat = (ARGV.shift || 1).to_i
repeat.times do
Bio::FlatFile.auto(fasta).each do |entry|
puts ">#{entry.entry_id}"
ntseq = entry.seq
R.eval(%Q[aaseq <- strTranslate("#{ntseq}")])
puts R.pull("aaseq")
end
end
% time ruby-1.9 DNAtranslate-rinruby.rb test-dna.fa
遅い上に数十エントリ処理したところで途中で止まってしまい、計測できなかった。実用には向かない感じ?
Rserve
rserve-client をインストールする。http://github.com/clbustos/Rserve-Ruby-client
% sudo gem install rserve-client Successfully installed rserve-client-0.2.5 1 gem installed Installing ri documentation for rserve-client-0.2.5... Installing RDoc documentation for rserve-client-0.2.5...
Rserve のインストール。http://www.rforge.net/Rserve/files/
% wget http://www.rforge.net/Rserve/snapshot/Rserve_0.6-2.tar.gz
% R CMD INSTALL Rserve_0.6-2.tar.gz
* Installing to library ‘/Library/Frameworks/R.framework/Resources/library’
* Installing *source* package ‘Rserve’ ...
checking whether to compile the server... yes
checking whether to compile the client... no
checking for gcc... gcc -arch i386 -std=gnu99
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc -arch i386 -std=gnu99 accepts -g... yes
checking for gcc -arch i386 -std=gnu99 option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -arch i386 -std=gnu99 -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... rm: conftest.dSYM: is a directory
rm: conftest.dSYM: is a directory
yes
checking for sys/wait.h that is POSIX.1 compatible... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for string.h... (cached) yes
checking for memory.h... (cached) yes
checking sys/time.h usability... yes
checking sys/time.h presence... yes
checking for sys/time.h... yes
checking for unistd.h... (cached) yes
checking for sys/stat.h... (cached) yes
checking for sys/types.h... (cached) yes
checking sys/socket.h usability... yes
checking sys/socket.h presence... yes
checking for sys/socket.h... yes
checking sys/un.h usability... yes
checking sys/un.h presence... yes
checking for sys/un.h... yes
checking netinet/in.h usability... yes
checking netinet/in.h presence... yes
checking for netinet/in.h... yes
checking netinet/tcp.h usability... yes
checking netinet/tcp.h presence... yes
checking for netinet/tcp.h... yes
checking for an ANSI C-conforming const... yes
checking whether byte ordering is bigendian... no
checking whether time.h and sys/time.h may both be included... yes
checking for pid_t... yes
checking vfork.h usability... no
checking vfork.h presence... no
checking for vfork.h... no
checking for fork... yes
checking for vfork... yes
checking for working fork... yes
checking for working vfork... (cached) yes
checking return type of signal handlers... void
checking for memset... yes
checking for mkdir... yes
checking for rmdir... yes
checking for select... yes
checking for socket... yes
checking for library containing crypt... none required
checking crypt.h usability... no
checking crypt.h presence... no
checking for crypt.h... no
checking for socklen_t... yes
checking for connect... yes
checking for dlopen in -ldl... yes
configure: creating ./config.status
config.status: creating src/Makefile
config.status: creating src/client/cxx/Makefile
config.status: creating src/config.h
** libs
** arch - i386
gcc -arch i386 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -I/usr/local/include -DDAEMON -Iinclude -I. -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -fPIC -g -O2 -c Rserv.c -o Rserv.o
gcc -arch i386 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -I/usr/local/include -DDAEMON -Iinclude -I. -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -fPIC -g -O2 -c session.c -o session.o
gcc -arch i386 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -I/usr/local/include -DDAEMON -Iinclude -I. -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -fPIC -g -O2 -c md5.c -o md5.o
gcc -arch i386 -std=gnu99 Rserv.o session.o md5.o -o Rserve -F/Library/Frameworks/R.framework/.. -framework R -ldl
cp Rserve Rserve.so
gcc -arch i386 -std=gnu99 -Iinclude -I. -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -c Rserv.c -o Rserv_d.o -DNODAEMON -DRSERV_DEBUG -g -I/usr/local/include -g -O2
gcc -arch i386 -std=gnu99 Rserv_d.o session.o md5.o -o Rserve.dbg -F/Library/Frameworks/R.framework/.. -framework R -ldl
cp Rserve Rserve-bin.so
cp Rserve.dbg Rserve-dbg.so
./mergefat Rserve "/Library/Frameworks/R.framework/Resources/bin/Rserve"
./mergefat Rserve.dbg "/Library/Frameworks/R.framework/Resources/bin/Rserve.dbg"
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
>>> Building/Updating help pages for package 'Rserve'
Formats: text html latex example
Rclient text html latex example
Rserv text html latex
** building package indices ...
* DONE (Rserve)
% R CMD Rserve R version 2.9.2 (2009-08-24) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Rserv started in daemon mode.
Rserve のサーバがうごいたー。
% cat rserve-test.rb
require 'rubygems'
require 'rserve'
include Rserve
c = Connection.new
x = c.eval("R.version.string");
puts x.as_string
テストコードを動かしてみる。
% ruby-1.8 rserve-test.rb R version 2.9.2 (2009-08-24)
オッケー。
ここで、BioConductor の GeneR パッケージをインストールする。
% R
> source("http://bioconductor.org/biocLite.R")
> biocLite()
かなーり待たされてインストール完了。 しかし GeneR はこの中に入っていないらしい。。
> library(GeneR) 以下にエラー library(GeneR) : 'GeneR' という名前のパッケージはありません
http://www.bioconductor.org/packages/2.3/bioc/html/GeneR.html によると、
> source("http://bioconductor.org/biocLite.R")
> biocLite("GeneR")
Using R version 2.9.2, biocinstall version 2.4.13.
Installing Bioconductor version 2.4 packages:
[1] "GeneR"
Please wait...
URL 'http://bioconductor.org/packages/2.4/bioc/bin/macosx/universal/contrib/2.9/GeneR_2.14.0.tgz' を試しています
Content type 'application/x-gzip' length 411393 bytes (401 Kb)
開かれた URL
==================================================
downloaded 401 Kb
The downloaded packages are in
/var/folders/lt/ltVmCLsiF3mLKUpLCN3GlU+++TM/-Tmp-//RtmpHRXj4g/downloaded_packages
とするらしい。
> library(GeneR)
次のパッケージを付け加えます: 'GeneR'
The following object(s) are masked from package:utils :
relist
ちょっと試してみる。
> seq = "atgacagtggcgagttacagtatggtgctgtgtggctcatctgatgatca + tcgctatcgaggcagaatcgaaaaagtaaaattcggcgtacccataaacg + aagcatttgcccatgacattcccgccacgcttctcatgctcttgctcaaa + gtgaacaaggatggacccgcgaaaaaggatatttggcgagcgcccggaaa + tcaggctcaagtgcgaaaattgtcgcaagtgatgcaacacgggcggcttg + taaatatcgagaatttcacggtttacacggcggcatctgtcatcaaaaag + tttctttcaaagttgccaaacggcatttttggacgggataatgaggagac + actgttcaatagtgcatcgactggaatggatattgagaagcagagacagg + tgttttataggatatttggatcacttccagtcgcatcccaacacttgctc + gtcctacttttcggcacatttcgggtcgtcgccgactcgtcggacggtca + ttcgaacgcgatgaacccgaatgcgatcgcgatttcggtggcaccatcgc + tttttcacacttgtatacacgatggacggacggcgcgagtagaagacctt + caacggttcaagctggcctcgaacattgtgtgctcgataatttgctcatt + cggcgacacgaagctcttcccacgcgagtgctacgagtattacgccagat + acacgggtcgcacgttgcgaatcgacgagaatcgaatgttcacttttcat + aatccatccaaccgtcgtgctcgtggcgaagagttctccgcgttggcggc + aaagtgtgcgggcgcctactcgctggccgccatccacctggccgaagaag + cgtcaccggagcccactccgacaacctcgaagcctccacgtggcaacggc + gtcgggcgtgccgggagtctgaagcagcacgcgttgacccagacgacgga + tcatccgaagagaagcgtgtcgatcgcggctaaggatccgtatccaactg + atttaaggacatcggtcagctgtgatttttga + " > seq [1] "atgacagtggcgagttacagtatggtgctgtgtggctcatctgatgatca\ntcgctatcgaggcagaatcgaaaaagtaaaattcggcgtacccataaacg\naagcatttgcccatgacattcccgccacgcttctcatgctcttgctcaaa\ngtgaacaaggatggacccgcgaaaaaggatatttggcgagcgcccggaaa\ntcaggctcaagtgcgaaaattgtcgcaagtgatgcaacacgggcggcttg\ntaaatatcgagaatttcacggtttacacggcggcatctgtcatcaaaaag\ntttctttcaaagttgccaaacggcatttttggacgggataatgaggagac\nactgttcaatagtgcatcgactggaatggatattgagaagcagagacagg\ntgttttataggatatttggatcacttccagtcgcatcccaacacttgctc\ngtcctacttttcggcacatttcgggtcgtcgccgactcgtcggacggtca\nttcgaacgcgatgaacccgaatgcgatcgcgatttcggtggcaccatcgc\ntttttcacacttgtatacacgatggacggacggcgcgagtagaagacctt\ncaacggttcaagctggcctcgaacattgtgtgctcgataatttgctcatt\ncggcgacacgaagctcttcccacgcgagtgctacgagtattacgccagat\nacacgggtcgcacgttgcgaatcgacgagaatcgaatgttcacttttcat\naatccatccaaccgtcgtgctcgtggcgaagagttctccgcgttggcggc\naaagtgtgcgggcgcctactcgctggccgccatccacctggccgaagaag\ncgtcaccggagcccactccgacaacctcgaagcctccacgtggcaacggc\ngtcgggcgtgccgggagtctgaagcagcacgcgttgacccagacgacgga\ntcatccgaagagaagcgtgtcgatcgcggctaaggatccgtatccaactg\natttaaggacatcggtcagctgtgatttttga\n" > strTranslate(seq) [1] "MTVASYSMVLCGSSDD-SLSRQNRKSKIRRTHK-KHLPMTFPPRFSCSCS-VNKDGPAKKDIWRAPG-SGSSAKIVASDATRAA-*ISRISRFTRRHLSSK-FLSKLPNGIFGRDNEE-TVQ*CIDWNGY*EAET-CFIGYLDHFQSHPNTC-VLLFGTFRVVADSSDG-FERDEPECDRDFGGTI-FFTLVYTMDGRRE*KT-QRFKLASNIVCSIICS-RRHEALPTRVLRVLRQ-TRVARCESTRIECSLF-NPSNRRARGEEFSALA-KVCGRLLAGRHPPGRR-RHRSPLRQPRSLHVAT-VGRAGSLKQHALTQTT-SSEEKRVDRG*GSVSN-I*GHRSAVIF-"
大丈夫そう。
頂いた Python 用のテストプログラム:
# Read a FASTA file a number of times (default once), translate
# using R/Bioconductor GeneR and print to STDOUT
#
# Usage:
#
# python DNAtranslate.py dna.fa [n]
#
# Example:
#
# python DNAtranslate.py ../../../test/data/test-dna.fa
#
verbose=False
import sys
import time
from Bio.Seq import Seq
from Bio import SeqIO
from Bio.Alphabet import generic_dna
import subprocess
import pyRserve
fn = sys.argv[1]
times = 1
if len(sys.argv) > 2:
times = int(sys.argv[2])
# Start the RServer
subprocess.Popen([r"R","CMD", "Rserve"], stdout=subprocess.PIPE).wait()
time.sleep(0.5)
conn = pyRserve.rconnect()
conn('library(GeneR)')
if verbose:
print >> sys.stderr, 'Biopython translate ',fn, ':', times
for i in range(0, times):
if verbose:
print >> sys.stderr, i+1
for seq_record in SeqIO.parse(fn, "fasta", generic_dna):
print ">",seq_record.id
ntseq = str(seq_record.seq)
print conn('strTranslate("'+ntseq+'")')
# Kill the RServer
subprocess.Popen([r"killall", "Rserve"], stdout=subprocess.PIPE)
これを Ruby に翻訳
require 'rubygems'
require 'rserve'
require 'bio'
rserve = Rserve::Connection.new
rserve.eval('library(GeneR)')
fasta = ARGV.shift
repeat = (ARGV.shift || 1).to_i
repeat.times do
Bio::FlatFile.auto(fasta).each do |entry|
puts ">#{entry.entry_id}"
ntseq = entry.seq
result = rserve.eval(%Q[strTranslate("#{ntseq}")])
puts result.as_string
end
end
実行してみる
% ruby-1.8 DNAtranslate-rserve.rb test-dna.fa >2L52.1 MSMVRNVSNQSEKLEILSCKWVGCLKSTEVFKTVEKLLDHVTADHIPEVIVNDDGSEEVVCQWDCCEMGASRGNLQKKKEWMENHFKTRHVRKAKIFKCLIEDCPVVKSSSQEIETHLRISHPINPKKERLKEFKSSTDHIEPTQANRVWTIVNGEVQWKTPPRVKKKTVIYYDDGPRYVFPTGCARCNYDSDESELESDEFWSATEMSDNEEVYVNFRGMNCISTGKSASMVPSKRRNWPKRVKKRLSTQRNNQKTIRPPELNKNNIEIKDMNSNNLEERNREECIQPVSVEKNILHFEKFKSNQICIVRENNKFREGTRRRRKNSGESEDLKIHENFTEKRRPIRSCKQNISFYEMDGDIEEFEVFFDTPTKSKKVLLDIYSAKKMPKIEVEDSLVNKFHSKRPSRACRVLGSMEEVPFDVEIGY* >2RSSE.1 MTVASYSMVLCGSSDDHRYRGRIEKVKFGVPINEAFAHDIPATLLMLLLKVNKDGPAKKDIWRAPGNQAQVRKLSQVMQHGRLVNIENFTVYTAASVIKKFLSKLPNGIFGRDNEETLFNSASTGMDIEKQRQVFYRIFGSLPVASQHLLVLLFGTFRVVADSSDGHSNAMNPNAIAISVAPSLFHTCIHDGRTARVEDLQRFKLASNIVCSIICSFGDTKLFPRECYEYYARYTGRTLRIDENRMFTFHNPSNRRARGEEFSALAAKCAGAYSLAAIHLAEEASPEPTPTTSKPPRGNGVGRAGSLKQHALTQTTDHPKRSVSIAAKDPYPTDLRTSVSCDF* :
計測
% time ruby-1.9 DNAtranslate-rserve.rb test-dna.fa > /dev/null ruby-1.9 DNAtranslate-rserve.rb test-dna.fa > /dev/null 4.16s user 0.25s system 76% cpu 5.737 total % time ruby-1.9 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null ruby-1.9 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null 39.71s user 1.90s system 75% cpu 55.061 total
% time ruby-1.8 DNAtranslate-rserve.rb test-dna.fa > /dev/null ruby-1.8 DNAtranslate-rserve.rb test-dna.fa > /dev/null 13.69s user 0.28s system 87% cpu 16.010 total % time ruby-1.8 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null ruby-1.8 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null 135.66s user 2.56s system 82% cpu 2:46.68 total
RSRuby
% sudo gem install rsruby -- --with-R-dir=/Library/Frameworks/R.framework/Resources Building native extensions. This could take a while... Successfully installed rsruby-0.5.1.1 1 gem installed Installing ri documentation for rsruby-0.5.1.1... Installing RDoc documentation for rsruby-0.5.1.1...
ENV['R_HOME'] = '/Library/Frameworks/R.framework/Resources'
require 'rubygems'
require 'rsruby'
require 'bio'
r = RSRuby.instance
r.library('GeneR')
fasta = ARGV.shift
repeat = (ARGV.shift || 1).to_i
repeat.times do
Bio::FlatFile.auto(fasta).each do |entry|
puts ">#{entry.entry_id}"
ntseq = entry.seq
puts r.eval_R(%Q[aaseq <- strTranslate("#{ntseq}")])
end
end
% R
> source("http://bioconductor.org/biocLite.R")
> biocLite("GeneR")
% time ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa
次のパッケージを付け加えます: 'GeneR'
The following object(s) are masked from 'package:utils':
relist
>2L52.1
MSMVRNVSNQSEKLEILSCKWVGCLKSTEVFKTVEKLLDHVTADHIPEVIVNDDGSEEVVCQWDCCEMGASRGNLQKKKEWMENHFKTRHVRKAKIFKCLIEDCPVVKSSSQEIETHLRISHPINPKKERLKEFKSSTDHIEPTQANRVWTIVNGEVQWKTPPRVKKKTVIYYDDGPRYVFPTGCARCNYDSDESELESDEFWSATEMSDNEEVYVNFRGMNCISTGKSASMVPSKRRNWPKRVKKRLSTQRNNQKTIRPPELNKNNIEIKDMNSNNLEERNREECIQPVSVEKNILHFEKFKSNQICIVRENNKFREGTRRRRKNSGESEDLKIHENFTEKRRPIRSCKQNISFYEMDGDIEEFEVFFDTPTKSKKVLLDIYSAKKMPKIEVEDSLVNKFHSKRPSRACRVLGSMEEVPFDVEIGY*
:
% time ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa > /dev/null ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa > /dev/null 2.56s user 0.16s system 68% cpu 3.968 total % time ruby-1.9 DNAtranslate-rsruby.rb test-dna.fa > /dev/null ruby-1.9 DNAtranslate-rsruby.rb test-dna.fa > /dev/null 2.48s user 0.17s system 65% cpu 4.061 total
かなり高速に処理できるようだ。
すみません、下記情報は R のバージョンが古いのが問題でした。ハズカシイ。指摘していただいた西山さんありがとうございます(片山 2010/11/5 追記)
しかし R.h が見つからないといわれる
% sudo gem-1.8 install rsruby -- --with-R-dir=/Library/Frameworks/R.framework/Resources --with-R-include=/Library/Frameworks/R.framework/Resources/include
Building native extensions. This could take a while...
ERROR: Error installing rsruby:
ERROR: Failed to build gem native extension.
/usr/local/bin/ruby-1.8 extconf.rb --with-R-dir=/Library/Frameworks/R.framework/Resources --with-R-include=/Library/Frameworks/R.framework/Resources/include
checking for main() in -lR... yes
checking for R.h... no
ERROR: Cannot find the R header, aborting.
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of
necessary libraries and/or headers. Check the mkmf.log file for more
details. You may need configuration options.
Provided configuration options:
--with-opt-dir
--without-opt-dir
--with-opt-include
--without-opt-include=${opt-dir}/include
--with-opt-lib
--without-opt-lib=${opt-dir}/lib
--with-make-prog
--without-make-prog
--srcdir=.
--curdir
--ruby=/usr/local/bin/ruby-1.8
--with-R-dir
--with-R-include=${R-dir}/include
--with-R-lib
--without-R-lib=${R-dir}/lib
--with-Rlib
--without-Rlib
Gem files will remain installed in /usr/local/lib/ruby/gems/1.8/gems/rsruby-0.5.1.1 for inspection.
Results logged to /usr/local/lib/ruby/gems/1.8/gems/rsruby-0.5.1.1/ext/gem_make.out
ちゃんと extconf.rb にオプションは渡っているし
% ls /Library/Frameworks/R.framework/Resources/include R.h Rdefines.h Rinternals.h S.h ppc/ R_ext/ Rembedded.h Rmath.h i386/ Rconfig.h Rinterface.h Rversion.h libintl.h
ここに R.h あるんだけどな。。
% cd /usr/local/lib/ruby/gems/1.8/gems/rsruby-0.5.1.1/ext
% sudo ruby-1.8 -rmkmf -e 'create_makefile("rsruby_c")'
creating Makefile
% sudo vi Makefile
#INCFLAGS = -I. -I$(topdir) -I$(hdrdir) -I$(srcdir)
INCFLAGS = -I. -I$(topdir) -I$(hdrdir) -I$(srcdir) -I/Library/Frameworks/R.framework/Resources/include
#ldflags = -L.
ldflags = -L. -L/Library/Frameworks/R.framework/Resources/lib
無理やりガリガリやっつけてみる。
% make
gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I. -I/Library/Frameworks/R.framework/Resources/include -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -fno-common -g -O2 -pipe -fno-common -c Converters.c
In file included from /Library/Frameworks/R.framework/Resources/include/R.h:40,
from ./rsruby.h:37,
from Converters.c:32:
/Library/Frameworks/R.framework/Resources/include/Rconfig.h:9:28: error: x86_64/Rconfig.h: No such file or directory
Converters.c: In function ‘to_ruby_vector’:
Converters.c:356: warning: assignment discards qualifiers from pointer target type
Converters.c:384: warning: assignment discards qualifiers from pointer target type
Converters.c: In function ‘to_ruby_hash’:
Converters.c:601: warning: assignment discards qualifiers from pointer target type
{standard input}:unknown:FATAL:can't create output file: Converters.o
make: *** [Converters.o] Error 1
どうも arch がちゃんと設定されていないらしい。
% ls /Library/Frameworks/R.framework/Resources/include R.h Rdefines.h Rinternals.h S.h ppc/ R_ext/ Rembedded.h Rmath.h i386/ Rconfig.h Rinterface.h Rversion.h libintl.h % less /Library/Frameworks/R.framework/Resources/include/Rconfig.h /* This is an automatically generated universal stub for architecture-dependent headers. */ #ifdef __i386__ #include "i386/Rconfig.h" #elif defined __ppc__ #include "ppc/Rconfig.h" #elif defined __ppc64__ #include "ppc64/Rconfig.h" #elif defined __x86_64__ #include "x86_64/Rconfig.h" #elif defined __arm__ #include "arm/Rconfig.h" #else #error "Unsupported architecture." #endif
R のパッケージは i386 と ppc に対応しているけど、どうも x86_64 を期待しているらしい。
% sudo vi Makefile #CPPFLAGS = -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE $(DEFS) $(cppflags) CPPFLAGS = -D__i386__ -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE $(DEFS) $(cppflags)
無理やり指定する。
% sudo make gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I. -I/Library/Frameworks/R.framework/Resources/include -D__i386__ -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -fno-common -g -O2 -pipe -fno-common -c Converters.c Converters.c: In function ‘to_ruby_vector’: Converters.c:356: warning: assignment discards qualifiers from pointer target type Converters.c:384: warning: assignment discards qualifiers from pointer target type Converters.c: In function ‘to_ruby_hash’: Converters.c:601: warning: assignment discards qualifiers from pointer target type gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I. -I/Library/Frameworks/R.framework/Resources/include -D__i386__ -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -fno-common -g -O2 -pipe -fno-common -c R_eval.c R_eval.c: In function ‘get_last_error_msg’: R_eval.c:143: warning: return discards qualifiers from pointer target type gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I. -I/Library/Frameworks/R.framework/Resources/include -D__i386__ -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -fno-common -g -O2 -pipe -fno-common -c robj.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I. -I/Library/Frameworks/R.framework/Resources/include -D__i386__ -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -fno-common -g -O2 -pipe -fno-common -c rsruby.c cc -dynamic -bundle -undefined suppress -flat_namespace -o rsruby_c.bundle Converters.o R_eval.o robj.o rsruby.o -L. -L/usr/local/lib -L. -L/Library/Frameworks/R.framework/Resources/lib -ldl -lobjc
コンパイルは通った。しかしこれをインストールする方法がわからない。。
とりあえず gem を分解してみてみる。
% tar xvfz /usr/local/lib/ruby/gems/1.8/cache/rsruby-0.5.1.1.gem data.tar.gz metadata.gz % sudo tar xvfz data.tar.gz x History.txt x License.txt x Manifest.txt x README.txt x Rakefile.rb x examples/arrayfields.rb x examples/bioc.rb x examples/dataframe.rb x examples/erobj.rb x ext/Converters.c x ext/Converters.h x ext/R_eval.c x ext/R_eval.h x ext/extconf.rb x ext/robj.c x ext/rsruby.c x ext/rsruby.h x lib/rsruby.rb x lib/rsruby/dataframe.rb x lib/rsruby/erobj.rb x lib/rsruby/robj.rb x test/table.txt x test/tc_array.rb x test/tc_boolean.rb x test/tc_cleanup.rb x test/tc_eval.rb x test/tc_extensions.rb x test/tc_init.rb x test/tc_io.rb x test/tc_library.rb x test/tc_matrix.rb x test/tc_modes.rb x test/tc_robj.rb x test/tc_sigint.rb x test/tc_to_r.rb x test/tc_to_ruby.rb x test/tc_util.rb x test/tc_vars.rb x test/test_all.rb % sudo gzcat metadata.gz > metadata
こういう構造になってるんですね。metadata が gemspec ぽい。 しかし、時間切れのためここで挫折。。
RSOAP
RSOAP は Ruby 1.9 で SOAP が使えないのと RSOAP サーバの準備大変そうなのであきらめた。