RubyからRの機能を使う

提供:TogoWiki

移動: 案内, 検索

目次

RSRuby

Tigerでは、

$ export LD_LIBRARY_PATH=:/Library/Frameworks/R.framework/Resources/lib
$ ruby187/bin/gem install rsruby-0.5.1.1.gem -- --with-R-dir=$R_HOME
Building native extensions.  This could take a while...
Successfully installed rsruby-0.5.1.1
1 gem installed
Installing ri documentation for rsruby-0.5.1.1...
Installing RDoc documentation for rsruby-0.5.1.1...
$ R --version
R version 2.8.0 (2008-10-20)

利用コードは

ENV['R_HOME']='/Library/Frameworks/R.framework/Resources/lib'
require 'rsruby'

Snow Leopard と R 2.12 の場合だと、

sudo gem install rsruby -- --with-R-dir=/Library/Frameworks/R.framework/Resources
ENV['R_HOME']='/Library/Frameworks/R.framework/Resources'
require 'rsruby'

で OK

RinRuby

二階堂さんのブログ記事

Rserve + Rserve-Ruby-client

gem install rserve-client

ベンチマーク

Rserve-Ruby-clientの作者による3者比較

ここまでが事前調査


ここからがテストしてみた結果

BioRuby と RinRuby, Rserve の比較と RSRuby, RSOAP の挫折

ベンチマークは 6.8MB, 5210 エントリの塩基配列 FASTA ファイルを読み込んで翻訳するスピードを競う、らしい。

入力データ:

% head -50 test-dna.fa
>2L52.1
atgtcaatggtaagaaatgtatcaaatcagagcgaaaaattggaaatttt
gtcatgtaaatgggtaggatgtctcaaatcaacagaagtgttcaaaacgg
ttgaaaagttattagatcatgttacggctgatcatattccagaagttatt
gtaaacgatgacgggtcggaggaagtcgtttgtcagtgggattgctgcga
aatgggtgccagtcgtggaaatcttcaaaaaaagaaagagtggatggaga
atcacttcaaaacacgtcatgttcgcaaagcaaaaatattcaaatgctta
attgaggattgccctgtggtaaagtcaagtagtcaggaaattgaaaccca
tctcagaataagtcatccaataaatccgaaaaaagagagactgaaagagt
ttaaaagttctaccgaccacatcgaacctactcaagctaatagagtatgg
acaattgtgaacggagaggttcaatggaagactccaccgcgggttaaaaa
aaagactgtgatatactatgatgatgggccgaggtatgtatttccaacgg
gatgtgcgagatgcaactatgatagtgacgaatcagaactggaatcagat
gagttttggtcagccacagagatgtcagataatgaagaagtatatgtgaa
cttccgtggaatgaactgtatctcaacaggaaagtcggccagtatggtcc
cgagcaaacgaagaaattggccaaaaagagtgaagaaaaggctatcgaca
caaagaaacaatcagaaaactattcgaccaccagagctgaataaaaataa
tatagagataaaagatatgaactcaaataaccttgaagaacgcaacagag
aagaatgcattcagcctgtttctgttgaaaagaacatcctgcattttgaa
aaattcaaatcaaatcaaatttgcattgttcgggaaaacaataaatttag
agaaggaacgagaagacgcagaaagaattctggtgaatcggaagacttga
aaattcatgaaaactttactgaaaaacgaagacccattcgatcatgcaaa
caaaatataagtttctatgaaatggacggggatatagaagaatttgaagt
gtttttcgatactcccacaaaaagcaaaaaagtacttctggatatctaca
gtgcgaagaaaatgccaaaaattgaggttgaagattcattagttaataag
tttcattcaaaacgtccatcaagagcatgtcgagttcttggaagtatgga
agaagtaccatttgatgtggaaataggatattga
>2RSSE.1
atgacagtggcgagttacagtatggtgctgtgtggctcatctgatgatca
tcgctatcgaggcagaatcgaaaaagtaaaattcggcgtacccataaacg
aagcatttgcccatgacattcccgccacgcttctcatgctcttgctcaaa
gtgaacaaggatggacccgcgaaaaaggatatttggcgagcgcccggaaa
tcaggctcaagtgcgaaaattgtcgcaagtgatgcaacacgggcggcttg
taaatatcgagaatttcacggtttacacggcggcatctgtcatcaaaaag
tttctttcaaagttgccaaacggcatttttggacgggataatgaggagac
actgttcaatagtgcatcgactggaatggatattgagaagcagagacagg
tgttttataggatatttggatcacttccagtcgcatcccaacacttgctc
gtcctacttttcggcacatttcgggtcgtcgccgactcgtcggacggtca
ttcgaacgcgatgaacccgaatgcgatcgcgatttcggtggcaccatcgc
tttttcacacttgtatacacgatggacggacggcgcgagtagaagacctt
caacggttcaagctggcctcgaacattgtgtgctcgataatttgctcatt
cggcgacacgaagctcttcccacgcgagtgctacgagtattacgccagat
acacgggtcgcacgttgcgaatcgacgagaatcgaatgttcacttttcat
aatccatccaaccgtcgtgctcgtggcgaagagttctccgcgttggcggc
aaagtgtgcgggcgcctactcgctggccgccatccacctggccgaagaag
cgtcaccggagcccactccgacaacctcgaagcctccacgtggcaacggc
gtcgggcgtgccgggagtctgaagcagcacgcgttgacccagacgacgga
tcatccgaagagaagcgtgtcgatcgcggctaaggatccgtatccaactg
atttaaggacatcggtcagctgtgatttttga
>2RSSE.2
 :

結果:Ruby 1.9 は 1.8 よりかなり速い。R を使うより BioRuby 単独の方が速い(ただしBioLib + Ruby は超速いらしい)。Rserve は結構安定して使える。

→ RSRuby を使うと R を利用しても BioRuby 単独より速くなった。前回インストールに失敗したのは R が古かったため。m(__)m

→ ただし、BioRuby は Ruby 1.9 の gsub(/re/, hash) を利用することでさらに高速化できることが判明。そのバージョンを使うと RSRuby よりも若干速くなり最速(約2.1秒)。

  • pure BioRuby

Ruby 1.9 で約3秒、Ruby 1.8 で約6秒

% time ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa > /dev/null
ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa > /dev/null  2.96s user 0.07s system 97% cpu 3.120 total

% time ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null
ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null  28.25s user 0.29s system 91% cpu 31.250 total
% time ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa > /dev/null
ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa > /dev/null  5.36s user 0.18s system 97% cpu 5.666 total

% time ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null
ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null  52.98s user 1.48s system 91% cpu 59.319 total
  • RinRuby

途中で実行が継続できなくなりテストできなかった。

  • Rserve

Ruby 1.9 で約6秒、Ruby 1.8 で約16秒

% time ruby-1.9 DNAtranslate-rserve.rb test-dna.fa > /dev/null 
ruby-1.9 DNAtranslate-rserve.rb test-dna.fa > /dev/null  4.16s user 0.25s system 76% cpu 5.737 total

% time ruby-1.9 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null
ruby-1.9 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null  39.71s user 1.90s system 75% cpu 55.061 total
% time ruby-1.8 DNAtranslate-rserve.rb test-dna.fa > /dev/null
ruby-1.8 DNAtranslate-rserve.rb test-dna.fa > /dev/null  13.69s user 0.28s system 87% cpu 16.010 total

% time ruby-1.8 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null
ruby-1.8 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null  135.66s user 2.56s system 82% cpu 2:46.68 total
  • RSRuby

Snow Leopard では?インストールが困難でテストまでたどり着けなかった。

→ 後日再挑戦して無事動作。最速。1.8 と 1.9 であまり差が出ず(1.9の方が若干速い)、どちらも約 2.5 秒。

% time ruby-1.9 DNAtranslate-rsruby.rb test-dna.fa  > /dev/null
ruby-1.9 DNAtranslate-rsruby.rb test-dna.fa > /dev/null  2.35s user 0.12s system 99% cpu 2.481 total

% time ruby-1.9 DNAtranslate-rsruby.rb test-dna.fa 10  > /dev/null
ruby-1.9 DNAtranslate-rsruby.rb test-dna.fa 10 > /dev/null  19.77s user 0.42s system 99% cpu 20.280 total
% time ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa  > /dev/null 
ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa > /dev/null  2.45s user 0.12s system 99% cpu 2.589 total

% time ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa 10  > /dev/null
ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa 10 > /dev/null  21.27s user 0.61s system 98% cpu 22.128 total


  • RSOAP

サーバのインストール方法を調べるところから。


pure BioRuby

% cat DNAtranslate-bioruby.rb
require 'rubygems'
require 'bio'

fasta = ARGV.shift
repeat = (ARGV.shift || 1).to_i

repeat.times do
  Bio::FlatFile.auto(fasta).each do |entry|
    puts ">#{entry.entry_id}"
    puts entry.naseq.translate
  end
end
% time ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa > /dev/null
ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa > /dev/null  2.96s user 0.07s system 97% cpu 3.120 total

% time ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null
ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null  28.25s user 0.29s system 91% cpu 31.250 total

% time ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa > /dev/null
ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa > /dev/null  5.36s user 0.18s system 97% cpu 5.666 total

% time ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null
ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null  52.98s user 1.48s system 91% cpu 59.319 total

Ruby 1.9 の gsub(/re/, hash) を試してみるべき。

Rinruby

Pure Ruby で書かれた R の実行ライブラリ。http://blog.itoshi.tv/2010/09/rinruby/

% sudo gem install rinruby
Successfully installed rinruby-2.0.1
1 gem installed
Installing ri documentation for rinruby-2.0.1...
Installing RDoc documentation for rinruby-2.0.1...
% cat DNAtranslate-rinruby.rb
require 'rubygems'
require 'rinruby'
require 'bio'

R.echo(enable = false)

R.eval('library(GeneR)')

fasta = ARGV.shift
repeat = (ARGV.shift || 1).to_i

repeat.times do
  Bio::FlatFile.auto(fasta).each do |entry|
    puts ">#{entry.entry_id}"
    ntseq = entry.seq
    R.eval(%Q[aaseq <- strTranslate("#{ntseq}")])
    puts R.pull("aaseq")
  end
end
% time ruby-1.9 DNAtranslate-rinruby.rb test-dna.fa

遅い上に数十エントリ処理したところで途中で止まってしまい、計測できなかった。実用には向かない感じ?

Rserve

rserve-client をインストールする。http://github.com/clbustos/Rserve-Ruby-client

% sudo gem install rserve-client
Successfully installed rserve-client-0.2.5
1 gem installed
Installing ri documentation for rserve-client-0.2.5...
Installing RDoc documentation for rserve-client-0.2.5...

Rserve のインストール。http://www.rforge.net/Rserve/files/

% wget http://www.rforge.net/Rserve/snapshot/Rserve_0.6-2.tar.gz

% R CMD INSTALL Rserve_0.6-2.tar.gz
* Installing to library ‘/Library/Frameworks/R.framework/Resources/library’
* Installing *source* package ‘Rserve’ ...
checking whether to compile the server... yes
checking whether to compile the client... no
checking for gcc... gcc -arch i386 -std=gnu99
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc -arch i386 -std=gnu99 accepts -g... yes
checking for gcc -arch i386 -std=gnu99 option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -arch i386 -std=gnu99 -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... rm: conftest.dSYM: is a directory
rm: conftest.dSYM: is a directory
yes
checking for sys/wait.h that is POSIX.1 compatible... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for string.h... (cached) yes
checking for memory.h... (cached) yes
checking sys/time.h usability... yes
checking sys/time.h presence... yes
checking for sys/time.h... yes
checking for unistd.h... (cached) yes
checking for sys/stat.h... (cached) yes
checking for sys/types.h... (cached) yes
checking sys/socket.h usability... yes
checking sys/socket.h presence... yes
checking for sys/socket.h... yes
checking sys/un.h usability... yes
checking sys/un.h presence... yes
checking for sys/un.h... yes
checking netinet/in.h usability... yes
checking netinet/in.h presence... yes
checking for netinet/in.h... yes
checking netinet/tcp.h usability... yes
checking netinet/tcp.h presence... yes
checking for netinet/tcp.h... yes
checking for an ANSI C-conforming const... yes
checking whether byte ordering is bigendian... no
checking whether time.h and sys/time.h may both be included... yes
checking for pid_t... yes
checking vfork.h usability... no
checking vfork.h presence... no
checking for vfork.h... no
checking for fork... yes
checking for vfork... yes
checking for working fork... yes
checking for working vfork... (cached) yes
checking return type of signal handlers... void
checking for memset... yes
checking for mkdir... yes
checking for rmdir... yes
checking for select... yes
checking for socket... yes
checking for library containing crypt... none required
checking crypt.h usability... no
checking crypt.h presence... no
checking for crypt.h... no
checking for socklen_t... yes
checking for connect... yes
checking for dlopen in -ldl... yes
configure: creating ./config.status
config.status: creating src/Makefile
config.status: creating src/client/cxx/Makefile
config.status: creating src/config.h
** libs
** arch - i386
gcc -arch i386 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386  -I/usr/local/include   -DDAEMON -Iinclude -I. -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -fPIC  -g -O2 -c Rserv.c -o Rserv.o
gcc -arch i386 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386  -I/usr/local/include   -DDAEMON -Iinclude -I. -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -fPIC  -g -O2 -c session.c -o session.o
gcc -arch i386 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386  -I/usr/local/include   -DDAEMON -Iinclude -I. -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -fPIC  -g -O2 -c md5.c -o md5.o
gcc -arch i386 -std=gnu99 Rserv.o session.o md5.o -o Rserve -F/Library/Frameworks/R.framework/.. -framework R -ldl 
cp Rserve Rserve.so
gcc -arch i386 -std=gnu99 -Iinclude -I. -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -c Rserv.c -o Rserv_d.o -DNODAEMON -DRSERV_DEBUG -g  -I/usr/local/include -g -O2
gcc -arch i386 -std=gnu99 Rserv_d.o session.o md5.o -o Rserve.dbg -F/Library/Frameworks/R.framework/.. -framework R -ldl 
cp Rserve Rserve-bin.so
cp Rserve.dbg Rserve-dbg.so
./mergefat Rserve "/Library/Frameworks/R.framework/Resources/bin/Rserve"
./mergefat Rserve.dbg "/Library/Frameworks/R.framework/Resources/bin/Rserve.dbg"
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
 >>> Building/Updating help pages for package 'Rserve'
     Formats: text html latex example
  Rclient                           text    html    latex   example
  Rserv                             text    html    latex
** building package indices ...
* DONE (Rserve)
% R CMD Rserve
R version 2.9.2 (2009-08-24)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

Rserv started in daemon mode.

Rserve のサーバがうごいたー。

% cat rserve-test.rb
require 'rubygems'
require 'rserve'
include Rserve
c = Connection.new
x = c.eval("R.version.string");
puts x.as_string

テストコードを動かしてみる。

% ruby-1.8 rserve-test.rb
R version 2.9.2 (2009-08-24)

オッケー。

ここで、BioConductor の GeneR パッケージをインストールする。

% R
> source("http://bioconductor.org/biocLite.R")
> biocLite()

かなーり待たされてインストール完了。 しかし GeneR はこの中に入っていないらしい。。

> library(GeneR)
 以下にエラー library(GeneR) :  'GeneR' という名前のパッケージはありません

http://www.bioconductor.org/packages/2.3/bioc/html/GeneR.html によると、

> source("http://bioconductor.org/biocLite.R")
> biocLite("GeneR")
Using R version 2.9.2, biocinstall version 2.4.13.
Installing Bioconductor version 2.4 packages:
[1] "GeneR"
Please wait...

 URL 'http://bioconductor.org/packages/2.4/bioc/bin/macosx/universal/contrib/2.9/GeneR_2.14.0.tgz' を試しています
Content type 'application/x-gzip' length 411393 bytes (401 Kb)
 開かれた URL
==================================================
downloaded 401 Kb

The downloaded packages are in
     /var/folders/lt/ltVmCLsiF3mLKUpLCN3GlU+++TM/-Tmp-//RtmpHRXj4g/downloaded_packages

とするらしい。

> library(GeneR)
 次のパッケージを付け加えます: 'GeneR'
     The following object(s) are masked from package:utils :
      relist

ちょっと試してみる。

> seq = "atgacagtggcgagttacagtatggtgctgtgtggctcatctgatgatca
+ tcgctatcgaggcagaatcgaaaaagtaaaattcggcgtacccataaacg
+ aagcatttgcccatgacattcccgccacgcttctcatgctcttgctcaaa
+ gtgaacaaggatggacccgcgaaaaaggatatttggcgagcgcccggaaa
+ tcaggctcaagtgcgaaaattgtcgcaagtgatgcaacacgggcggcttg
+ taaatatcgagaatttcacggtttacacggcggcatctgtcatcaaaaag
+ tttctttcaaagttgccaaacggcatttttggacgggataatgaggagac
+ actgttcaatagtgcatcgactggaatggatattgagaagcagagacagg
+ tgttttataggatatttggatcacttccagtcgcatcccaacacttgctc
+ gtcctacttttcggcacatttcgggtcgtcgccgactcgtcggacggtca
+ ttcgaacgcgatgaacccgaatgcgatcgcgatttcggtggcaccatcgc
+ tttttcacacttgtatacacgatggacggacggcgcgagtagaagacctt
+ caacggttcaagctggcctcgaacattgtgtgctcgataatttgctcatt
+ cggcgacacgaagctcttcccacgcgagtgctacgagtattacgccagat
+ acacgggtcgcacgttgcgaatcgacgagaatcgaatgttcacttttcat
+ aatccatccaaccgtcgtgctcgtggcgaagagttctccgcgttggcggc
+ aaagtgtgcgggcgcctactcgctggccgccatccacctggccgaagaag
+ cgtcaccggagcccactccgacaacctcgaagcctccacgtggcaacggc
+ gtcgggcgtgccgggagtctgaagcagcacgcgttgacccagacgacgga
+ tcatccgaagagaagcgtgtcgatcgcggctaaggatccgtatccaactg
+ atttaaggacatcggtcagctgtgatttttga
+ "
> seq
[1] "atgacagtggcgagttacagtatggtgctgtgtggctcatctgatgatca\ntcgctatcgaggcagaatcgaaaaagtaaaattcggcgtacccataaacg\naagcatttgcccatgacattcccgccacgcttctcatgctcttgctcaaa\ngtgaacaaggatggacccgcgaaaaaggatatttggcgagcgcccggaaa\ntcaggctcaagtgcgaaaattgtcgcaagtgatgcaacacgggcggcttg\ntaaatatcgagaatttcacggtttacacggcggcatctgtcatcaaaaag\ntttctttcaaagttgccaaacggcatttttggacgggataatgaggagac\nactgttcaatagtgcatcgactggaatggatattgagaagcagagacagg\ntgttttataggatatttggatcacttccagtcgcatcccaacacttgctc\ngtcctacttttcggcacatttcgggtcgtcgccgactcgtcggacggtca\nttcgaacgcgatgaacccgaatgcgatcgcgatttcggtggcaccatcgc\ntttttcacacttgtatacacgatggacggacggcgcgagtagaagacctt\ncaacggttcaagctggcctcgaacattgtgtgctcgataatttgctcatt\ncggcgacacgaagctcttcccacgcgagtgctacgagtattacgccagat\nacacgggtcgcacgttgcgaatcgacgagaatcgaatgttcacttttcat\naatccatccaaccgtcgtgctcgtggcgaagagttctccgcgttggcggc\naaagtgtgcgggcgcctactcgctggccgccatccacctggccgaagaag\ncgtcaccggagcccactccgacaacctcgaagcctccacgtggcaacggc\ngtcgggcgtgccgggagtctgaagcagcacgcgttgacccagacgacgga\ntcatccgaagagaagcgtgtcgatcgcggctaaggatccgtatccaactg\natttaaggacatcggtcagctgtgatttttga\n"
> strTranslate(seq)
[1] "MTVASYSMVLCGSSDD-SLSRQNRKSKIRRTHK-KHLPMTFPPRFSCSCS-VNKDGPAKKDIWRAPG-SGSSAKIVASDATRAA-*ISRISRFTRRHLSSK-FLSKLPNGIFGRDNEE-TVQ*CIDWNGY*EAET-CFIGYLDHFQSHPNTC-VLLFGTFRVVADSSDG-FERDEPECDRDFGGTI-FFTLVYTMDGRRE*KT-QRFKLASNIVCSIICS-RRHEALPTRVLRVLRQ-TRVARCESTRIECSLF-NPSNRRARGEEFSALA-KVCGRLLAGRHPPGRR-RHRSPLRQPRSLHVAT-VGRAGSLKQHALTQTT-SSEEKRVDRG*GSVSN-I*GHRSAVIF-"

大丈夫そう。

頂いた Python 用のテストプログラム:

# Read a FASTA file a number of times (default once), translate
# using R/Bioconductor GeneR and print to STDOUT
#
# Usage:
#
#   python DNAtranslate.py dna.fa [n]
#
# Example:
#
#   python DNAtranslate.py ../../../test/data/test-dna.fa
#

verbose=False

import sys
import time
from Bio.Seq import Seq
from Bio import SeqIO
from Bio.Alphabet import generic_dna

import subprocess
import pyRserve

fn = sys.argv[1]
times = 1
if len(sys.argv) > 2:
  times = int(sys.argv[2])

# Start the RServer
subprocess.Popen([r"R","CMD", "Rserve"], stdout=subprocess.PIPE).wait()

time.sleep(0.5)
conn = pyRserve.rconnect()
conn('library(GeneR)')

if verbose:
  print >> sys.stderr, 'Biopython translate ',fn, ':', times
for i in range(0, times):
  if verbose:
    print >> sys.stderr, i+1
  for seq_record in SeqIO.parse(fn, "fasta", generic_dna):
    print ">",seq_record.id
    ntseq = str(seq_record.seq)
    print conn('strTranslate("'+ntseq+'")')

# Kill the RServer
subprocess.Popen([r"killall", "Rserve"], stdout=subprocess.PIPE)

これを Ruby に翻訳

require 'rubygems'
require 'rserve'
require 'bio'

rserve = Rserve::Connection.new

rserve.eval('library(GeneR)')

fasta = ARGV.shift
repeat = (ARGV.shift || 1).to_i

repeat.times do
  Bio::FlatFile.auto(fasta).each do |entry|
    puts ">#{entry.entry_id}"
    ntseq = entry.seq
    result = rserve.eval(%Q[strTranslate("#{ntseq}")])
    puts result.as_string
  end
end

実行してみる

% ruby-1.8 DNAtranslate-rserve.rb test-dna.fa           
>2L52.1
MSMVRNVSNQSEKLEILSCKWVGCLKSTEVFKTVEKLLDHVTADHIPEVIVNDDGSEEVVCQWDCCEMGASRGNLQKKKEWMENHFKTRHVRKAKIFKCLIEDCPVVKSSSQEIETHLRISHPINPKKERLKEFKSSTDHIEPTQANRVWTIVNGEVQWKTPPRVKKKTVIYYDDGPRYVFPTGCARCNYDSDESELESDEFWSATEMSDNEEVYVNFRGMNCISTGKSASMVPSKRRNWPKRVKKRLSTQRNNQKTIRPPELNKNNIEIKDMNSNNLEERNREECIQPVSVEKNILHFEKFKSNQICIVRENNKFREGTRRRRKNSGESEDLKIHENFTEKRRPIRSCKQNISFYEMDGDIEEFEVFFDTPTKSKKVLLDIYSAKKMPKIEVEDSLVNKFHSKRPSRACRVLGSMEEVPFDVEIGY*
>2RSSE.1
MTVASYSMVLCGSSDDHRYRGRIEKVKFGVPINEAFAHDIPATLLMLLLKVNKDGPAKKDIWRAPGNQAQVRKLSQVMQHGRLVNIENFTVYTAASVIKKFLSKLPNGIFGRDNEETLFNSASTGMDIEKQRQVFYRIFGSLPVASQHLLVLLFGTFRVVADSSDGHSNAMNPNAIAISVAPSLFHTCIHDGRTARVEDLQRFKLASNIVCSIICSFGDTKLFPRECYEYYARYTGRTLRIDENRMFTFHNPSNRRARGEEFSALAAKCAGAYSLAAIHLAEEASPEPTPTTSKPPRGNGVGRAGSLKQHALTQTTDHPKRSVSIAAKDPYPTDLRTSVSCDF*
 :

計測

% time ruby-1.9 DNAtranslate-rserve.rb test-dna.fa > /dev/null 
ruby-1.9 DNAtranslate-rserve.rb test-dna.fa > /dev/null  4.16s user 0.25s system 76% cpu 5.737 total

% time ruby-1.9 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null
ruby-1.9 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null  39.71s user 1.90s system 75% cpu 55.061 total
% time ruby-1.8 DNAtranslate-rserve.rb test-dna.fa > /dev/null
ruby-1.8 DNAtranslate-rserve.rb test-dna.fa > /dev/null  13.69s user 0.28s system 87% cpu 16.010 total

% time ruby-1.8 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null
ruby-1.8 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null  135.66s user 2.56s system 82% cpu 2:46.68 total


RSRuby

% sudo gem install rsruby -- --with-R-dir=/Library/Frameworks/R.framework/Resources
Building native extensions.  This could take a while...
Successfully installed rsruby-0.5.1.1
1 gem installed
Installing ri documentation for rsruby-0.5.1.1...
Installing RDoc documentation for rsruby-0.5.1.1...
ENV['R_HOME'] = '/Library/Frameworks/R.framework/Resources'

require 'rubygems'
require 'rsruby'
require 'bio'

r = RSRuby.instance

r.library('GeneR')

fasta = ARGV.shift
repeat = (ARGV.shift || 1).to_i

repeat.times do
  Bio::FlatFile.auto(fasta).each do |entry|
    puts ">#{entry.entry_id}"
    ntseq = entry.seq
    puts r.eval_R(%Q[aaseq <- strTranslate("#{ntseq}")])
  end
end
% R
> source("http://bioconductor.org/biocLite.R")
> biocLite("GeneR")

% time ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa

 次のパッケージを付け加えます: 'GeneR' 

The following object(s) are masked from 'package:utils':

    relist

>2L52.1
MSMVRNVSNQSEKLEILSCKWVGCLKSTEVFKTVEKLLDHVTADHIPEVIVNDDGSEEVVCQWDCCEMGASRGNLQKKKEWMENHFKTRHVRKAKIFKCLIEDCPVVKSSSQEIETHLRISHPINPKKERLKEFKSSTDHIEPTQANRVWTIVNGEVQWKTPPRVKKKTVIYYDDGPRYVFPTGCARCNYDSDESELESDEFWSATEMSDNEEVYVNFRGMNCISTGKSASMVPSKRRNWPKRVKKRLSTQRNNQKTIRPPELNKNNIEIKDMNSNNLEERNREECIQPVSVEKNILHFEKFKSNQICIVRENNKFREGTRRRRKNSGESEDLKIHENFTEKRRPIRSCKQNISFYEMDGDIEEFEVFFDTPTKSKKVLLDIYSAKKMPKIEVEDSLVNKFHSKRPSRACRVLGSMEEVPFDVEIGY*
 :
% time ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa > /dev/null
ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa > /dev/null  2.56s user 0.16s system 68% cpu 3.968 total

% time ruby-1.9 DNAtranslate-rsruby.rb test-dna.fa > /dev/null
ruby-1.9 DNAtranslate-rsruby.rb test-dna.fa > /dev/null  2.48s user 0.17s system 65% cpu 4.061 total

かなり高速に処理できるようだ。


すみません、下記情報は R のバージョンが古いのが問題でした。ハズカシイ。指摘していただいた西山さんありがとうございます(片山 2010/11/5 追記)

しかし R.h が見つからないといわれる

% sudo gem-1.8 install rsruby -- --with-R-dir=/Library/Frameworks/R.framework/Resources --with-R-include=/Library/Frameworks/R.framework/Resources/include
Building native extensions.  This could take a while...
ERROR:  Error installing rsruby:
     ERROR: Failed to build gem native extension.

/usr/local/bin/ruby-1.8 extconf.rb --with-R-dir=/Library/Frameworks/R.framework/Resources --with-R-include=/Library/Frameworks/R.framework/Resources/include
checking for main() in -lR... yes
checking for R.h... no

ERROR: Cannot find the R header, aborting.
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of
necessary libraries and/or headers.  Check the mkmf.log file for more
details.  You may need configuration options.

Provided configuration options:
     --with-opt-dir
     --without-opt-dir
     --with-opt-include
     --without-opt-include=${opt-dir}/include
     --with-opt-lib
     --without-opt-lib=${opt-dir}/lib
     --with-make-prog
     --without-make-prog
     --srcdir=.
     --curdir
     --ruby=/usr/local/bin/ruby-1.8
     --with-R-dir
     --with-R-include=${R-dir}/include
     --with-R-lib
     --without-R-lib=${R-dir}/lib
     --with-Rlib
     --without-Rlib

Gem files will remain installed in /usr/local/lib/ruby/gems/1.8/gems/rsruby-0.5.1.1 for inspection.
Results logged to /usr/local/lib/ruby/gems/1.8/gems/rsruby-0.5.1.1/ext/gem_make.out

ちゃんと extconf.rb にオプションは渡っているし

% ls /Library/Frameworks/R.framework/Resources/include
R.h           Rdefines.h    Rinternals.h  S.h           ppc/
R_ext/        Rembedded.h   Rmath.h       i386/
Rconfig.h     Rinterface.h  Rversion.h    libintl.h

ここに R.h あるんだけどな。。

% cd /usr/local/lib/ruby/gems/1.8/gems/rsruby-0.5.1.1/ext
% sudo ruby-1.8 -rmkmf -e 'create_makefile("rsruby_c")'
creating Makefile
% sudo vi Makefile

#INCFLAGS = -I. -I$(topdir) -I$(hdrdir) -I$(srcdir)
INCFLAGS = -I. -I$(topdir) -I$(hdrdir) -I$(srcdir) -I/Library/Frameworks/R.framework/Resources/include

#ldflags  = -L.
ldflags  = -L. -L/Library/Frameworks/R.framework/Resources/lib

無理やりガリガリやっつけてみる。

% make
gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I. -I/Library/Frameworks/R.framework/Resources/include -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE   -fno-common -g -O2 -pipe -fno-common   -c Converters.c
In file included from /Library/Frameworks/R.framework/Resources/include/R.h:40,
                 from ./rsruby.h:37,
                 from Converters.c:32:
/Library/Frameworks/R.framework/Resources/include/Rconfig.h:9:28: error: x86_64/Rconfig.h: No such file or directory
Converters.c: In function ‘to_ruby_vector’:
Converters.c:356: warning: assignment discards qualifiers from pointer target type
Converters.c:384: warning: assignment discards qualifiers from pointer target type
Converters.c: In function ‘to_ruby_hash’:
Converters.c:601: warning: assignment discards qualifiers from pointer target type
{standard input}:unknown:FATAL:can't create output file: Converters.o
make: *** [Converters.o] Error 1

どうも arch がちゃんと設定されていないらしい。

% ls /Library/Frameworks/R.framework/Resources/include
R.h           Rdefines.h    Rinternals.h  S.h           ppc/
R_ext/        Rembedded.h   Rmath.h       i386/
Rconfig.h     Rinterface.h  Rversion.h    libintl.h

% less /Library/Frameworks/R.framework/Resources/include/Rconfig.h
/* This is an automatically generated universal stub for architecture-dependent
headers. */
#ifdef __i386__
#include "i386/Rconfig.h"
#elif defined __ppc__
#include "ppc/Rconfig.h"
#elif defined __ppc64__
#include "ppc64/Rconfig.h"
#elif defined __x86_64__
#include "x86_64/Rconfig.h"
#elif defined __arm__
#include "arm/Rconfig.h"
#else
#error "Unsupported architecture."
#endif

R のパッケージは i386 と ppc に対応しているけど、どうも x86_64 を期待しているらしい。

% sudo vi Makefile

#CPPFLAGS =   -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE $(DEFS) $(cppflags)
CPPFLAGS = -D__i386__ -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE $(DEFS) $(cppflags)

無理やり指定する。

% sudo make
gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I. -I/Library/Frameworks/R.framework/Resources/include -D__i386__ -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE   -fno-common -g -O2 -pipe -fno-common   -c Converters.c
Converters.c: In function ‘to_ruby_vector’:
Converters.c:356: warning: assignment discards qualifiers from pointer target type
Converters.c:384: warning: assignment discards qualifiers from pointer target type
Converters.c: In function ‘to_ruby_hash’:
Converters.c:601: warning: assignment discards qualifiers from pointer target type
gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I. -I/Library/Frameworks/R.framework/Resources/include -D__i386__ -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE   -fno-common -g -O2 -pipe -fno-common   -c R_eval.c
R_eval.c: In function ‘get_last_error_msg’:
R_eval.c:143: warning: return discards qualifiers from pointer target type
gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I. -I/Library/Frameworks/R.framework/Resources/include -D__i386__ -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE   -fno-common -g -O2 -pipe -fno-common   -c robj.c
gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I. -I/Library/Frameworks/R.framework/Resources/include -D__i386__ -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE   -fno-common -g -O2 -pipe -fno-common   -c rsruby.c
cc -dynamic -bundle -undefined suppress -flat_namespace -o rsruby_c.bundle Converters.o R_eval.o robj.o rsruby.o -L. -L/usr/local/lib -L. -L/Library/Frameworks/R.framework/Resources/lib    -ldl -lobjc  

コンパイルは通った。しかしこれをインストールする方法がわからない。。

とりあえず gem を分解してみてみる。

% tar xvfz /usr/local/lib/ruby/gems/1.8/cache/rsruby-0.5.1.1.gem
data.tar.gz
metadata.gz

% sudo tar xvfz data.tar.gz
x History.txt
x License.txt
x Manifest.txt
x README.txt
x Rakefile.rb
x examples/arrayfields.rb
x examples/bioc.rb
x examples/dataframe.rb
x examples/erobj.rb
x ext/Converters.c
x ext/Converters.h
x ext/R_eval.c
x ext/R_eval.h
x ext/extconf.rb
x ext/robj.c
x ext/rsruby.c
x ext/rsruby.h
x lib/rsruby.rb
x lib/rsruby/dataframe.rb
x lib/rsruby/erobj.rb
x lib/rsruby/robj.rb
x test/table.txt
x test/tc_array.rb
x test/tc_boolean.rb
x test/tc_cleanup.rb
x test/tc_eval.rb
x test/tc_extensions.rb
x test/tc_init.rb
x test/tc_io.rb
x test/tc_library.rb
x test/tc_matrix.rb
x test/tc_modes.rb
x test/tc_robj.rb
x test/tc_sigint.rb
x test/tc_to_r.rb
x test/tc_to_ruby.rb
x test/tc_util.rb
x test/tc_vars.rb
x test/test_all.rb

% sudo gzcat metadata.gz > metadata

こういう構造になってるんですね。metadata が gemspec ぽい。 しかし、時間切れのためここで挫折。。

RSOAP

RSOAP は Ruby 1.9 で SOAP が使えないのと RSOAP サーバの準備大変そうなのであきらめた。

個人用ツール