RubyからRの機能を使う
提供:TogoWiki
目次 |
RSRuby
Tigerでは、
$ export LD_LIBRARY_PATH=:/Library/Frameworks/R.framework/Resources/lib $ ruby187/bin/gem install rsruby-0.5.1.1.gem -- --with-R-dir=$R_HOME Building native extensions. This could take a while... Successfully installed rsruby-0.5.1.1 1 gem installed Installing ri documentation for rsruby-0.5.1.1... Installing RDoc documentation for rsruby-0.5.1.1... $ R --version R version 2.8.0 (2008-10-20)
利用コードは
ENV['R_HOME']='/Library/Frameworks/R.framework/Resources/lib' require 'rsruby'
Snow Leopard と R 2.12 の場合だと、
sudo gem install rsruby -- --with-R-dir=/Library/Frameworks/R.framework/Resources
ENV['R_HOME']='/Library/Frameworks/R.framework/Resources' require 'rsruby'
で OK
RinRuby
Rserve + Rserve-Ruby-client
gem install rserve-client
ベンチマーク
ここまでが事前調査
ここからがテストしてみた結果
BioRuby と RinRuby, Rserve の比較と RSRuby, RSOAP の挫折
ベンチマークは 6.8MB, 5210 エントリの塩基配列 FASTA ファイルを読み込んで翻訳するスピードを競う、らしい。
入力データ:
% head -50 test-dna.fa >2L52.1 atgtcaatggtaagaaatgtatcaaatcagagcgaaaaattggaaatttt gtcatgtaaatgggtaggatgtctcaaatcaacagaagtgttcaaaacgg ttgaaaagttattagatcatgttacggctgatcatattccagaagttatt gtaaacgatgacgggtcggaggaagtcgtttgtcagtgggattgctgcga aatgggtgccagtcgtggaaatcttcaaaaaaagaaagagtggatggaga atcacttcaaaacacgtcatgttcgcaaagcaaaaatattcaaatgctta attgaggattgccctgtggtaaagtcaagtagtcaggaaattgaaaccca tctcagaataagtcatccaataaatccgaaaaaagagagactgaaagagt ttaaaagttctaccgaccacatcgaacctactcaagctaatagagtatgg acaattgtgaacggagaggttcaatggaagactccaccgcgggttaaaaa aaagactgtgatatactatgatgatgggccgaggtatgtatttccaacgg gatgtgcgagatgcaactatgatagtgacgaatcagaactggaatcagat gagttttggtcagccacagagatgtcagataatgaagaagtatatgtgaa cttccgtggaatgaactgtatctcaacaggaaagtcggccagtatggtcc cgagcaaacgaagaaattggccaaaaagagtgaagaaaaggctatcgaca caaagaaacaatcagaaaactattcgaccaccagagctgaataaaaataa tatagagataaaagatatgaactcaaataaccttgaagaacgcaacagag aagaatgcattcagcctgtttctgttgaaaagaacatcctgcattttgaa aaattcaaatcaaatcaaatttgcattgttcgggaaaacaataaatttag agaaggaacgagaagacgcagaaagaattctggtgaatcggaagacttga aaattcatgaaaactttactgaaaaacgaagacccattcgatcatgcaaa caaaatataagtttctatgaaatggacggggatatagaagaatttgaagt gtttttcgatactcccacaaaaagcaaaaaagtacttctggatatctaca gtgcgaagaaaatgccaaaaattgaggttgaagattcattagttaataag tttcattcaaaacgtccatcaagagcatgtcgagttcttggaagtatgga agaagtaccatttgatgtggaaataggatattga >2RSSE.1 atgacagtggcgagttacagtatggtgctgtgtggctcatctgatgatca tcgctatcgaggcagaatcgaaaaagtaaaattcggcgtacccataaacg aagcatttgcccatgacattcccgccacgcttctcatgctcttgctcaaa gtgaacaaggatggacccgcgaaaaaggatatttggcgagcgcccggaaa tcaggctcaagtgcgaaaattgtcgcaagtgatgcaacacgggcggcttg taaatatcgagaatttcacggtttacacggcggcatctgtcatcaaaaag tttctttcaaagttgccaaacggcatttttggacgggataatgaggagac actgttcaatagtgcatcgactggaatggatattgagaagcagagacagg tgttttataggatatttggatcacttccagtcgcatcccaacacttgctc gtcctacttttcggcacatttcgggtcgtcgccgactcgtcggacggtca ttcgaacgcgatgaacccgaatgcgatcgcgatttcggtggcaccatcgc tttttcacacttgtatacacgatggacggacggcgcgagtagaagacctt caacggttcaagctggcctcgaacattgtgtgctcgataatttgctcatt cggcgacacgaagctcttcccacgcgagtgctacgagtattacgccagat acacgggtcgcacgttgcgaatcgacgagaatcgaatgttcacttttcat aatccatccaaccgtcgtgctcgtggcgaagagttctccgcgttggcggc aaagtgtgcgggcgcctactcgctggccgccatccacctggccgaagaag cgtcaccggagcccactccgacaacctcgaagcctccacgtggcaacggc gtcgggcgtgccgggagtctgaagcagcacgcgttgacccagacgacgga tcatccgaagagaagcgtgtcgatcgcggctaaggatccgtatccaactg atttaaggacatcggtcagctgtgatttttga >2RSSE.2 :
結果:Ruby 1.9 は 1.8 よりかなり速い。R を使うより BioRuby 単独の方が速い(ただしBioLib + Ruby は超速いらしい)。Rserve は結構安定して使える。
→ RSRuby を使うと R を利用しても BioRuby 単独より速くなった。前回インストールに失敗したのは R が古かったため。m(__)m
→ ただし、BioRuby は Ruby 1.9 の gsub(/re/, hash) を利用することでさらに高速化できることが判明。そのバージョンを使うと RSRuby よりも若干速くなり最速(約2.1秒)。
- pure BioRuby
Ruby 1.9 で約3秒、Ruby 1.8 で約6秒
% time ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa > /dev/null ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa > /dev/null 2.96s user 0.07s system 97% cpu 3.120 total % time ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null 28.25s user 0.29s system 91% cpu 31.250 total
% time ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa > /dev/null ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa > /dev/null 5.36s user 0.18s system 97% cpu 5.666 total % time ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null 52.98s user 1.48s system 91% cpu 59.319 total
- RinRuby
途中で実行が継続できなくなりテストできなかった。
- Rserve
Ruby 1.9 で約6秒、Ruby 1.8 で約16秒
% time ruby-1.9 DNAtranslate-rserve.rb test-dna.fa > /dev/null ruby-1.9 DNAtranslate-rserve.rb test-dna.fa > /dev/null 4.16s user 0.25s system 76% cpu 5.737 total % time ruby-1.9 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null ruby-1.9 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null 39.71s user 1.90s system 75% cpu 55.061 total
% time ruby-1.8 DNAtranslate-rserve.rb test-dna.fa > /dev/null ruby-1.8 DNAtranslate-rserve.rb test-dna.fa > /dev/null 13.69s user 0.28s system 87% cpu 16.010 total % time ruby-1.8 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null ruby-1.8 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null 135.66s user 2.56s system 82% cpu 2:46.68 total
- RSRuby
Snow Leopard では?インストールが困難でテストまでたどり着けなかった。
→ 後日再挑戦して無事動作。最速。1.8 と 1.9 であまり差が出ず(1.9の方が若干速い)、どちらも約 2.5 秒。
% time ruby-1.9 DNAtranslate-rsruby.rb test-dna.fa > /dev/null ruby-1.9 DNAtranslate-rsruby.rb test-dna.fa > /dev/null 2.35s user 0.12s system 99% cpu 2.481 total % time ruby-1.9 DNAtranslate-rsruby.rb test-dna.fa 10 > /dev/null ruby-1.9 DNAtranslate-rsruby.rb test-dna.fa 10 > /dev/null 19.77s user 0.42s system 99% cpu 20.280 total
% time ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa > /dev/null ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa > /dev/null 2.45s user 0.12s system 99% cpu 2.589 total % time ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa 10 > /dev/null ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa 10 > /dev/null 21.27s user 0.61s system 98% cpu 22.128 total
- RSOAP
サーバのインストール方法を調べるところから。
pure BioRuby
% cat DNAtranslate-bioruby.rb require 'rubygems' require 'bio' fasta = ARGV.shift repeat = (ARGV.shift || 1).to_i repeat.times do Bio::FlatFile.auto(fasta).each do |entry| puts ">#{entry.entry_id}" puts entry.naseq.translate end end
% time ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa > /dev/null ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa > /dev/null 2.96s user 0.07s system 97% cpu 3.120 total % time ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null ruby-1.9 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null 28.25s user 0.29s system 91% cpu 31.250 total % time ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa > /dev/null ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa > /dev/null 5.36s user 0.18s system 97% cpu 5.666 total % time ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null ruby-1.8 DNAtranslate-bioruby.rb test-dna.fa 10 > /dev/null 52.98s user 1.48s system 91% cpu 59.319 total
Ruby 1.9 の gsub(/re/, hash) を試してみるべき。
Rinruby
Pure Ruby で書かれた R の実行ライブラリ。http://blog.itoshi.tv/2010/09/rinruby/
% sudo gem install rinruby Successfully installed rinruby-2.0.1 1 gem installed Installing ri documentation for rinruby-2.0.1... Installing RDoc documentation for rinruby-2.0.1...
% cat DNAtranslate-rinruby.rb require 'rubygems' require 'rinruby' require 'bio' R.echo(enable = false) R.eval('library(GeneR)') fasta = ARGV.shift repeat = (ARGV.shift || 1).to_i repeat.times do Bio::FlatFile.auto(fasta).each do |entry| puts ">#{entry.entry_id}" ntseq = entry.seq R.eval(%Q[aaseq <- strTranslate("#{ntseq}")]) puts R.pull("aaseq") end end
% time ruby-1.9 DNAtranslate-rinruby.rb test-dna.fa
遅い上に数十エントリ処理したところで途中で止まってしまい、計測できなかった。実用には向かない感じ?
Rserve
rserve-client をインストールする。http://github.com/clbustos/Rserve-Ruby-client
% sudo gem install rserve-client Successfully installed rserve-client-0.2.5 1 gem installed Installing ri documentation for rserve-client-0.2.5... Installing RDoc documentation for rserve-client-0.2.5...
Rserve のインストール。http://www.rforge.net/Rserve/files/
% wget http://www.rforge.net/Rserve/snapshot/Rserve_0.6-2.tar.gz % R CMD INSTALL Rserve_0.6-2.tar.gz * Installing to library ‘/Library/Frameworks/R.framework/Resources/library’ * Installing *source* package ‘Rserve’ ... checking whether to compile the server... yes checking whether to compile the client... no checking for gcc... gcc -arch i386 -std=gnu99 checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc -arch i386 -std=gnu99 accepts -g... yes checking for gcc -arch i386 -std=gnu99 option to accept ISO C89... none needed checking how to run the C preprocessor... gcc -arch i386 -std=gnu99 -E checking for grep that handles long lines and -e... /usr/bin/grep checking for egrep... /usr/bin/grep -E checking for ANSI C header files... rm: conftest.dSYM: is a directory rm: conftest.dSYM: is a directory yes checking for sys/wait.h that is POSIX.1 compatible... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking for string.h... (cached) yes checking for memory.h... (cached) yes checking sys/time.h usability... yes checking sys/time.h presence... yes checking for sys/time.h... yes checking for unistd.h... (cached) yes checking for sys/stat.h... (cached) yes checking for sys/types.h... (cached) yes checking sys/socket.h usability... yes checking sys/socket.h presence... yes checking for sys/socket.h... yes checking sys/un.h usability... yes checking sys/un.h presence... yes checking for sys/un.h... yes checking netinet/in.h usability... yes checking netinet/in.h presence... yes checking for netinet/in.h... yes checking netinet/tcp.h usability... yes checking netinet/tcp.h presence... yes checking for netinet/tcp.h... yes checking for an ANSI C-conforming const... yes checking whether byte ordering is bigendian... no checking whether time.h and sys/time.h may both be included... yes checking for pid_t... yes checking vfork.h usability... no checking vfork.h presence... no checking for vfork.h... no checking for fork... yes checking for vfork... yes checking for working fork... yes checking for working vfork... (cached) yes checking return type of signal handlers... void checking for memset... yes checking for mkdir... yes checking for rmdir... yes checking for select... yes checking for socket... yes checking for library containing crypt... none required checking crypt.h usability... no checking crypt.h presence... no checking for crypt.h... no checking for socklen_t... yes checking for connect... yes checking for dlopen in -ldl... yes configure: creating ./config.status config.status: creating src/Makefile config.status: creating src/client/cxx/Makefile config.status: creating src/config.h ** libs ** arch - i386 gcc -arch i386 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -I/usr/local/include -DDAEMON -Iinclude -I. -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -fPIC -g -O2 -c Rserv.c -o Rserv.o gcc -arch i386 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -I/usr/local/include -DDAEMON -Iinclude -I. -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -fPIC -g -O2 -c session.c -o session.o gcc -arch i386 -std=gnu99 -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -I/usr/local/include -DDAEMON -Iinclude -I. -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -fPIC -g -O2 -c md5.c -o md5.o gcc -arch i386 -std=gnu99 Rserv.o session.o md5.o -o Rserve -F/Library/Frameworks/R.framework/.. -framework R -ldl cp Rserve Rserve.so gcc -arch i386 -std=gnu99 -Iinclude -I. -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/i386 -c Rserv.c -o Rserv_d.o -DNODAEMON -DRSERV_DEBUG -g -I/usr/local/include -g -O2 gcc -arch i386 -std=gnu99 Rserv_d.o session.o md5.o -o Rserve.dbg -F/Library/Frameworks/R.framework/.. -framework R -ldl cp Rserve Rserve-bin.so cp Rserve.dbg Rserve-dbg.so ./mergefat Rserve "/Library/Frameworks/R.framework/Resources/bin/Rserve" ./mergefat Rserve.dbg "/Library/Frameworks/R.framework/Resources/bin/Rserve.dbg" ** R ** inst ** preparing package for lazy loading ** help *** installing help indices >>> Building/Updating help pages for package 'Rserve' Formats: text html latex example Rclient text html latex example Rserv text html latex ** building package indices ... * DONE (Rserve)
% R CMD Rserve R version 2.9.2 (2009-08-24) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Rserv started in daemon mode.
Rserve のサーバがうごいたー。
% cat rserve-test.rb require 'rubygems' require 'rserve' include Rserve c = Connection.new x = c.eval("R.version.string"); puts x.as_string
テストコードを動かしてみる。
% ruby-1.8 rserve-test.rb R version 2.9.2 (2009-08-24)
オッケー。
ここで、BioConductor の GeneR パッケージをインストールする。
% R > source("http://bioconductor.org/biocLite.R") > biocLite()
かなーり待たされてインストール完了。 しかし GeneR はこの中に入っていないらしい。。
> library(GeneR) 以下にエラー library(GeneR) : 'GeneR' という名前のパッケージはありません
http://www.bioconductor.org/packages/2.3/bioc/html/GeneR.html によると、
> source("http://bioconductor.org/biocLite.R") > biocLite("GeneR") Using R version 2.9.2, biocinstall version 2.4.13. Installing Bioconductor version 2.4 packages: [1] "GeneR" Please wait... URL 'http://bioconductor.org/packages/2.4/bioc/bin/macosx/universal/contrib/2.9/GeneR_2.14.0.tgz' を試しています Content type 'application/x-gzip' length 411393 bytes (401 Kb) 開かれた URL ================================================== downloaded 401 Kb The downloaded packages are in /var/folders/lt/ltVmCLsiF3mLKUpLCN3GlU+++TM/-Tmp-//RtmpHRXj4g/downloaded_packages
とするらしい。
> library(GeneR) 次のパッケージを付け加えます: 'GeneR' The following object(s) are masked from package:utils : relist
ちょっと試してみる。
> seq = "atgacagtggcgagttacagtatggtgctgtgtggctcatctgatgatca + tcgctatcgaggcagaatcgaaaaagtaaaattcggcgtacccataaacg + aagcatttgcccatgacattcccgccacgcttctcatgctcttgctcaaa + gtgaacaaggatggacccgcgaaaaaggatatttggcgagcgcccggaaa + tcaggctcaagtgcgaaaattgtcgcaagtgatgcaacacgggcggcttg + taaatatcgagaatttcacggtttacacggcggcatctgtcatcaaaaag + tttctttcaaagttgccaaacggcatttttggacgggataatgaggagac + actgttcaatagtgcatcgactggaatggatattgagaagcagagacagg + tgttttataggatatttggatcacttccagtcgcatcccaacacttgctc + gtcctacttttcggcacatttcgggtcgtcgccgactcgtcggacggtca + ttcgaacgcgatgaacccgaatgcgatcgcgatttcggtggcaccatcgc + tttttcacacttgtatacacgatggacggacggcgcgagtagaagacctt + caacggttcaagctggcctcgaacattgtgtgctcgataatttgctcatt + cggcgacacgaagctcttcccacgcgagtgctacgagtattacgccagat + acacgggtcgcacgttgcgaatcgacgagaatcgaatgttcacttttcat + aatccatccaaccgtcgtgctcgtggcgaagagttctccgcgttggcggc + aaagtgtgcgggcgcctactcgctggccgccatccacctggccgaagaag + cgtcaccggagcccactccgacaacctcgaagcctccacgtggcaacggc + gtcgggcgtgccgggagtctgaagcagcacgcgttgacccagacgacgga + tcatccgaagagaagcgtgtcgatcgcggctaaggatccgtatccaactg + atttaaggacatcggtcagctgtgatttttga + " > seq [1] "atgacagtggcgagttacagtatggtgctgtgtggctcatctgatgatca\ntcgctatcgaggcagaatcgaaaaagtaaaattcggcgtacccataaacg\naagcatttgcccatgacattcccgccacgcttctcatgctcttgctcaaa\ngtgaacaaggatggacccgcgaaaaaggatatttggcgagcgcccggaaa\ntcaggctcaagtgcgaaaattgtcgcaagtgatgcaacacgggcggcttg\ntaaatatcgagaatttcacggtttacacggcggcatctgtcatcaaaaag\ntttctttcaaagttgccaaacggcatttttggacgggataatgaggagac\nactgttcaatagtgcatcgactggaatggatattgagaagcagagacagg\ntgttttataggatatttggatcacttccagtcgcatcccaacacttgctc\ngtcctacttttcggcacatttcgggtcgtcgccgactcgtcggacggtca\nttcgaacgcgatgaacccgaatgcgatcgcgatttcggtggcaccatcgc\ntttttcacacttgtatacacgatggacggacggcgcgagtagaagacctt\ncaacggttcaagctggcctcgaacattgtgtgctcgataatttgctcatt\ncggcgacacgaagctcttcccacgcgagtgctacgagtattacgccagat\nacacgggtcgcacgttgcgaatcgacgagaatcgaatgttcacttttcat\naatccatccaaccgtcgtgctcgtggcgaagagttctccgcgttggcggc\naaagtgtgcgggcgcctactcgctggccgccatccacctggccgaagaag\ncgtcaccggagcccactccgacaacctcgaagcctccacgtggcaacggc\ngtcgggcgtgccgggagtctgaagcagcacgcgttgacccagacgacgga\ntcatccgaagagaagcgtgtcgatcgcggctaaggatccgtatccaactg\natttaaggacatcggtcagctgtgatttttga\n" > strTranslate(seq) [1] "MTVASYSMVLCGSSDD-SLSRQNRKSKIRRTHK-KHLPMTFPPRFSCSCS-VNKDGPAKKDIWRAPG-SGSSAKIVASDATRAA-*ISRISRFTRRHLSSK-FLSKLPNGIFGRDNEE-TVQ*CIDWNGY*EAET-CFIGYLDHFQSHPNTC-VLLFGTFRVVADSSDG-FERDEPECDRDFGGTI-FFTLVYTMDGRRE*KT-QRFKLASNIVCSIICS-RRHEALPTRVLRVLRQ-TRVARCESTRIECSLF-NPSNRRARGEEFSALA-KVCGRLLAGRHPPGRR-RHRSPLRQPRSLHVAT-VGRAGSLKQHALTQTT-SSEEKRVDRG*GSVSN-I*GHRSAVIF-"
大丈夫そう。
頂いた Python 用のテストプログラム:
# Read a FASTA file a number of times (default once), translate # using R/Bioconductor GeneR and print to STDOUT # # Usage: # # python DNAtranslate.py dna.fa [n] # # Example: # # python DNAtranslate.py ../../../test/data/test-dna.fa # verbose=False import sys import time from Bio.Seq import Seq from Bio import SeqIO from Bio.Alphabet import generic_dna import subprocess import pyRserve fn = sys.argv[1] times = 1 if len(sys.argv) > 2: times = int(sys.argv[2]) # Start the RServer subprocess.Popen([r"R","CMD", "Rserve"], stdout=subprocess.PIPE).wait() time.sleep(0.5) conn = pyRserve.rconnect() conn('library(GeneR)') if verbose: print >> sys.stderr, 'Biopython translate ',fn, ':', times for i in range(0, times): if verbose: print >> sys.stderr, i+1 for seq_record in SeqIO.parse(fn, "fasta", generic_dna): print ">",seq_record.id ntseq = str(seq_record.seq) print conn('strTranslate("'+ntseq+'")') # Kill the RServer subprocess.Popen([r"killall", "Rserve"], stdout=subprocess.PIPE)
これを Ruby に翻訳
require 'rubygems' require 'rserve' require 'bio' rserve = Rserve::Connection.new rserve.eval('library(GeneR)') fasta = ARGV.shift repeat = (ARGV.shift || 1).to_i repeat.times do Bio::FlatFile.auto(fasta).each do |entry| puts ">#{entry.entry_id}" ntseq = entry.seq result = rserve.eval(%Q[strTranslate("#{ntseq}")]) puts result.as_string end end
実行してみる
% ruby-1.8 DNAtranslate-rserve.rb test-dna.fa >2L52.1 MSMVRNVSNQSEKLEILSCKWVGCLKSTEVFKTVEKLLDHVTADHIPEVIVNDDGSEEVVCQWDCCEMGASRGNLQKKKEWMENHFKTRHVRKAKIFKCLIEDCPVVKSSSQEIETHLRISHPINPKKERLKEFKSSTDHIEPTQANRVWTIVNGEVQWKTPPRVKKKTVIYYDDGPRYVFPTGCARCNYDSDESELESDEFWSATEMSDNEEVYVNFRGMNCISTGKSASMVPSKRRNWPKRVKKRLSTQRNNQKTIRPPELNKNNIEIKDMNSNNLEERNREECIQPVSVEKNILHFEKFKSNQICIVRENNKFREGTRRRRKNSGESEDLKIHENFTEKRRPIRSCKQNISFYEMDGDIEEFEVFFDTPTKSKKVLLDIYSAKKMPKIEVEDSLVNKFHSKRPSRACRVLGSMEEVPFDVEIGY* >2RSSE.1 MTVASYSMVLCGSSDDHRYRGRIEKVKFGVPINEAFAHDIPATLLMLLLKVNKDGPAKKDIWRAPGNQAQVRKLSQVMQHGRLVNIENFTVYTAASVIKKFLSKLPNGIFGRDNEETLFNSASTGMDIEKQRQVFYRIFGSLPVASQHLLVLLFGTFRVVADSSDGHSNAMNPNAIAISVAPSLFHTCIHDGRTARVEDLQRFKLASNIVCSIICSFGDTKLFPRECYEYYARYTGRTLRIDENRMFTFHNPSNRRARGEEFSALAAKCAGAYSLAAIHLAEEASPEPTPTTSKPPRGNGVGRAGSLKQHALTQTTDHPKRSVSIAAKDPYPTDLRTSVSCDF* :
計測
% time ruby-1.9 DNAtranslate-rserve.rb test-dna.fa > /dev/null ruby-1.9 DNAtranslate-rserve.rb test-dna.fa > /dev/null 4.16s user 0.25s system 76% cpu 5.737 total % time ruby-1.9 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null ruby-1.9 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null 39.71s user 1.90s system 75% cpu 55.061 total
% time ruby-1.8 DNAtranslate-rserve.rb test-dna.fa > /dev/null ruby-1.8 DNAtranslate-rserve.rb test-dna.fa > /dev/null 13.69s user 0.28s system 87% cpu 16.010 total % time ruby-1.8 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null ruby-1.8 DNAtranslate-rserve.rb test-dna.fa 10 > /dev/null 135.66s user 2.56s system 82% cpu 2:46.68 total
RSRuby
% sudo gem install rsruby -- --with-R-dir=/Library/Frameworks/R.framework/Resources Building native extensions. This could take a while... Successfully installed rsruby-0.5.1.1 1 gem installed Installing ri documentation for rsruby-0.5.1.1... Installing RDoc documentation for rsruby-0.5.1.1...
ENV['R_HOME'] = '/Library/Frameworks/R.framework/Resources' require 'rubygems' require 'rsruby' require 'bio' r = RSRuby.instance r.library('GeneR') fasta = ARGV.shift repeat = (ARGV.shift || 1).to_i repeat.times do Bio::FlatFile.auto(fasta).each do |entry| puts ">#{entry.entry_id}" ntseq = entry.seq puts r.eval_R(%Q[aaseq <- strTranslate("#{ntseq}")]) end end
% R > source("http://bioconductor.org/biocLite.R") > biocLite("GeneR") % time ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa 次のパッケージを付け加えます: 'GeneR' The following object(s) are masked from 'package:utils': relist >2L52.1 MSMVRNVSNQSEKLEILSCKWVGCLKSTEVFKTVEKLLDHVTADHIPEVIVNDDGSEEVVCQWDCCEMGASRGNLQKKKEWMENHFKTRHVRKAKIFKCLIEDCPVVKSSSQEIETHLRISHPINPKKERLKEFKSSTDHIEPTQANRVWTIVNGEVQWKTPPRVKKKTVIYYDDGPRYVFPTGCARCNYDSDESELESDEFWSATEMSDNEEVYVNFRGMNCISTGKSASMVPSKRRNWPKRVKKRLSTQRNNQKTIRPPELNKNNIEIKDMNSNNLEERNREECIQPVSVEKNILHFEKFKSNQICIVRENNKFREGTRRRRKNSGESEDLKIHENFTEKRRPIRSCKQNISFYEMDGDIEEFEVFFDTPTKSKKVLLDIYSAKKMPKIEVEDSLVNKFHSKRPSRACRVLGSMEEVPFDVEIGY* :
% time ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa > /dev/null ruby-1.8 DNAtranslate-rsruby.rb test-dna.fa > /dev/null 2.56s user 0.16s system 68% cpu 3.968 total % time ruby-1.9 DNAtranslate-rsruby.rb test-dna.fa > /dev/null ruby-1.9 DNAtranslate-rsruby.rb test-dna.fa > /dev/null 2.48s user 0.17s system 65% cpu 4.061 total
かなり高速に処理できるようだ。
すみません、下記情報は R のバージョンが古いのが問題でした。ハズカシイ。指摘していただいた西山さんありがとうございます(片山 2010/11/5 追記)
しかし R.h が見つからないといわれる
% sudo gem-1.8 install rsruby -- --with-R-dir=/Library/Frameworks/R.framework/Resources --with-R-include=/Library/Frameworks/R.framework/Resources/include Building native extensions. This could take a while... ERROR: Error installing rsruby: ERROR: Failed to build gem native extension. /usr/local/bin/ruby-1.8 extconf.rb --with-R-dir=/Library/Frameworks/R.framework/Resources --with-R-include=/Library/Frameworks/R.framework/Resources/include checking for main() in -lR... yes checking for R.h... no ERROR: Cannot find the R header, aborting. *** extconf.rb failed *** Could not create Makefile due to some reason, probably lack of necessary libraries and/or headers. Check the mkmf.log file for more details. You may need configuration options. Provided configuration options: --with-opt-dir --without-opt-dir --with-opt-include --without-opt-include=${opt-dir}/include --with-opt-lib --without-opt-lib=${opt-dir}/lib --with-make-prog --without-make-prog --srcdir=. --curdir --ruby=/usr/local/bin/ruby-1.8 --with-R-dir --with-R-include=${R-dir}/include --with-R-lib --without-R-lib=${R-dir}/lib --with-Rlib --without-Rlib Gem files will remain installed in /usr/local/lib/ruby/gems/1.8/gems/rsruby-0.5.1.1 for inspection. Results logged to /usr/local/lib/ruby/gems/1.8/gems/rsruby-0.5.1.1/ext/gem_make.out
ちゃんと extconf.rb にオプションは渡っているし
% ls /Library/Frameworks/R.framework/Resources/include R.h Rdefines.h Rinternals.h S.h ppc/ R_ext/ Rembedded.h Rmath.h i386/ Rconfig.h Rinterface.h Rversion.h libintl.h
ここに R.h あるんだけどな。。
% cd /usr/local/lib/ruby/gems/1.8/gems/rsruby-0.5.1.1/ext % sudo ruby-1.8 -rmkmf -e 'create_makefile("rsruby_c")' creating Makefile % sudo vi Makefile #INCFLAGS = -I. -I$(topdir) -I$(hdrdir) -I$(srcdir) INCFLAGS = -I. -I$(topdir) -I$(hdrdir) -I$(srcdir) -I/Library/Frameworks/R.framework/Resources/include #ldflags = -L. ldflags = -L. -L/Library/Frameworks/R.framework/Resources/lib
無理やりガリガリやっつけてみる。
% make gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I. -I/Library/Frameworks/R.framework/Resources/include -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -fno-common -g -O2 -pipe -fno-common -c Converters.c In file included from /Library/Frameworks/R.framework/Resources/include/R.h:40, from ./rsruby.h:37, from Converters.c:32: /Library/Frameworks/R.framework/Resources/include/Rconfig.h:9:28: error: x86_64/Rconfig.h: No such file or directory Converters.c: In function ‘to_ruby_vector’: Converters.c:356: warning: assignment discards qualifiers from pointer target type Converters.c:384: warning: assignment discards qualifiers from pointer target type Converters.c: In function ‘to_ruby_hash’: Converters.c:601: warning: assignment discards qualifiers from pointer target type {standard input}:unknown:FATAL:can't create output file: Converters.o make: *** [Converters.o] Error 1
どうも arch がちゃんと設定されていないらしい。
% ls /Library/Frameworks/R.framework/Resources/include R.h Rdefines.h Rinternals.h S.h ppc/ R_ext/ Rembedded.h Rmath.h i386/ Rconfig.h Rinterface.h Rversion.h libintl.h % less /Library/Frameworks/R.framework/Resources/include/Rconfig.h /* This is an automatically generated universal stub for architecture-dependent headers. */ #ifdef __i386__ #include "i386/Rconfig.h" #elif defined __ppc__ #include "ppc/Rconfig.h" #elif defined __ppc64__ #include "ppc64/Rconfig.h" #elif defined __x86_64__ #include "x86_64/Rconfig.h" #elif defined __arm__ #include "arm/Rconfig.h" #else #error "Unsupported architecture." #endif
R のパッケージは i386 と ppc に対応しているけど、どうも x86_64 を期待しているらしい。
% sudo vi Makefile #CPPFLAGS = -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE $(DEFS) $(cppflags) CPPFLAGS = -D__i386__ -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE $(DEFS) $(cppflags)
無理やり指定する。
% sudo make gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I. -I/Library/Frameworks/R.framework/Resources/include -D__i386__ -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -fno-common -g -O2 -pipe -fno-common -c Converters.c Converters.c: In function ‘to_ruby_vector’: Converters.c:356: warning: assignment discards qualifiers from pointer target type Converters.c:384: warning: assignment discards qualifiers from pointer target type Converters.c: In function ‘to_ruby_hash’: Converters.c:601: warning: assignment discards qualifiers from pointer target type gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I. -I/Library/Frameworks/R.framework/Resources/include -D__i386__ -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -fno-common -g -O2 -pipe -fno-common -c R_eval.c R_eval.c: In function ‘get_last_error_msg’: R_eval.c:143: warning: return discards qualifiers from pointer target type gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I. -I/Library/Frameworks/R.framework/Resources/include -D__i386__ -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -fno-common -g -O2 -pipe -fno-common -c robj.c gcc -I. -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I/usr/local/lib/ruby/1.8/i686-darwin10.0.0 -I. -I/Library/Frameworks/R.framework/Resources/include -D__i386__ -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -fno-common -g -O2 -pipe -fno-common -c rsruby.c cc -dynamic -bundle -undefined suppress -flat_namespace -o rsruby_c.bundle Converters.o R_eval.o robj.o rsruby.o -L. -L/usr/local/lib -L. -L/Library/Frameworks/R.framework/Resources/lib -ldl -lobjc
コンパイルは通った。しかしこれをインストールする方法がわからない。。
とりあえず gem を分解してみてみる。
% tar xvfz /usr/local/lib/ruby/gems/1.8/cache/rsruby-0.5.1.1.gem data.tar.gz metadata.gz % sudo tar xvfz data.tar.gz x History.txt x License.txt x Manifest.txt x README.txt x Rakefile.rb x examples/arrayfields.rb x examples/bioc.rb x examples/dataframe.rb x examples/erobj.rb x ext/Converters.c x ext/Converters.h x ext/R_eval.c x ext/R_eval.h x ext/extconf.rb x ext/robj.c x ext/rsruby.c x ext/rsruby.h x lib/rsruby.rb x lib/rsruby/dataframe.rb x lib/rsruby/erobj.rb x lib/rsruby/robj.rb x test/table.txt x test/tc_array.rb x test/tc_boolean.rb x test/tc_cleanup.rb x test/tc_eval.rb x test/tc_extensions.rb x test/tc_init.rb x test/tc_io.rb x test/tc_library.rb x test/tc_matrix.rb x test/tc_modes.rb x test/tc_robj.rb x test/tc_sigint.rb x test/tc_to_r.rb x test/tc_to_ruby.rb x test/tc_util.rb x test/tc_vars.rb x test/test_all.rb % sudo gzcat metadata.gz > metadata
こういう構造になってるんですね。metadata が gemspec ぽい。 しかし、時間切れのためここで挫折。。
RSOAP
RSOAP は Ruby 1.9 で SOAP が使えないのと RSOAP サーバの準備大変そうなのであきらめた。