Quantcast
Channel: r software hub
Viewing all articles
Browse latest Browse all 1015

Craig Venter’s first chromosome

$
0
0

By Isomorphismes

is, I think, the one you can find at sacred-texts.org.

curl -O 'http://www.sacred-texts.com/dna/hgp011k.htm'  #get it

#BORING DATA JANITORSHIP
tail -n +15 hgp011k.htm > hgp011k   #remove the HTML head stuff .. up to 
head -n -3 hgp011k | sponge hgp011k #remove the HTML tail

#the `sponge` nonsense is because `command  file` will just blank your file
#`sponge` holds the output in a temp/swap for a sec, then writes > to file
#you can also slow your shell down by wrapping `command` in this bit of nonsense:
echo "`head -n -3 hgp011k`" > hgp011k

#now it's almost clean … just tattied with needless line endings
tr -d 'r' 

So that's a bit of unix 101 / datacleaning 101. Now open up an R terminal for the fun part:

craig.v     A     C     G     T 
14941 15080 15210 14769

A good time was had by all.

Why don't we do the same thing with π? Unlike Dr V's DNA, I don't have to get all wet and bloody acquiring as much of this data as I want. I do have to set some limits on how long to run the Berkeley Calculator though.

echo "scale=22222; a(1)*4" | bc -l  > pi.22222 #a(1) = arctan(1) = a quarter-circle
less pi.22222   #needs cleanup
echo "scale=22222; a(1)*4" | bc -l | tr -d 'n' | tr -d ''  > pi.22222
#one-liner! and it feels so good…

That was comparatively easier than scrolling through the HTML file to find the beginning of what we really wanted. R me the rock:

pi.2      0     1     2     3     4     5     6     7  ...read more

Source:: r-bloggers.com


Viewing all articles
Browse latest Browse all 1015

Trending Articles