If you have MiSeq and ONT MinION data, it is now very easy to combine them into a single-contig, high quality assembly. Here I will start with a base Ubuntu 14.04 LTS distribution and take you through every step, including software installation. I am starting with an Amazon EC2 m4.xlarge AMI with 4 vCPUs and 16Gb RAM. If you want to repeat the work below, you don’t need to use EC2, any computer running Ubuntu and with enough disk/RAM should work.
For MinION data, we will use Nick’s SQK-MAP-006 data at the EBI. Using a simple trick, and because I know the internal structure of the tar archive, we can download only the base-called data:
curl ftp://ftp.sra.ebi.ac.uk/vol1/ERA540/ERA540530/oxfordnanopore_native/MAP006-1.tar | tar -xf - MAP006-1/MAP006-1_downloads &
We can also get some MiSeq data from ENA from the same strain, E coli K-12 MG1655:
wget -q ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR262/005/SRR2627175/SRR2627175_2.fastq.gz & wget -q ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR262/005/SRR2627175/SRR2627175_1.fastq.gz &
Now, while that is downloading, we need to install some software on the server:
sudo apt-get update sudo apt-get install r-base-core git samtools
This installs R which we will use to extract the MinION data (using poRe), git and samtools. Just reply “Y” to any questions.
We now need to install the poRe dependencies in R, which is very easy:
R biocLite("rhdf5") install.packages(c("shiny","bit64","data.table","svDialogs")) q()
R may ask if you want to install into a local library, just say Y and accept defaults. We need to download poRe from sourecforge and we are using version 0.16
Once downloaded, and back at the Linux command line:
R CMD INSTALL poRe_0.16.tar.gz
The fastq extraction scripts for poRe are in github, so let’s go get those:
git clone https://github.com/mw55309/poRe_scripts.git
We will assemble using SPAdes, so let’s go get that:
http://spades.bioinf.spbau.ru/release3.6.2/SPAdes-3.6.2-Linux.tar.gz gunzip < SPAdes-3.6.2-Linux.tar.gz | tar xvf -
Now, we are ready to go. First off, let’s extract the 2D sequence data as …read more
Source:: biomickwatson.wordpress.com