ngsShoRT- A Next Generation Sequencing Short Read Trimmer



Startup Tutorial

First, Check your CPAN modules

ngsShoRT requires the perl modules String::Approx and PerlIO::gzip, which can be installed as follows (you may need admin permissions):

     	perl -MCPAN -e shell
   	cpan> install String::Approx
   	cpan> install PerlIO::gzip
	      

See here for more info on installing the module.

Download and untar ngsShoRT_2.1 in a target directory

	tar -xvf /path/to/ngsShoRT_2.1.tar.gz

Run ngsShoRT on the sample_data

Paired-End (PE) fastq files

	cd /path/to/ngsShoRT_2.1
	perl ngsShoRT.pl -pe1 sample_data/fastq/SRR065390_1_1st_2000reads.fastq.gz \
		-pe2 sample_data/fastq/SRR065390_2_1st_2000reads.fastq.gz \
		-o sample_data/output_directry -methods 5adpt
	

This trims the gzipped paired-end files (pe1 = forward reads, pe2=reverse reads) using the 5adpt (removal of 5' adapters/primers, which by default trims known illumina primers) and prints the output in sample_data/output_directory

Your output files should be:

trimmed_SRR065390_1_1st_2000reads.fastqtrimmed pe1 reads
trimmed_SRR065390_2_1st_2000reads.fastqtrimmed pe2 reads
surviving_SE_mates.fastqtrimmed pe1 and pe2 whose mate read was filtered out during trimming
extracted_five_prime_adapter_sequences_at_100_percent_match.txt
log.txt
final_PE_report.txtfull report of ngsShoRT runtime, % and number of trimmed bases and reads, and method specific trimming statistics.


Commonly used options include:

-t (numebr of threads)default is 10
-min_rl (minmum trimmed read length)default is 21
-print_discarded_read yesdefault is no

Additional trimming tools can be added to the -methods, e.g., -methods lqr_5adpt will filter out low quality reads before trimming 5'-adapters.

We recommend the trimming methods -methods lqr_5adpt_tera for filtering low-quality reads (reads with > 50% bases having a quality socre < 2), removing their adapter/primer sequences, and trimming their low-quality 3'-end bases.

Single-End(SE) fastq file

	perl ngsShoRT.pl -se sample_data/fastq/SRR065390_1_1st_2000reads.fastq \
		-o sample_data/output_directry -methods 5adpt

QSEQ files:

You will use the same command as the PE files because ngsShoRT can auto-detect fastq and qseq fiels:

	perl ngsShoRT.pl -se sample_data/qseq/SRR065390_1st_2000_reads_qseq.txt \
		-o sample_data/output_directry -methods 5adpt

Howver, qseq file quality scoring is based at Phred64 for Illumina 1.8+, so the output files (which will be in fastq format) are going to have this quality scoring as well. If you want them to be Sanger (Phred33) based, add i2s to your method liast to convert them from illumina to Sanger scoring:

	perl ngsShoRT.pl -se sample_data/qseq/SRR065390_1st_2000_reads_qseq.txt \
		-o sample_data/output_directry -methods 5adpt_i2s

Working with compressed files:

ngsShoRT auto-detects and opens files with the extension .bz2, .gz, and .zip

If you want your trimmed files output to be gzipped, add -gzip to the command. For example,

	perl ngsShoRT.pl -se sample_data/qseq/SRR065390_1st_2000_reads_qseq.txt \
		-o sample_data/output_directry -methods 5adpt_i2s -gzip

will produce the output file "trimmed_SRR065390_1st_2000_reads_qseq.txt.gz