Startup Tutorial
First, Check your CPAN modules
ngsShoRT requires the perl modules String::Approx and PerlIO::gzip , which can be installed as follows (you may need admin permissions):
perl -MCPAN -e shell
cpan> install String::Approx
cpan> install PerlIO::gzip
See here for more info on installing the module.
Download and untar ngsShoRT_2.1 in a target directory
tar -xvf /path/to/ngsShoRT_2.1.tar.gz
Run ngsShoRT on the sample_data
Paired-End (PE) fastq files
cd /path/to/ngsShoRT_2.1
perl ngsShoRT.pl -pe1 sample_data/fastq/SRR065390_1_1st_2000reads.fastq.gz \
-pe2 sample_data/fastq/SRR065390_2_1st_2000reads.fastq.gz \
-o sample_data/output_directry -methods 5adpt
This trims the gzipped paired-end files (pe1 = forward reads, pe2=reverse reads) using the 5adpt (removal of 5' adapters/primers, which by default trims known illumina primers) and prints the output in sample_data/output_directory
Your output files should be:
trimmed_SRR065390_1_1st_2000reads.fastq | trimmed pe1 reads |
trimmed_SRR065390_2_1st_2000reads.fastq | trimmed pe2 reads |
surviving_SE_mates.fastq | trimmed pe1 and pe2 whose mate read was filtered out during trimming |
extracted_five_prime_adapter_sequences_at_100_percent_match.txt | |
log.txt | |
final_PE_report.txt | full report of ngsShoRT runtime, % and number of trimmed bases and reads, and method specific trimming statistics. |
Commonly used options include:
-t (numebr of threads) | default is 10 |
-min_rl (minmum trimmed read length) | default is 21 |
-print_discarded_read yes | default is no |
Additional trimming tools can be added to the -methods, e.g., -methods lqr_5adpt will filter out low quality reads before trimming 5'-adapters.
We recommend the trimming methods -methods lqr_5adpt_tera for filtering low-quality reads (reads with > 50% bases having a quality socre < 2), removing their adapter/primer sequences, and trimming their low-quality 3'-end bases.
Single-End(SE) fastq file
perl ngsShoRT.pl -se sample_data/fastq/SRR065390_1_1st_2000reads.fastq \
-o sample_data/output_directry -methods 5adpt
QSEQ files:
You will use the same command as the PE files because ngsShoRT can auto-detect fastq and qseq fiels:
perl ngsShoRT.pl -se sample_data/qseq/SRR065390_1st_2000_reads_qseq.txt \
-o sample_data/output_directry -methods 5adpt
Howver, qseq file quality scoring is based at Phred64 for Illumina 1.8+, so the output files (which will be in fastq format) are going to have this quality scoring as well. If you want them to be Sanger (Phred33) based, add i2s to your method liast to convert them from illumina to Sanger scoring:
perl ngsShoRT.pl -se sample_data/qseq/SRR065390_1st_2000_reads_qseq.txt \
-o sample_data/output_directry -methods 5adpt_i2s
Working with compressed files:
ngsShoRT auto-detects and opens files with the extension .bz2, .gz, and .zip
If you want your trimmed files output to be gzipped, add -gzip to the command. For example,
perl ngsShoRT.pl -se sample_data/qseq/SRR065390_1st_2000_reads_qseq.txt \
-o sample_data/output_directry -methods 5adpt_i2s -gzip
will produce the output file "trimmed_SRR065390_1st_2000_reads_qseq.txt.gz
|