A command line tool allows users to query the peptide sequences against their own customized protein sequence database.
The tool provides two major functionalities:
- Given a protein sequence database in FASTA format, create the Lucene index for it.
- Query the peptide sequences against the above index. The query can be:
- A peptide sequence or a comma-separated list of peptide sequences or
- A file in either FASTA format or a list of peptide sequences, one sequence per line.
From Native OS
The runnable jar can be downloaded at here. The source code is also availabe at here. The software is released under GNU General Public License.
Run from executable jar
$ java -jar PeptideMatchCMD_1.1.jar -h
Command line options: -h
usage: java -jar PeptideMatchCMD_1.1.jar [options]
Available options:
------------------
-a,--action The action to perform ("index" or "query").
-d,--dataFile The path to a FASTA file to be indexed.
-e,--LeqI Treat Leucine (L) and Isoleucine (I) as
equivalent (default: no).
-f,--force Overwrite the indexDir (default: no).
-h,--help Print this message.
-i,--indexDir The directory where the index is stored.
-l,--list The query peptide sequence file is a list of
peptide sequences, one sequence per line
(default: no).
-o,--outputFile The path to the query result file.
-Q,--queryFile The path to the query peptide sequence file in
either FASTA format or a list of peptide
sequences, one sequence per line.
-q,--query One peptide sequence or a comma-separated list of
peptide sequences.
Compile from source
$ unzip PeptideMatchCMD_src_1.1.zip
$ cd PeptideMatchCMD_src_1.1
$ ant
$ java -jar PeptideMatchCMD_1.1.jar -h
Tutorial
- Creating Lucene index using a protein sequence database in FASTA format:
$ java -jar PeptideMatchCMD_1.1.jar -a index -d uniprot_sprot.fasta -i sprot_index
Command line: -a index -d uniprot_sprot.fasta -i sprot_index
Indexing to directory "sprot_index" ...
Indexing "uniprot_sprot.fasta" ...
Indexing "uniprot_sprot.fasta" finished
Time used: 00 hours, 06 mins, 31.215 seconds
- Query a peptide sequence:
$ java -jar PeptideMatchCMD_1.1.jar -a query -i sprot_index -q AAFGGSGGR -o out.txt
Command line: -a query -i sprot_index -q AAFGGSGGR -o out.txt
Quering...
AAFGGSGGR has 1 match
Query is finished.
The result is saved in "out.txt".
Time used: 00 hours, 00 mins, 00.457 seconds
$ cat out.txt
#Command line: -a query -i sprot_index -q AAFGGSGGR -o out.txt
##Query Subject SubjectLength MatchStart MatchEnd
AAFGGSGGR sp|P35908|K22E_HUMAN 639 516 524
- Query a list of peptide sequences:
$ java -jar PeptideMatchCMD_1.1.jar -a query -i sprot_index -q AAFGGSGGR,GVPDIR -o out.txt
Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -o out.txt
Quering...
AAFGGSGGR has 1 match
GVPDIR has 4 matches
Query is finished.
The result is saved in "out.txt".
Time used: 00 hours, 00 mins, 00.493 seconds
$ cat out.txt
#Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -o out.txt
##Query Subject SubjectLength MatchStart MatchEnd
AAFGGSGGR sp|P35908|K22E_HUMAN 639 516 524
GVPDIR sp|Q9CK59|Y1775_PASMU 92 45 50
GVPDIR sp|B1Y8E7|PYRB_LEPCP 320 194 199
GVPDIR sp|B4SHE6|MURD_PELPB 464 252 257
GVPDIR sp|Q6FX42|ATR_CANGA 2379 1135 1140
- Query a list of peptide sequences and treat Leucine (L) and Isoleucine (I) as equivalent:
$ java -jar PeptideMatchCMD_1.1.jar -a query -i sprot_index -q AAFGGSGGR,GVPDIR -e -o out.txt
Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -e -o out.txt
Quering...
AAFGGSGGR has 1 match
GVPDIR has 13 matches
Query is finished.
The result is saved in "out.txt".
Time used: 00 hours, 00 mins, 00.513 seconds
$ cat out.txt
#Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -e -o out.txt
##Query Subject SubjectLength MatchStart MatchEnd MatchedLEqIPositions
AAFGGSGGR sp|P35908|K22E_HUMAN 639 516 524
GVPDIR sp|Q9CK59|Y1775_PASMU 92 45 50
GVPDIR sp|A0R5Z2|GLFT1_MYCS2 302 182 187 186
GVPDIR sp|Q7D4V6|GLFT1_MYCTU 304 179 184 183
GVPDIR sp|B1Y8E7|PYRB_LEPCP 320 194 199
GVPDIR sp|A5GDX3|RECF_GEOUR 364 126 131 130
GVPDIR sp|P96919|EX5A_MYCTU 575 138 143 142
GVPDIR sp|Q17QV2|MON1A_BOVIN 555 441 446 445
GVPDIR sp|Q2QZ37|OBGM_ORYSJ 528 500 505 504
GVPDIR sp|B4SHE6|MURD_PELPB 464 252 257
GVPDIR sp|Q9M1G3|LRK16_ARATH 669 595 600 599
GVPDIR sp|Q5U3H2|SV421_DANRE 808 575 580 579
GVPDIR sp|A6H5Y3|METH_MOUSE 1253 1147 1152 1151
GVPDIR sp|Q6FX42|ATR_CANGA 2379 1135 1140
- Query peptides in a FASTA file:
$ java -jar PeptideMatchCMD_1.1.jar -a query -i sprot_index -Q query.fasta -e -o out_fasta.txt
Command line: -a query -i sprot_index -Q query.fasta -e -o out_fasta.txt
Quering...
example_1 has 1 match
example_2 has 1 match
example_3 has 1 match
example_4 has 1 match
example_5 has 1 match
example_6 has 1 match
example_7 has 1 match
example_8 has 1 match
example_9 has 1 match
example_10 has 1 match
Query is finished.
The result is saved in "out_fasta.txt".
Time used: 00 hours, 00 mins, 00.724 seconds
- Query peptides in a list file, one peptide per line:
$ java -jar PeptideMatchCMD_1.1.jar -a query -i sprot_index -Q query.list -l -e -o out_list.txt
Command line: -a query -i sprot_index -Q query.list -l -e -o out_list.txt
Quering...
AAFGGSGGR has 1 match
ELEVQSEDGTFAK has 1 match
FEDPAEGEDTLVEK has 1 match
FSDGLITPDFLAK has 1 match
GAPEFWAAR has 1 match
GVIEANGGKVEK has 1 match
HIPVYVSEEMVGHKFGEFSPTR has 1 match
HNDVNFGTQDHNR has 1 match
IGFYLTTCPR has 1 match
ILVGQGNDGVAFVK has 1 match
Query is finished.
The result is saved in "out_list.txt".
Time used: 00 hours, 00 mins, 00.752 seconds
From Docker Container
- Set up local working directory to hold input and output files. It will be mounted into Docker container.
$ mkdir /your/localworkdir/
$ cd /your/localworkdir/
$ ls
uniprot_sprot.fasta query.list query.fasta
- Creating Lucene index using a protein sequence database in FASTA format:
$ docker run -v /your/localworkdir/:/workdir chenc/peptidematch \
-a index -d /workdir/uniprot_sprot.fasta -i /workdir/uniprot_sprot_index -f
Unable to find image 'chenc/peptidematch:latest' locally
latest: Pulling from chenc/peptidematch
7448db3b31eb: Pull complete
c36604fa7939: Pull complete
29e8ef0e3340: Pull complete
a0c934d2565d: Pull complete
a360a17c9cab: Pull complete
cfcc996af805: Pull complete
2cf014724202: Pull complete
4bc402a00dfe: Pull complete
1da5b1324a69: Pull complete
Digest: sha256:923a488fad501b35de6629309a02f6aa786d42edb7aa0666691aa861bbfd831f
Status: Downloaded newer image for chenc/peptidematch:latest
Command line options: -a index -d /workdir/uniprot_sprot.fasta -i /workdir/uniprot_sprot_index -f
Indexing to directory "/workdir/uniprot_sprot_index" ...
Indexing "/workdir/uniprot_sprot.fasta" ...
Indexing "/workdir/uniprot_sprot.fasta" finished
Time used: 00 hours, 03 mins, 31.116 seconds
- Query a peptide sequence:
$ docker run -v /your/localworkdir/:/workdir chenc/peptidematch \
-a query -q NEKKQQMGKEYREKIEAEL -i /workdir/uniprot_sprot_index -o /workdir/single_query_out.txt
Command line options: -a query -q NEKKQQMGKEYREKIEAEL -i /workdir/uniprot_sprot_index -o /workdir/single_query_out.txt
Quering...
NEKKQQMGKEYREKIEAEL has 6 matches
Query is finished.
The result is saved in "/workdir/single_query_out.txt".
Time used: 00 hours, 00 mins, 00.935 seconds
- Query a list of peptide sequences:
$ docker run -v /your/localworkdir/:/workdir chenc/peptidematch \
-a query -q NEKKQQMGKEYREKIEAEL,EAFEISKKE -i /workdir/uniprot_sprot_index -o /workdir/multi_query_out.txt
Command line options: -a query -q NEKKQQMGKEYREKIEAEL,EAFEISKKE -i /workdir/uniprot_sprot_index -o /workdir/multi_query_out.txt
Quering...
NEKKQQMGKEYREKIEAEL has 6 matches
EAFEISKKE has 15 matches
Query is finished.
The result is saved in "/workdir/multi_query_out.txt".
Time used: 00 hours, 00 mins, 00.685 seconds
- Query peptides in a FASTA file:
$ docker run -v /your/localworkdir/:/workdir chenc/peptidematch \
-a query -Q /workdir/query.fasta -i /workdir/uniprot_sprot_index -o /workdir/fasta_query_out.txt
Command line options: -a query -Q /workdir/query.fasta -i /workdir/uniprot_sprot_index -o /workdir/fasta_query_out.txt
Quering...
example_1 has 1 match
example_2 has 1 match
example_3 has 1 match
example_4 has 1 match
example_5 has 1 match
example_6 has 1 match
example_7 has 1 match
example_8 has 1 match
example_9 has 1 match
example_10 has 1 match
Query is finished.
The result is saved in "/workdir/fasta_query_out.txt".
Time used: 00 hours, 00 mins, 01.733 seconds
- Query peptides in a list file, one peptide per line:
$ docker run -v /your/localworkdir/:/workdir chenc/peptidematch \
-a query -Q /workdir/query.list -l -i /workdir/uniprot_sprot_index -o /workdir/list_query_out.txt
Command line options: -a query -Q /workdir/query.list -l -i /workdir/uniprot_sprot_index -o /workdir/list_query_out.txt
Quering...
AAFGGSGGR has 1 match
ELEVQSEDGTFAK has 1 match
FEDPAEGEDTLVEK has 1 match
FSDGLITPDFLAK has 1 match
GAPEFWAAR has 1 match
GVIEANGGKVEK has 1 match
HIPVYVSEEMVGHKFGEFSPTR has 1 match
HNDVNFGTQDHNR has 1 match
IGFYLTTCPR has 1 match
ILVGQGNDGVAFVK has 1 match
Query is finished.
The result is saved in "/workdir/list_query_out.txt".
Time used: 00 hours, 00 mins, 01.432 seconds
|