Peptide Match - Command Line Tool [PIR - Protein Information Resource]


		Protein Search		Site Search

HOME / Search / Peptide Match - Command Line Tool

API 1.0 | API 2.0

Single Peptide Match

Command Line Tool

A command line tool allows users to query the peptide sequences against their own customized protein sequence database.

The tool provides two major functionalities:

Given a protein sequence database in FASTA format, create the Lucene index for it.
Query the peptide sequences against the above index. The query can be:
- A peptide sequence or a comma-separated list of peptide sequences or
- A file in either FASTA format or a list of peptide sequences, one sequence per line.

From Native OS

The runnable jar can be downloaded at here. The source code is also availabe at here. The software is released under GNU General Public License.

Run from executable jar

$ java -jar PeptideMatchCMD_1.1.jar -h
Command line options: -h 
usage: java -jar PeptideMatchCMD_1.1.jar [options]
            Available options:
            ------------------
 -a,--action        The action to perform ("index" or "query").
 -d,--dataFile      The path to a FASTA file to be indexed.
 -e,--LeqI               Treat Leucine (L) and Isoleucine (I) as
                         equivalent (default: no).
 -f,--force              Overwrite the indexDir (default: no).
 -h,--help               Print this message.
 -i,--indexDir      The directory where the index is stored.
 -l,--list               The query peptide sequence file is a list of
                         peptide sequences, one sequence per line
                         (default: no).
 -o,--outputFile    The path to the query result file.
 -Q,--queryFile     The path to the query peptide sequence file in
                         either FASTA format or a list of peptide
                         sequences, one sequence per line.
 -q,--query         One peptide sequence or a comma-separated list of
                         peptide sequences.

Compile from source

$ unzip PeptideMatchCMD_src_1.1.zip
$ cd PeptideMatchCMD_src_1.1
$ ant
$ java -jar PeptideMatchCMD_1.1.jar -h

Tutorial

Creating Lucene index using a protein sequence database in FASTA format:

$ java -jar PeptideMatchCMD_1.1.jar -a index -d uniprot_sprot.fasta -i sprot_index 
Command line: -a index -d uniprot_sprot.fasta -i sprot_index 
Indexing to directory "sprot_index" ...
Indexing "uniprot_sprot.fasta" ...
Indexing "uniprot_sprot.fasta" finished
Time used: 00 hours, 06 mins, 31.215 seconds

Query a peptide sequence:

$ java -jar PeptideMatchCMD_1.1.jar -a query -i sprot_index -q AAFGGSGGR -o out.txt 
Command line: -a query -i sprot_index -q AAFGGSGGR -o out.txt 
Quering...

AAFGGSGGR	has 1 match

Query is finished.
The result is saved in "out.txt".
Time used: 00 hours, 00 mins, 00.457 seconds

$ cat out.txt 
#Command line: -a query -i sprot_index -q AAFGGSGGR -o out.txt 
##Query	Subject	SubjectLength	MatchStart	MatchEnd
AAFGGSGGR	sp|P35908|K22E_HUMAN	639	516	524

Query a list of peptide sequences:

$ java -jar PeptideMatchCMD_1.1.jar -a query -i sprot_index -q AAFGGSGGR,GVPDIR -o out.txt 
Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -o out.txt 
Quering...

AAFGGSGGR	has 1 match
GVPDIR	has 4 matches

Query is finished.
The result is saved in "out.txt".
Time used: 00 hours, 00 mins, 00.493 seconds

$ cat out.txt 
#Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -o out.txt 
##Query	Subject	SubjectLength	MatchStart	MatchEnd
AAFGGSGGR	sp|P35908|K22E_HUMAN	639	516	524	
GVPDIR	sp|Q9CK59|Y1775_PASMU	92	45	50	
GVPDIR	sp|B1Y8E7|PYRB_LEPCP	320	194	199	
GVPDIR	sp|B4SHE6|MURD_PELPB	464	252	257	
GVPDIR	sp|Q6FX42|ATR_CANGA	2379	1135	1140

Query a list of peptide sequences and treat Leucine (L) and Isoleucine (I) as equivalent:

$ java -jar PeptideMatchCMD_1.1.jar -a query -i sprot_index -q AAFGGSGGR,GVPDIR -e -o out.txt 
Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -e -o out.txt 
Quering...

AAFGGSGGR	has 1 match
GVPDIR	has 13 matches

Query is finished.
The result is saved in "out.txt".
Time used: 00 hours, 00 mins, 00.513 seconds

$ cat out.txt 
#Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -e -o out.txt 
##Query	Subject	SubjectLength	MatchStart	MatchEnd	MatchedLEqIPositions
AAFGGSGGR	sp|P35908|K22E_HUMAN	639	516	524	
GVPDIR	sp|Q9CK59|Y1775_PASMU	92	45	50	
GVPDIR	sp|A0R5Z2|GLFT1_MYCS2	302	182	187	186
GVPDIR	sp|Q7D4V6|GLFT1_MYCTU	304	179	184	183
GVPDIR	sp|B1Y8E7|PYRB_LEPCP	320	194	199	
GVPDIR	sp|A5GDX3|RECF_GEOUR	364	126	131	130
GVPDIR	sp|P96919|EX5A_MYCTU	575	138	143	142
GVPDIR	sp|Q17QV2|MON1A_BOVIN	555	441	446	445
GVPDIR	sp|Q2QZ37|OBGM_ORYSJ	528	500	505	504
GVPDIR	sp|B4SHE6|MURD_PELPB	464	252	257	
GVPDIR	sp|Q9M1G3|LRK16_ARATH	669	595	600	599
GVPDIR	sp|Q5U3H2|SV421_DANRE	808	575	580	579
GVPDIR	sp|A6H5Y3|METH_MOUSE	1253	1147	1152	1151
GVPDIR	sp|Q6FX42|ATR_CANGA	2379	1135	1140

Query peptides in a FASTA file:

$ java -jar PeptideMatchCMD_1.1.jar -a query -i sprot_index -Q query.fasta -e -o out_fasta.txt 
Command line: -a query -i sprot_index -Q query.fasta -e -o out_fasta.txt 
Quering...

example_1	has 1 match
example_2	has 1 match
example_3	has 1 match
example_4	has 1 match
example_5	has 1 match
example_6	has 1 match
example_7	has 1 match
example_8	has 1 match
example_9	has 1 match
example_10	has 1 match

Query is finished.
The result is saved in "out_fasta.txt".
Time used: 00 hours, 00 mins, 00.724 seconds

Query peptides in a list file, one peptide per line:

$ java -jar PeptideMatchCMD_1.1.jar -a query -i sprot_index -Q query.list -l -e -o out_list.txt 
Command line: -a query -i sprot_index -Q query.list -l -e -o out_list.txt 
Quering...

AAFGGSGGR	has 1 match
ELEVQSEDGTFAK	has 1 match
FEDPAEGEDTLVEK	has 1 match
FSDGLITPDFLAK	has 1 match
GAPEFWAAR	has 1 match
GVIEANGGKVEK	has 1 match
HIPVYVSEEMVGHKFGEFSPTR	has 1 match
HNDVNFGTQDHNR	has 1 match
IGFYLTTCPR	has 1 match
ILVGQGNDGVAFVK	has 1 match

Query is finished.
The result is saved in "out_list.txt".
Time used: 00 hours, 00 mins, 00.752 seconds

From Docker Container

Set up local working directory to hold input and output files. It will be mounted into Docker container.

$ mkdir /your/localworkdir/

$ cd /your/localworkdir/

$ ls 
uniprot_sprot.fasta query.list query.fasta

Creating Lucene index using a protein sequence database in FASTA format:

$ docker run -v /your/localworkdir/:/workdir chenc/peptidematch \
	-a index -d /workdir/uniprot_sprot.fasta -i /workdir/uniprot_sprot_index -f
Unable to find image 'chenc/peptidematch:latest' locally
latest: Pulling from chenc/peptidematch
7448db3b31eb: Pull complete 
c36604fa7939: Pull complete 
29e8ef0e3340: Pull complete 
a0c934d2565d: Pull complete 
a360a17c9cab: Pull complete 
cfcc996af805: Pull complete 
2cf014724202: Pull complete 
4bc402a00dfe: Pull complete 
1da5b1324a69: Pull complete 
Digest: sha256:923a488fad501b35de6629309a02f6aa786d42edb7aa0666691aa861bbfd831f
Status: Downloaded newer image for chenc/peptidematch:latest
Command line options: -a index -d /workdir/uniprot_sprot.fasta -i /workdir/uniprot_sprot_index -f 
Indexing to directory "/workdir/uniprot_sprot_index" ...
Indexing "/workdir/uniprot_sprot.fasta" ...
Indexing "/workdir/uniprot_sprot.fasta" finished
Time used: 00 hours, 03 mins, 31.116 seconds

Query a peptide sequence:

$ docker run -v /your/localworkdir/:/workdir chenc/peptidematch \
	-a query -q NEKKQQMGKEYREKIEAEL -i /workdir/uniprot_sprot_index -o /workdir/single_query_out.txt
Command line options: -a query -q NEKKQQMGKEYREKIEAEL -i /workdir/uniprot_sprot_index -o /workdir/single_query_out.txt 
Quering...

NEKKQQMGKEYREKIEAEL	has 6 matches

Query is finished.
The result is saved in "/workdir/single_query_out.txt".
Time used: 00 hours, 00 mins, 00.935 seconds

Query a list of peptide sequences:

$ docker run -v /your/localworkdir/:/workdir chenc/peptidematch \
	-a query -q NEKKQQMGKEYREKIEAEL,EAFEISKKE -i /workdir/uniprot_sprot_index -o /workdir/multi_query_out.txt
Command line options: -a query -q NEKKQQMGKEYREKIEAEL,EAFEISKKE -i /workdir/uniprot_sprot_index -o /workdir/multi_query_out.txt 
Quering...

NEKKQQMGKEYREKIEAEL	has 6 matches
EAFEISKKE	has 15 matches

Query is finished.
The result is saved in "/workdir/multi_query_out.txt".
Time used: 00 hours, 00 mins, 00.685 seconds

Query peptides in a FASTA file:

$ docker run -v /your/localworkdir/:/workdir chenc/peptidematch \
	-a query -Q /workdir/query.fasta -i /workdir/uniprot_sprot_index -o /workdir/fasta_query_out.txt
Command line options: -a query -Q /workdir/query.fasta -i /workdir/uniprot_sprot_index -o /workdir/fasta_query_out.txt 
Quering...

example_1	has 1 match
example_2	has 1 match
example_3	has 1 match
example_4	has 1 match
example_5	has 1 match
example_6	has 1 match
example_7	has 1 match
example_8	has 1 match
example_9	has 1 match
example_10	has 1 match

Query is finished.
The result is saved in "/workdir/fasta_query_out.txt".
Time used: 00 hours, 00 mins, 01.733 seconds

Query peptides in a list file, one peptide per line:

$ docker run -v /your/localworkdir/:/workdir chenc/peptidematch \
	-a query -Q /workdir/query.list -l -i /workdir/uniprot_sprot_index -o /workdir/list_query_out.txt
Command line options: -a query -Q /workdir/query.list -l -i /workdir/uniprot_sprot_index -o /workdir/list_query_out.txt 
Quering...

AAFGGSGGR	has 1 match
ELEVQSEDGTFAK	has 1 match
FEDPAEGEDTLVEK	has 1 match
FSDGLITPDFLAK	has 1 match
GAPEFWAAR	has 1 match
GVIEANGGKVEK	has 1 match
HIPVYVSEEMVGHKFGEFSPTR	has 1 match
HNDVNFGTQDHNR	has 1 match
IGFYLTTCPR	has 1 match
ILVGQGNDGVAFVK	has 1 match

Query is finished.
The result is saved in "/workdir/list_query_out.txt".
Time used: 00 hours, 00 mins, 01.432 seconds

Protein Information Resource