Processing ...

home uniprot
Protein Search Site Search
 
HOME / Search / Command Line Tool

Peptide Match | Batch Peptide Match | Web Services API
Command Line Tool

A command line tool allows users to query the peptide sequences against their own customized protein sequence database.

The tool provides two major functionalities:

  1. Given a protein sequence database in FASTA format, create the Lucene index for it.
  2. Query the peptide sequences against the above index. The query can be:
    • A peptide sequence or a comma-separated list of peptide sequences or
    • A file in either FASTA format or a list of peptide sequences, one sequence per line.

The runnable jar can be downloaded at here. The source code is also availabe at here.

Run from executable jar

$ java -jar PeptideMatchCMD_1.0.jar -h
Command line options: -h 
usage: java -jar PeptideMatchCMD_1.0.jar [options]
            Available options:
            ------------------
 -a,--action        The action to perform ("index" or "query").
 -d,--dataFile      The path to a FASTA file to be indexed.
 -e,--LeqI               Treat Leucine (L) and Isoleucine (I) as
                         equivalent (default: no).
 -f,--force              Overwrite the indexDir (default: no).
 -h,--help               Print this message.
 -i,--indexDir      The directory where the index is stored.
 -l,--list               The query peptide sequence file is a list of
                         peptide sequences, one sequence per line
                         (default: no).
 -o,--outputFile    The path to the query result file.
 -Q,--queryFile     The path to the query peptide sequence file in
                         either FASTA format or a list of peptide
                         sequences, one sequence per line.
 -q,--query         One peptide sequence or a comma-separated list of
                         peptide sequences.

Compile from source

$ unzip PeptideMatchCMD_src_1.0.zip
$ cd PeptideMatchCMD_src_1.0
$ ant
$ java -jar PeptideMatchCMD_1.0.jar -h

Tutorial

  • Creating Lucene index using a protein sequence database in FASTA format:
    $ java -jar PeptideMatchCMD_1.0.jar -a index -d uniprot_sprot.fasta -i sprot_index 
    Command line: -a index -d uniprot_sprot.fasta -i sprot_index 
    Indexing to directory "sprot_index" ...
    Indexing "uniprot_sprot.fasta" ...
    Indexing "uniprot_sprot.fasta" finished
    Time used: 00 hours, 06 mins, 31.215 seconds
    
  • Query a peptide sequence:
    $ java -jar PeptideMatchCMD_1.0.jar -a query -i sprot_index -q AAFGGSGGR -o out.txt 
    Command line: -a query -i sprot_index -q AAFGGSGGR -o out.txt 
    Quering...
    
    AAFGGSGGR	has 1 match
    
    Query is finished.
    The result is saved in "out.txt".
    Time used: 00 hours, 00 mins, 00.457 seconds
    
    $ cat out.txt 
    #Command line: -a query -i sprot_index -q AAFGGSGGR -o out.txt 
    ##Query	Subject	SubjectLength	MatchStart	MatchEnd
    AAFGGSGGR	sp|P35908|K22E_HUMAN	639	516	524
    
  • Query a list of peptide sequences:
    $ java -jar PeptideMatchCMD_1.0.jar -a query -i sprot_index -q AAFGGSGGR,GVPDIR -o out.txt 
    Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -o out.txt 
    Quering...
    
    AAFGGSGGR	has 1 match
    GVPDIR	has 4 matches
    
    Query is finished.
    The result is saved in "out.txt".
    Time used: 00 hours, 00 mins, 00.493 seconds
    
    $ cat out.txt 
    #Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -o out.txt 
    ##Query	Subject	SubjectLength	MatchStart	MatchEnd
    AAFGGSGGR	sp|P35908|K22E_HUMAN	639	516	524	
    GVPDIR	sp|Q9CK59|Y1775_PASMU	92	45	50	
    GVPDIR	sp|B1Y8E7|PYRB_LEPCP	320	194	199	
    GVPDIR	sp|B4SHE6|MURD_PELPB	464	252	257	
    GVPDIR	sp|Q6FX42|ATR_CANGA	2379	1135	1140
    
  • Query a list of peptide sequences and treat Leucine (L) and Isoleucine (I) as equivalent:
    $ java -jar PeptideMatchCMD_1.0.jar -a query -i sprot_index -q AAFGGSGGR,GVPDIR -e -o out.txt 
    Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -e -o out.txt 
    Quering...
    
    AAFGGSGGR	has 1 match
    GVPDIR	has 13 matches
    
    Query is finished.
    The result is saved in "out.txt".
    Time used: 00 hours, 00 mins, 00.513 seconds
    
    $ cat out.txt 
    #Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -e -o out.txt 
    ##Query	Subject	SubjectLength	MatchStart	MatchEnd	MatchedLEqIPositions
    AAFGGSGGR	sp|P35908|K22E_HUMAN	639	516	524	
    GVPDIR	sp|Q9CK59|Y1775_PASMU	92	45	50	
    GVPDIR	sp|A0R5Z2|GLFT1_MYCS2	302	182	187	186
    GVPDIR	sp|Q7D4V6|GLFT1_MYCTU	304	179	184	183
    GVPDIR	sp|B1Y8E7|PYRB_LEPCP	320	194	199	
    GVPDIR	sp|A5GDX3|RECF_GEOUR	364	126	131	130
    GVPDIR	sp|P96919|EX5A_MYCTU	575	138	143	142
    GVPDIR	sp|Q17QV2|MON1A_BOVIN	555	441	446	445
    GVPDIR	sp|Q2QZ37|OBGM_ORYSJ	528	500	505	504
    GVPDIR	sp|B4SHE6|MURD_PELPB	464	252	257	
    GVPDIR	sp|Q9M1G3|LRK16_ARATH	669	595	600	599
    GVPDIR	sp|Q5U3H2|SV421_DANRE	808	575	580	579
    GVPDIR	sp|A6H5Y3|METH_MOUSE	1253	1147	1152	1151
    GVPDIR	sp|Q6FX42|ATR_CANGA	2379	1135	1140
    
  • Query peptides in a FASTA file:
    $ java -jar PeptideMatchCMD_1.0.jar -a query -i sprot_index -Q query.fasta -e -o out.txt 
    Command line: -a query -i sprot_index -Q query.fasta -e -o out.txt 
    Quering...
    
    example_1	has 1 match
    example_2	has 1 match
    example_3	has 1 match
    example_4	has 1 match
    example_5	has 1 match
    example_6	has 1 match
    example_7	has 1 match
    example_8	has 1 match
    example_9	has 1 match
    example_10	has 1 match
    
    Query is finished.
    The result is saved in "out.txt".
    Time used: 00 hours, 00 mins, 00.724 seconds
    
  • Query peptides in a list file, one peptide per line:
    $ java -jar PeptideMatchCMD_1.0.jar -a query -i sprot_index -Q query.list -l -e -o out.txt 
    Command line: -a query -i sprot_index -Q query.list -l -e -o out.txt 
    Quering...
    
    AAFGGSGGR	has 1 match
    ELEVQSEDGTFAK	has 1 match
    FEDPAEGEDTLVEK	has 1 match
    FSDGLITPDFLAK	has 1 match
    GAPEFWAAR	has 1 match
    GVIEANGGKVEK	has 1 match
    HIPVYVSEEMVGHKFGEFSPTR	has 1 match
    HNDVNFGTQDHNR	has 1 match
    IGFYLTTCPR	has 1 match
    ILVGQGNDGVAFVK	has 1 match
    
    Query is finished.
    The result is saved in "out.txt".
    Time used: 00 hours, 00 mins, 00.752 seconds
    

PIR
 HomeAbout PIRDatabasesSearch/AnalysisDownloadSupport  SITE MAPTERMS OF USE
© 2016 Protein Information Resource