TI - RESULTS . AB - An alignment of the T.thermophila and T.pyriformis mitochondrial genomes indicates that these genomes are virtually identical with respect to gene content and order ( Fig 1 ) , except for the tandem duplication of nad9 in T.thermophila . Thus , we have labeled the T.thermophila ORFs with the same registered ymf designations used for the T.pyriformis genes (7) . The z scores calculated for each of the ORFs in T.thermophila and T.pyriformis ( Table 1 ) all substantially exceed our criteria ( z gt 6 ) as an indicator of homology between the sequences . The RNA coding genes also show a high degree of sequence similarity as well . Strong selective pressure on an ORF will favor silent substitutions ( Ks ) over nucleotide substitutions that lead to a change in an amino acid (Ka) . The values for Ks ( Table 1 ) indicate that , while there has been substantial nucleotide substitution in the T.thermophila and T.pyriformis mitochondrial genomes , the silent substitutions have not saturated the potential sites . The Ks/Ka ratio varies dramatically from gene to gene , providing an indication of the selective pressure on each protein . As noted for the T.pyriformis mitochondrial genome , the T.thermophila mitochondrial genome appears to have a limited number of transcriptional units (7) . To a first approximation there appear to be two transcriptional units , transcribed from a nearly central bi-directional promoter set ( Fig 1 ) . The largest intergenic gap in the genome ( 424 bp ) is at the position of the central set of bi-directional promoters . A short transcriptional unit , including the three genes transcribed in the 5'-3' direction , interrupts the large 3'-5' transcriptional unit . The most apparent difference between the two genomes is the tandem duplication of the nad9 gene in the T.thermophila genome . Tetrahymena malaccensis , the closest known relative of T.thermophila , also has tandemly repeated nad9 genes . The tandemly repeated genes are highly similar within each species , but diverge considerably between the two species ( Table 3a ) . Similarily , the long inverted repeats in the terminal regions are virtually identical within a species , but diverge significantly between species ( Table 3b ) . The mitochondrial genes in T.pyriformis apparently use four initiation codons ( ATA , ATT , TTG and GTG ) in addition to the standard ATG ( 16 ) . The vast majority of the T.thermophila mitochondrial putative proteins appear to start at the same relative position as their T.pyriformis homolog and use the same initiation codon . Seven putative proteins appear to initiate at somewhat different positions in T.thermophila and T.pyriformis . The most plausible initiation codons for T.thermophila were assigned based on three criteria : . . . . . . . . (i) their proximity to the 5' end of the ORF ; ( ii ) their juxtaposition to the assigned T.pyriformis initiation codons ; and ( iii ) the utilization of only initiation codons assigned in T.pyriformis . In five cases , proteins that initiate with unconventional codons in one Tetrahnymena species are initiated in the other species at an identical nucleotide position with ATG ( Table 1 ) . This strongly supports the correct identification of these unconventional initiation codons in these sequences . By far the largest gene in T.thermophila or T.pyriformis is ymf77 . It is highly improbable that a region of >4000 nt would be devoid of TAA stop codons or TAG unused codons in both species by chance . A comparison of the nucleotide substitutions between T.thermophila and T.pyriformis for ymf77 indicates that the selective pressure on this protein is probably not large ( Ks/Ka = 161 , Table 1 ) . Although most of the intergenic regions in T.thermophila and T.pyriformis are of similar length , the intergenic region following the T.pyriformis atp9 gene is exceptionally large ( 95 bp ) compared to the intergenic in T.thermophila ( 14 bp ) . The atp9 gene and its flanking regions from T.malaccensis were sequenced and compared with T.thermophila and T.pyriformis . This comparison indicates that the T.malaccensis and T.thermophila atp9 genes and intergenic regions are virtually identical . Our objective is to identify the probable function of each of the Tetrahymena genes based on their sequence similarity to proteins of known function . BLAST analysis provided initial identification of the Tetrahymena genes . Thirteen of the T.thermophila mitochondrial ORFs can be immediately assigned function by their similarity to standard proteins of B.taurus ( Atp9 , Cob , Cox1 , Cox2 , Nad1 , Nad2 , Nad3 , Nad4 , Nad5 , Nad7 and Nad10 ) or S.cerevisiae ( Rpl2 and Rps12 ) ( Table 2 ) . Two additional T.thermophila mitochondrial ORFs ( Nad9 1 and Nad9 2 ) are linked via a chain of homology to B.taurus proteins and five further ORFs ( Rpl14 , Rpl16 , Rps13 , Rps14 and Rps19 ) are linked to S.cerevisiae proteins ( Table 2 ) . In this manner , the functions for 21 of the 45 T.thermophila ORFs can be reliably assigned by chains of sequence similarity comparison . The remainder of the T.thermophila ORFs have BLAST hit lists that include a wide variety of proteins having different functions , with no clear consensus protein indicated . At this point the physico-chemical parameters for these protein sequences were examined for clues to their identity . The theoretical pI , ratio of negative to positive residues , aliphatic index and the GRAVY and TM were calculated for the ORFs with assigned function ( Table 2 ) . TM are not present in the ribosomal proteins of Tetrahymena or in virtually any ribosomal proteins found in GenBank . Thus , the presence of TM strongly indicates that the ORF is not a ribosomal protein . Alternatively , most of the proteins involved in oxidative PHOSphorylation and electron transport have TM and many have multiple TM . However , some proteins of the oxidative PHOSphorylation NADH Fo complex ( Nad5 , Nad9 and Nad10 ) do not have TM . Generally , the theoretical pI for ribosomal proteins is high and rarely below 10 , while the theoretical pI for proteins involved in oxidative PHOSphorylation and electron transport is low , usually below 8 and never above 10 . The ratio of negative to positive residues for ribosomal proteins is low , rarely above 0.5 , while this ratio is usually above 1.0 for proteins involved in oxidative PHOSphorylation and electron transport . Among the ribosomal proteins the aliphatic index is never above 100 , while the aliphatic index for proteins involved in oxidative PHOSphorylation and electron transport is seldom below 100 . The GRAVY for ribosomal proteins is always negative , averaging -0.51 , while the GRAVY for proteins involved in oxidative PHOSphorylation and electron transport is seldom negative , averaging 0.49 . Although these parameters are not totally independent of one another , taken together they present a valuable signature for ribosomal proteins and proteins involved in oxidative PHOSphorylation and electron transport . The physico-chemical parameters for Ymf60 indicate that it is potentially a ribosomal protein ( Table 2 ) . It has no TM and all of the other parameters match those of ribosomes . Among the weak BLAST hits for Ymf60 were two hits for the large subunit ribosomal protein 6 . We compared the sequence of Ymf60 to virtually all of the Rpl6 sequences in GenBank . The Rpl6 sequence from Thermatoga maritima was found to have sufficient similarity to Ymf60 to indicate homology ( Fig 2a ) , and a chain of homology linked Ymf60 to Rpl6 of S.cerevisiae ( Table 2 ) . Thus , Rpl6 function can be assigned to Ymf60 with confidence . On the basis of the additional physico-chemical parameters , Ym62 and Ymf58 appear to be proteins involved in oxidative PHOSphorylation and electron transport . They have five and three TM respectively and they each have low pIs , ratios of negative to positive residues above 1.0 , aliphatic indices well above 100 and positive GRAVYs . The TM profiles of Ymf62 and Ymf58 were compared with the TM profiles of representatives of all of the proteins involved in oxidative PHOSphorylation and electron transport whose function had not already been identified within the T.thermophila mitochondrial genome . ORF Ymf62 has a TM profile very similar to that of Nad6 from B.taurus (Fig 3A) . Each sequence has four TM helices followed by a non-TM region of ~40 amino acids followed by a fifth TM region . A survey of virtually all of the Nad6 proteins in GenBank indicates that the Nad6 protein of Porphyra purpurea has the greatest sequence similarity to the T.thermophila Ymf62 protein ( Fig 2b ) . A chain of homology was established linking Ymf62 to B.taurus Nad6 ( Table 2 ) . The combination of TM profile and sequence similarity allows us to assign Nad6 function to the T.thermophila Ymf62 protein . The TM profile of Ymf58 is very similar to the TM profile of B.taurus Nad4L , both for the TM regions and the inter-TM spacings (Fig 3B) . A survey of virtually all of the Nad4L proteins in GenBank indicates that Nad4L of Nephroselmis olivacea has the greatest sequence similarity to the T.thermophila Ymf58 protein ( Fig 2c ) . Although the sequence similarity is only moderate ( 547 ) , coupled with the similarity of TM profiles , homology is highly probable . A chain of homology extending from T.thermophila to the B.taurus Nad4L sequence via N.olivacea and Penaeus monodon can be readily established ( Table 2 ) . The function of Nad4L can be assigned to the T.thermophila Ymf58 . The ORF identified as yejR by Burger et al . (7) does not have an apparent homolog in either S.cerevisiae or B.taurus . However , this ORF is clearly homologous to a Triticum aestivum ( wheat ) gene involved in the biogenesis of c-type cytochromes via a homology chain through Pseudomonas putida ( Table 2 ) . The T.aestivum protein is a member of a family that consists of various proteins involved in cytochrome c assembly from mitochondria and bacteria ( 41 ) . Thus , the T.thermophila YejR protein is assigned a similar function . This protein family differs from the heme lyase found in yeast ( 42 ) . Virtually all of the identified ORFs have reasonably strong hits with the appropriate domains in the PRODOM and pFAM databases . This strengthens our confidence that the appropriate function has been assigned to these ORFs . Most of the ORFs without assigned functions do not have any hits with domains in the PRODOM and pFAM databases . The few hits that do occur for these ORFs have not led to the identification of proteins with reasonably high sequence similarity . One of the T.pyriformis ORFs , which was identified as Rps3 by Burger et al . (7) , does not have significant sequence similarity with any of the Rps3 genes found in GenBank . No protein has been found that has significant sequence similarity to this ORF . Most of the physico-chemical parameters suggest that this protein is potentially a ribosomal protein . However , it has a predicted TM , which is not characteristic of ribosomal proteins ( Table 4 ) . The T.thermophila mitochondrial genome has 19 ORFs , in addition to Rps3 , that do not have sufficient sequence similarity to proteins of known function to permit the confident assignment of function . On the basis of their physico-chemical parameters these ORFs can be grouped into two broad classes : putative ribosomal proteins and putative non-ribosomal proteins . Table 4 lists the parameters of these ORFs along with the mean values of these parameters from the ORFs for which function has been assigned .