Annotated Borrelia burgdorferi B31 Plasmid Nucleotide Sequences

Fraser et al. (1997) and Casjens et al. (2000)

Compiled by Sherwood Casjens, Daniel Haft, Jeremy Peterson, Brian Stevenson and Claire Fraser
Last modified on Feb. 3, 2000.

This document is also available as Macintosh Microsoft WORD 5.1 and OFFICE ’98 (WORD’98) document from Sherwood Casjens.

Please send corrections, additions, comments, etc. to sherwood.casjens@hci.utah.edu.

Table of Contents

 

Introduction and Summary

Part I

Annotated Complete B31 Plasmid Gene/Pseudogene List

Part II

B31 Paralogous Gene Families

Part III

Putative B31 Plasmid Lipoprotein Genes

Part IV

The Pseudo-, Questionable, and Short Genes on the B31 Plasmids

Part V

Short Sequence Repeats in the B31 Plasmids

Part VI

Ambiguous Nucleotides in the B. burgdorferi B31 Genome Sequence

Part VII

Genome Sequence Assembly Methods

Part VIII

References

ORGANIZATION OF THIS DOCUMENT (read me first)

PURPOSE

This document contains a number of tables which cross-annotate the current knowledge of the B. burgdorferi B31 genome in various ways. We hope that this cross-referencing will allow readers to browse through the information profitably, and that it will allow them to become familiar with what is not known as well as what is known about this genome. Major conclusions from this analysis are published in Fraser et al. (1997) and Casjens et al. (2000)

ORGANIZATION

In each section of this document plasmids are listed with circular plasmids ascending in number (approximate size) followed by linear plasmids ascending in number as follows:

cp9, cp26, cp32-1, cp32-3, cp32-4, cp32-6, cp32-7, cp32-8, cp32-9,

lp5, lp17, lp21, lp25, lp28-1, lp28-2, lp28-3, lp28-4, lp36, lp38, lp54, lp56

OPEN READING FRAMES and PREDICTED GENES

Throughout this document we use the words "gene" and "protein" advisedly to mean putative gene and putative protein that has been predicted from the nucleotide sequence. Since little molecular biology has been done with these organisms, nearly all of the "genes" in this document are currently only identified as open reading frames.

GENBANK ACCESSION NUMBERS and GENE NAME PREFIXES

The B. burgdorferi B31 chromosome and plasmid sequences are available at the TIGR Borrelia web site or from GENBANK. The accession numbers from GENBANK and gene name prefixes are as follows (as reported in Fraser et al. (1997) and Casjens et al. (2000):

Replicon

Accession #

gene name prefix

     

Chromosome

AE000788

BB0 (BBzero)

cp9

AE000791

BBC

cp26

AE000792

BBB

cp32-1

AE001575

BBP

cp32-3

AE001576

BBS

cp32-4

AE001577

BBR

cp32-6

AE001578

BBM

cp32-7

AE001579

BBO (BB"oh")

cp32-8

AE001580

BBL

cp32-9

AE001581

BBN

lp5

AE001583

BBT

lp17

AE000793

BBD

lp21

AE001582

BBU

lp25

AE000785

BBE

lp28-1

AE000794

BBF

lp28-2

AE000786

BBG

lp28-3

AE000784

BBH

lp28-4

AE000789

BBI

lp36

AE000788

BBK

lp38

AE000787

BBJ

lp54

AE000790

BBA

lp56

AE001584

BBQ

 

 

 

 

 

 

 

 

B31 Plasmid Open Reading Frame Summary

Sherwood Casjens - 1999

ALL B31 PLASMIDS

898 total gene-like entities. Among these gene-like entities are the following:

836 genes (which are not "questionable") + pseudogenes

167 pseudogenes (+ about 10 others that have marginal similarity to "intact" genes)

62 "questionable" genes (29 in-frame fragments of larger pseudogenes; 33 ²300 bp genes inside a larger pseudogene in another frame and short genes that were not called in paralogous sequence elsewhere on the plasmids).

669 "intact" genes (which are not "questionable")

39 convincing similarity hits to genes of known function outside of Borrelia among plasmid genes

16 convincing similarity hits to genes of unknown function outside of Borrelia among plasmid genes

535 "intact" genes >300 bp (which are not "questionable")

134 "intact" genes ²300 bp (which are not "questionable")

472 genes (which are not "questionable") have a paralog (it may not be intact)

197 genes (which are not "questionable") have no paralog (63 of these are >300 bp and 134 are ²300 bp)

98 plasmid gene-like entities that encode potential lipoproteins

90 intact plasmid genes that encode potential lipoproteins

7 gene-like entities that we defined as pseudogenes have translation start codons that could possibly lead to expression of lipoproteins that are truncated relative to their paralogs

32 intact plasmid genes that are below but close to our lipidation cutoff

162 paralogous gene families, 107 of which have plasmid-borne members

9 paralogous gene families encode only predicted lipoproteins

17 paralogous gene families are heterogeneous in that at least 1 potential LP and at least one non-LP is found in the family

 

 

THE "LOW PSEUDOGENE" or "WELL BEHAVED" B31 PLASMIDS

These plasmids are: cp9, cp26, all seven of the cp32s, lp28-2, lp54 and the cp32-like portion of lp56

498 gene-like entities on the "well behaved" plasmids on which apparent protein-encoding genes occupy >70% of the DNA.

9 "questionable" genes (all are ²300 bp genes inside a larger pseudogene in another frame or short genes that were not called in paralogous sequence elsewhere on the plasmids).

489 genes (which are not "questionable") + pseudogenes

22 pseudogenes

467 genes (which are not "questionable")

420 genes >300 bp (which are not "questionable")

47 genes ²300 bp (which are not "questionable")

54 genes that encode potential lipoproteins

12 genes that are below but close to our lipidation cutoff

23 convincing matches to genes of known function outside of Borrelia among plasmid genes (which are not "questionable")

13 convincing matches to genes of unknown function outside of Borrelia among plasmid genes (which are not "questionable")

 

THE "HIGH PSEUDOGENE" or "NOT YET AMMELIORATED" B31 PLASMIDS

These plasmids are: lp5, lp17, lp21, lp25, lp28-1, lp28-3, lp28-4, lp36, lp38, lp56 and the non-cp32-like portion of lp56

400 gene-like entities on the "bad" plasmids on which apparent protein-encoding genes occupy <75% of the DNA.

53 "questionable" genes (29 in-frame fragments of larger pseudogenes; 24 ²300 bp genes inside a larger pseudogene in another frame and short genes that were not called in paralogous sequence elsewhere on the plasmids).

347 genes (which are not "questionable") + pseudogenes

145 pseudogenes

202 genes (which are not "questionable")

115 genes >300 bp (which are not "questionable")

87 genes ²300 bp (which are not "questionable")

37 genes that encode potential lipoproteins

5 genes that are below but close to our lipidation cutoff

16 convincing matches to genes of known function outside of Borrelia among plasmid genes (which are not "questionable")

3 convincing matches to genes of unknown function outside of Borrelia among plasmid genes (which are not "questionable")

 

Part I

Annotated B. burgdorferi B31 Plasmid Gene List

Compiled by Sherwood Casjens, Dan Haft and Jeremy Peterson - April 1999

Definitions for Gene List

Note that these definitions are NOT necessarily absolutely identical to those used in the other published gene lists and maps for B. burgdorferi or on the TIGR WEB site. In particular we have an expanded definition of "pseudogene" that includes truncated members of paralogous gene families.

Putative genes and gene names column lists all the putative "gene-like entities" - genes and pseudogenes - currently recognized in the twenty-one B. burgdorferi B31 plasmids. We tentatively interpret those genes not indicated to be pseudogenes to be intact and potentially functional, but since the functionality of most Borrelia genes is unknown this may not be true. The gene and plasmid names used here are those used in Fraser et al. (1997) and Casjens et al. (2000). Of course any given putative pseudo-, questionable, short, fragmented or frameshifted genes could in principle have an important function, but it seems likely that a substantial fraction of them are not functional.

Daggers mark computer-recognized ORFs that are an in-frame and part of a larger pseudogene entity. To avoid counting the entity twice, these were ignored when compiling gene and pseudogene numbers in Casjens et al. (2000).

Coordinates - these columns list the positions of the 5’ and 3’ ends of the gene or pseudogene on the sequence of the relevant plasmid.

Database hit outside Borrelia indicates all similarities to non-Borrelia sequences in the extant database as of January 1999. The criteria for inclusion in the list are those of the TIGR protocol, which uses BLAST (Altschul et al., 1997), and alignments can be found on the TIGR Borrelia WEB page. A search using EMOTIF (Nevill-Manning et al., 1998) did not find any additional convincing B31 plasmid gene similarities to previously known genes.

Common name column gives gene names previously used in the literature. If it was previously named in a strain other than B31, the Borrelia strain is given in parentheses. In addition, we and others have suggested more specific, clarifying common names for genes currently under study in the following paralogous families: mlp [family 113], bdr [80], rev [63] and erp [162/163/164] genes.

Paralog family column indicates the family of paralogous genes (homologs within B. burgdorferi B31) to which individual genes belong. A complete list of genes and pseudogenes in each of these paralogous gene families can be found in PART II of this document.

Comments Column

N-terminal lipidation consensus refers to genes whose products are most likely to be lipoproteins.

Near-consensus N-terminal lipidation signal refers to genes whose products may be lipoproteins, but whose N-terminal amino acid sequences did not quite meet the arbitrary cutoff that we set for criteria for inclusion in the "probable lipoproteins" category.

See PART III of this document for a discussion of the strategies used to identifiy possible lipoprotein encoding genes.

Authentic frameshift genes contain one or a few simple frameshifts relative to their paralogs. It is unlikely that these are actually expressed by programmed frameshifting mechanisms, since they usually do not contain the expected translationally "slippery" sequences. The TIGR computer uses this term for damaged genes (hence it currently replaces "pseudogene" in some parts of the TIGR Borrelia web page). These considered to be pseudogenes in this analysis (Casjens et al., 2000).

Authentic point mutation gene has an in-frame stop codon relative to its paralogs. These are considered to be pseudogenes in this analysis.

Gene fragments or truncated genes are substantially shorter than other members of their paralogous families. Some of these could be expressed and have a function, although they are included in the pseudogene category for ease of discussion in this analysis and to point out that they are truncated.

Pseudogenes are regions of DNA that are similar in sequence to a paralogous Borrelia gene or to a gene from another organism, but which are obviously truncated and/or do not have full open reading frames relative to those homologs. These mostly appear to be mutationally damaged genes - they include "authentic frameshift", "authentic point mutation", fused and truncated genes. These pseudogenes often contain multiple frameshifts, deletions, insertions and inversions (see Casjens et al., 2000).

Exceptions to this definition of a pseudogene are the 15 silent vlsE cassettes on lp28-1; these are not damaged are apparently "designed" to be a reservoir of antigenic variation for the vlsE protein. They are pseudogenes in that they are incomplete relative to the expressed vlsE gene and are probably not expressed themselves.

Of course the gene fragments whose reading frames are intact, that we include in this category for ease of discussion, could in fact be expressed and if so could perform a function. Nonetheless such fragments are very unusual in prokaryotes, and given the other evidence for many rearrangements in the B31 plasmids (Casjens et al., 2000) it seems likely that many, if not all of such fragments, may no longer have a biological function.

See PART IV of this document for a complete list of pseudogenes and the reasons why each is so classified.

"Questionable genes" were called by TIGR’s standard gene recognition protocol, but there is reason to suspect they may be spurious calls. For example, "computer-called genes" that are inside another gene or pseudogene and small genes that were not called in paralogous sequence elsewhere in the Borrelia sequence. Those marked with daggers (†) are inside of larger pseudogenes, but which were nonetheless called as genes by the TIGR protocol.

See PART IV of this document for a complete list of questionable genes and the reasons why each is so classified.

Short genes are <300 bp in length but ARE NOT in the "questionable" or "pseudogene" categories. The Borrelia plasmids have an inordinately large fraction of called genes that are <300 bp in length. These are often not tightly packed and fall into regions that contain no larger genes. Of course any given putative short gene could in principle be functional, but it seems likely that a substantial fraction of them are not functional

See PART IV of this document for a complete list of short plasmid "genes".

Putative functions were deduced in most cases from homologies to genes of known function.

WE EMPHASIZE ONE MORE TIME! Any given putative pseudo-, questionable, short, fragmented or frameshifted gene (as we have defined them) could in principle be functional. But it seems likely that a substantial fraction of them are not functional. We use the above pseudogene definitions only as terms to describe relevant features of the B31 plasmid genes, not to imply functionality in any specific cases.

A Complete B. burgdorferi B31 Plasmid Gene List

Putative Gene

5’end

3’end

Database hit outside Borrelia

{organism of best database hit}

Common

Name

Paralog Family

Comments/References

cp9

         

A homolog of cp9, called cp8.3 from B. garinii strain Ip21 was completely sequenced by (Dunn et al., 1994)

BBC01

163

1269

   

57

 

BBC02

1282

1836

   

50

 

BBC03

1892

2449

   

49

 

BBC04

2700

2593

     

short gene

BBC05

2804

3709

   

161

 

BBC06

4377

3856

 

eppA

95

exported protein (Champion et al., 1994)

BBC07

4788

4507

     

short gene

BBC08

5534

5977

   

55

 

(BBC09)

         

Does not exist; erroneously present in original gene list and map in figure 2 of Fraser et al. (1997)

BBC10

6808

6284

   

63

N-terminal lipidation consensus

BBC11

6974

7768

   

96

 

BBC12

9203

7914

   

165

 
             

cp26

         

Homolog of cp26 present in essentially all isolates (e.g., Tilly et al., 1997)

BBB01

16

321

conserved hypothetical protein {Escherichia coli}

   

weak similarity to acylphosphatase

BBB02

751

311

       

BBB03

2186

840

weak (Y-BLAST) similarity to phage N15 gene 29

   

The protein encoded by this gene has weak similarity to the putative "protelomerase" encoded by gene 29 of phage N15 ( Ravin et al., in preparation). Circumstantial evidence suggests this N15 protein is responsible for hairpoin end formation in the N15 prophage plasmid.

BBB04

3807

2479

PTS system, cellobiose-specific IIC component (celB) {Bacillus stearothermophilus}

   

possible chitobiose transporter (Fraser et al., 1997)

BBB05

4084

4428

PTS system, cellobiose-specific IIA component (celC) {Bacillus subtilis}

   

possible chitobiose transporter (Fraser et al., 1997)

BBB06

4440

4754

PTS system, cellobiose-specific IIB component (celA) {Bacillus subtilis}

   

possible chitobiose transporter (Fraser et al., 1997)

BBB07

4769

5863

       

BBB08

6517

5891

     

N-terminal lipidation consensus

BBB09

6677

7711

     

N-terminal lipidation consensus

BBB10

7836

8762

   

62

 

BBB11

8781

9296

   

50

 

BBB12

9275

10033

plasmid partition protein {Bacillus subtilis}

 

32

putative plasmid partition function

BBB13

10104

10649

   

49

 

BBB14

11417

10923

     

N-terminal lipidation consensus

BBB15

11636

11737

     

short gene

BBB16

12014

13603

oligopeptide ABC transporter, periplasmic oligopeptide-binding protein {Escherichia coli}

oppAIV

37

N-terminal lipidation consensus, not surface exposed, and not essential in culture (Bono et al., 1998)

BBB17

15107

13896

IMP dehydrogenase {Haemophilus influenzae}

guaA

 

IMP dehydrogenase (Margolis et al., 1994b; Zhou et al., 1997)

BBB18

16718

15135

GMP synthase {Haemophilus influenzae}

guaB

 

putative GMP synthase Margolis et al., 1994b) erroneous duplication in cp26 between BBB18 and BBB19 corrected in current gene list; affected originally released gene coordinates to right of BB18

BBB19

16903

17532

 

ospC

 

surface localized (Wilske et al., 1993), N-terminal lipidation consensus (Fuchs et al., 1992; Jauris-Heipke et al., 1993; Jauris-Heipke et al., 1995; Marconi et al., 1993c; Margolis et al., 1994a; Margolis et al., 1994b; Masuzawa et al., 1997; Stevenson and Barthold, 1994; Stevenson et al., 1994; Tilly et al., 1997; Wang et al., 1999; Wilske et al., 1996a; Wilske et al., 1996b); transcription start site (Marconi et al., 1993b); temperature regulation (Schwan et al., 1995; Stevenson et al., 1995)

BBB20

17733

17626

     

short gene

BBB21

17750

17842

     

short gene

BBB22

19321

17969

conserved hypothetical protein MJ0326 {Methanococcus jannaschii}

 

94

12 putative membrane spanning regions; homologs in E. coli

BBB23

20822

19434

conserved hypothetical protein MJ0326 {Methanococcus jannaschii}

 

94

12 putative membrane spanning regions; homologs in E. coli

BBB24

21364

20861

     

near-consensus N-terminal lipidation signal

BBB25

21851

21342

     

N-terminal lipidation consensus

BBB26

21898

22590

       

BBB27

23154

22606

     

N-terminal lipidation consensus

BBB28

23255

24496

       

BBB29

24825

26450

PTS system, maltose and glucose-specific IIABC component (malX) {Escherichia coli}

 

16

putative sugar transport

             

cp32-1

           

BBP01

66

1286

   

146

 

BBP02

1306

1995

   

147

 

BBP03

2011

2565

   

148

 

BBP04

2575

3336

   

148

 

BBP05

3369

3938

   

148

 

BBP06

3948

4919

   

149

(Casjens et al., 1997)

BBP07

4936

5394

   

150

 

BBP08

5379

5777

   

107

 

BBP09

5768

6154

   

108

 

BBP10

6154

6717

   

151

 

BBP11

6701

7810

   

152

 

BBP12

7828

8253

   

153

 

BBP13

8272

8724

   

154

 

BBP14

8724

8957

   

155

short gene

BBP15

8968

10239

   

156

 

BBP16

10265

10945

   

157

 

BBP17

10952

11899

   

159

 

BBP18

11920

12462

   

160

 

BBP19

12495

12824

   

139

 

BBP20

12824

13696

   

140

 

BBP21

13709

14311

   

141

 

BBP22

14324

15136

   

142

 

BBP23

15215

15415

 

orfA-1; blyA-1

109

putative hemolysin; short gene; sequenced for homologous plasmids in strain 297 by Porcella et al. (1996)

BBP24

15422

15766

 

orfB; blyB-1

111

putative hemolysin; sequenced for homologous plasmids in strain 297 by Porcella et al. (1996)

BBP25

15759

16091

 

orfC

112

(Gilmore and Mbow, 1998); sequenced in homologous plasmids of strain 297 by Porcella et al. (1996)

BBP26

16081

16437

 

orfD

143

(Gilmore and Mbow, 1998); sequenced in homologous plasmids of strain 297 by Porcella et al. (1996); near-consensus N-terminal lipidation signal but strain 297 homolog was not ipidated in E. coli.

BBP27

17060

16581

 

rev-1

63

N-terminal lipidation consensus (Gilmore and Mbow, 1998); sequenced in homologous plasmids of strain 297 by Porcella et al.(1996)

BBP28

17232

17675

 

mlpA

113

N-terminal lipidation consensus (Gilmore and Mbow, 1998); sequenced in several homologous plasmids of strain 297 by Porcella et al. (1996); lipidated in E. coli (Porcella et al., 1996); paralog lipidated in B. afzelii Theisen (1996)

BBP29

18728

17718

 

orf4-1

161

(Gilmore and Mbow, 1998)

BBP30

19114

20211

 

orf1-1

57

(Zuckert and Meyer, 1996)

BBP31

20224

20787

 

orf2-1

50

(Zuckert and Meyer, 1996)

BBP32

20766

21503

plasmid partition protein {Bacillus subtilis}

orfC-1

32

putative plasmid partition function (Zuckert and Meyer, 1996)

BBP33

21510

22115

 

orf3-1

49

(Zuckert and Meyer, 1996)

BBP34

22131

22760

 

bdrA

80

contains 4.7 repeats of a 54 bp sequence; all "bdr" genes contain direct, tandem repeats (Casjens et al., 1999; Zuckert and Meyer, 1996)

BBP35

23231

24553

 

orf8/7-1

165

(Casjens et al., 1997; Zuckert and Meyer, 1996)

BBP36

24609

25031

 

orf10-1

144

(Casjens et al., 1997)

BBP37

25816

25043

 

orf6-1

96

(Casjens et al., 1997)

BBP38

26235

26765

 

erpA

162

surface exposed (Lam et al., 1994); N-terminal lipidation consensus (Stevenson et al., 1996); lipidated in E. coli (Akins et al., 1995b; Wallich et al., 1995); erp-like genes have been sequenced from several other strains (Akins et al., 1999; Lam et al., 1994; Marconi et al., 1996b; Stevenson et al., 1997; Suk et al., 1995)

BBP39

26796

27929

 

erpB

163

N-terminal lipidation consensus (Stevenson et al., 1996)

BBP40

28074

28652

   

114

 

BBP41

28835

29398

   

115

 

BBP42

29398

30747

conserved hypothetical protein Orf26 of phage fO1205 {Streptococcus thermophilus}

 

145

(Amouriaux et al., 1993; Casjens et al., 1997); phage fO1205 Orf26 homology; Orf26 is a possible phage structural protein

             

cp32-3

           

BBS01

66

1286

   

146

 

BBS02

1306

1995

   

147

 

BBS03

2011

2565

   

148

 

BBS04

2575

3336

   

148

 

BBS05

3369

3938

   

148

 

BBS06

3963

4919

   

149

(Casjens et al., 1997)

BBS07

4936

5394

   

150

 

BBS08

5379

5777

   

107

 

BBS09

5768

6154

   

108

 

BBS10

6154

6717

   

151

 

BBS11

6701

7810

   

152

 

BBS12

7828

8253

   

153

 

BBS13

8272

8724

   

154

 

BBS14

8724

8957

   

155

short gene

BBS15

8968

10239

   

156

 

BBS16

10265

10945

   

157

 

BBS17

10952

11899

   

159

 

BBS18

11920

12462

   

160

 

BBS19

12495

12824

   

139

 

BBS20

12824

13696

   

140

 

BBS21

13709

14311

   

141

 

BBS22

14324

15133

   

142

 

BBS23

15212

15412

 

blyA-3

109

putative hemolysin; short gene

BBS24

15419

15763

 

blyB-3

111

putative hemolysin;

BBS25

15756

16088

   

112

 

BBS26

16078

16434

   

143

near-consensus N-terminal lipidation signal

BBS27

16586

16900

       

BBS28

16915

17046

     

short gene

BBS29

17068

17694

 

bdrF

80

contains 3.6 repeats of a 33 bp sequence

BBS30

17803

18246

 

mlpC

113

N-terminal lipidation consensus

BBS31

19159

18290

 

orf4-3

161

(Zuckert and Meyer, 1996)

BBS32

19198

19392

conserved hypothetical protein {Chlorella vulgaris}(similarity poor)

   

questionable gene; gene not called in paralogous sequence on other cp32s

BBS33

19605

20702

 

orf1-3

57

(Zuckert and Meyer, 1996)

BBS34

20715

21278

   

50

 

BBS35

21257

21994

plasmid partition protein {Bacillus subtilis}

orfC-3

32

putative plasmid partition function; (Stevenson et al., 1998b)

BBS36

22038

22577

 

orf3-3

49

(Stevenson et al., 1998b)

BBS37

22593

23180

 

bdrE

80

contains 4.1 repeats of a 54 bp sequence

BBS38

23649

25013

 

orf8/7-3

165

(Casjens et al., 1997)

BBS39

25069

25491

 

orf10-3

144

(Casjens et al., 1997)

BBS40

26276

25503

 

orf6-3

96

(Casjens et al., 1997)

BBS41

26708

27295

 

erpG; pG

164

N-terminal lipidation consensus; (Stevenson et al., 1996; Wallich et al., 1995)

BBS42

27410

27916

 

bapA

95

(Stevenson et al., 1996; Wallich et al., 1995)

BBS43

28067

28246

     

short gene

BBS44

28236

28871

   

115

 

BBS45

28871

30220

conserved hypothetical protein Orf26 of phage f01205 {Streptococcus thermophilus}

 

145

(Amouriaux et al., 1993; Casjens et al., 1997); phage f01205 Orf26 homology; Orf26 is a possible phage structural protein

             

cp32-4

           

BBR01

66

1286

   

146

 

BBR02

1306

1998

   

147

pseudogene; authentic frameshift

BBR03

2013

2573

   

148

 

BBR04

2580

3344

   

148

 

BBR05

3340

3948

 

orfI

148

(Casjens et al., 1997)

BBR06

3958

4929

 

orfII

149

(Casjens et al., 1997)

BBR07

4952

5404

 

orfIII

150

(Casjens et al., 1997)

BBR08

5389

5787

 

orfIV

107

(Casjens et al., 1997)

BBR09

5778

6164

 

orfV

108

(Casjens et al., 1997)

BBR10

6164

6727

   

151

 

BBR11

6711

7820

   

152

 

BBR12

7838

8263

   

153

 

BBR13

8282

8734

   

154

 

BBR14

8734

8967

   

155

short gene

BBR15

8978

10270

   

156

 

BBR16

10296

10889

   

157

 

BBR17

10896

11843

   

159

 

BBR18

11864

12415

   

160

 

BBR19

12448

12777

   

139

 

BBR20

12777

13649

   

140

 

BBR21

13662

14264

   

141

 

BBR22

14277

15089

   

142

 

BBR23

15167

15367

 

blyA-4

109

putative hemolysin; short gene

BBR24

15374

15718

 

blyB-4

111

putative hemolysin

BBR25

15711

16043

   

112

 

BBR26

16033

16389

   

143

near-consensus N-terminal lipidation signal

BBR27

16467

16994

 

bdrH

80

sequenced in homologous plasmids of strain 297 by Porcella et al. (1996) and in B. afzelii by Theisen (1996)

BBR28

17103

17522

 

mlpD

113

N-terminal lipidation consensus

BBR29

18664

17576

   

161

 

BBR30

18829

18737

     

questionable gene; gene not called in paralogous sequence on other cp32s

BBR31

18960

20054

   

57

 

BBR32

20067

20630

   

50

 

BBR33

20609

21361

plasmid partition protein {Bacillus subtilis}

orfC-4

32

putative plasmid partition function (Stevenson et al., 1998b)

BBR34

21415

21957

 

orf3-4

49

(Stevenson et al., 1998b)

BBR35

21974

22249

 

bdrG

80

authentic point mutation; has an in-frame stop codon

BBR36

22831

24153

   

165

 

BBR37

24210

24632

 

orf10-4

144

(Casjens et al., 1997)

BBR38

25435

24644

 

orf6-4

96

(Casjens et al., 1997); sequence from strain N40 - assession # AF011453

BBR39

25636

25538

     

questionable gene; gene not called in paralogous sequence on other cp32s

BBR40

25865

25966

 

erpH

162

pseudogene; severely truncated relative to other erps; N-terminal lipidation consensus (Stevenson et al., 1996)

BBR41

26077

26817

   

161/

162

pseudogene; this is a "fusion" gene - a family [161] gene is fused to an [162] erp gene

BBR42

26853

27524

 

erpY

164

N-terminal lipidation consensus

BBR43

27634

28200

   

114

 

BBR44

28384

28947

   

115

 

BBR45

28947

30296

conserved hypothetical protein Orf26 of phage fO1205 {Streptococcus thermophilus}

 

145

homolog of phage Streptococcus thermophilus fO1205 gene orf26 that is likely to be phage structural protein; (Amouriaux et al., 1993; Casjens et al., 1997)

             

cp32-6

           

BBM01

66

1286

   

146

 

BBM02

1306

1995

   

147

 

BBM03

2010

2570

   

148

 

BBM04

2577

3341

   

148

 

BBM05

3337

3945

   

148

 

BBM06

3955

4926

   

149

 

BBM07

4949

5401

   

150

 

BBM08

5386

5784

   

107

 

BBM09

5775

6161

   

108

 

BBM10

6161

6727

   

151

 

BBM11

6711

7820

   

152

 

BBM12

7838

8263

   

153

 

BBM13

8282

8734

   

154

 

BBM14

8734

8967

   

155

short gene

BBM15

8978

10249

   

156

 

BBM16

10275

10955

   

157

 

BBM17

10962

11909

   

159

 

BBM18

11930

12481

   

160

 

BBM19

12514

12843

   

139

 

BBM20

12843

13715

   

140

 

BBM21

13728

14330

   

141

 

BBM22

14343

15152

   

142

 

BBM23

15231

15431

 

blyA-6

109

putative hemolysin; short gene

BBM24

15438

15782

 

blyB-6

111

putative hemolysin

BBM25

15775

16107

   

112

 

BBM26

16097

16453

   

143

near-consensus N-terminal lipidation signal

BBM27

17075

16596

 

rev-6

63

N-terminal lipidation consensus

BBM28

17247

17693

 

mlpF

113

N-terminal lipidation consensus

BBM29

18680

17736

   

161

 

BBM30

19069

20166

   

57

 

BBM31

20179

20742

   

50

 

BBM32

20721

21467

plasmid partition protein {Bacillus subtilis}

orfC-6

32

putative plasmid partition (Stevenson et al., 1998b)

BBM33

21520

22095

 

orf3-6

49

(Stevenson et al., 1998b)

BBM34

22102

22767

 

bdrK

80

 

BBM35

23241

24563

   

165

 

BBM36

24619

25041

   

144

 

BBM37

25820

25053

   

96

only [96] member with signal sequence

BBM38

26245

27012

 

erpK

164

N-terminal lipidation consensus; (Casjens et al., 1997)

BBM39

27745

27080

       

BBM40

27731

27850

     

questionable gene; gene not called in paralogous sequence on other cp32s

BBM41

27923

28486

   

115

 

BBM42

28486

29835

conserved hypothetical protein Orf26 of phage fO1205 {Streptococcus thermophilus}

 

145

phage fO1205 Orf26 homology; Orf26 is a possible phage structural protein; (Amouriaux et al., 1993; Casjens et al., 1997)

             

cp32-7

           

BBO01

65

1285

   

146

 

BBO02

1305

1994

   

147

 

BBO03

2010

2564

   

148

 

BBO04

2574

3335

   

148

 

BBO05

3368

3937

   

148

 

BBO06

3962

4918

   

149

 

BBO07

4935

5393

   

150

 

BBO08

5378

5776

   

107

 

BBO09

5767

6153

   

108

 

BBO10

6153

6719

   

151

 

BBO11

6703

7812

   

152

 

BBO12

7830

8255

   

153

 

BBO13

8274

8726

   

154

 

BBO14

8726

8959

   

155

short gene

BBO15

8970

10301

   

156

 

BBO16

10317

10955

   

157

 

BBO17

10962

11900

   

159

 

BBO18

11904

12470

   

160

 

BBO19

12503

12832

   

139

 

BBO20

12832

13707

   

140

 

BBO21

13716

14318

   

141

 

BBO22

14331

15143

   

142

 

BBO23

15222

15422

 

blyA-7

109

putative hemolysin; short gene

BBO24

15429

15782

 

blyB-7

111

putative hemolysin

BBO25

15766

16098

   

112

 

BBO26

16088

16444

   

143

near-consensus N-terminal lipidation signal

BBO27

16522

17136

 

bdrN

80

 

BBO28

17245

17664

 

mlpG

113

N-terminal lipidation consensus

BBO29

18770

17715