The library concentration equivalence was calculated as 2.8�� 109 molecules/��L. The library was stored at -20��C until further use. The shotgun library was clonally amplified with 1 and overnight delivery 2 cpb in two emPCR reactions each, and the paired-end library was amplified with 0.5 cpb in three emPCR reactions using the GS Titanium SV emPCR Kit (Lib-L) v2 (Roche). The yields of the emPCR were 6.8 and 9.8%, respectively, for the shotgun library, and 11.29% for the paired-end library. These yields fall into the expected 5 to 20% range according to Roche protocol. For each library, approximately 790,000 beads for a quarter region were loaded on the GS Titanium PicoTiterPlate PTP kit and sequenced with the GS FLX Titanium Sequencing Kit XLR70 (Roche).

The run was performed overnight and analyzed on a cluster using the gsRunBrowser and Newbler assembler (Roche). For the shotgun sequencing, 188,659 passed-filter wells were obtained. The sequencing generated 129.3 Mb with an average length of 685 bp. For the paired-end sequencing, 106,675 passed-filter wells were obtained. The sequencing generated 35 Mb with an average length of 262 bp. The passed-filter sequences were assembled using Newbler with 90% identity and 40 bp as overlap. The final assembly identified 8 scaffolds and 66 contigs (>1,500 bp) and generated a genome size of 3.79 Mb which corresponds to a coverage of 54.25 genome equivalents. Genome annotation Open Reading Frames (ORFs) were predicted using Prodigal [42] with default parameters, but the predicted ORFs were excluded if they were spanning a sequencing gap region.

The predicted bacterial protein sequences were searched against the GenBank database [43] and the Clusters of Orthologous Groups (COG) databases using BLASTP. The tRNAScanSE tool [44] was used to find tRNA genes, whereas ribosomal RNAs were found by using RNAmmer [45] and BLASTn against the GenBank database. Lipoprotein signal peptides and numbers of transmembrane helices were predicted using SignalP [46] and TMHMM [47] respectively. ORFans were identified if their BLASTP E-value was lower than 1e-03 for alignment length greater than 80 amino acids. If alignment Cilengitide lengths were smaller than 80 amino acids, we used an E-value of 1e-05. Such parameter thresholds have already been used in previous works to define ORFans. Ortholog sets composed of one gene from each of the four genomes H.

