LCB arrangement was plotted in circular view as in [10] in CGView [23]. As in [10], subset datasets were produced by randomly sampling nucleotides from concatenated LCB alignments for each chromosome
using BioPerl scripts. These subset datasets were 10,000 bp, 20,000 bp, 30,000 bp 40,000 bp, 50,000 bp, 100,000 bp, 200,000 bp, 300,000 bp, 400,000 bp, 500,000 bp, and 1,000,000 bp (only up to 300,000 bp for the small chromosome because the concatenated alignment was only just over 400,000 bp). These datasets were each also analyzed in TNT and Garli or RaxML (depending on length). 44-taxon dataset For this dataset, genomes were downloaded as detailed above or assembled de novo as detailed below. Because genome sequences that were present as multiple contigs were included, arrangement of these contigs was ignored and contigs were simply concatenated. Breakpoint analyses could not be Small molecule library high throughput completed on this dataset because the arrangement of gene and multi-gene fragments was not necessarily true to life after Imatinib contig concatenation. A different strategy was implemented in
Mauve in order to be able to include all 44 taxa. Concatenated contigs were grouped by two to three close relatives as determined in [9] as well the concatenated LCBs of closely related species from the Mauve results from the 19-taxon dataset. This was done because the de novo analysis in Mauve of all 44 concatenated genomes was computationally prohibitive. This strategy works because the Mauve results of interest are those LCBs common to all taxa. Since the 44-taxon dataset contains all the taxa of the 19-taxon dataset plus new taxa, one would expect the percent
of base-pairs to be homologized by Mauve to decrease as taxa are added. By running Mauve analyses that start with the LCBs generated by the 19-taxon dataset Mauve analysis, one expects to capture the same homologies that one would capture if all 44-taxa were analyzed in Mauve from scratch. The LCBs that resulted from the smaller runs for all 44-taxa were extracted. Since Mauve provides results that collinearize the LCBs, a final, simpler Mauve run was performed with all 44 taxa together. The above was done separately for the large and small chromosomes. Phylogenetic analyses in TNT and Garli were conducted on the resulting alignments for both the large and small chromosomes.V. brasiliensis was removed from Molecular motor small chromosome dataset because it caused Mauve to crash repeatedly. New genome sequences Salinivibrio costicola strain ATCC 33508, Vibrio gazogenes strain ATCC 43941, and Aliivibrio logei strain ATCC 35077 were ordered from the ATCC (American Type Culture Collection). They were grown on Difco Marine Agar. S. costicola was grown at 26 degrees C, V. gazogenes was grown at 26 degrees C and A. logei was grown at 18 degrees C. DNA was extracted using the Qiagen DNeasy DNA extraction kit and DNA concentration was measured using a Qubit 2.0 Fluorometer from Invitrogen.