About(Visit our FAQs)
This web site was developed so that researchers could easily view and download genomic islands for all published sequenced bacterial and archaeal genomes that have been predicted using the the currently most accurate GI prediction methods. Users can also upload their own unpublished genomes for analysis. The source code and entire GI data sets are available for download and acknowledgement information is available for those who use our resources. This software and website is developed and maintained by the Brinkman lab at Simon Fraser University, Canada. Please contact us with any questions or comments. Note, this is IslandViewer 4, the older IslandViewer 3 is still available for any old analysis needs.
In (Langille et al., 2008), we describe a new method called IslandPick that uses a comparative genomic GI prediction method to develop stringent data sets of GIs and non-GIs. These positive and negative data sets were used to evaluate several sequence composition GI prediction methods and showed that SIGI-HMM and IslandPath-DIMOB were shown to have the highest overall accuracy. In addition, IslandPick had the most agreement with an independent data set of previously published genomes indicating that it was a highly precise method for GI prediction. IslandPick requires several phylogenetically related genomes to be sequenced to be able to make a prediction; therefore, predictions will not be available for many genomes. IslandPick results are highly dependent on the comparison genomes selected and can be customized to include or exclude particular genomes. This is useful in cases where IslandPick doesn't provide any results or you would like to compare genomes with a particular phenotype or within a phylogenetic distance. These non-default analyses can be run by following the "Show Islandpick Comparison Genomes" link for a given genome. For more information please see Langille et al., 2008 and/or our overview presentation.
SIGI-HMM (see Waack et al., 2006) is a sequence composition GI prediction method that is part of the Columbo package. This method uses a Hidden Markov Model (HMM) and measures codon usage to identify possible GIs. In a former study, SIGI-HMM was shown to have the highest precision and overall accuracy, out of six tested sequence composition GI prediction methods (Langille et al., 2008).
IslandPath (see Hsiao et al., 2003) was originally designed to aid to the identification of prokaryotic genomics islands (GIs), by visualizing several common characteristics of GIs such as abnormal sequence composition or the presence of genes that functionally related to mobile elements (termed mobility genes). Our subsequent studies (see Langille et al., 2008), showed that dinucleotide sequence composition bias and the presence of mobility genes were good indicators for identifying GIs. In fact, it was tied with SIGI-HMM for having the highest overall accuracy and traded a slightly lower precision for higher recall.
More recently, IslandPath-DIMOB has been improved to include more recent Pfam profiles for the identification of mobility genes while using more stringent cutoffs to avoid false positives. A more sensitive dinucleotide score for the identification of potential GIs and the merging of closely-positioned regions with dinucleotide biases have also been implemented. Overall, IslandPath-DIMOB features a 19% increase in recall and a 0.6 % increase in precision when assessed using the same dataset.
Islander (see Hudson et al., 2015) was designed to find genomic islands based on mechanistic consequences of their typical site-specific integration into tRNA/tmRNA genes. Islander has a high precision and defines precisely the boundaries of genomic islands. Islander results are so far available for 1264 genomes (among 2168 analyzed) in IslandViewer pre-computed results.
Curated virulence factor annotations are incorporated from the 2014 release of the Virulence Factor Database, VFDB (see Chen et al., 2012), PATRIC (see Wattam et al., 2014) and Victors. Antibiotic resistance gene were identified using Resistance Gene Identifier (RGI) and the Comprehensive Antibiotic Resistance Database (CARD) (see Jia et al., 2017) as those with perfect matches. These curated annotations are available for visualization on genome images within the scatter plot, as well as for download in the various downloadable files. Within the linear genome view, links are provided to the respective entries from the annotation source for more information. In downloadable files, the source of each annotation is denoted using its acronym.
Homologs of Virulence Factors and Antibiotic Resistance Genes
Homologs of curated virulence factor genes and resistance genes have been provided for genomes missing such curated data. Resistance gene homologs were determined using the Resistance Gene Identifier tool available through the CARD database using the strict cutoff. Virulence factor homologs were identified in close relatives of genomes with curated data based on a reciprocal best blast hit (RBBH) approach with very stringent cutoff values: e-value cutoff of 1e-10, >90% sequence similarity, and >80% coverage. Note that virulence factor homologs are predictions and indicate genes of potential interest to check for presence or absence between isolates. Virulence is very contextual and further manual investigation of predictions is needed. All homologs of curated annotations are indicated within the scatter plot in lighter shades and are also available for download. In downloadable files, the source for these homologs may be indicated by "RGI" for resistance genes and "BLAST" for virulence factors.
Pathogen-associated genes are those genes only found in pathogens and not in non-pathogens, to date, according to set criteria. A study conducted by Ho Sui et al. in 2009 compiled a list of these pathogen-associated genes (see Ho Sui et al., 2009 for more information and find data for download here). An update of this analysis was performed to include more recent genomes (all NCBI complete bacterial genomes from September 3, 2014). These annotations are also indicted within the scatter plot to compare against GI predictions. For robustness we included only those pathogen-associated genes found conserved across at least three genera (but never in non-pathogens of any genera). In downloadable files, the source for pathogen-associated genes may be indicated by the acronym "PAG".
IslandViewer integrates two sequence composition GI prediction methods SIGI-HMM and IslandPath-DIMOB, and a single comparative GI prediction method IslandPick. These methods have varying advantages and disadvantages. Predictions of virulence factor homologs for certain genomes are provided to indicate genes of potential interest, but require further manual investigation of their role in virulence.
Sequence composition GI prediction methods may have difficulty detecting ancient GIs due to the amelioration of the foreign DNA to the host genome over time. Also, these methods will have difficulty detecting GIs that have originated from a genome with similar sequence composition as the host genome. Lastly, these methods can make false predictions due to the normal variation in sequence composition that can occur in bacterial genomes.
Comparative GI prediction methods depend heavily on the genomes that are chosen for comparison. For instance, the selection of very similar genomes will result in the prediction of only recently inserted GIs, while comparing with more distantly related genomes will detect many more GIs (including recent and ancient GIs), but may increase the chance of false prediction. IslandPick uses several different cutoffs to automatically select genomes for comparison, but users have the choice to select different comparison genomes based on their own insights.
In general, we recommend that users take advantage of the ability to view GI predictions from all three methods in a single integrated view. To aid this further, we have highlighted islands that are predicted by one or more tools as red around the outer circle.
Conducting Further Analysis
We strongly encourage researchers to conduct further analyses of any GIs displayed by IslandViewer. As outlined above, in Analysis Considerations, there are many factors that can lead to false prediction by any GI prediction program. In addition, any GIs that are very close to each other in the genome may actually be one large GI insertion (or vice versa) and the exact boundaries or insertion sites should be inspected in more detail. We recommend the use of the following computational tools to aid in further analysis of genomic islands.
Artemis, is a popular and well developed genome browser and annotation tool. IslandViewer allows users to download predicted GIs in a GenBank file format that can be opened directly in Artemis. The GIs will appear in Artemis using the same colour scheme used in IslandViewer (i.e. green for IslandPick, orange for SIGI-HMM, etc.). An easy to view list of islands can be produced in Artemis by using the "Feature Selector" under the "Select" menu and filling in:
- Key = "misc_feature"
- Qualifier = "note"
- Containing this text = "Genomic Island"
Along with the inspection of genes neighbouring and within GIs, sequence composition graphs, such as GC content and Karlin Signature Difference, can be added via the Graph menu. Users with their own GI predictions or other genome features, such as IS or direct repeat elements, can add their data to Artemis after constructing a fairly simple input file (as described in the Artemis manual).
IslandPath, displays many features that are often associated with genomic islands, such as various types of sequence composition bias, tRNAs, and integrases and transposases. These features are shown together in a single integrated view and may help in determining the location of GIs. IslandViewer contains a direct link to IslandPath for the genome being viewed on the left column of every results page.
Mauve, is a whole genome alignment tool that can be used to view genome rearrangements and large insertions in related genomes. By using the IslandViewer GenBank download file along with your choice of other closely related bacteria genomes, GIs can be viewed in the context of other genomes and conserved regions surrounding these regions can be inspected further.
Websites are welcome to link to island predictions in IslandViewer. The RefSeq accession can be used to link to the page in the format: http://www.pathogenomics.sfu.ca/islandviewer/accession/NC_XXXXXX.X/ or www.pathogenomics.sfu.ca/islandviewer/accession/NC_XXXXXX.X/ with or without the accession version. If the version is left off the most recent version will be displayed (e.g NC_003997 is equivalent to NC_003997.3).