Frequently Asked Questions
- Which genomes are available through IslandViewer and how can I run IslandViewer on other genomes?
- Why does my input file fail to upload or run?
- How should I be annotating my genomes prior to submitting to IslandViewer?
- Which prediction method is preferred?
- Why can't I download the GenBank file for my analysis (Server 500 Error)?
- Why is an expected genomic island missed from IslandViewer predictions?
- How do I change IslandPick analysis settings to compare against different genomes?
- What are the issues with running an incomplete genome through IslandViewer?
- What if my microorganism has several replicons?
- What is the accuracy of IslandViewer GI predictions?
- What are the sources of external annotations in the scatter plot? How can I determine which resource to cite if I use any of these annotations?
- Why does my submitted genome from NCBI give different results than the pre-computed version?
- How can I check if a GI might be a pathogenicity or resistance island?
- How can I compare GI predictions across two species?
- How do I cite IslandViewer?
- Where can I find more information about GI prediction using computational methods?
- What is the purpose of the login?
- How do I use the HTTP API for batch submission?
If you are still having problems or concerns, please contact us for further assistance.
All complete bacterial and archaeal genomes available for download on NCBI are pre-computed through IslandViewer. The genomes are updated from NCBI approximately semiannually. If you would like to run your own custom genome, use the Genome Upload page to upload your own GenBank or EMBL file. See here for documentation about GenBank and access to sample records, or see here for a description of the EMBL file format. Unfinished genomes at the contig stage are now currently accepted by IslandViewer. Contigs must be annotated for analysis and will be ordered against a user-specified reference genome using the Mauve contig mover (Rissman et al., 2009) before running through the GI prediction pipeline.
The most common problem is incorrect format of input files. Please ensure your input file follows GenBank or EMBL formats with all necessary fields, including all nucleotide and coding sequences (CDSs). See examples for more details: GenBank, EMBL. In some cases, SIGI-HMM will fail to run due to issues beyond our control (SIGI-HMM software was written by others), but will be able to view results for IslandPath-DIMOB and IslandPick in such cases. If you don't appear to have any such problems and you notice your jobs are taking longer than a few hours to complete, please don't hesitate to contact us and we would be happy to help identify the problem.
While we do not endorse a specific annotation tool at this time, an important point we urge all users to keep in mind is to be consistent in your gene annotations. The integrated GI prediction software are dependant on the gene annotations. Therefore, you should be sure to use the same annotation software and version for all of the genomes in your analysis for the most consistent results. Further, when annotating your files, ensure that the annotations are in order and that your file is formatted in adherence with the standards for GenBank and EMBL files.
IslandViewer integrates predictions from four methods for pre-computed genomes: IslandPath-DIMOB, SIGI-HMM, IslandPick and Islander. For user custom genomes, GI predictions are available from three methods (IslandPath-DIMOB, SIGI-HMM, and IslandPick) for complete genomes (one replicon) and from two methods (IslandPath-DIMOB, and SIGI-HMM) for draft genomes. Due to important differences in the methods (more details here), the various software may provide different GI predictions. All methods have a high precision (>85%) and hence make few but some false positive predictions and we encourage users to carefully check the results. Genomic islands predicted by several or all software can be more confidently identified as genomic islands.
Owing to the precise finding of direct repeats flanking the acquired region, the boundaries of genomic island predictions by Islander are highly accurate and should be favored compared to gene-based nucleotide bias approaches such as IslandPath-DIMOB or SIGI-HMM. As a downside, Islander only predicts a small subset of canonical genomic islands inserted in tRNAs and tmRNAs and has a very low recall. The comparative genomic approach, IslandPick also determines rather precise genomic island boundaries from sequence alignment with reference genomes. However, these boundaries depend highly on the selected reference genomes and for example in cases of old large complex genomic islands with further gene acquisition or loss, boundaries might not be entirely accurately predicted. IslandPath-DIMOB and SIGI-HMM generally provide sometime fragmented predictions of regions that have likely been acquired by horizontal gene transfer and the determination of GI boundaries are imprecise. However, both methods provide a much higher recall, and are thus especially useful to identify more widely regions acquired by horizontal gene transfer.
For a nice example, please have a look at the genomic island predictions between 4.4 Mb and 4.53 Mb in this reference genome.
If your analysis completed, but you cannot download the GenBank file with genomic island annotations (links to Server 500 error message on web interface), this may be due to the formatting of you input file. Longer comments in your file have been found to lead to this issue. For the time being, we recommend removing comments from your files and re-running them. If you continue to experience problems after doing so, please contact us.
Genomic islands that have been more anciently incorporated into a genome can ameliorate into the given genome, thus making prediction by SIGI-HMM and IslandPath-DIMOB difficult as these methods rely on the presence of sequence composition bias differences. However, IslandPick can be customized to detect these islands. By default, IslandPick detects islands with a certain phyletic distance so will not detect older islands common to some species/strains. But you can easily change your comparison genomes to identify islands associated with different time scales to run a custom IslandPick analysis (see next question).
From the GI prediction results page, under "Additional Tools" on the left sidebar, click on IslandPick. If the genome was not processed using IslandPick, click on the link for "Manual selection of IslandPick" to run a custom analysis. Otherwise if IslandPick was previously run but you would like to select different genomes to compare against, click on "Change" on the right hand side above the listed genomes used in the original IslandPick analysis. This will take you to the page where you can select the genomes you would like to include in the analysis and you can also customize the different selection parameters. A measure of phyletic distance, calculated using MASH, is available beside each species. Once you click "Run IslandPick", you will be directed to the page where results will be available once the analysis is complete. You may enter your email address here to be notified when the results are accessible, or bookmark the page and check back later. An IslandPick analysis can take a few hours to run depending on the number of jobs in the queue.
Incomplete genomes are first reordered against a user-selected reference genomes. The quality of contig reordering will depend on the sequence similarity between the two organisms and the quality of the draft genomes. Contigs unique to the custom genome or contigs that could be placed in several position according to the reference genome, such as identical transposases that could not be solved by short read assembly software, will remain unaligned and placed at the end of the pseudochromosome. Contigs that could not be ordered are shown in gray in the outermost circle of the circular representation. Please note that potential plasmids, that are obviously not in the chromosome of the reference strain, will not be placed and will also tend to be found at the pseudochromosome. Moreover, if your genome likely has several replicons, please see the specific section below.
Due to the pitfalls of short read sequencing and the unknown quality of contig reordering against a reference, IslandViewer's GI prediction algorithms could falsely predict GIs, and could miss real GIs. A proper assessment of the accuracy of GI prediction in incomplete genomes is being performed, and until such assessment is complete, all GI predictions in incomplete genomes through IslandViewer should be carefully evaluated for validity.
Individual replicons are available as separate entries in the pre-computed genome. Each replicon can be used as a reference for contig reordering of custom draft genome submission.
Users wishing to analyse a draft genome of a microorganisms with multiple replicons can submit all contigs using each chromosome of the reference strain for reordering. In this case, GI prediction would also be performed every time on all the contigs that are not aligned to the given reference chromosome and are most likely part of another replicon, leading to redundant predictions. Users should therefore only consider GI predictions in reordered contigs that are shown in green on the outermost circle of the circular genome representation.
IslandViewer incorporates three of the most accurate GI prediction methods to complement each other for the best computational prediction of GIs. Note that our definition of a GI for some methods, such as IslandPath-DIMOB, only includes regions of size >8kb. The approach used in accuracy calculation was performed at a certain range of phyletic distances that islands can be most accurately identified at. In some cases, you may want higher recall, for which we suggest using Alien_Hunter, but the number of false predictions will also increase greatly. IslandPick is the most precise/specific, if comparison genomes are available, though as mentioned the settings can be modified in IslandPick to change what/if islands are predicted by comparing genomes at a different phyletic distance. For a more detailed description of accuracy calculations, see this paper: Langille et al, 2008 BMC Bioinformatics 9:329. To learn more about detecting GIs using computational methods, you may also wish to view Langille et al, 2010 Nature Reviews Microbiology 8:5.
Curated virulence factors have been collected from the Virulence Factor Database, VFDB (see Chen et al., 2012), PATRIC (see Wattam et al., 2014) and Victors. Homologs of virulence factors were determined as reciprocal best blast hits (RBBH's) to curated virulence factors. Curated resistance genes were collected from the Comprehensive Antibiotic Resistance Database (CARD) (see Jia et al., 2017). Homologs of resistance genes were determined using the Resistance Gene Identifier. Pathogen-associated genes were determined using an approach as outlined by Ho Sui et al., 2009, with a more recent list of genomes. In order to determine the source of any curated annotations, there are two main approaches. Firstly, the circle glyphs in the linear view provides direct links to the source database. Secondly, under the Downloads, a file of all "VF/AMR Annotations" shows the source of each annotation that should be cited and also provides hard links to the corresponding entries in the source database. For more information about any of these resources, please refer to the About page and see our Acknowledgments page to correctly cite these resources.
In an effort to standardize identifiers and reduce duplication, NCBI has been re-annotating its bacterial genomes with RefSeq release 70. We have newly computed all GI and AMR predictions on the RefSeq files retrieved on Feb 09, 2017. Moreover, old protein accessions of curated virulence factors have been mapped to their new non-redundant protein accessions to ensure a proper annotation transfer for users using the latest genome files. However, users submitting older or newer genome files, GenBank files (as opposed to RefSeq), might obtain different GI predictions and virulence factors based on differences in annotation and in the protein accession numbers.
Within the summary table below the interactive display, any GIs containing annotated virulence genes, resistance genes or pathogen-associated genes will be indicated under the "External Annotations" column. You can alternatively determine overlap by manually checking annotations in the genome visualization when you click on a GI. Keep in mind that the annotations available here are by no means complete and are not available for every genome so if you do not see any annotated islands, there still may be a pathogenicity or resistance island in your genome of interest. Further investigation of the types of genes within the GI predictions will be crucial for classifying these types of GIs.
You can compare two genomes on a single page by clicking on "Visualize two genomes" under the figure legend and choosing a genome to compare against. A pop up box should appear at the top of the page where you can select the second genome. This feature will be further improved in the future using sequence-similarity comparisons, but due to differences in criteria by users for matching up similar genomic regions, the simple visual comparison was strongly initially preferred.
We would appreciate anyone using IslandViewer to cite our publications, please see the acknowledgements page for the appropriate citations.
If you are interested in learning more about the computational prediction of GIs, please see the review article by Langille et al., 2010 in Nature Reviews Microbiology.
The login gives you access to a user management interface that lists previously submitted genomes with direct links to their results. Login is currently possible using Google, Twitter and Github credentials, but remains entirely optional. It also enables you to generate an authentication token to use the HTTP API (see below and here) as an authenticated user, hence enabling to retrieve information about past genome submission status and results.
IslandViewer 4 now integrates a HTTP API (REST API) that enable users to submit a larger number of genomes and retrieve predictions automatically using various programming languages. Please check out our specific HTTP API page.