How To Use BASys
BASys is simple to use. The following steps describe how you can use BASys to generate automated genome-scale annotations for bacterial chromosomes, contigs, or plasmids.
If you have any questions or comments on how to improve BASys, please feel free to contact the authors.
- Verify that BASys is right for you. If you are unsure, you might wish to find out more about BASys, review the documentation and check out some examples before proceeding.
- Have your bacterial sequence data ready. If you are unsure of where to find bacterial chromosome sequences, you might try NCBI's comprehensive archive of microbial genome sequence data.
- Identify the coding sequences. If you don't have this information then you can let BASys predict them for you using Glimmer. If you are annotating a genome from NCBI, you can use their ".ffn" file, which is available in the same ftp folder as the chromosome (".fna") file. The ".ffn" file is a multi-FASTA-formatted file. The .ffn file defline contains each identified coding sequence's location and direction along the chromosome. This information is critical for BASys. The .ffn file also contains the actual coding sequence which may differ slightly from the chromosome sequence if there are frame shift corrections and the like. If you dont have a pre-existing .ffn file for your chromosomal data you can make one pretty easily. The defline must be in the form >identifier:cstart-end name, where:
The sequence must contain the correct coding sequence for this gene. You may also want to check out an example .ffn file for reference.
identifier (optional)|| specifies a unique gene identifier (this can be used to supply BASys with your own gene/protein accessions). |
c|| specifies that the coding sequence is on the complementary strand. If the coding sequence is on the direct strand, just omit the c character. |
start|| specifies the start location of the first base of the coding sequence on the direct strand of the chromosome |
end|| specifies the end location of the last base of the coding sequence on the direct strand of the chromosome.|
name (optional)|| specifies the gene name.|
BASys also lets you submit gene identification information in a TAB-delimited format, similar (but not identical) to the NCBI "protein table" or ".ptt" format:
START END STRAND IDENTIFIER NAME
332 805 + 11496586 mob
4373 4257 - 11496606
The header line is optional, but the table columns must be in the correct order. You may also want to check out an example file for reference.
START|| specifies the start location of the first base of the coding sequence on the direct strand of the chromosome |
END|| specifies the end location of the last base of the coding sequence on the direct strand of the chromosome.|
STRAND|| specifies that the coding sequence is on the direct strand (+) or on the opposite strand (-) of the chromosome. |
IDENTIFIER|| specifies a unique gene identifier. It will be used for gene/protein accessions. |
NAME (optional)|| specifies the gene name.|
- Login or submit your sequence information directly to BASys. Use the login sytem if you wish to submit and monitor multiple chromosome submissions to BASys, or if you wish to access BASys with username/password instead of by email. Submit your chromosome information directly without logging in if you simply want to submit a single chromosome.
- Wait. BASys requires about 24 hours to annotate an average-sized bacterial chromosome. If there is a heavy load on the server when you submit, then your submission might queued for annotation when the annotation system becomes free. BASys notifies you by email when your annotations are complete and provides you with a URL so you can view them.
- Examine your annotations. Access to the annotations are provided in a table format and as a browseable graphical genome map. There are also facilities for server-side similarity searches (with BLAST) and text searches for selected annotations. You may want to view some examples annotations for reference.
- Download your annotations for offline viewing, manual corrections etc. You can download the complete annotation report or a more compact version containing without evidence cards. Note that evidence cards can substantially increase the size of a download (up to 1Gb for an an average bacterial genome!).