Info • Aspergillus flavus NRRL3357 v1.0

Status

[March 2020] The Aspergillus flavus NRRL 3357 genome was not sequenced and assembled at the Joint Genome Institute, but rather by Jeff Skerker using a combination of long-read and short-read datasets (Pacbio, Oxford Nanopore, and Illumina).  Eight chromosomes were assembled using a hybrid assembly method and the CANU assembler (v1.7.1).  The assembly was polished using Pacbio data using pbalign (v0.3.1), blasr (v5.3) and the Arrow (v2.2.2) algorithm. Final error correction performed using Pilon and Illumina data.  Eight chromosomes were assembled, 7 out of 8 are complete telomere-to-telomere assemblies.

This assembly was then annotated using the JGI annotation pipeline, with several modifications. To preserve original gene names as much as possible, previously produced models available on fungiDB were mapped forward onto the new assembly, which constitute 43.29% of the Filtered Model (FM) set. Additionally, to improve capture of UTRs using RNAseq data available at NCBI, we applied our est extension procedure to mapped fungiDB models, producing a new track, estExt_Aspflav1_ExternalModels (20.79% of FM). Our standard filtering parameters were also adjusted to allow capture of more models with transcriptomic support, as well as prioritize mapped forward models (and their est extended versions) for inclusion in the FM set.

Summary statistics for the Aspergillus flavus NRRL3357 v1.0 release are below.
Genome Assembly
Genome Assembly size (Mbp) 37.75
Sequencing read coverage depth 650x
# of contigs 8
# of scaffolds 8
# of scaffolds >= 2Kbp 8
Scaffold N50 4
Scaffold L50 (Mbp) 4.81
# of gaps 0
% of scaffold length in gaps 0.0%
Three largest Scaffolds (Mbp) 6.51, 6.31, 5.20


Publicly available RNAseq libraries on NCBI for A. flavus NRRL3357 were identified by Jeff Skerker, which include SRA libraries: SRR2632952, SRR2632961, SRR2632962, SRR2632963, SRR2632966, SRR2633059, SRR2633060, SRR2633061, SRR2633139, SRR5061895, SRR5061899, SRR5061903, SRR5061905, SRR5061908, SRR5061909, SRR544871, SRR544872, SRR544873, SRR8115610, SRR8115611, SRR8115612, SRR8115613, SRR8115614, and SRR8115615. All individual libraries are available for exploration on the genome browser (some are hidden by default, open the browser toolbar to access them all). Reads across all libraries were depth normalized using bbnorm.sh (BBMap version 38.79) and the transcriptome was assembled using Trinity. Several contaminants were identified across these libraries - the most abundant being Burkholderia sp., Oryza sativa Nipponbare, Home sapiens and Mizuhopecten yessoensis. These contaminants do not map to A. flavus, therefore reducing the total # of transcripts mapped to the genome.
ESTs Data set # sequences total # mapped to genome % mapped to genome
Ests est.fasta 19038495 18641048 97.9%
Other Trinity_assembled_Illumina_transcriptome 110843 58815 53.1%


Gene Models FilteredModels4
length (bp) of: average median
gene 1898 1570
transcript 1709 1413
exon 528 299
intron 87 62
description:
protein length (aa) 462 390
exons per gene 3.24 3
# of gene models 13715


Collaborators

Funding

This project was not sequenced at the JGI.