Introduction
Butterfly populations across the contiguous United States recently experienced a marked decline, with abundance decreasing by 22% between 2000 and 2020 (Edwards et al. 2025). In Florida, which is home to several federally listed butterfly species, active conservation efforts are underway to safeguard these vulnerable taxa. The Miami blue (Cyclargus thomasi bethunebakeri, Figure 1.A) and the Schaus’ swallowtail (Heraclides ponceana, Figure 1.B) are two prominent examples of federally endangered species. In contrast, the Atala butterfly (Eumaeus atala, Figure 1.C) was once considered extinct and never listed but has seen a resurgence in after its rediscovery (Koi and Daniels 2015).
The Miami Blue butterfly was once widespread across southern Florida, but extensive habitat loss led to severe population fragmentation and near extirpation. A remnant breeding population was later discovered, prompting reevaluation of its conservation status and its subsequent listing as endangered (Calhoun, Slotten, and Salvato 2002). Similarly, Schaus’ swallowtail suffered drastic declines due to habitat degradation and was among the first butterflies protected under the Endangered Species Act. In this study, we present the genome assemblies of these three taxa providing critical resources for their conservation.
Methods
Tissue used for this study was isolated from individuals that were lab-reared at the McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, in Gainesville, Florida. For C. t. bethunebakeri, DNA was extracted from 5-pooled females, while E. atala was extracted from a single female and H. ponceana from a 2-pooled female sample. DNA extractions were performed with the Qiagen DNAeasy genomic extraction kit according to standard protocols. A paired-end sequencing library was constructed using the Illumina TruSeq kit following the manufacturer’s instructions. The library was sequenced on an Illumina Hi-Seq platform in paired-end, 2 × 150bp format. The resulting fastq files were trimmed with Trimmomatic v0.33 (Bolger, L, and Usadel 2014) to remove adapter/primer sequences and low-quality regions with validation using fastqc (Andrews 2014). The trimmed sequences were then assembled by SPAdes v2.5 (Bankevich et al. 2012) followed by a finishing step using Zanfona (Kieras, O’Neill, and Pirro 2021), both set to default parameters. To evaluate the genomes’ completeness, we used Busco v5.0.0 (Seppey, Manni, and Zdobnov 2019) to measure the proportion of conserved single-copy orthologs using the insecta_odb10 dataset. A portion of our analyses were conducted using the ACCESS system (Boerner et al. 2023).
Results and Data Availability
Genome assemblies yielded total sequence lengths ranging from 355-522 bp with scaffold N50s spanning 3-6.2Mb. BUSCO scores indicated between 77-92% of the benchmarked, conserved, single-copy orthologous genes within Insecta (Table 1). All raw read data and assembled genomes are available on NCBI (Table 2).
Despite using exclusively short-reads for these assemblies, two of the three genomes contained over 90% of orthologous genes found with insects (C. t. bethunebakeri and E. atala). Moreover, these two genomes were among the top 50 highest-quality genomes across a comprehensive dataset of 822 butterfly genomes (Ellis, Storer, and Kawahara 2021). We are confident that these genomes will all be an asset for future Lepidopteran conservation efforts.
Funding
Funding was provided by Iridian Genomes, grant # IRGEN_RG_2021-1345 Genomic Studies of Eukaryotic Taxa.
Acknowledgements
This work used Bridges-2 at Pittsburgh Supercomputing Center through allocation [MCB130187] from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.
The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing computational resources that have contributed to the research results reported within this paper. URL: https://www.tacc.utexas.edu