1. Introduction
The octocoral Xenia umbellata Lamarck, 1816 is a common component of Red Sea coral reef ecosystems, known for its high tolerance to environmental stressors and rapid proliferation, facilitated by whole-body regeneration from even a single tentacle (Halász et al. 2019; Nadir, Lotan, and Benayahu 2023; Mezger et al. 2022). This remarkable regenerative capacity has also established X. umbellata as an emerging model system for studying regeneration (Nadir, Lotan, and Benayahu 2023). In October 2023, initial reports from recreational divers documented the onset of a Xenia umbellata invasion – originally misidentified as Unomia stolonifera, which is actively spreading in Venezuela (Ruiz-Allais et al. 2014; Ruiz-Allais, Benayahu, and Lasso-Alcalá 2021) – on Puerto Rico’s already degraded reefs, raising concerns about the species’ potential to take over ecosystems both local and in neighboring islands with little natural resistance (Toledo-Rodriguez et al. 2025). Capable of colonizing available substrate (i.e., coral rubble, rocky substrate, and bare sand) and overgrowing native benthic organisms like ecosystem engineering stony corals and sponges (Figure 1), significant effort from Puerto Rico’s Department of Natural and Environmental Resources Emergency Response Unit has been dedicated to tracking and eradicating X. umbellata patches, yet new patches continue to be found in shallow (<30 ft) and deeper reefs (up to 55 m). As the region prepares for the long-term management of X. umbellata’s potential impact on Caribbean reefs, a lack of genomic and microbial resources for this species remains. Such genomic data, especially generated during the early phase of the invasion, can provide critical insights into its invasion dynamics and establish a foundation for future research.
Invasion genomics provides an effective evolutionary framework for investigating the invasion process, from potential introduction pathways to the establishment, spread, adaptation, and population dynamics of the invader (C. E. Lee 2002; McGaughran et al. 2024; North, McGaughran, and Jiggins 2021). For example, by characterizing an introduced species’ genome, investigators can identify signatures of invasiveness, such as standing genetic variation (e.g., heterozygosity or admixture) or structural variation (e.g., gene duplications), which may facilitate rapid adaptation to the novel environment (Hahn and Rieseberg 2017; Makino and Kawata 2019; McGaughran et al. 2024; O’Donnell et al. 2014; Wu et al. 2019). Such genomic knowledge can help predict invasive potential, forecast invasion success, and support management priority setting in efforts to impede spread (McGaughran et al. 2024). Moreover, genomic resources generated during the early phases of an invasion provide critical baseline data for reconstructing invasion histories and identifying the genomic mechanisms of adaptation underlying the spread. While this information is typically generated only for the invading metazoan, it is equally important to characterize its associated microbial symbionts, which may influence the success and adaptability of the invader.
Xenia umbellata, like all cnidarians, harbors diverse microbial symbionts inclusive of dinoflagellates (Family Symbiodiniaceae), bacteria, archaea, fungi, and viruses, collectively referred to as the holobiont (Stévenne et al. 2021). The acquisition or loss of symbiotic partners can strongly influence host physiology by conferring new traits that may alter holobiont ecology and fitness (Bordenstein and Theis 2015; Hussa and Goodrich-Blair 2013; Pita, Fraune, and Hentschel 2016). For example, in cnidarians, the identity of associated Symbiodiniaceae has been shown to influence host resilience to abiotic and biotic stressors, emphasizing the role of microbial symbionts in driving environmental adaptation and, ultimately, invasion success (Wang et al. 2023; Newkirk et al. 2020; Stévenne et al. 2021). Therefore, it is imperative to investigate the microbial symbionts of invaders and track changes in their communities as factors shaping invasion outcomes. In addition, these invasive microbial symbionts may directly affect native species by displacing resident symbionts or acting parasitically (Bojko et al. 2021). Considering both the host genome and its microbial symbionts as an integrated “hologenome” (Bordenstein and Theis 2015) provides a powerful framework for understanding how invasions are mediated at the genomic and ecological levels and supports the need for hologenomic resources to guide future studies and management strategies.
Here, we describe the production of early phase hologenomic resources for Xenia umbellata invading Puerto Rico’s coastal waters. Using deep metagenomic sequencing, we generated the first high-quality draft genome of X. umbellata and characterized the taxonomy of its associated Symbiodiniaceae. These resources were derived from a type specimen collected within six months of the initial report of the invasion. By integrating host and symbiont data, this study provides the first hologenomic reference for X. umbellata, establishing a foundation for both scientific investigation and management applications.
2. Data description
2.1. Sampling, DNA extraction, and Sequencing
Several polyps from a Xenia umbellata patch located in the La Parguera Natural Reserve at a depth of 21.6 m were carefully collected with tweezers on SCUBA and placed in a 50 mL tube in March 2024 (Table 1). The tissue sample in the 50 mL tube containing ambient seawater was then transferred to the lab on ice. Once in the lab, the ambient seawater was removed from the 50 mL tube. The polyps in the original 50 mL tube were then immediately placed in -80 ºC for storage until DNA extraction. DNA extraction was performed using the ZymoBIOMICS DNA/RNA Miniprep kit (Zymo Research) with minor modifications to the provider’s protocols (Veglia and Watkins 2025). Extracted DNA was sequenced on the Illumina NovaSeq 6000 platform using 150 bp paired-end reads, following library preparation with the Illumina DNA PCR-Free Prep kit. Sequencing was performed with a target output of 40 Gb, corresponding to approximately 133 million paired-end 150 bp reads (~266 million total reads). Sample quality was assessed using Tapestation and Nanodrop, and library quality was evaluated using Tapestation and qPCR. DNA input met provider requirements: ≥10 ng/μL concentration, ≥0.1 μg total quantity, 260/280 ratio of 1.5-2.2, and DIN value between 6.0-10.0.
2.2. Sequence processing and assembly
Sequencing resulted in 278,081,250 raw reads with 96% of bases having a phred score >20. Raw reads were then processed and cleaned with the program fastp (v0.23.2; Chen et al. 2018) resulting 272,355,896 high quality cleaned reads. High quality reads were assembled with the program SPAdes (v4.0.0; Prjibelski et al. 2020) using the metaSPAdes algorithm (Nurk et al. 2017) producing 1,092,378 scaffolds. The metagenome assembly provided a peak into the X. umbellata hologenome containing scaffolds sources from the xeniid metazoan as well as all its microbial symbionts inclusive of the endosymbiotic dinoflagellates within Family Symbiodiniaceae.
2.3. Xenia umbellata Genome Assembly, Assessment, and Annotation
To extract all xeniid scaffolds from the metagenome assembly, BLASTn (Camacho et al. 2009) was used to align contigs against a chromosome-level genome assembly of Ovabunda sp. (Hu et al. 2020). The genome assembly was originally labeled as Xenia sp., however it was later confirmed to be Ovabunda sp. (pers. comm. Catherine McFadden), a closely related genus. All sequences exhibiting >95% nucleotide identity and alignment lengths >100 bp were retained as putative xeniid scaffolds. On the remaining sequences, an additional scan for anthozoan-like scaffolds was performed using the program CAT (von Meijenfeldt et al. 2019) and the CAT_nr database (v20241212). All likely xeniid sequences (n=132,376), identified through BLASTn and/or CAT, were pooled into a single fasta file. The program RagTag (v2.1.0; Alonge et al. 2022) was then used to further scaffold these sequences using the chromosome-level Ovabunda genome assembly (Hu et al. 2020) as a reference. Next, the scaffolded assembly was then assessed for any contaminants (e.g., mitochondrial sequences, non-target organism sequences) to be removed using NCBI’s Foreign Contamination Screen (Astashyn et al. 2024). Finally, identified contaminants were removed and length filtering was then performed using the ‘clean’ function of the program funannotate (v1.8.17;Palmer and Stajich 2020) resulting in a final assembly of 27,739 sequences with lengths greater than 500 nucleotides. Assembly quality was assessed using QUAST (v5.2.0; Mikheenko et al. 2023) revealing that the final assembly had a total length of 151,140,580 bp and an N50 of 6,477,837 bp (Figure 2). Genome completeness was assessed using BUSCO (v5.8.0; Manni et al. 2021) with the anthozoan lineage database (anthozoa_odb12.2025-07-01). The analysis revealed that 91.4% of BUSCOs were complete, comprising 89.9% single-copy and 1.5% duplicated genes. An additional 3.9% were fragmented and 4.7% were missing. Taken together, the QUAST and BUSCO results indicate high contiguity and completeness of the Xenia umbellata genome assembly. Next, we used the stats.sh program within the BBMAP tool kit (v39.15; Bushnell 2014) to assess base content of the genome assembly. The genome assembly exhibited an overall base composition of 32.60% adenine (A), 32.82% thymine (T), 17.29% cytosine (C), and 17.29% guanine (G), with a GC content of 34.58% and 1.84% ambiguous bases (Ns), likely introduced during scaffolding and is expected for a draft genome assembled from short-read data. Further sequencing or long-read integration would likely improve contiguity, but this assembly represents a high-quality and biologically informative first reference genome for Xenia umbellata.
Prior to annotation, repetitive elements in the X. umbellata assembly were identified and classified de novo using RepeatModeler (v 2.0.7; Flynn et al., 2020). Identified repeats were then quantified and soft-masked assembly-wide with RepeatMasker (v4.2.1; Smit et al., 2013–2015) using both the custom library and the Dfam/RepBase databases as references. The Xenia umbellata genome was then annotated with the funannotate (v1.8.17) pipeline. Genes prediction was done with the ab initio predictors AUGUSTUS (v3.5.0; Stanke et al. 2006), GeneMark-ES (v4.71; Ter-Hovhannisyan et al. 2008), SNAP (v2006-07-28; Korf 2004), and GlimmerHMM (v3.0.4; Majoros, Pertea, and Salzberg 2004) and protein homology evidence generated by aligning NCBI RefSeq invertebrate proteins (downloaded July 2025) to the soft-masked genome with DIAMOND (Buchfink, Xie, and Huson 2015) and Exonerate (Slater and Birney 2005). Consensus gene models were generated with EVidenceModeler (Haas et al. 2008), and functional annotation incorporated InterProScan (Jones et al. 2014), Pfam (Mistry et al. 2021), UniProt (v2025_03), MEROPS (v12.5; Rawlings et al. 2018), and dbCAN (v13; Zheng et al. 2023).
Annotation of the Xenia umbellata genome with Funannotate predicted 21,596 protein-coding genes across 20,844 mRNAs and 752 tRNAs, with an average gene length of ~3.1 kb and an average protein length of 399 aa. The annotation comprised 142,785 exons (16,289 multi-exon and 4,555 single-exon transcripts). Functional annotation assigned putative functions to: 11,909 genes with GO terms, 14,275 InterProScan annotations, 11,452 Pfam domains, 692 MEROPS proteases, and 197 CAZymes. In addition, 1,143 genes were assigned common names through similarity to UniProt proteins.
2.4. Pairwise Genome Comparison with a Closely Related Ovabunda Species
Pairwise genome comparisons using PyANI-plus (ANIb method; v0.0.1; Pritchard et al. 2015) yielded an Average Nucleotide Identity (ANI) of 93.2% (X. umbellata aligned to Ovabunda sp.) and 92.1% (Ovabunda sp. aligned to X. umbellata), with a mean ANI of 92.7%. Alignment coverage was asymmetric, with 70.8% of the X. umbellata draft genome (107,045,860 bp) aligning to Ovabunda sp. and 63.6% of the Ovabunda sp. assembly (141,702,152 bp) aligning to X. umbellata. Transformed ANI (tANI) values were 0.415 and 0.535, respectively (mean = 0.475), consistent with substantial genomic divergence. These results indicate that while a large portion of the genome (~64–71%) is shared at ~93% identity, considerable lineage-specific sequence divergence remains, consistent with intergeneric genomic differentiation. Next, we calculated the estimated X. umbellata genome size using the program GenomeScope2 (v2.0; Ranallo-Benavidez, Jaron, and Schatz 2020). The kmer histogram file used for GenomeScope2 analyses was generated with the program jellyfish (v2.3.1; Marçais and Kingsford 2011) with a “-m” equal to 21. GenomeScope analyses using 21-mer frequencies estimated the X. umbellata genome at ~171.7 (171.6-171.9) Mbp (R²=93.15%). The estimated 171.7 Mbp genome size is 25.3 Mbp shorter than the calculated genome size for the related Ovabunda sp. genome. This observed difference is likely driven by repeat content (the “repeatome”), as GenomeScope estimated that the X. umbellata genome is approximately 38.1% repetitive (≈ 66 Mbp). In contrast, the Ovabunda sp. genome is 46.2% repetitive (≈ 91 Mbp) (Hu et al. 2020).
Furthermore, repeat region analyses identified 27.8% of the X. umbellata assembly (~42.1 Mbp) as repetitive (26.97% interspersed repeats), dominated by unclassified elements (20.6%). Classified transposable elements comprised 6.4% of the genome (LINEs 2.6%, LTRs 1.5%, DNA transposons 2.3%). Transposable element expansions are a major driver of genomic divergence between species and have been reported to facilitate genome evolution in diverse eukaryotes (Castro et al., 2024; Pluess et al. 2016; Shah, Hoffman, and Schielzeth 2020). Future efforts should build on this baseline characterization of the X. umbellata repeatome to assess changes in element abundance and diversity that may signal genomic adaptations to novel environments during its continued spread (C. C. Lee and Wang 2018; Mérel et al. 2021).
2.5. X. umbellata Genome Heterozygosity: A Clue of Invasion Origin?
Genome-wide heterozygosity is hypothesized to provide insight into the adaptive and invasive potential of species (Kołodziejczyk et al. 2025). For example, the marbled crayfish (Procambarus virginalis), an emerging invasive species, possesses a triploid genome with high heterozygosity that is thought to facilitate its ecological success and spread (Gutekunst et al. 2018). In this context, establishing genome-wide heterozygosity for a specimen of X. umbellata collected during the early phase of an invasion provides a useful baseline for future comparisons, offers preliminary insight into adaptive capacity, and may help infer source populations: high diversity could indicate a wild origin, whereas reduced diversity might reflect bottlenecks associated with aquaculture or the aquarium trade. GenomeScope2 calculated the heterozygosity of the X. umbellata early-invasion genome to be approximately ~1.3%. This value falls within the range that has been previously reported for cnidarians (0.79-1.96%; Locatelli and Baums 2024; Shinzato et al. 2021; Stephens et al. 2022; Young et al. 2024; Yu et al. 2022) and is slightly higher than values reported for octocorals (0.73-1.2%; Ip et al. 2023; Ledoux et al. 2025). While this is a measurement of a single individual, it preliminarily suggests that the X. umbellata population introduced to Puerto Rican reefs may retain relatively high genomic diversity, consistent with high adaptive potential and a wild origin.
2.6. Characterization of Xenia umbellata dinoflagellate symbionts
An additional goal of this study was to provide genomic information/resources for the dinoflagellate symbionts (Family Symbiodiniaceae) associated with Xenia umbellata in Puerto Rico. Having this baseline knowledge is critical for tracking holobiont adaptation to the region via symbiont switching throughout X. umbellata’s continued spread (Creed, Brown, and Skelton 2022; Sørensen et al. 2021). To conservatively identify Symbiodiniaceae scaffolds from the metagenome, we first aligned non-xeniid sequences to a Symbiodiniaceae reference database containing all publicly available genomes on NCBI (as of March 2025) using minimap2. Candidate scaffolds were then validated by re-alignment with BLASTn to the same database, with non-aligning sequences removed. Validated scaffolds were assigned genus-level taxonomy according to the best-matching reference genome. In total, 555,596 scaffolds, ranging in length from 200 to 25,187 bp, were identified as Symbiodiniaceae, with nucleotide similarities ranging from 95% to 100%. Of which, 555,520 (99.9% of scaffolds) aligned to the representative genomes from genus Durusdinium, suggesting the co-invading symbiont belongs to genus Durusdinium. Previous work reported Xenia umbellata in symbiosis with Durusdinium, but interestingly only at one shallow site in Ras Mohammed in the Red Sea, which was the only one of ten Red Sea sites where this association was observed (Osman et al. 2020). Our observation of Durusdinium as the dominant symbiont in this Puerto Rican individual suggests that X. umbellata has retained its original symbiosis during invasion. This finding provides critical baseline knowledge of the X. umbellata–Durusdinium association, enabling future efforts to track holobiont adaptation and monitor the spread of invasive symbionts.
Acknowledgments
We thank Dr. Catherine S. McFadden for her continued consultation on xeniids. We also thank Dr. Nilda Jimenez-Marrero (DNER-PR), Dr. Hector Ruiz-Torres (HJR Reefscaping), and the Department of Marine Sciences at UPRM for field and logistical support. The Department of Biology at UPRM provided consumables and lab space for molecular work. Molecular work and sequencing were personally funded by AJV and DATR. We acknowledge the Puerto Rico Department of Natural and Environmental Resources for granting the sampling permit (O-VS-PVS15-SJ-01444-14032024) used in this study. Finally, we acknowledge the Florida Department of Environmental Protection (Award No. C1E0A5) for funding the Linux workstation used to conduct all computational analyses.
Data accessibility
Raw reads can be accessed through the NBCI Sequence Read Archive (SRR34963192). This Genome Assembly project has been deposited at DDBJ/ENA/GenBank under the accession GCA_021976095.1 and can be downloaded from Zenodo (https://zenodo.org/records/17287826). Data sources are linked to the NCBI BioSample and BioProject numbers SAMN50581036 and PRJNA1304975. All Symbiodiniaceae sequences and metadata are available for download on Zenodo (10.5281/zenodo.16849361)
_close-up_view_.png)
_snail_plot_illustrating_scaffold_metrics__completeness__a.png)