Introduction
The Caspian seal (Pusa caspica), first described by Gmelin in 1788, is the only endemic mammal of the Caspian Sea, where it thrives in brackish conditions (Ranjbar Jafarabadi et al. 2021). Alongside the Baikal seal (Pusa sibirica), it is one of only two completely landlocked species in the Phocidae family. However, the Caspian seal faces mounting threats from climate change, natural processes occurring in the Caspian Sea, habitat degradation and pollution (Rozhnov et al. 2022; S. Goodman and Dmitrieva 2016; Ranjbar Jafarabadi et al. 2021). Over the past century, its population has declined by more than 70%. In recent years, the number of seals has remained at a relatively stable level of about 302 thousand individuals (Sidorov et al. 2023). As a result, the species is currently classified as Endangered (EN) on the IUCN Red List of Threatened Species (S. Goodman and Dmitrieva 2016).
Although Caspian seals are considered a single population distributed throughout the Caspian Sea, genetic data on the species remains scarce (Arnason et al. 2006; Palo and Väinölä 2006), and no comprehensive population genetics study has been conducted to date. The Caspian Sea represents one of the most extreme environments inhabited by any pinniped, with air temperatures ranging from −35°C in winter to +40°C in summer (S. J. Goodman 2018). Water depths vary dramatically, from less than 1 meter in large river deltas to 1,000 meters in the central and southern basins (S. J. Goodman 2018). These harsh conditions have driven several evolutionary adaptations in Caspian seals, including distinct phenotypic traits such as large eyes, robust masticatory muscles, and a unique tooth structure (Endo et al. 2002; S. J. Goodman 2018).
Methods
A muscle tissue sample was obtained from the naturally injured forelimb of a wild Pusa caspica specimen and preserved in 96% ethanol. The sample was collected during a veterinary procedure in which a torn muscle fragment was surgically removed as part of the animal’s treatment. This specimen was sampled in the waters of the Republic of Kazakhstan in November 2020 during a scientific campaign conducted within the framework of the Russia-Kazakhstan “Program for the Study of the Caspian Seal of the Northern Caspian (2019-2023)”. Total DNA was then extracted using the Qiagen MagAttract HMW DNA extraction kit (Dobra). Whole genome sequencing was performed using a Truseq Nano DNA Illumina library preparation kit, followed by paired-end sequencing (2 × 150 bp) on an Illumina NovaSeq6000 machine, at Macrogen Inc. (Korea). Raw sequencing reads were trimmed of adapters and quality filtered using Trimmomatic v.0.38 (Bolger, Lohse, and Usadel 2014), and read quality was validated using fastqc before and after trimming. The genome assembly was generated with Abyss2 (Jackman et al. 2017) using the paired and unpaired trimmed reads and specifying a 96 k-mer length.
Results
The genome assembly yielded a total sequence length of 2.36 Gbp over 442 199 scaffolds and a scaffold N50 length of 48 263bp.
Data availability
Raw and assembled data is publicly available via GenBank:
Raw genome data
https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR32005400
Assembled genome:
https://www.ncbi.nlm.nih.gov/nuccore/JBLQUN010000000
Funding
The collection of material for the study was carried out within the framework of the Russia-Kazakhstan “Program for the Study of the Caspian Seal of the Northern Caspian (2019-2023)” with financial support “Kazakhstan Agency of Applied Ecology” LLP and North Caspian Operating Company N.V. This research was funded by national funds through FCT – Fundação para a Ciência e a Tecnologia within the scope of the Strategic Funding UIDB/04423/2020 (https://doi.org/10.54499/UIDB/04423/2020), UIDP/04423/2020 (https://doi.org/10.54499/UIDP/04423/2020), and LA/P/0101/2020 (https://doi.org/10.54499/LA/P/0101/2020). Fundação para a Ciência e a Tecnologia also supported AGS (2023.07625.CEECIND), EF (CEECINST/00027/2021/CP2789/CT0003 and DOI identifier https://doi.org/10.54499/CEECINST/00027/2021/CP2789/CT0003) and MLL (CEECINSTLA/00020/2022).