Genetic diversity and genomic epidemiology of SARS-CoV-2 during the first 3 years of the pandemic in Morocco: comprehensive sequence analysis, including the unique lineage B.1.528 in Morocco

S. Djorwé , A. Malki , N. Nzoyikorera , J. Nyandwi , S. P Zebsoubo , K. Bellamine and A. Bousfiha

ABSTRACT
During the 3 years following the emergence of the COVID-19 pandemic, the African continent, like other regions of the world, was substantially impacted by COVID-19. In Morocco, the COVID-19 pandemic has been marked by the emergence and spread of several SARS-CoV-2 variants, leading to a substantial increase in the incidence of infections and deaths. Nevertheless, the comprehensive understanding of the genetic diversity, evolution, and epidemiology of several viral lineages remained limited in Morocco. This study sought to deepen the understanding of the genomic epidemiology of SARS-CoV-2 through a retrospective analysis. The main objective of this study was to analyse the genetic diversity of SARS-CoV-2 and identify distinct lineages, as well as assess their evolution during the pandemic in Morocco, using genomic epidemiology approaches. Furthermore, several key mutations in the functional proteins across different viral lineages were highlighted along with an analysis of the genetic relationships amongst these strains to better understand their evolutionary pathways. A total of 2274 genomic sequences of SARS-CoV-2 isolated in Morocco during the period of 2020 to 2023, were extracted from the GISAID EpiCoV database and subjected to analysis. Lineages and clades were classified according to the nomenclature of GISAID, Nextstrain, and Pangolin. The study was conducted and reported in accordance with STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) guidelines. An exhaustive analysis of 2274 genomic sequences led to the identification of 157 PANGO lineages, including notable lineages such as B.1, B.1.1, B.1.528, and B.1.177, as well as variants such as B.1.1.7, B.1.621, B.1.525, B.1.351, B.1.617.1, B.1.617.2, and its notable sublineages AY.33, AY.72, AY.112, AY.121 that evolved over time before being supplanted by Omicron in December 2021. Among the 2274 sequences analysed, Omicron and its subvariants had a prevalence of 59.5%. The most predominant clades were 21K, 21L, and 22B, which are respectively related phylogenetically to BA.1, BA.2, and BA.5. In June 2022, Morocco rapidly observed a recrudescence of cases of infection, with the emergence and concurrent coexistence of subvariants from clade 22B such as BA.5.2.20, BA.5, BA.5.1, BA.5.2.1, and BF.5, supplanting the subvariants BA.1 (clade display 21K) and BA.2 (clade display 21L), which became marginal. However, XBB (clade 22F) and its progeny such XBB.1.5(23A), XBB.1.16(23B), CH.1.1(23C), XBB.1.9(23D), XBB.2.3(23E), EG.5.1(23F), and XBB.1.5.70(23G) have evolved sporadically. Furthermore, several notable mutations, such as H69del/V70del, G142D, K417N, T478K, E484K, E484A, L452R, F486P, N501Y, Q613H, D614G, and P681H/R, have been identified. Some of these SARS-CoV-2 mutations are known to be involved in increasing transmissibility, virulence, and antibody escape. This study has identified several distinct lineages and mutations involved in the genetic diversity of Moroccan isolates, as well as the analysis of their evolutionary trends. These findings provide a robust basis for better understanding the distinct mutations and their roles in the variation of transmissibility, pathogenicity, and antigenicity (immune evasion/reinfection). Furthermore, the noteworthy number of distinct lineages identified in Morocco highlights the importance of maintaining continuous surveillance of COVID-19. Moreover, expanding vaccination coverage would also help protect patients against more severe clinical disease.

Methods

Study design : This study was conducted and reported in accordance with STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) guidelines [10].

Sequence data acquisition : The complete genomic sequences of SARS-CoV-2 isolates, collected in Morocco from 2020 to 2023, were extracted in FASTA format from the GISAID EpiCoV database (https://gisaid.org/, accessed on 23 January 2024) [11]. The genomic sequences of the Moroccan isolates were compared with the Wuhan-Hu-1 reference genome, identified by accession number NC_045512.2 in the GenBank database.

File S1 contains the digital object identifier (DOI) and EPI_SET identifier of the 2274 SARS-CoV-2 genomic sequences used in this study. The collection dates range from 2 February 2020 to 3 November 2023.

Sequence alignment and phylogenetic analysis of Moroccan genomesIn this study, we used standard dynamic classification systems to assign genetic lineages and viral clades. The classification of genomic sequences into lineages was achieved by Pangolin COVID-19 lineage assigner version 4.3, which is a Phylogenetic Assignment tool for Named Global Outbreak LINeages developed by the Centre for Genomic Pathogen Surveillance (https://cov-lineages.org/resources/pangolin.html, accessed on 23 January 2024) and/or Nextstrain web tool version 3.3.1 (https://clades.nextstrain.org/, accessed on 23 January 2024) [12–14]. Rigorous quality checks and the assignment of viral clades were performed using the Nextclade Web and GISAID. The phylogenetic tree was generated using the UCSC UShER Web interface (https://genome.ucsc.edu/util.html, accessed on 25 January 2024), Microreact (https://microreact.org/upload, accessed on 3 February 2024), and Nextclade. Viral clades were defined based on shared mutation profiles among the analysed genomic sequences [14, 15].

Analysis of mutation profiles and assignment of lineages and cladesThe GISAID database, the Nextclade and Coronapp web tool (http://giorgilab.unibo.it/coronannotator/, accessed on 20 February 2024) [16] were used to detect and annotate all mutations, thus establishing the single nucleotide polymorphism (SNP) profile of the 2274 genomic sequences. This was achieved by identifying substitutions (amino acid), deletions, or insertions (Indels) in structural protein regions, as well as in some regions of non-structural protein (ORF1ab) [NSP1 to NSP16]. Furthermore, several international reference tools and platforms were used to assign SARS-CoV-2 genomic lineages and clades. The GISAID platform was used to assign lineages and clades. Nextclade Web Tool was used to align sequences and identify specific mutations in comparison with the Wuhan-Hu-1 reference sequence, as well as for the phylogenetic placement of lineages. In addition, Nextclade was also used to assign sequences to lineages and clades according to their specific mutational characteristics. Pangolin was used to assign SARS-CoV-2 lineages according to the PANGO nomenclature and is available both as a web application and as a command-line tool on « Cov-Lineages ». Consequently, these integrated approaches provided a detailed analysis of the 2274 genomic sequences, including their phylogenetic placement and assignment to clades [11, 13, 14].

Results

Genomic diversity and demographic distribution of SARS-CoV-2 sequencesA set of 2274 genomic sequences of SARS-CoV-2 collected in Morocco over the 3 years following the pandemic has been analysed, revealing several variants and lineages. Table 1 shows the temporal distribution of variants and lineages among the 2274 sequences analysed. Of the 2274 sequences, 3.9%(89/2274) of isolates were sequenced in 2020, 22.4%(511/2274) in 2021, 53.5%(1217/2274) in 2022, and 20.1%(457/2274) in 2023. Among the 2274 sequences analysed, 20.2%(460/2274) were assigned to lineages other than the Alpha, Beta, Delta, Eta, Kappa, Mu, and Omicron variants. Of the 460 sequences analysed, 19.3%(89/460) were identified in 2020, 40%(184/460) in 2021, 4.1%(19/460) in 2022, and 36.5%(168/460) in 2023. The Alpha variant had a prevalence of 7.7%(176/2274) among all sequences analysed, of which 81.2%(143/176) of sequences were detected in 2021, 5.1%(9/176) in 2022, and 13.6%(24/176) in 2023. The Delta variant and its subvariants accounted for 11.3%(257/2274) of the analysed sequences, among which 62.6%(161/257) were identified in 2021, 28.4%(73/257) in 2022, and 8.9%(23/257) in 2023. The Omicron variant and its subvariants were identified in 59.5%(1353/2274) of all sequences analysed, of which 1.1%(15/1353) were identified in 2021, 82.3%(1114/1353) in 2022, and 16.5%(224/1353) in 2023 (Table 1). Furthermore, the other variants such as Beta, Eta, Kappa, and Mu were less predominant. These findings illustrate the dynamic evolution of SARS-CoV-2 variants and lineages during the pandemic, highlighting periods of predominance of variants and lineages identified over time.

 

Conclusion: This study provided a detailed analysis of the genomic epidemiology and genetic diversity of SARS-CoV-2 lineages identified in Morocco during the 3 years of the pandemic, enabling a better understanding of the evolution and phylogenetic relationships among different lineages. Several lineages identified in Morocco were closely related to those observed worldwide, except for lineage B.1.528, before their local spread, highlighting the impact of human mobility on the introduction and spread of these lineages during the pandemic. Viral dynamics in Morocco, characterized by a predominance of Alpha, Delta, Omicron variants, and their subvariants, reflected global trends in their evolution. However, the epidemiological trends of some Delta and Omicron subvariants showed variable patterns compared to those observed in other countries. Additionally, several key mutations identified within the lineages analysed were correlated with variations in transmissibility, pathogenicity and antigenicity, which could have contributed to affecting vaccine efficacy and pandemic management. However, the set-up of the SARS-CoV-2 genomic surveillance consortium in Morocco and vaccination campaigns have contributed to control and reduce infection rates and severe forms of COVID-19, thus mitigating the impact of infections at national level.

Read More