PI and Co-PIs: Ponce de León, Federico A.; Di Rocco, Florencia; Gutiérrez, Gustavo
F. Abel Ponce de León, Melina Anello, Gustavo Gutiérrez, Florencia Di Rocco.
A major goal of the Alpaca Research Foundation (ARF) is to support genetic research that would help alpaca owners and breeders by developing genetic tests that predict favorable traits, such as coat color and fiber quality, and prevent the birth of animals with disabilities, including choanal atresia and deafness. To this end, considerable work has been conducted to understand the structure of the alpaca genome, which genes control specific alpaca traits, and what variations in the structure of the genome are associated with variations in traits such as coat color. Dr. Kylie Munyard’s research group in Australia developed the first genetic test for alpacas. This test predicts the grey coat color and is based on one nucleotide variation in exon 3 of the KIT gene (Jones et al., 2019). Our study, described in some detail below, is examining structural variations in large segments of the genome, with the intention to further refine our understanding of the alpaca genome and to develop a genetic test that will predict the brown coat color and/or identify the genetic basis of the white coat color.
The advent of new and improved DNA sequencing technologies allows the sequencing of very large DNA fragments that can reach, on average, over 20 to 30 thousand nucleotides long per fragment or more. This allows geneticists to compare the sequence of such fragments among animals, within a species, to assess differences. One type of these differences is known as structural variations (SVs), because these differences do not result from single-nucleotide differences but from the length and orientation of genome fragments that range from 50 base pairs to millions of base pairs in some cases. When there is a loss (deletion), there is a missing fragment of junk DNA, regulatory sequences, partial gene sequences, or whole gene sequences. On the other hand, when there is a gain (insertion), a duplication of a fragment of DNA has occurred, or a new fragment from an unrelated part of the genome is added. In this same manner, a fragment sequence that is oriented from A to Z is inverted, in some animals, from Z to A. In addition, the occurrence of some, or all, of these variants can occur in combination in any genomic fragment of DNA. Due to the size of these variants, SVs contribute more than single-nucleotide variations, also known as single-nucleotide polymorphisms (SNPs), to animal diversity, evolution, and disease.
Our work focuses on identifying SVs in alpacas. For this, we used the PromethION sequence platform (Oxford Nanopore, UK) to sequence nine animal samples (three of each, white, brown, and black coat colors). Sequencing was conducted at BGI-Tech Labs (Hong Kong). Bioinformatic analysis was completed at the Minnesota Supercomputer Institute (MSI, University of Minnesota).
Our intent is to provide a simple account of our findings to make them accessible to a wide audience. Scientific details, including methodology, bioinformatics, and results, will be part of a scientific manuscript that will be peer reviewed and published when we finish the laboratory experiments necessary to validate our bioinformatic results.
Simple statistics demonstrated successful sequencing results. The average length of fragments sequenced was 25,470 base pairs, where the average length of the longest 50% (N50) fragments was 33,366 base pairs. The number of fragments sequenced per sample was between 4.17 and 6.24 million, establishing a sequencing depth of 51x on average. 'x' is the length of a single alpaca genome that is approximately 2.5 billion nucleotides long. For practical purposes, the latter allows us to compare the sequence of any specific DNA fragment to about 51 similar DNA fragments from the same animal, on average, to assess the accuracy of the sequencing and significantly reduce and/or eliminate sequencing errors.
The VicPac3.2 alpaca reference genome was used for comparison with the nine sampled genomes of this study. In broad terms, 49,734 SVs exist across the nine alpaca samples. Insertions and deletions account for 50.83% and 48.59%, respectively, while inversions and duplications are infrequent, accounting for 0.37% and 0.21%, respectively. Of these, 39,026 SVs are located between genes (intergenic regions) while 10,708 SVs are located within at least one gene region. However, due to the lengths of the fragments, 548 gene-located SVs interrupted more than one gene; as a result, 12,663 genes are affected by SVs. These include protein-coding genes, long non-coding RNAs, pseudogenes, transfer RNAs, small nuclear RNAs, and miscellaneous RNAs.
Among the thousands of alpaca SVs that we have uncovered, one is of particular interest to us as we are interested in genes that control coat color in South American camelids (SACs). This SV is an inversion of a fragment of DNA (486,000 nucleotides long) that contains the coding sequences of five genes (NCOA6, PIGU, ITCH, AHCY, and ASIP). The ASIP gene, together with the MC1R gene, is the main regulator of pigment switching in mammals. When the alpha-melanocyte-stimulating hormone (α-MSH) gene protein product binds to the MC1R cell membrane receptor, a black pigment (eumelanin) is synthesized; however, if the ASIP gene protein product binds to the MC1R receptor, a brown/red pigment (pheomelanine) is synthesized. Sequence changes (mutations) in these genes are responsible for coat color changes.
Bioinformatic sequence comparisons of the DNA fragment carrying the above-described genes among the animals in our sample and the sequence of the VicPac3 reference genome indicated that black coat color animals present an arrangement of genes similar to the reference genome. In contrast, white and brown coat animals present an inverse arrangement. (Figure 1).
Figure 1. Tandem arrangement of genes in the NCOA6-ASIP 486 kb DNA fragment. Top: Arrangement of genes in the reference genome and black coat animals. Bottom: inverted arrangement of genes in the brown and white coat animals. Each gene is represented as a rectangular box with a pointed end indicating the direction of transcription. The NCOA6 and ASIP genes include an area within their respective boxes with a different color to highlight the regulatory area of these genes.
The inversion of the 486 kb fragment relocates the ASIP coding exons under the control of the NCOA6 regulatory region (promoter), which generates a NCOA6-ASIP fusion transcript. This fusion transcript has been observed in llamas and alpacas among other alternative transcripts (Anello et al., 2022; Chandramohan et al., 2013). The study of the ASIP 5’ untranslated region led to the identification of a duplication of the ASIP gene in sheep that places the ASIP gene under the control of the ITCH gene promoter, resulting in white color (Norris and Whan, 2008). Based on this latter observation, it was hypothesized that the NCOA6-ASIP fusion transcript was also the result of a duplication (Anello et al., 2022). However, our current work does not support the duplication hypothesis. Further work is necessary to determine if the inversion we have observed is associated with white or brown coat color and/or if it is present in animals with other coat colors or pigmentation patterns.
We are now working on the laboratory validation of our bioinformatic observation. This validation involves the development of a polymerase chain reaction (PCR) test to identify animals carrying the inversion, which can become a practical tool to facilitate strategic mating and optimize desired coat color outcomes.
Here we have described an SV that involves five genes and is associated with coat color. There are still 547 SVs to analyze. We will develop a repository of our findings and information that will serve both alpaca breeders and the scientific community interested in SACs.
Since there is a need to improve the assembly of the current alpaca reference genome, we have shared the sequence data of the nine alpacas used in our work with Drs. Terje Raudsepp and Brian W. Davis's research group at Texas A&M University. They are working on a project entitled “Improvement of the Alpaca Reference Genome to Chromosome Level with Comprehensive Gene Annotation and Variant Database,” sponsored by the Alpaca Research Foundation.
References.
Anello, M., Daverio M.S., Rodríguez, S.S., Romero, S.R., Renieri, C., Vidal Rioja, L., and Di Rocco, F., 2022. The ASIP gene in the llama (Lama glama): Alternative transcripts, expression, and relation with color phenotypes Gene: 809, 146018. www.sciencedirect.com/science/article/abs/pii/S0378111921006132?via%3Dihub
Chandramohan, B., C. Renieri, V. La Manna, and A. La Terza. 2013. The alpaca agouti gene: Genomic locus, transcripts, and causative mutations of eumelanic and pheomelanic coat color. Gene 521:303303310. Doi: 10.1016/j. gene.2013.03.060 310
Jones, M., Sergeant, C., Richardson, M., Groth, D., Brooks, S., and Munyard K., 2019. A non-synonymous SNP in exon 3 of the KIT gene is responsible for the classic grey phenotype in alpacas (Vicugna pacos). Animal Genetics 50 (5): 493-500. doi: 10.1111/age.12814
Norris, B.J., and Whan, V.A., 2008. A gene duplication affecting the expression of the ovine ASIP gene is responsible for white and black sheep. Available from: Genome Res. 18 (8), 1282–1293. genome.cshlp.org/content/18/8/1282
Acknowledgments
We are thankful to the Alpaca Research Foundation for the financial support of this study and for a contribution from Jackie and Derek King, Suri and Company, St. Louis, MO. Similarly, we are thankful to Ms. Amanda VandenBosch (Flying Dutchman Alpacas) for providing samples.
In Memoriam
We dedicate this article to our colleague and friend, Dr. Florencia Di Rocco, a recognized scholar who passed away this past August. Florencia designed the original project and actively participated in the analysis of the data. She was a relentless researcher who dedicated her career to the study of the genetics of South American camelids. With over twenty years of experience, she was the head of the Molecular Genetics Laboratory at the Multidisciplinary Institute of Cellular Biology (IMBICE) in La Plata, Buenos Aires, Argentina, where she made numerous scientific contributions to the field. Among these, the sequencing of the mitochondrial genomes of the vicuña and the guanaco stands out, as well as many population studies assessing diversity or conservation statuses, and studies of the molecular basis of pigmentation. Besides being missed as an outstanding scientist, above all, we will miss her as the excellent person she was.
Abbreviations
NCOA6: Nuclear Receptor Coactivator 6
PIGU: Phosphatidylinositol Glycan Anchor Biosynthesis class U
ITCH: Itchy E3 Ubiquitin Protein Ligase
AHCY: Adenosylhomocysteinase
ASIP: Agouti Signaling Protein
MC1R: Melanocortin 1 receptor.