A microsatellite is a tract of repetitive DNA in which certain DNA motifs (from one to six or more base pairs in length ) are repeated, typically 5–50 times. [1] [2] Microsatellites are found at thousands of locations within an organism’s genome . They have a higher mutation rate than other regions of DNA [3] leading to high genetic diversity . Microsatellites are often referred to by forensic geneticists and in genetic lineages as short tandem repeats ( STRs ).) is referred to as, or asSimple sequence duplications ( SSRs ) by plant geneticists . [4] A microsatellite database can be accessed at. Microsatellites and their longer cousins, minisatellites , are classified together as VNTR (consisting of variable numbers of repeats ) DNA. The name “satellite” DNA refers to the initial observation that centrifugation of genomic DNA in a test tube separates a major layer of bulk DNA with “satellite” layers of repetitive DNA.
They are widely used for DNA profiling in cancer diagnosis , kinship analysis (particularly paternity testing ) and forensic identification. They are also used in genetic linkage analysis to locate the gene or mutation responsible for a given trait or disease . Microsatellites are also used in population genetics to measure the level of affinity between subspecies, groups, and individuals.
History
Although the first microsatellite was characterized as a polymorphic GGAT duplication in the human myoglobin gene by Weller, Jefferies and their colleagues at the University of Leicester in 1984, the term “microsatellite” was later introduced by Litt and Luty in 1989. [1] The name “satellite” DNA refers to the initial observation that centrifugation of genomic DNA in a test tube separates a major layer of bulk DNA with “satellite” layers of repetitive DNA. [6]The increasing availability of DNA amplification by PCR in the early 1990s has led to the use of amplification of microsatellites as genetic markers for forensic medicine, for paternity testing, and for positional cloning to find genes underlying a trait or disease. started a large number of studies. Major early applications include the identification by microsatellite genotyping of the eight-year-old skeletal remains of a British murder victim (Hagelberg et al. 1991), and the Auschwitz concentration camp physician Josef Mengele who fled to South America after World War II (Jeffreys et al. et al. 1992). [1]
Structures, Location and Functions
A microsatellite is a tract of simultaneously repeated (i.e. adjacent) DNA motifs that range from one to six or ten nucleotides in length (the exact definition and depiction of long short satellites varies from author to author), [1] ] [2] and is usually repeated 5–50 times. For example, the sequence TATATATATA is a dinucleotide microsatellite, and GTCGTCGTCGTCGTC is a trinucleotide microsatellite (with one being Adenine, G Guanine, C Cytosine, and T Thymine) repeating units of four and five nucleotides with tetra- and pentanucleotide motifs, respectively. is referred to as. With the notable exception of some yeast species, most eukaryotes have microscopic satellites. Microsatellites are distributed throughout the genome. [7] [1] [8]For example, the human genome contains 50,000–100,000 dinucleotide microsatellites, and tri-, tetra- and pentanucleotide microsatellites are few in number. [9] Many are located in non-coding parts of the human genome and therefore do not produce proteins, but they can also be located in regulatory regions and coding regions.
Microsatellites may not have a specific function in non-coding regions, and therefore cannot be selected; This allows them to accumulate mutations unhindered over generations and gives rise to variability that can be used for DNA fingerprinting and identification purposes. Regulators of other microsatellite genes are located in flanking or intronic regions or directly in the codons of genes – in such cases microsatellite mutations can lead to phenotypic changes and diseases, particularly in triple expansion diseases such as triple X syndrome and Huntington’s disease.
Telomeres consist of repetitive DNA, with the hexanucleotide repeat motif TTAGGG, thought to be involved in the aging/senescence backbone, at the ends of chromosomes. Thus they are classified as small satellites. Similarly, the telomeres of insects have short repetitive motifs that can arguably be considered microsatellites.
Mutation mechanism and mutation rate
Unlike point mutations, which affect only one nucleotide, microsatellite mutations result in gain or loss of an entire repeat unit, and sometimes two or more simultaneous repeats. Thus, mutation rates at microsatellite loci are expected to be different from other mutation rates, such as base substitution rates. The actual cause of mutations in microsatellites is debated.
One proposed cause of such length changes is replication slippage, which is caused by mismatches between DNA strands during replication during meiosis. [11] DNA polymerase, the enzyme responsible for reading DNA during replication, can slip while moving along the template strand and continue on the wrong nucleotide. DNA polymerase slippage is more likely to occur when a repetitive sequence (such as CGCGCG) is repeated. Since microsatellites contain such repetitive sequences, DNA polymerase can make errors in these sequence regions at a high rate. Several studies have found evidence that slippage is the cause of microsatellite mutations. [12] [13] Typically, slippage in each microsatellite occurs approximately once per 1,000 generations.[14] Thus, slippery changes in repetitive DNA are three orders of magnitude more common than point mutations in other parts of the genome. [15] Most slippage results in alteration of only one repeat unit, and slippage rates vary for different allele lengths and repeat unit sizes, [3] and within different species. [16] If there is a large size difference between individual alleles, instability may increase during recombination at meiosis.
Note: DNA strand slippage during replication of the STR locus. Boxes symbolize repetitive DNA units. Arrows indicate the direction in which a new DNA strand (white box) is being replicated from the template strand (black box). Three situations during DNA replication are depicted. (a) Replication of the STR locus has proceeded without a mutation. (b) there is a gain of one unit due to the loop in the new strand due to replication of the STR locus; The aberrant loop is stabilized by complementary flanking units of the opposite strand. (C) Replication of the STR locus has resulted in the loss of a unit due to a loop in the template strand. (Forster et al. 2015)
Another possible cause of microsatellite mutations are point mutations, where only one nucleotide is copied incorrectly during replication. A study comparing human and primate genomes found that most changes in repeat numbers in short microsatellites appear to be due to point mutations rather than slippage. [17]
Microsatellite Mutation Rate
Microsatellite mutation rates vary with base position relative to microsatellite, repeat type, and base identity. [17] The mutation rate typically increases with duplication number, reaching approximately six to eight repeats and then decreasing. [17] Increasing heterozygosity in a population will also increase the microsatellite mutation rate, [18] especially when there is a large length difference between alleles. This is likely to cause instability during meiosis due to arms of unequal length due to homologous chromosomes. [19]
Direct estimates of microsatellite mutation rates have been made in many organisms, from insects to humans. In the locust desert Schistocerca gregaria , the rate of microsatellite mutation was estimated to be 2.1 x 10–4 per locus per generation. [20] The microsatellite mutation rate in human male germ lines is five to six times higher than in female germ lines and ranges from 0 to 7 x 10 −3 per locus per gamete per generation. [3] In the nematode Pristianchus pacificus , estimated microsatellite mutation rates range from 8.9 × 10 −5 to 7.5 × 10 −4 per locus per generation. [21]
Biological Effects of Microsatellite Mutations
Many microsatellites are located in non-coding DNA and are biologically silent. Others are located in regulatory or coding DNA – microsatellite mutations in such cases can lead to phenotypic changes and diseases. A genome-wide study estimated that microsatellite variation contributes to 10–15% of genetic gene expression variation in humans. [22]
Effect on protein
In mammals, 20% to 40% of proteins contain repetitive sequences of amino acids encoded by short sequence repeats. [23] Most short sequence repeats within protein-coding parts of the genome consist of a repeating unit of three nucleotides, as this length will not cause frame-shifts during mutation. [24] Each trinucleotide repeat sequence is transcribed into a repeat chain of the same amino acid . In yeast, the most common repetitive amino acids are glutamine, glutamic acid, asparagine, aspartic acid, and serine.
Mutations in these repetitive segments can affect the physical and chemical properties of proteins, with the potential to produce gradual and predictable changes in protein function. [25] For example, length changes in tandem repeat regions in the runx2 gene lead to differences in face length in domesticated dogs ( Canis familiaris ), with an association between longer sequence lengths and longer faces. [26] This association also applies to a wide range of Carnivora species. [27] Changes in length in polyalanine tracts within the HoxA13 gene have been associated with hand-foot-genital syndrome, a developmental disorder in humans. [28]Changes in length in other triplet repeats have been associated with more than 40 neurological diseases in humans, especially triple expansion diseases such as triple X syndrome and Huntington’s disease. [10] Evolutionary changes from replication slippage also occur in simple organisms. For example, microsatellite length changes are common within surface membrane proteins in yeast, which confer a rapid evolution in cell properties. [29] Notably, length changes in the FLO1 gene regulate the level of adhesion to substrates. [30] Short sequence repeats also confer rapid evolutionary changes in surface proteins in pathogenic bacteria; This may allow them to maintain immunological changes in their hosts. [31] A fungus ( Neurospora crassa )) the length changes in a repeating short sequence control the duration of its circadian clock cycles. [32]
Effect on gene regulation
Changes in the length of microsatellites within promoters and other cis-regulatory regions can, between generations, quickly alter gene expression. Regulatory regions in the human genome contain many (>16,000) short sequence repeats, providing ‘tuning knobs’ on the expression of many genes. [22] [33]
Changes in length in bacterial SSRs can affect fimbriae formation in Haemophilus influenzae , by altering promoter spacing. [31] Dinucleotide microsatellites have been associated with aberrant variation in cis-regulatory control regions in the human genome. [33] Microsatellites in control regions of the vasopressin 1A receptor gene affect their social behavior, and the level of monogamy.
In Ewing’s sarcoma (a type of traumatic bone cancer in young humans), a point mutation has created an expanded GGAA microsatellite that binds to a transcription factor, which in turn activates the EGR2 gene which drives cancer. [35] In addition, other GGAA microsatellites may affect the expression of genes that contribute to the clinical outcome of Ewing sarcoma patients. [36]
Effects within introns
Microsatellites within introns also influence phenotype, in means that are not currently understood. For example, a GAA triplet expansion in the first intron of the X25 gene appears to interfere with transcription, and causes Friedreich ataxia. [37] Tandem duplications in the first intron of the asparagine synthetase gene are associated with acute lymphoblastic leukemia. [38] A duplication polymorphism in the fourth intron of the NOS3 gene is associated with hypertension in a Tunisian population. [39] Short repeat lengths in the EGFR gene are associated with osteosarcoma. [40]
An archaeal form of conserved splicing in zebrafish is known to use microsatellite sequences within intronic mRNA to remove introns in the absence of U2AF2 and other splicing machinery. It has been theorized that these sequences form highly stable cloverleaf configurations that bring the 3′ and 5′ intron splice sites into close proximity, effectively replacing the spliceosome. This method of RNA splicing is believed to have diverged from human evolution in the formation of tetrapods and represents an artifact of the RNA world. [41]
Effects within transposon
About 50% of the human genome is contained in a variety of transposable elements (also called transposons, or ‘jumping genes’), and many of them contain repetitive DNA. [42] It is possible that small sequence repeats in those locations are also involved in the regulation of gene expression. [43]
Application
Microsatellites are used to assess chromosomal DNA deletions in cancer diagnosis. Microsatellites are widely used for DNA profiling, also known as crime stains (in forensics) and “genetic fingerprinting” of tissues (in transplant patients). They are also widely used in kinship analysis (usually in paternity testing). In addition, microsatellites are used for mapping locations within the genome, particularly in genetic linkage analysis to locate a gene or a mutation responsible for a given trait or disease. As a special case of mapping, They can be used to study gene duplications or deletions. Researchers use microsatellites in population genetics and in species conservation projects. Plant geneticists have proposed the use of microsatellites for marker assisted selection of desirable traits in plant breeding.
Cancer diagnosis
In tumor cells, whose control over replication are damaged, microsatellites can be gained or lost during each round of mitosis at a particularly high frequency. A tumor cell line may therefore show a distinct genetic fingerprint from that of the host tissue, and may present with loss of heterozygosity, particularly in colorectal cancer. Therefore microsatellites have been routinely used in cancer diagnosis to assess tumor progression.
Forensic and Medical Fingerprinting
Microsatellite analysis became popular in the field of forensics in the 1990s. [47] It is used for genetic fingerprinting of individuals where it allows forensic identification (usually matching the victim or perpetrator to the stain of the crime). It is also used to follow up bone marrow transplant patients. [48] The microsatellites in use today for forensic analysis are all tetra- or penta-nucleotide repeats, as these give a high degree of error-free data while being short enough to avoid degradation under non-ideal conditions. Even short repetitive sequences would suffer from artifacts such as PCR stutter and preferential amplification, whereas longer repetitive sequences would suffer more from environmental degradation and would be less well amplified by PCR. [49]Another forensic consideration is that the individual’s medical confidentiality should be respected, so that forensic STRs can be chosen that are non-coding, do not affect gene regulation, and do not usually contain trinucleotide STRs that are known to cause triple expansion diseases. May be involved in conditions such as Huntington’s disease. Forensic STR profiles are stored in DNA databanks such as the UK National DNA Database (NDNAD), US CODIS or Australian NCIDD.
Kinship Analysis (Paternity Test)
Autosomal microsatellites are widely used for DNA profiling in kinship analysis (usually in paternity testing). [50] Paternally inherited Y-STRs (microsatellites on the Y chromosome) are often used in genealogical DNA testing.
Genetic linkage analysis
During the 1990s and the first several years of this millennium, microsatellites were the workhorse genetic marker for genome-wide scans to locate any gene responsible for a given phenotype or disease, across generations of a sample pedigree. Using isolation comments. Although the rise of high throughput and cost-effective single-nucleotide polymorphism (SNP) platforms led to the era of SNPs for genome scans, microsatellite linkage and association studies remain highly informative measures of genomic variation. Their continued advantage lies in their greater diversity than bivalent SNPs, thus microsatellites can separate alleles within SNP-defined linkage disequilibrium blocks of interest. Thus far, microsatellites have successfully searched for type 2 diabetes (TCF7L2) and prostate cancer genes (region 8q21).[२] [५१]
Population genetics
Microsatellites were popularized in population genetics during the 1990s because as PCR became ubiquitous in laboratories, researchers were able to design primers and amplify the set of microsatellites at low cost. Their uses are wide. [53] A microsatellite with a neutral evolutionary history applies it to measure or estimate constraints, [54] local adaptation, [55] allele fixation index (FST ) , [56] population size, [57] and gene flow. [58] The use of next-generation sequencing satellites has decreased, although they remain an important tool in this area. [59]
Plant breeding
Marker assisted selection or marker aided selection (MAS) is an indirect selection process where a trait of interest is selected based on a marker (morphological, biochemical or DNA/RNA variation) associated with the trait of interest (eg, productivity, disease resistance, stress). is selected. tolerance, and quality), not on the quality itself. Microsatellites have been proposed to be used as such markers to aid in plant reproduction. [60]
Analysis
Repetitive DNA is not easily analyzed by next-generation DNA sequencing methods, which conflict with homopolymeric tracts. Therefore, microsatellites are normally analyzed by conventional PCR amplification and amplicon size determination, sometimes followed by Sanger DNA sequencing.
In forensics, analysis is performed by extracting nuclear DNA from cells of the sample of interest, then amplifying specific polymorphic regions of the extracted DNA via polymerase chain reaction. Once these sequences have been amplified, they are resolved via gel electrophoresis or capillary electrophoresis, which will allow the analyzer to determine how many repeats of the microsatellite sequence in question. If the DNA was resolved by gel electrophoresis, the DNA could be visualized either by silver staining (low sensitivity, safe, inexpensive), or by an intercalating dye such as ethidium bromide. (fairly sensitive, moderate health risk, inexpensive) ), or as most modern forensic laboratories use, fluorescent dye (highly sensitive, safe, expensive). [61]Instruments made to resolve microsatellite fragments by capillary electrophoresis also use fluorescent dyes. [61] Forensic profiles are stored in major databanks. The data base for British microsatellite loci identification was originally based on the British SGM+ system [62] [63] using 10 loci and one sex marker. Americans [64] increased this number to 13 loci. [65] The Australian database is called NCIDD, and since 2013 it has been using 18 core markers for DNA profiling. [47]
Extension
By using the unique sequences of the flanking regions as primers, microsatellites can be amplified for identification by the polymerase chain reaction (PCR) process. DNA is repeatedly denatured at high temperatures to separate the double strands, then cooled to allow annealing of primers and extension of nucleotide sequences via microsatellite. This process results in the production of sufficient DNA to be visible on agarose or polyacrylamide gels; Only a small amount of DNA is required for amplification because thermocycling in this way results in an exponential increase in the replication segment. [66]With the abundance of PCR techniques, primers that flank microsatellite loci are simple and fast to use, but the development of correctly functioning primers is often a tedious and costly process.
Design of microsatellite primers
If microsatellite markers are discovered in specific regions of a genome, for example within a particular intron, primers can be designed manually. This involves searching the genomic DNA sequence for microsatellite repeats, which can be done by eye or using automated tools such as repeat maskers. Once potentially useful microsatellites have been determined, the flanking sequences can be used to design oligonucleotide primers that will amplify specific microsatellite repeats in the PCR reaction.
Random microsatellite primers can be developed by cloning random segments of DNA from confocal species. These random segments are inserted into a plasmid or bacteriophage vector, which is in turn converted to Escherichia coli .is implanted in bacteria. Colonies are then grown, and probed with fluorescently-labeled oligonucleotide sequences that will hybridize to the microsatellite repeat if present on the DNA segment. If positive clones can be obtained by this procedure, the DNA is sequenced and PCR primers are selected from sequences in such regions to determine a specific locus. This process involves significant trial and error on the part of researchers, as the microsatellite repeat sequence must be predicted and primers that are randomly isolated may not exhibit significant polymorphisms. [15] [67]Microsatellite loci are widely distributed throughout the genome and can be isolated from semi-degraded DNA of intronic samples, as amplification via PCR requires a suitable substrate.
More recent techniques include using oligonucleotide sequences (microsatellite enrichment) to “enrich” the DNA of repeats that are complementary to the repeats in the microsatellite. The oligonucleotide probe hybridizes with the repeat in the microsatellite, and the probe/microsatellite complex is then taken out of solution. Enriched DNA is cloned normally, but the success ratio will now be much higher, significantly reducing the time required to grow regions for use. However, which probe to use can be a trial and error process in itself. [68]
ISSR PCR
ISSR ( for Inter-Simple Sequence Repeat ) is a general term for the genome region between microsatellite loci. Complementary sequences from two neighboring microsatellites are used as PCR primers; The variable field between them gets amplified. The limited length of amplification cycles during PCR prevents excessive replication of extremely long contiguous DNA sequences, so the result will be a mixture of different types of amplified DNA strands that are generally short but vary greatly in length.
The sequences amplified by ISSR-PCR can be used for DNA fingerprinting. Since the ISSR can be a protected or non-protected area, this technique is not useful for specific individuals, but for phylogenetic analysis or perhaps for species limiting; Sequence diversity is lower than that of SSR-PCR, but still higher than that of actual gene sequences. Furthermore, microsatellite sequencing and ISSR sequencing are mutually helpful, as one produces primers for the other.
Borders
Repetitive DNA is not easily analyzed by next-generation DNA sequencing methods, which conflict with homopolymeric tracts. [69] Therefore, microsatellites are normally analyzed by conventional PCR amplification and amplicon size determination. The use of PCR means that microsatellite length analysis is prone to PCR limitations like any other PCR-amplified DNA locus. Of particular concern is the occurrence of ‘null alleles’:
- Occasionally, within a sample of individuals, such as in paternity test casework, a mutation in the DNA flanking the microsatellite can prevent the PCR primer from binding to an amplicon (creating a “null allele” in a gel assay) and producing, Thus only one allele (from the non-mutated sister chromosome) is amplified, and the individual may then appear to be falsely homozygous. This can lead to confusion in the paternity casework. Then it may be necessary to amplify the microsatellite using a different set of primers. [15] [70] Null alleles are typically caused by mutations in the 3′ segment, where expansion begins.
- In species or population analyses, for example in conservation work, PCR primers that amplify microsatellites in one individual or species may work in other species. However, the risk of applying PCR primers to different species is that null alleles are likely created whenever the sequence divergence is too high for the primers to bind. Then artificially low diversity may appear in the species. In this case the null allele can sometimes be indicated by an excessive frequency of homozygotes that cause deviations from Hardy–Weinberg equilibrium expectations.