GC content

In molecular biology and genetics , GC content (or guanine-cytosine content ) is the percentage of nitrogenous bases in a DNA or RNA molecule that are either guanine (G) or cytosine (C). This measurement indicates the ratio of the G and C bases out of a contained four total bases, which also include adenine and thymine in DNA and adenine and uracil in RNA .

G C-content can be assigned to a certain piece of DNA or RNA, or to the whole genome . When it refers to a fragment, it can denote the G C-content of an individual gene or segment of a gene (domain), a group of genes or gene groups, a non-coding region , or a synthetic oligonucleotide such as a primer . can.

structure

Qualitatively, guanine (G) and cytosine (C) undergo a specific hydrogen bond with each other , whereas adenine (A) specifically binds with thymine (T) in DNA and uracil (U) in RNA. . Quantitatively, each GC base pair is held together by three hydrogen bonds, whereas the AT and Au base pairs are held together by two hydrogen bonds. To emphasize this difference, base pairing is often denoted as “G≡C” versus “A=T” or “A=U”.

DNA with a low GC-content is less stable than DNA with a high G C content; However, the hydrogen bonds themselves do not have a particularly significant effect on molecular stability, which is instead mainly due to the molecular interactions of base stacking. [2] Despite high thermostability to nucleic acids with high G C-content , it has been observed that at least some species of bacteria with high G C-content DNA undergo autolysis more readily , thereby reducing cell longevity. She goes. Per se . [3] Due to the thermostability of G C pairs, it was once believed that high G C-content was a necessary adaptation to high temperatures., but this hypothesis was refuted in 2001. [4] Nevertheless, it has been shown that there is a strong correlation between optimal growth of prokaryotes at high temperatures and the GC-content of structural RNAs such as ribosomal RNA , transfer RNA , and many other non-coding RNAs . [4] [5] Au base pairs are less stable than GC base pairs, making high-GC-content RNA structures more resistant to the effects of high temperatures.

Recently, it has been demonstrated that the most important factor contributing to the thermal stability of double-stranded nucleic acids is actually due to base stacking of adjacent bases rather than the number of hydrogen bonds between bases. There is a more favorable stacking energy for GC pairs than for AT or Au pairs due to the relative position of the exocyclic groups. Additionally, there is a relationship between the order in which the bases stack up and the thermal stability of the molecule as a whole.

determination

GC-content is usually expressed as a percentage value, but sometimes as a ratio (called the G + C ratio or GC – ratio ). The GC-content percentage is calculated as

{\displaystyle {\cfrac {G+C}{A+T+G+C}}\times 100\%}


Whereas the AT/GC ratio is calculated as

{\cfrac {A+T}{G+C}} .

The GC-content percentage as well as the GC-ratio can be measured in several ways, but one of the simplest methods is to measure the melting temperature of the DNA double helix using spectrophotometry . The absorption of DNA at a wavelength of 260 nm increases rapidly when the double-stranded DNA molecule is sufficiently heated to separate into two single strands. [9] The most commonly used protocol to determine the GC-ratio uses flow cytometry for a large number of samples . [10]

In an alternative way, if the DNA or RNA molecule under investigation has been reliably sequenced, the G C-content can be calculated by simple arithmetic or by using various publicly available software tools, such as the free online G C calculator. Is .

genomic material

within-genome variation

The GC-ratio within a genome is found to be markedly variable. These variations in GC-ratio within the genomes of more complex organisms result in a mosaic-like formation with islet regions called isochores. [11] This results in variations in staining intensity across chromosomes. [12] GC-rich isoforms usually contain many protein-coding genes within them, and thus the determination of the GC-ratios of these specific regions contributes to the mapping of gene-rich regions of the genome.

coding sequence

Within a long region of the genomic sequence, genes are often characterized by having a high GC-content in contrast to the background GC-content for the whole genome. Evidence for the GC ratio with the length of the coding region of a gene has shown that the length of the coding sequence is directly proportional to the high G+C content. [15] This is alluded to the fact that there is a bias towards the A and T nucleotides in the stop codon, and thus, the shorter the sequence the greater the AT bias. [16]

Comparison of more than 1,000 orthologous genes in mammals revealed marked within-genome variations at the third-codon position GC content, with a range from less than 30% to more than 80%.

between-genome variation

GC content is found to vary with different organisms, a process hypothesized to contribute to DNA repair associated with variation in selection, mutual bias and biased recombination. [18]

The average GC-content in the human genome ranges from 35% to 60% in 100-kb fragments, with a mean of 41%. [19] The GC-content of yeast ( Saccharomyces cerevisiae ) is 38%, [20] and that of another common model organism, the thale cress ( Arabidopsis thaliana ), is 36%. [21] Due to the nature of the genetic code, it is almost impossible for an organism to have a genome with 0% or even 100% GC-content. However, the species with extremely low GC content is Plasmodium falciparum (GC% = ~20%), [22] and it is common to refer to such instances as AT-rich rather than GC-poor.

Several mammalian species (eg, shu, microbat, tenrec, rabbit) have independently significantly increased the GC-content of their genes. These GC-content changes are correlated with species’ life-history traits (eg, body mass or longevity) and genome size, [17] and may be linked to a molecular phenomenon called GC-biased gene conversion.

Application

molecular Biology

In polymerase chain reaction (PCR) experiments, the GC content of oligonucleotides known as short primers is often used to predict their annealing temperature for template DNA. A high GC-content level indicates a relatively high melting temperature.

classification

The species problem has led to various suggestions in classifying bacteria into non-eukaryotic taxonomy, and the Ad-hoc Committee on Reconciliation of Approaches to Bacterial Systems has recommended the use of high-level hierarchical taxonomy to GC ratios. [25] For example, actinobacteria are known as “high GC-content bacteria”. [26] In Streptomyces coelicolor A3(2), the GC content is 72%.

software tools

GCSpeciesSorter and TopSort are software tools for classifying species based on their GC-content.