‘Adenine, thymine,guanine and cytosine — the A, T, G and C make up the DNA code’ is the mostbasic concept of molecular biology known to every person related to this field.But this is not the whole story which was unraveled with time due to the riseof epigenetics in the past decade. Modifications of the base cytosine in theDNA double helix into 5-methylcytosine (5mC), 5-hydroxymethylcytosine (5hmC),5-formylcytosine (5fC) and many other variants have been identified and studiedfor their role in regulation of gene expression. Biologistsrevealed these additional characters in the mammalian DNA code, paving way foran entirely new arena of research related to these variances and its connectionto epigenetics which is the study of heritable changes in gene function whichdo not involve changes in the DNA sequence. These modifications thenbecame an area of paramount interest, creating several speculations regardingtheir numerous roles in the genome and consecutive experimentation to supportthe beliefs. The most interesting comment was on how these methylated cytosines,believed to be transient products, might be stable DNA modification in mammals,giving the world some new nucleotides additional to the four accepted ones.This can change the basics of genetics that has been taught since an eternitynow.
Even though this issue is yet to be entirely proven with appropriateevidences, there are already some impressive works and results which mightchange the accepted facts.This review revolvesaround some of these supposedly new nucleotides produced by methylation ofcytosine in the DNA and their role in epigenetics along with some speculationsregarding their stability in the genome.Discovery of DNA modifications:A lot of reasons lead to the discovery ofvariants of cytosine in the DNA of organisms in the first place.
Severalinvestigators, observing the absence of regeneration and mitosis in adultneurons, suggested a difference in the genetic make-up of brain from othertissues which needed to be deciphered. 10.. Another contributing event wasreprogramming of somatic cells or nuclei through induced pluripotent stem cell(iPS) generation, cell fusion or somatic cell nuclear transfer (SCNT). Afterthorough inspection, the failure to carry out these techniques was attributedto absence of proper methylation patterns in the DNA of new cells. 5.. Thus,to obtain reasons for the difference in the genetic make-up within anindividual and to formulate potential molecular pathways implicated in DNAmethylation and demethylation, extensive research was initiated.
Upon thorough studies and experimentations, itwas confirmed that the DNA of all mammalian cells and tissues is methylated atspecific loci, mainly in the 5?-cytosine-phosphate-guanine-3? (CpG) site, to controlthe expression of genes. In the genomic DNA of mouse embryonic stem (mES) cellsand several adult mouse tissues, using S-adenosyl methionine (SAM) as a sourceof the methyl group, 5-methylcytosine (5mC) is produced from cytosine (C) byDNA methyltransferases. Further hydroxylation of 5-methylcytosine (5mC), catalyzedby ten-eleven translocation (TET) enzyme, produces 5-hydroxymethylcytosine(5hmC) with global levels ranging between 0.005% and 0.
7% of all Cs. Theiron(II)- and 2-oxoglutarate-dependent TET enzymes can also oxidize 5hmCfurther to transient5-formylcytosine (5fC) and 5-carboxycytosine (5caC), which were found at levelsbelow 0.002% of all Cs. 11.. The latter two are excised by thymine DNA glycosylase(TDG) 2..
, followed by the activation of base excision repair (BER) torestore an unmodified base, defining the demethylation aspect in mammals.Despite this, these are found in the genome, creating some doubts of someadditional functionality. 5.. Figure 1 gives a basic flow of how the cytosine methylation andhydroxylation.
Also, some recently discovered DNA base modifications in adenineand thymine are illustrated.The presence of these transient oxidativeproducts in the genome in-spite of the repair mechanisms intrigued thebiologists for further exploration. It made them believe that there was more tothe role for these oxidized cytosine bases than just serve as intermediates ofenzyme-mediated DNA demethylation initiated by oxidation of 5mC.
Thus,techniques of sequencing and mapping these methylations was were used foranalysis. Issues related to their low abundance, transient nature andsimilarity between chemical structures made the identification, mapping andstudy of characteristics of these variants a tedious task. But with time andinnovation, this was overcome by a number of methods discovered with increasingefficiency and specificity.The very first basic methodfor global assessment of DNA content was thin layer chromatography (TLC) whichhas undergone a lot of modifications since its first use like combination with radio-labelling.Other methods that followed are antibody-based techniques like immunofluorescencestaining but with uncertainty in the sensitivity and cross-reactivity, use ofT4-?-glucosyltransferase (T4-BGT) and liquid chromatography (LC) coupled withmass spectroscopy (MS). LC/MC was considered the gold standard till the under-estimatedflaw of ion suppression occurrence came into picture 5…
These techniqueshelped in the global DNA assessment, but to understand the mechanism of dynamicbalance between cytosine methylation and demethylation, novel single-cell,single base resolution techniques were needed to be developed. With advent oftechnology, this was made possible with methods like CLEVER sequencing(chemical-labelling-enabled C-to-T conversion sequencing) 2.., bisulfitesequencing (BS-seq)- based methods 8..
and nano high- performance liquidchromatography–tandem high-resolution mass spectrometry (Nano HPLC-MS/HRMS)method 11… The comparativeanalysis of single-base resolution methylation maps by BS-seq methods showed avery high amount of variation in the methylation of CpG sites among differentanimal species. This opened a new door beyond mammalian methylomes, definingthree major categories:1. Mammalian methylomes –These have a prevalence of methylated sites in their genome with the rest ofthe region required for the binding of active regulatory proteins.
Humans, forinstance, have more than 80% of CpG nucleotides methylated into variousoxidative states. The default state of genome appears to be “methylated”.2. Honeybee methylome –Here, the default state appears to be “unmethylated”. They have 60,000 CpGcites only which is enriched with exons.3.
Absence of cytosinemethylation patterns has been observed in some organisms such as S. cerevisiae, C.elegans and D. melanogaster. This indicates that5mC is not essential in any development processes in these organisms.Figure2 illustrates thewhole-genome bisulfite sequencing analysis of these three methylomes showingthe ubiquitous, sporadic and absence of cytosine methylation pattern in them.These methods of sequencingand mapping aided in getting a better insight into the functionality of thenewly discovered cytosine-derived nucleotides.
They were then studies for theirroles and stability in the mammalian genome.5-methylcytosine (5mC):The genome of mammals contains epigenetic markswhich maintain the inheritable information of gene functions and are accessibleto either transcription factors and activators, or repressor complex recruitingproteins resulting in a closed chromatin structure that prevents activatedtranscription. The methylation of the carbon-5 of cytosine to 5-methylcytosine(5Mc, Figure 1) ata CpG site is an example of such a mark, preventing transcription when presentnear gene regulatory regions by modulating the binding of specific 5mC-bindingproteins hiring co-repressor complexes to methylated sites.The key feature of cytosine methylation is thehomology of methylation marks, i.e. the presence of methylation marks on boththe strands of the paternal DNA and its intact passage through DNA replication,confirming the stability and heritability of epigenetic information. It is saidto be enriched with “symmetric” CpG dinucleotides which basically permits theinheritance of methylation patterns through DNA replication and is successfullycarried out by the Dnmt1 DNA methyltransferase.
An unusual observation madeduring studying this modification at CpG sites was the rare occurrence of somenon-CpG methylation in mouse embryonic stem cells (mESCs). On working withthis, it was enlisted that in ESCs non-CpG methylation correlates with geneexpression, whereas in the neurons it is the inverse due to the recruitment ofthe methyl-CpG binding protein 2 (MeCP2) 12.. Due to the high abundance of methylation atCpG sites, 5-methylcytosine has been termed the “fifth base” of the humangenome. 12..
It represents an epigenetic modi?cation that plays a fundamentalrole in embryonic development, transcriptional regulation 3.. and numerousbiological phenomena, such as genetic imprinting, genetic silencing,and X chromosome inactivation 8..
. In some cases, it is also considered as anorigin of mutations, termed as a mutation hotspot, in CpG dinucleotides as aresult of spontaneous hydrolytic deamination of 5mC to thymine. 3.
.While, the classical epigenetic mark,5-methylcytosine, was being exploited for its abundance and role inepigenetics, discovery of additional cytosine modifications led to an increasedinterest in this field. The discovery ofenzymes that catalyse the hydroxylation of 5-methylcytosine to5-hydroxymethylcytosine, believed to be the sixth nucleotide, gave rise to anew epigenetic mark associated with activated transcription 12..5-hydroxymethylcytosine(5hmC):This novel epigenetic DNA modification emergedwith the discovery of the catalytic dioxygenase activity of Ten eleventranslocation (Tet) proteins which hydroxylated 5-methylcytosine (5mC) to 5-hydroxymethylcytosine(5hmC, Figure 1)which was originally discovered in mammalian DNA. Its prevalence was alreadyestablished earlier in bacteriophages but due to less sensitive equipments, ittook years to identify this oddity in mammals.
Cytosine hydroxy methylationlevels are often around 0.1% in mammalian tissues, but can vary greatly, withhighest values in the brain, where up to 1% of the cytosines can behydroxymethylated. The three mammalian Tet homologues generate 5hmC fromexisting 5mC, which they can further process to 5- formylcytosine (5fC) and5-carboxylcytosine (5caC, Figure 1).12.. The positional abundance of 5-hmC, contribution to epigenetics andfurther oxidation into new products was explored using experimentation onanimal models.Table1. Composition of rat DNAFormic acid hydrolysates were subjected tochromatography, phosphorus determinations were performed.
Determinations weremade on two preparations. Bases Content (mol/100mol ofDNA) Brain Liver Adenine 28.9 28.8 Guanine 21.1 21.5 Cytosine 18.5 17.
7 5- hydroxymethylcytosine 3.3 3.6 Thymine 28.1 28.2 Ratio P/base 1.01 1.03 The maximum content of 5-hmC exists in thebrain was confirmed by performing formic acid hydrolysis and two-dimensionalchromatography of DNA components of the rat genome 10..
. Identification of5-hydroxymethylcytosine from the chromatographic products was based on itsidentity with standards by spectrophotometric analysis. The standard separatedpoorly from cytosine but the extinction peaks at 261 nm and 276nm respectively,sharply differentiated the two compounds.
The results from this analysis isgiven in Table 1. Itcan be inferred that the sum of the molar percentages of cytosine and5-hydroxymethylcytosine is required for reasonable correspondence to the valuesfor guanine, indicating that the 5-hydroxymethylcytosine is in fact a DNAcomponent. The brain DNA content revealed the presence of5-hydroxymethylcytosine which constituted 15% of the total cytosine bases.Identical results were also observed for mouse and frog brain analysis. Applicationof the same preparative method to rat liver and rat spleen gave a similar DNAfraction, although in low yield.
The presence of 5-hydroxymethylcytosine in DNAsuggested an examination of RNA for this base. A very high percentage of5-hydroxymethylcytosine appeared to be present in the crude RNA fraction ofbrain. 10..The discovery of 5-hmC modifications is arecent phenomenon due to the thorough study of variance in behavior of neuronsof an adult brain and other tissue cells and to formulate proper re-programmingtechniques as mentioned in the very start. Previous failure to observe5-hydroxymethylcytosine in DNA preparations from animal tissue can be attributedto the indigenous and basic chromatographic systems which failed to distinguishbetween cytosine and 5-hydroxymethylcytosine due to minimal differences. Thenew high-throughput and sensitive methods allows proper identification of thismutation.5hmC was also found as a relative stable baseat a subset of mammalian promoters and active enhancers indicating some role inmediating epigenetic regulation 12.
. Therole of 5hmC as an active mark was supported by mass spectrometric analyses ofisotope labelled DNA form mammalian cell culture and mice showing that 5hmC ismostly a stable modification and not a transient intermediate, hence thepossible sixth nucleotide. 12..Thus, with all this evidence and comparisonwith bacteriophage DNA depicted some kind of a trend giving rise to a newpossibility of a glucosylated DNA preparations suggesting a similarity to theglucosylated 5 -hydroxymethylcytosineof bacteriophage DNA. Thus the high concentration of 5-hydroxymethylcytosinein brain nucleic acids is thought to have a relation to the central nervoussystem’s dependence on glucose for primary metabolic processes. 5-methylcytosineas well as 5-hydroxymethylcytosine and other oxidative products are lost bycurrent techniques for preparing DNA from brain. Eventually, there is apossibility that the native structure could be more complex than currentconcept suggests.
10.. This is what lead to the identification of 5-formylcytosine(5fC), 5-carboxylcytosine (5caC), 5-hydroxymethyluracil (5hmU) and N6-methyladenineas some new nucleotides to be considered.5-formylcytosine (5fC)While mapping the 5hmC regions, a subset ofmarked regions showed the presence of 5fC, suggesting its role as anindependent epigenetic mark which engaged researchers on a new task to explore thestability of 5fC.
12..Single-cell 5fC sequencing technique called CLEVER-seq(chemical-labelling-enabled C-to-T conversion sequencing) was used to map thedevelopment or presence of 5fC sites in the genome of various stages pre- andpost-implantation of embryo to study its dynamics.
It was observed that thesemethylated sites are inherited as well as newly generated after fertilization.2..Also, these sites show a high level of heterogeneity except at thepromoters and exons. In these regions they show the least heterogeneityestablishing DNA demethylation activity and thus upregulation of gene expression.5.
. 5-formylcytosine (5fC) is a rare base found inmammalian DNA which was originally thought to be involved just in active DNAdemethylation. But apart from this, studies have directed towards the stabilityof this modification in the genome as a controlling factor of epigenetics.
Also, it is assumed to have a higher integration in the DNA than 5hmC levels.This was experimentally proven by monitoring developmentaldynamics of 5fC and 5hmC levels in mouse with results suggesting that 5fC hasfunctional roles in DNA that go beyond being a demethylation intermediate11…
It included DNA quantification of these rare modifications (5fC and5caC) with the highest possible sensitivity and accuracy, employing a nanohigh- performance liquid chromatography–tandem high-resolution massspectrometry (Nano HPLC-MS/HRMS) method, which is able to resolve genuine rare modifiedbases from potential impurities of the same nominal mass and retention time,and can detect down to 0.1 ppm of total Cs in as little as 100 ng of digestedgenomic DNA along with stable isotope labeling in vivo which substantiallyimproved the quality of the measurements, ensuring excellent reproducibilitybetween technical replicates and excluding spontaneous oxidation of 5hmC as an additional source of 5fC or 5caC and givingfalse positives.The experimentation initially involvedanalyzing global levels of all cytosine modifications in the genomic DNA ofC57BL/6 mouse (inbred strain of laboratory mouse) tissues to establish arelationship between 5fC and its precursors 5mC and 5hmC or its metabolite5caC. A range of postnatal tissues from newborn (1 d old), adolescent(21-d-old) and adult (15-week-old) mice were under study, along with embryos at11.5 d post-fertilization and mES cells for comparision.
11..5mC and 5hmC were found to be present in alltissues. 5fC was also present in all studied tissues at levels ranging between0.2 p.p.m.
and 15 p.p.m. of all Cs. Whereas, 5caC was not detected in anypostnatal tissue. Overall, no possible link between the levels of 5fC and thelevels of its precursors 5mC or 5hmC was noticed. In turn it was observed thatthe mechanism varied with different tissues. They can retain the levels of 5fCwhile gaining 5hmC (e.
g., brain), lose 5fC while retaining the levels of 5hmC(e.g.
, heart) or even lose 5fC while gaining 5hmC 11..To elucidate the stability of 5fC in genomicDNA toward turnover in vivo, The isotope-labelled oxidative products gave anoutput as labeling ratios which change according to the dynamics and half-lifeof the given modification in the genomic DNA 11.. The absence of 5fC in thebrain, where 5fC is most abundant, indicates minimal or no further generationof 5fC once placed in postmitotic tissues. Moreover, if 5fC was involved incycles of methylation and demethylation, its labeling ratio would be similar tothat of 5mC in RNA.
This can be summarized as, even though the production of5fC depends on Tet3, the removal of 5fC is independent of TDG. Thepossibilities of 5fC absence can be attributed to either further oxidation into5caC or its deformylation and decarboxylation. 2..The further study of 5fC modification revealed its role in forming a mutational hotspot at CpGdinucleotides, i.
e. it can induce G·C to A ·T transition mutationsduring in vitro DNA replication as a result of spontaneous hydrolyticdeamination. The X-ray crystal structure of DNA containing 5fC shows that 5fCalters the helical coiling and trajectory of the canonical B-form of DNA 37. Itcan arrest RNA polymerase II transcription elongation, thereby reducing thenumber of transcripts produced 38. Base pairings of guanine with iminotautomers of 5fC display a G·T mismatch geometry 3…
Aside from the mostefficient way of getting rid of 5hmC and salvaging C, the trace molecules 5fCand 5caC are adopted in the DNA methylation interplay, which means those traceelements have a high possibility of functionality, rather than acting as mereintermediates. it is tempting to speculate on their specific function asdeterminants in lineage specification in embryonic brain development. Thistopic of research has recently sprung up and with more experimental evidence,5fC may prove to be the new, stable nucleotide in the DNA sequence.Recently, some unusualvariances were observed in the thymine and adenine bases of the genetic code.With intense studies in these developments, we might probably get the seventh,eight or innumerable new stable DNA bases.
5-hydroxymethyluracil(5hmU) Very recently, it has been shown that Tetproteins can also oxidize thymine to 5-hydroxymethyluracil (5hmU,Figure 1). Tet-dependent 5hmU has been shown to have similarities with5caC in matters of abundance, distribution according to tissue and age andfunctions as a protein recruiter. Thus, with this relation, a correlation that5hmU also might be a contributor to epigenetics is speculated. Apart from thisnew role, it is natively destined to get targeted by Smug1 DNA glycosylase whenbase paired to adenine and undergo base excision repair or promote activedemethylation by recruiting repair factors to Tet targets in absence of Smug1.N6-methyladenine Most recently, N6-methyladenine was describedas an additional eukaryotic DNA modification with epigenetic regulatorypotential. Interestingly, this modification is found to be present in genomesthat lack canonical cytosine methylation patterns (type three methylomesprobably), suggesting independent functions. This newfound diversity of DNAmodification and its potential for combinatorial interactions it yet to be fullystudied and understood12.
.. CONCLUSION:Specific readers for 5hmC, 5fC and 5caC havebeen identified that function in transcription regulation and chromatinremodeling, mostly promoting the active state. In addition, 5fC, 5caC and 5hmUmight primarily function in the recruitment of DNA repair-associated complexesand thus enhance demethylation. Finally, these marks might also directlycontribute to gene regulation by triggering “scheduled” DNA repair, which hasbeen suggested to be coupled with activated transcription. The discovery of 6mAin eukaryotes recently identified an additional methylation mark 12..
One of the biggestquestions is why vertebrates do not have 5mC-speci?c glycosylases similar toDme and Ros1 that are found in plants and can directly excise 5mC ef?ciently.One possible explanation could be that 5mC oxidized products create additionalepigenetic codes. These codes, in turn, allow for a diverse layer ofregulation. However, it should be noted that generation of 5fC, 5caC, or asingle-stranded gap during excision repair could expose cells to deleterious effectsunless the processes are completed perfectly and with high ?delity. In thisaspect, it would appear that acquiring these epigenetic codes poses some risksfor vertebrates.
3..The cytosine is constantlyundergoing a cycle of methylation and demethylation which plays a role in theregulation of gene expression.
Although passive demethylation occurs mainlythrough a DNA replication dependent process, active demethylation is achievedthrough several players in a DNA replication-independent manner 8..The chemical modification of DNA bases plays akey role in epigenetic gene regulation. 12..BRAIN 8.
.In particular, given thehighly enriched level of 5-hmC in brain relative to many other tissues and celltypes (for example, in Purkinje cells of the cerebellum, 5-hmC is approximately40% as abundant as 5-mC), here we highlight the potential functional roles ofthis cytosine modification, and others, in brain development. REFERENCES: