2021, Vol.94, No.7

The Chemical Society of Japan Award for 2019

Nucleic acids form not only the canonical double helix (duplex) but also the non-canonical (non-double helix) structures such as triplexes, G-quadruplexes, and i-motifs. The formation of these non-canonical structures and their stabilities depend on the microscopic environmental conditions around the nucleic acids. The intracellular environments, where various molecules are densely packed, exhibit molecular crowding. The non-canonical structures are very stable under molecular crowding conditions. The functions and structures of these nucleic acids in cells are optimized to enable them to function well in the crowded environments. We envisaged that molecular crowding in cells may play an important role in the reactions involving functionalized biomolecules and discovered a novel regulatory mechanism underlying the role of the non-canonical structures in gene expression. Based on the results of our work, we have developed novel methods to control the gene expression of non-double helical nucleic acids, leading to new insights into the chemistry of such nucleic acids. Our major achievements are summarized in this review.

1.1 Watson-Crick Double Helix.

Nucleic acids are biomacromolecules used for transferring genetic information. However, as it was widely believed until the early 1900s that proteins acted as the genetic material, it took a substantial amount of time before the nucleic acids were established as genetic materials. Since the 1950s, several technological advancements have facilitated the study of nucleic acids. E. Chargaff, who is considered one of the pioneers in the field of nucleic acids, discovered that the ratio of guanine units equals the ratio of cytosine units and the ratio of adenine units equals the ratio of thymine units in DNA, using paper chromatography to analyze DNA from various origins.1 These are known as Chargaff’s rules, which describe the complementarity of the base composition in DNA. At the beginning of 1950, M. H. F. Wilkins, R. E. Franklin, and others at Kings College, University of London extensively studied the X-ray crystallographic structure of DNA. Finally, J. D. Watson and F. H. Crick, who worked at Cavendish Laboratory in Cambridge, proposed that the DNA forms a double helix based on their analysis; this result was published in the Nature issue dated April 25, 1953, as a single-page-paper.2 Watson, Crick, and Wilkins were awarded the Nobel Prize in Physiology or Medicine in 1962 for their discovery of the DNA double helix.

The double helical structure of nucleic acids is composed of two complementary strands paired via Watson-Crick base pairing (Figures 1a and 2a).2 Nucleic acids are linear polymers. Each monomer unit contains a phosphate, ribose, and base residue [adenine (A), cytosine (C), guanine (G), and thymine (T, in DNA) or uracil (U, in RNA)]. Based on the rules of Watson-Crick base pairing of A•T (or A•U) and G•C pairs (• indicates Watson-Crick base pairs), each strand binds to its complementary strand to form a double helix. This structure forms a right-handed helix, in which the core of the structure is the stacks of base pairs and surrounded by the phosphate backbone. The B-duplex has a straight spiral axis and ten base pairs in each turn, with a pitch of 3.4 nm per turn. The RNA double helix forms an A-type helix structure, which has a pitch of 3.0 nm and 11 base pairs per turn. These double helices form via specific base pairing between complementary sequences, resulting in the formation of canonical right-handed helical structures.

1.2 Non-Watson-Crick Base Pairing.

It should be noted that Watson and Crick first “proposed” the double helix structure but did not prove the tertiary structure at the atomic level. Moreover, the first model of nucleic acid helices was not proposed by Watson and Crick, but rather by L. Pauling, who earned two Nobel Prizes in his career. Pauling and his associate R. B. Corey first presented the helix structure of nucleic acids; however, it was a faulty structure that consisted of a triplex with the core of the structure modeled as negatively charged phosphates.3 This could not have existed naturally, and hence, lost to the model structure proposed by Watson and Crick. Once their model was proposed, many attempts were made to determine the double-helix structure of DNA based on Watson-Crick base pairing using the monomer combinations of purines and pyrimidines. In 1959, K. Hoogsteen,4 who was an associate of Corey’s at the California Institute of Technology, reported the first attempt to analyze co-crystals containing 9-methyladenine and 1-methylthymine, using X-ray crystallography.4,5 However, the result did not identify the Watson-Crick base pair. The crystal structure suggested that adenine formed a base pair with thymine in an upside-down position. This different type of base pair was later named a Hoogsteen base pair (Figure 1b). Since then, only Hoogsteen base pairs were known for a long time until A. Rich first discovered Watson-Crick base pairs from the co-crystals of AU and GC dinucleoside phosphate in 1973.6,7 Soon after, R. E. Dickerson, who took over Pauling’s lab, first deciphered the single crystal structure of a DNA dodecamer by X-ray crystallography using heavy atoms.8 Thus, it took almost 20 years to prove the existence of Watson-Crick base pairs after they were originally proposed. These results suggest that the constraints on the helical structure of the nucleic acids induce Watson-Crick base pairs, whereas Hoogsteen base pairs are preferred in other structural conditions. Double helix structures composed of Watson-Crick base pairs are called canonical structures. On the other hand, non-canonical structures include non-Watson–Crick base pairs, such as Hoogsteen base pairs.

1.3 Nucleic Acid Structures with Non-Watson-Crick Base Pairs.

Nucleic acid structures with Hoogsteen base pairs have been known to exist since the 1960s. Rich et al. found that triplexes could be formed with poly(rU) strands and poly(rA)-poly(rU) double helices (Figure 2b).9 Using NMR analysis, it was found that protonated G*C+ Hoogsteen base pairs (Hoogsteen base pairs are represented by *) were formed at cytosine N3 in a poly(dG)-poly(dC) complex with dGMP at low pH (Figure 1b), suggesting triplex formation.10 Moreover, unusual structures from short guanine-rich DNA sequences were identified in 1962.11 The X-ray analysis of poly(guanylic acid) gels suggested that four guanines could form a planar conformation with hydrogen bonds, which is now known as a guanine quartet (G-quartet) (Figure 1b). Stacks of some G-quartets build a tetraplex, or “G-quadruplex” (Figure 2b). Using X-ray crystal analysis, Hoogsteen base pairs in the poly nucleic acid structure were first found in tRNA.12 It was observed that the secondary structure of the tRNA was constructed by Watson-Crick base pairs, whereas the tertiary structure of the tRNA was assisted by Hoogsteen base pairs. In tRNA structures, other non-Watson-Crick base pairs have also been identified. This significance can be widely adopted for non-coding RNAs, including tRNAs. Ribozymes are non-coding RNAs discovered by T. R. Cech.13,14 Similar to protein enzymes, ribozymes catalyze chemical reactions. Based on ribozyme structure analysis, it was found that many non-Watson-Crick-type base pairs might be present in the active cores of the ribozymes. Thus, non-canonical base pairing such as Hoogsteen base pairing is employed as an important interaction in the formation of the tertiary structures of the nucleic acids, except for the duplexes.

Since the 1990s, progress in the field of structural analysis technology has revealed that Hoogsteen base pairs are temporarily present when the DNA interacts with small compounds and proteins, as well as in the canonical double helix structure. In addition, analysis of cross-intercalation of hemi-protonated cytosine-cytosine (C*C+) base pairs under acidic conditions identified yet another type of tetraplex structure, called as the i-motif, from the cytosine-rich DNA sequence (Figures 1b and 2b).15 Since then, the roles of the non-canonical structures in cells are increasingly being recognized. In particular, G-quadruplexes have been a major topic of interest since the 2000s. The formation of G-quadruplex affects the interaction of proteins with DNA or RNA, resulting in regulation of gene expression.16 It is commonly recognized that genetic information is uniquely determined from the sequences of DNA and RNA, which Crick termed as “the central dogma.” However, G-quadruplex formation can cause deviation from this rule by regulating reactions, including replication, transcription, and translation, without altering the sequence information. In general, the regulation of gene expression is considered to be mediated by proteins such as transcription factors. However, gene expression can also be controlled by nucleic acids themselves via non-canonical structures formed with non-Watson-Crick base pairs, including the Hoogsteen type.17 Therefore, types of base pairs can differentially utilize nucleic acids as follows: Watson-Crick base pairs are involved in the transfer of “genetic information,” while non-Watson-Crick base pairs are responsible for “genetic function.” The potential G-quadruplex sequences are frequently found in specific regions of the chromosome, such as the telomeric region and the promoter region of oncogenes. Since 2013, G-quadruplexes and i-motif formation have been detected in cells.1820 These reports suggest that the formation (or dissociation) of G-quadruplexes on oncogenes may activate these oncogenes and cause cancer. Furthermore, the liquid-liquid phase separated structures in the cells containing G-quadruplex RNA have been suggested to contribute to neurological diseases.21

2.1 Factors underlying the Thermodynamic Stability of Double Helical Nucleic Acids.

From the thermodynamic perspective, there are five main factors that affect the double helix stability (Figure 3). The chemical attractions between the base pairs are as follows: (i) hydrogen bonds, which facilitate base pair formation, and (ii) stacking interactions between neighboring base pairs, which facilitate the formation of the ladder structure. In Watson-Crick base pairs, the donors and acceptors of hydrogen bonds face each other. These complementarities between base pairs allow for high selectivity and stability of base pairing. In addition, bases consisting of electron-rich aromatic rings interact with other bases via π-stacking interactions. However, a structural change from a coil to a helix is an energetically unfavorable reaction, in terms of the penalty of (iii) conformational entropy. On the other hand, as environmental factors, (iv) cations stabilize the helix structure upon its formation according to the theory of counterion condensation. Cations neutralize the anionic charges on phosphate groups to avoid electric repulsion between the strands, thereby ultimately facilitating the duplex formation. Furthermore, (v) hydration water influences formation and stability of the double helix structure.

2.2 Thermodynamic Analysis of the Formation of Double Helix.

To analyze the transition from a coil to a double helix, various spectroscopic techniques, such as ultraviolet (UV), fluorescence and circular dichroism (CD) spectroscopy, are used. Since this transition depends on the temperature, the changes in the spectroscopic signals are monitored to observe whether, with the increasing temperature, the duplexes unfold, which is known as melting. The changes in signals that indicate the melting of the double helices have a sigmoidal profile, as a function of temperature. In this sigmoidal profile, the midpoint is known as the melting temperature (Tm). Under the assumption of the transition through a two-state model between the coil and double helix, the Tm of the double helix depends on the concentration of the oligonucleotide strands, as follows:

\begin{equation} 1/T_{\text{m}} = 2.303\ R\ (\log (C_{\text{t}}/n))/\Delta H^{\circ} + \Delta S^{\circ} /\Delta H^{\circ} \end{equation}
(1)
where R is the gas constant (1.987 cal K−1 mol−1; 1 cal = 4.184 J), and Ct is the total concentration of oligonucleotide strands. The value of n equals 1, if the double helix strands are self-complementary, and equals 4, if the double helix strands are non-self-complementary and are in equal concentrations. Tm values with different Ct values from several measurements give changes in the enthalpy and entropy (ΔH° and ΔS°) for the double helix formation using eq (1). The ΔG° value (free energy change during double helix formation) can be determined adopting eq (2), as follows:
\begin{equation} \Delta G^{\circ} = \Delta H^{\circ} - T\Delta S^{\circ} \end{equation}
(2)
where T is the temperature. Based on the equilibrium constant K, the ΔG° value can be obtained using eq (3),
\begin{equation} K = \exp\ (- \Delta G^{\circ} /RT) \end{equation}
(3)
The thermodynamic parameters can also be analyzed by fitting the sigmoidal profile of strand melting using the theoretical model derived from eqs (2) and (3).22

2.3 Nearest Neighbor (NN) Model and Stability Calculation.

The “nearest neighbor (NN) model,” was first proposed and developed by I. Tinoco Jr. and co-workers. This model is the most successful theoretical prediction method for the thermodynamic parameters of duplex formation.2325 The concept of this model is that the stability of a certain base pair can be defined by interactions between said base pair and its NN base pair (Figure 4). The A•T and G•C base pairs have two and three hydrogen bonds, respectively. Thus, the G•C pair forms a more stable base pair than A•T. However, the energetic contribution of base stacking depends on how the combinations of stacked base pairs overlap with each other.26 Since the function of the stacking interaction is related to 1/r6, only the interaction contributed by the NN base pair requires consideration. The duplexes macroscopically look like a monotonous helix, suggesting that there is no structural deviation that determines the stability of the double helix.

By using the NN model, it is possible to estimate the thermodynamic parameters (ΔH°, ΔS°, ΔG°, and Tm) of the formation of a double helix, including DNA/DNA, RNA/RNA, and RNA/DNA hybrid duplexes under dilute buffer conditions.23,24,27 Prediction of the stability using the NN model depends on three thermodynamic parameters to obtain the thermodynamic values for duplex formation. The first parameter is the free energy change for helix initiation to form the first base pair at the edge of the double helix. As the terminal bases of the duplexes are partially exposed to the solvent, the energetic contribution of the formation of the first base pair is different from that of the internal base pairs. This term, called “initiation factor,” is valid for both A•T and G•C base pairs.24 The second parameter is the free energy change accumulated by the sum of each subsequent base pair for the helix propagation. All possible NN combinations for DNA/DNA or RNA/RNA base pairs are 10, whereas those for RNA/DNA hybrid base pairs are 16. The third parameter is the free energy change of mixing entropy term for the self-complementary strands. To summarize these terms, ΔG°37(total) (the total free energy change of the formation of the double helix at 37 °C) is given by eq (4):

(4)
where ΔG°37,NN(i) is the standard free energy change for each of 10 NN (e.g., ΔG°37,NN(1) = ΔG°37(AA/TT), ΔG°37,NN(2) = ΔG°37(TA/AT), etc.) for DNA/DNA as an example, and ni is the number of occurrences of each NN(i). ΔG°37(init) is the initiation factor, depending on the terminal base pair (A•T or G•C), and ΔG°37(sym) equals to +0.40 kcal mol−1, when the duplex is self-complementary, and to zero, when it is non-self-complementary.

Computational calculations based on, for example, a linear least square computer program, are used to determine ΔG°37 and ΔH° via experiments using parameter sets as shown in Tables 13 (e.g., in the case of DNA/DNA double helix, 13 parameters for 10 Watson-Crick NN base pairs, 2 initiation factors, and a symmetry parameter for self-complementary sequences.). To avoid a biased data source, various sequences with an unbiased frequency of NN sets are selected.

Table
Table 1. Nearest neighbor (NN) parameters (ΔG°NN) for the formation of DNA/DNA double helix in different solution conditions at 37 °C
Table 1. Nearest neighbor (NN) parameters (ΔG°NN) for the formation of DNA/DNA double helix in different solution conditions at 37 °C
NN set 1 M NaCl
in the absence of cosolutea
(kcal mol−1)
100 mM NaCl in the absence of cosoluteb
(kcal mol−1)
100 mM NaCl
in the presence
of 40 wt%
PEG 200c
(kcal mol−1)
d(AA/TT) −1.00 −0.65 −0.55
d(AT/TA) −0.88 −0.60 −0.28
d(TA/AT) −0.58 −0.36 −0.16
d(CA/GT) −1.45 −1.23 −1.00
d(GT/CA) −1.44 −1.20 −0.89
d(CT/GA) −1.28 −1.11 −0.91
d(GA/CT) −1.30 −0.93 −0.87
d(CG/GC) −2.17 −1.85 −1.38
d(GC/CG) −2.24 −2.05 −1.31
d(GG/CC) −1.84 −1.69 −1.25
Initiation per GCd +0.98 +0.98 +0.76
Initiation per ATd +1.03 +1.03 +1.00
Symmetry factore +0.40 +0.40 +0.40

aData reported by J. Santa Lucia Jr.28 bValues calculated by the method of J. M. Huguet et al.29 cData collected from our recent report.30 dAs interactions of base stacking can be affected by salt concentrations, initiation parameters remain the same at low NaCl concentrations; initiation involves only the hydrogen bonding, and not stacking interactions, between base pairs. eFree energy change due to the entropic penalty for maintaining the C2 symmetry in self-complementary sequences is independent of the environment.

Table
Table 2. Nearest neighbor (NN) parameters (ΔG°NN) for the formation of RNA/RNA double helix in different solution conditions at 37 °C
Table 2. Nearest neighbor (NN) parameters (ΔG°NN) for the formation of RNA/RNA double helix in different solution conditions at 37 °C
NN set 1 M NaCl
in the absence
of cosolutea
(kcal mol−1)
100 mM NaCl in the absence of cosoluteb
(kcal mol−1)
1 M NaCl
in the presence
of 20 wt%
PEG 200c
(kcal mol−1)
d(AA/UU) −0.93 n.d. −0.88
d(AU/UA) −1.10 n.d. −1.13
d(UA/AU) −1.33 n.d. −1.36
d(CU/GA) −2.08 n.d. −2.03
d(CA/GU) −2.11 n.d. −1.91
d(GU/CA) −2.24 n.d. −2.36
d(GA/CU) −2.35 n.d. −2.36
d(CG/GC) −2.36 n.d. −2.19
d(GG/CC) −3.26 n.d. −3.32
d(GC/CG) −3.42 n.d. −3.44
Initiation per terminal GC +4.09 n.d. +4.63
Initiation per terminal AU +0.45 n.d. +0.55
Symmetry factor +0.43 n.d. +0.68

aData reported by T. Xia et al.31 bNo data are available.32 cData reported by M. S. Adams et al.33

Table
Table 3. Nearest neighbor (NN) parameters (ΔG°NN) for the formation of RNA/DNA hybrid double helix in solutions with different NaCl concentrations without cosolute at 37 °C
Table 3. Nearest neighbor (NN) parameters (ΔG°NN) for the formation of RNA/DNA hybrid double helix in solutions with different NaCl concentrations without cosolute at 37 °C
NN set 1 M NaCla
(kcal mol−1)
1 M NaClb
(kcal mol−1)
100 mM NaClb
(kcal mol−1)
rAA/dTT −1.0 −1.0 −0.7
rAC/dTG −2.1 −1.8 −1.5
rAG/dTC −1.8 −1.6 −1.3
rAU/dTA −0.9 −0.7 −0.4
rCA/dGT −0.9 −1.3 −1.2
rCC/dGG −2.1 −2.0 −1.7
rCG/dGC −1.7 −1.9 −1.4
rCU/dGA −0.9 −1.1 −0.4
rGA/dCT −1.3 −1.8 −1.5
rGC/dCG −2.7 −2.6 −2.0
rGG/dCC −2.9 −2.7 −2.3
rGU/dCA −1.1 −1.5 −1.4
rUA/dAT −0.6 −0.7 −0.5
rUC/dAG −1.5 −1.2 −1.4
rUG/dAC −1.6 −1.4 −1.6
rUU/dAA −0.2 −0.4 +0.2
Initiation +3.1 −0.4 (GC)c
+3.1 (AU(T))d
+2.0 (GC)c
+2.6 (AU(T))d

aData obtained from our earlier report.34 bData collected from a recent report of our laboratory.35 cInitiation parameters for the duplexes that contain at least one rG•dC or rC•dG base pair at any terminal. dInitiation parameters for duplexes containing only rA•dT or rU•dA base pairs at both terminals.

Synthetic oligonucleotides used in melting assays usually possess 5′-OH, whereas there exists a phosphate group at the 5′-end of the naturally occurring oligonucleotide. It has been demonstrated that the presence of 5′-phosphate only slightly increases the stability of duplexes,36 although this increase is not negligible in the case of nucleic acids with dangling ends. Therefore, the current parameters can be used without further corrections for the group at the 5′-end.

2.4 Improvement of NN Parameters under Various Cation Concentrations and Molecular Crowding Conditions.

The established NN parameters are used only in those analyses carried out in a solution containing 1 M NaCl. As shown in section 2.1, the presence of cations and hydration largely affects the stability of nucleic acids. For example, a decrease in the cation concentration induces an increase in the electric repulsion between the strands, which results in decreased duplex stability. Crowding agents, such as polyethylene glycol (PEG) and polysaccharides, generally do not interact with the nucleic acids, but destabilize the duplexes due to enhanced dehydration. If the contribution of cations and crowding molecules towards the stability of a certain sequence is elucidated, NN parameters that estimate the stability of a double helix in molecular crowding can be determined. We studied the validity of the NN model for the self-complementary DNA double helices under molecular crowding conditions resulting from the addition of PEG 200.37 In the presence of 40 wt% PEG 200 in a buffer containing 0.1 M NaCl, the thermodynamic parameters for duplex formation for sequences with identical NN base pairs were found to be similar. Thus, the validity of the NN model was established even under molecular crowding conditions. In our previous study,38 the experimental ΔG°37 and Tm showed a linear correlation between the values in a buffer solution containing 100 mM NaCl in the absence of crowders and those predicted from established NN parameters39 obtained from a solution with 1 M NaCl, as follows (Figure 5a):

\begin{align} &\Delta G^{\circ} {}_{37}\ (100\,\text{mM NaCl},\ \text{no crowding}) \\&\quad= 0.63\ \Delta G^{\circ} {}_{37}\ (1\,\text{M NaCl},\ \text{no crowding}) - 1.67 \end{align}
(5)
\begin{align} &T_{m}\ (100\,\text{mM NaCl},\ \text{no crowding}) \\&\quad= 0.88\ T_{m}\ (1\,\text{M NaCl},\ \text{no crowding}) - 5.15 \end{align}
(6)
As shown above, we plotted ΔG°37 and Tm values obtained experimentally from a buffer containing 0.1 M NaCl with 40 wt% PEG 200 against values calculated using eqs (5) and (6), which also showed linear correlations (Figure 5b).
\begin{align} &\Delta G^{\circ} {}_{37}\ (100\,\text{mM NaCl},\ \text{crowding}) \\&\quad= 0.99\ \Delta G^{\circ} {}_{37}\ (100\,\text{mM NaCl},\ \text{no crowding}) + 2.32 \end{align}
(7)
\begin{align} &T_{m}\ (100\,\text{mM NaCl},\ \text{crowding}) \\&\quad= 0.90\ T_{m}\ (100\,\text{mM NaCl},\ \text{no crowding}) - 3.39 \end{align}
(8)
where “crowding” indicates the buffer containing 100 mM NaCl with 40 wt% PEG 200 and “no crowding” indicates the buffer containing 100 mM NaCl without crowders. The average difference of ΔG°37 and Tm was 4.6% and 1.1 °C, respectively. Thus, the values of ΔG°37 and Tm in the presence of 40 wt% PEG 200 in a buffer containing 100 mM NaCl can be accurately predicted using NN parameters established in 1 M NaCl. There is no sequence information in eqs (7) and (8), which suggests that the NN parameters, in the presence of 40 wt% PEG 200 in the buffer containing 100 mM NaCl, are potentially improved by treatment with a simple linear relationship. This concept allowed the established NN parameters for self-complementary DNA duplexes to be improved as follows:
\begin{align} &\Delta G^{\circ} {}_{\text{NN},37}\ (\text{crowding},\ 100\,\text{mM NaCl}) \\&\quad= 0.67\ \Delta G^{\circ} {}_{\text{NN},37}\ (\text{no crowding},\ 1\,\text{M NaCl})+ 0.12 \end{align}
(9)
where ΔG°NN,37 (crowding, 100 mM NaCl) is the “predicted” NN parameter and ΔG°NN,37 (no crowding, 1 M NaCl) is the established NN parameter obtained from the solution containing 1 M NaCl at 37 °C.39 However, we also found that the stability of DNA double helices with G and C rich sequences was not well predicted under crowding conditions.37 Thus, further improvements are needed for the prediction for such biased sequences under crowding conditions.

We further established the NN parameters for DNA double helices with both self- and non-self-complementary strands in the buffer containing 100 mM NaCl, in the presence of 40 wt% PEG 200 (Table 1).30 Comparison with NN parameters under no crowding condition indicated that the crowding condition destabilized NN parameters for GC pairs [d(CG/GC), d(GC/CG), and d(GG/CC)], considerably more than those for other NN pairs. This trend suggests that PEG 200 decreased the water activity of the solution, which resulted in greater destabilization of NN pairs comprising of G•C than that of A•T pairs, due to the relatively large amount of hydration around G•C pairs for stabilization.40 Moreover, the most remarkable differences were observed among the initiation factors, in which ΔH° and ΔS° drastically changed to a greater extent than each NN parameter under the crowding condition, as compared to those in the solution without the cosolute.30 These changes suggest that hydration of the base pairs at the edges of oligonucleotides is preferentially induced in the crowded environment.

In the case of the RNA duplex, NN parameters for usage under crowding conditions have also been developed by another research group, in the presence of 20% PEG 200 and 1 M NaCl.33 Most of these NN parameters were nearly similar to those under the non-crowding condition at the same NaCl concentrations (Table 2).31 The greatest effect of crowding was found on the terminal pair, other than that on the propagating NN pairs. When compared with the DNA NN parameters mentioned above, it can be said that under the molecular crowding conditions, in the B-type tertiary structure of DNA double helices, the phosphate backbone is more hydrated and sugar residues are less exposed than the A-type structure. This hydration geometry makes the DNA structure unstable under conditions of low water activity.41 On the other hand, RNAs form A-type duplexes comprising shared inter-phosphate water bridges. Less hydration should make the A-type duplexes more stable at low water activity than the B-type duplexes. Therefore, the reduction of water activity in the presence of a cosolute destabilizes DNA duplexes to a greater extent than RNA duplexes.

In the case of the RNA/DNA hybrid duplex, the NN parameters in the presence of 100 mM NaCl under no crowding conditions are available.32,35 In a comparison, between the NN parameters in 1 M and 100 mM NaCl solutions (Table 3), carried out in our recent report,35 ΔG°NN,37 in 100 mM NaCl solution generally increased more than that in 1 M NaCl solution. For example, the ΔG°NN,37 values for the NNs consisting of purine bases in the RNA strand (e.g., rAA/dTT and rAG/dTC) increased by 0.3 kcal mol−1. In the cases of the NNs of complementary pyrimidine-rich RNAs, namely, rUU/dAA and rCU/dGA, ΔG°NN,37 values showed large increases of 0.6 kcal mol−1 and 0.7 kcal mol−1, respectively. Moreover, in the case of the NNs with higher contents of rG•dC or rC•dG and purine bases in the RNA strand, such as rGG/dCC, exhibited higher stability of ΔG°NN,37 in the solution containing 100 mM NaCl (−2.3 kcal mol−1). In contrast, the NN with the lowest content of rG•dC or rC•dG and pyrimidine-rich RNA (rUU/dAA) was the least stable.

3.1 Chemical Properties of Non-Double Helical Nucleic Acids.

3.1.1 Effect of Hydration:

Hydration is one of the key factors responsible for stabilizing both the double-helical and non-double-helical nucleic acids. The high electrostatic polarity of the phosphate group makes water molecules tightly bind to the phosphate backbone.42 The hydroxyl groups of sugar (ribose and deoxyribose) are hydrophilic, and thus, can also act as hydration sites. Some polar groups on nucleobases can interact with water molecules. The geometry of the helical structure creates a structure-dependent site of hydration. G-quadruplex DNA has well-ordered hydration spines in its grooves, as is also observed in the canonical duplex DNA (Figure 6).43 The water hydrates both the guanine bases and the sugar-phosphate backbone to bridge them via hydrogen bonding. Moreover, hydration occurs widely in the loop regions of the G-quadruplex (Figure 6). To quantitatively analyze the hydration property of G-quadruplex formation, we used an osmolyte to estimate the magnitude of hydration.44 In general, dependency of the water activity on the equilibrium constant K reflects the number of water molecules per nucleotide released after the reaction, namely Δnw, as presented by the equation (∂log K/∂log aw) = −Δnw, if other interactions between the osmolyte and nucleic acids and between the osmolyte and water molecule are considered negligible.45 We applied the osmotic analysis to the G-quadruplex formation and elucidated the effect on the thermostability of the G-quadruplex.44 As osmolytes decreased the water activity, the equilibrium constant K was increased, indicating that the value of Δnw was found to be positive, representing dehydration. For example, 4.5 water molecules per nucleotide are released during the formation of the thrombin-binding aptamer G-quadruplex.44 In contrast, 3.4–3.5 water molecules per nucleotide are taken during formation of the canonical double helix;44 nonetheless, some studies, using molecular dynamics analysis, have indicated that water molecules are released at the lower level.46,47 Based on these results, the mechanism of dehydration upon G-quadruplex formation can be categorized into several components: (1) dehydration is accompanied by G-quadruplex formation, (2) a higher charge density on the G-quadruplex than that on the coil structure induces the uptake of electrostricted water molecules, (3) hydration water around metal ions is released upon their incorporation in the G-quadruplex core, and (4) loop regions interacting with the G-quadruplex release or uptake water molecules. These trends of stabilization of G-quadruplexes and destabilization of a double helix are also observed in the case of RNA.48 Therefore, dehydration depends on the tertiary structures of nucleic acids.

3.1.2 Effect of Cation Binding:

Similar to duplex structures, non-double helical structures are also negatively charged on the phosphate backbone, which must be shielded with cations to overcome the Coulomb repulsion to form tertiary structures. Furthermore, non-canonical nucleic acid structures exhibit unique cation binding because of their structural arrangements. In the case of G-quadruplexes, cations specifically coordinate with the core of the quadruplex via guanine O6 atoms. This coordination is very important for the formation of the G-quadruplex structure, as the absence of the coordinated cation in the center of the quartet makes the G-quadruplex structure electronically disadvantageous and unstable.49,50 The stability of G-quadruplex structures with different cations is in the order of K+ > Ca2+ > Na+ > Mg2+ > Li+.51

Under cellular conditions, organic cations play a role in the metabolism. Hence, we studied the stability of non-double helical structures in the presence of various organic cations (Figure 7). The cells are rich in 2-hydroxy-N,N,N-trimethylethanaminium (choline) ions and their derivatives, which are involved in various metabolic processes, including DNA methylation. At high concentrations of choline ions (as a hydrated ionic liquid), the stability of the A•T base pair is larger than that of the G•C base pair in a DNA double helix (see section 4.1).52 On the other hand, the stability of G-quadruplexes is decreased by choline ions, as these ions preferentially bind to the coil structure of the G-rich strand.53 Furthermore, choline ions specifically bind to the loop region of an i-motif DNA, which stabilizes the i-motif.53 Trimethylamine N-oxide (TMAO) and trimethylammonium preferentially bind to the single-stranded coil form, resulting in the destabilization of the duplexes.54,55 In contrast, at low concentrations, TMAO interacts with the groove of the duplex and leads to its stabilization.56 Larger cations, such as tetrabutylammonium and tetrapentylammonium ions, stabilize a hairpin DNA with a long loop.57 The interaction also occurs in the loop of G-quadruplex, which increases the stability of G-quadruplexes. Thus, the loop region with sufficient length to access the loop nucleotides can interact with large cations. These findings suggest that organic cations play an essential role in the regulation of gene expression in cells.

Non-canonical structures can change to different conformations depending on the presence of metal cations. For an RNA sequence with equilibrium between G-quadruplex and hairpin structure formation, K+ ions trigger G-quadruplex formation, whereas Mg2+ converts this structure into a hairpin structure.58,59 We found that this type of transition can change the secondary structure of mRNA to control the translation (see section 3.4). These findings suggest that cellular metabolism can be regulated by ionic conditions through the formation of non-double helical nucleic acids.

3.1.3 Effect of pH:

The stability of canonical duplexes shows no pH dependence, except under extremely low and high pH values, as some base pair units are rarely protonated or partially deprotonated. G-quadruplex structures containing Hoogsteen base pairs between guanines also do not show any pH dependency with respect to stability. On the other hand, the triplex and i-motif structures contain C•G*C+ units and C*C+ base pairs, respectively. Thus, these structures exhibited pH dependency. As the pKa of the N3 atom of cytidine is 4.2, triplexes and i-motifs stably form under mildly acidic conditions. Therefore, stabilities of these motifs remarkably drop under neutral pH. However, pKa can be changed by the chemical environment, enabling the formation of these structures under neutral pH. In fact, we have found that choline cations increase the pKa value of the cytosine, which stabilizes i-motif structures in neutral pH (see section 4.1).53 Moreover, the molecular environments dramatically affect the protonation of cytosine and stabilize i-motif (see section 4.2).60 Therefore, varying pH can modulate the non-canonical structures in living cells. From a technological perspective, tuning the pH can reversibly change the conformation of non-canonical structures. Many nano-devices based on the pH dependent alterations in the triplexes and i-motifs, might be constructed in the coming years.

3.2 Telomere and Non-Double Helical Nucleic Acids.

The termini of chromosomal DNA and telomeres in eukaryotes are commonly repetitive and rich in guanine. For example, vertebrates, including humans, have a 5′-TTAGGG-3′ repeat in the telomere.61,62 The length ranges from a few kb to approximately 50 kb. Since DNA polymerase cannot replicate the 5′-terminus of the chromosomal DNA in eukaryotes, chromosomal DNA contains a single-stranded region at each end of the DNA strand, as shown in Figure 8;61,63 these single-stranded regions form G-quadruplexes (Figure 8).

As the G-rich sequence at the telomere is long enough to form multiple G-quadruplexes, we studied the thermodynamics of multiple repeats of human telomere G-quadruplex structures.64 The circular dichroism analysis indicated that the long folded telomeric sequences had an indicative spectrum of anti-parallel, hybrid, or parallel topologies, which depended on the solution conditions, as well as the properties of the individual telomeric G-quadruplex. On the other hand, the thermodynamic stability of multiple repeats of the telomeric G-quadruplex was decreased compared with that of the individual telomeric G-quadruplex. This decrease in thermodynamic stability suggests that G-quadruplex units are connected by TTA loops and are not stacked on each other. Since the diagonal and lateral loops are flexible, the resulting multimer would also not be rigid. This model, called a beads-on-a-string model, is accepted as one of the promising models of the telomere structure (Figure 8).

Due to the inability to replicate the telomere termini, the telomere length becomes shorter after cell division, and finally, the cells undergo apoptosis. However, cancer cells have an active telomerase that elongates the telomere termini, and thus, can divide without any limitations. As telomerase recognizes the single-stranded region of telomeres, G-quadruplex formation in this region inhibits telomere elongation by telomerase. In cancer therapy, the induction of G-quadruplex formation in the telomeric region is one of the approaches to inhibit the telomerase activity. Thus, a compound that stabilizes the G-quadruplex structure can be used as a drug for cancer treatment. G-quadruplex ligands have a large π planar structure that binds to G-quadruplexes with high affinity, as well as a positively charged group that binds to the phosphate backbones. However, due to the positive charge, these ligands also bind to chromosomal DNA duplexes, which decreases the specificity of the ligand to bind to G-quadruplexes in cells. To develop G-quadruplex ligands with high specificity for the G-quadruplex, we focused on anionic phthalocyanines.65 We used copper(ii) phthalocyanine 3,4,4′′,4′′′-tetrasulfonic acid (Cu-APC), which showed relatively low affinity (binding constant, Ka = 2.4 × 104 M−1 at 25 °C) for G-quadruplex DNA, as compared to that of the cationic ligand Meso-tetrakis-(N-methyl-4-pyridyl)-porphyrin (TMPyP4). In the presence of excess amounts of DNA duplexes mimicking the conditions in the nucleus of the cell, the Ka value was calculated to be 2.8 × 104 M−1 at 25 °C, which is almost the same as that in the absence of DNA duplexes. These results suggest that Cu–APC specifically binds to the G-quadruplex. Furthermore, we added PEG 200 as a crowder molecule and found that PEG 200 did not affect the affinity and specificity of the binding of Cu-APC to G-quadruplex DNA; however, the affinity and specificity of TMPyP4 were found to be decreased. We further tested whether Cu-APC inhibited telomerase activity in human cell lysates. As expected, in the presence of excess DNA duplexes, although TMPyP4 could not repress the telomere elongation, Cu-APC inhibited the telomere elongation as effectively as in the absence of excess DNA duplexes. To test the activity of Cu-APC in the cells, we added Cu-APC to the living cells.65 We found that the antiproliferative activity of Cu-APC on normal cells was lower than that on the HeLa cells. Therefore, our research approach of mimicking the cellular condition was useful for understanding the mechanism of G-quadruplex formation in human telomeres and developing G-quadruplex ligands to inhibit telomere elongation in cancer cells.

3.3 Role of Non-Double Helical Nucleic Acids in Transcription.

Transcription is the first process of gene expression, in which the primary sequence information in a template DNA is copied into a new RNA strand by RNA polymerase. The underlying mechanism of how transcription proceeds in eukaryotes was first proposed by R. D. Kornberg.66,67 When transcription factors bind to specific DNA sequences, called enhancers and promoters, RNA polymerase is recruited to an appropriate DNA sequence known as the transcription start site. The RNA polymerase then moves to unwind the double strand of the template DNA and reads the DNA sequence to synthesize RNA. Transcription is terminated when the RNA strand is completely synthesized. Understanding the molecular mechanisms involved in transcription is of fundamental medical importance because disruption of the process may be related to human illnesses such as cancer, heart disease, and various kinds of inflammation. Thus, Kornberg was awarded the Nobel Prize in Chemistry in 2006 for his elucidation of eukaryotic transcription at the molecular level.

RNA elongation during transcription is highly regulated and dependent on the DNA sequence68,69 The fidelity of transcription is critical for maintaining the accurate flow of genetic information. However, some DNA sequences perturb the transcription elongation.70,71 During the elongation process, the movement of RNA polymerase is not constant and can be divided into the following categories: “Pause,” wherein RNA polymerase movement rate drops sharply; “Arrest,” wherein the polymerase movement is completely stagnant; “Slippage,” wherein the polymerase slides on the DNA. In general, RNA production decreases during “Pause,” the strand length of transcribed RNA increases or decreases in “Slippage,” and RNA polymerase stops moving during “Arrest.” As a consequence of changes in the movement of RNA polymerase, the amount and length of transcribed RNA produced changes. Such changes related to RNA production are known as transcriptional mutations that potentially cause diseases.72 For example, transcriptional mutations have been reported in genes for β-amyloid precursor proteins in the human central nervous system, as well as in genes related to hypolipoproteinemia and hemophilia.68,73 Thus, studies have investigated the relationship between mutated RNA production and disease onset.74

The site that causes transcriptional mutations is considered to be determined by the DNA sequence, i.e., the primary source of information (Figure 9). DNA structures also affect transcription regulation.7577 Although the canonical DNA structure is a duplex, it can also form non-canonical structures such as triplexes, G-quadruplexes, and junctions, as mentioned above. Such non-canonical structures are formed by specific DNA sequences (e.g., G-quadruplexes may be formed by a continuous sequence of guanines). In recent years, it has become clear that transcriptional mutations are also caused by non-canonical structures on the template DNA. Transcriptional mutations caused due to “structures,” especially G-quadruplexes, are more efficient at inducing “Arrest” than previously reported transcriptional mutation-causing DNA sequences. We found that in an environment that mimics the intracellular environment, a duplex is destabilized, but non-canonical structures such as triplexes and G-quadruplexes are stabilized.78 Since the cellular environment changes significantly depending on the cell cycle, the DNA structure may also change in response to environmental changes (Figure 9). We believe that such a DNA “structure” may play a role in regulating biological phenomena such as gene expression.

To analyze the effect of DNA G-quadruplex structure on transcriptional mutation, we designed loop sequences with varying numbers of G-quartets (2, 3, or 4 G-quartets) and investigated their structures and thermostability (Figure 10). We then designed six template DNAs (Q1–Q6), where these G-quadruplex-forming sequences (Figure 11a, X site) were inserted downstream of the T7 promoter sequence; a control sequence (Linear) was designed such that it did not form a significant structure (Figure 11b).

When the transcription proceeded to the end of the DNA template, a full-length transcript of 70 nt was produced (Figure 12). However, the length of the transcript changed when the template DNA had a G-quadruplex-forming sequence. Template DNAs with an unstable G-quadruplex (Q1, Q2, and Q4) produced a full-length transcript, as well as a transcript 5–10 nt longer than the full-length transcript due to “Slippage” (Figure 12). On the other hand, full-length RNA transcript production was significantly reduced in the template DNAs forming a stable G-quadruplex, namely, Q3, Q5, and Q6. Furthermore, in the products transcribed from these template DNAs, RNA polymerase arrested transcription on the template DNA for a certain period of time, and the presence of short-stranded transcript RNA that appeared to be dissociated from the template DNA was confirmed (Figure 12).

In addition, since the G-quadruplexes in Q3, Q5, and Q6 have different topologies, it was found that transcription arrest is determined not by the topology of G-quadruplexes, but by their stability. In other words, when a stable G-quadruplex is formed in the template DNA, RNA polymerase cannot unwind the G-quadruplex and stops at the template DNA, inducing “Arrest.”77 Physical and chemical analyses revealed that such transcriptional mutations were induced by a stable non-canonical structure formation (−ΔG°37 = 8.2 kcal mol−1 or higher) on the template DNA.77 This information about the effect of the stability of the non-canonical structure on transcription is very useful in predicting transcriptional mutations based on the DNA sequences.

3.4 Role of Non-Double Helical Nucleic Acids in Translation.

The process of translation produces proteins, which are the main functional molecules in cells, based on the genetic code of nucleic acids. This is one of the most important biological reactions that is indispensable for life as we know it. As described in the “Sequence Hypothesis,” proposed by Crick,79 the amino acid sequences of the proteins are defined by the primary sequence of DNA. Messenger RNA (mRNA), which is produced from DNA through transcription and serves as a template for translation, is often seen as a molecule that acts as an intermediate between primary sequences of nucleotides in DNA and amino acids in proteins. Therefore, in textbooks elaborating on the central dogma, mRNA is often illustrated as an extended strand. However, as mRNAs are intrinsically produced as single-stranded nucleic acids, they form complex higher-order structures through intramolecular interactions, as well as interactions with solute molecules, as described in sections 2.1 and 3.1.

To enable translation elongation, the ribosome needs to bring the mRNA structures to a single-stranded state. Therefore, the presence of stable structures in mRNA, especially in the open reading frame (ORF) region, potentially slows down or temporarily halts translation elongation and subsequently impacts gene expression from the mRNA (Figure 13). For example, pseudoknot structures formed on the ORF have the potential to suppress translation elongation.80,81 When the mRNA contains a characteristic sequence, called the slippery sequence, upstream of the pseudoknot structure, the suppression of translation elongation triggers ribosomal frameshift.82,83 In addition, alteration of the translation elongation rate has been suggested to influence not only the gene expression levels but also the structure formation of nascent proteins on the ribosome.8486 Based on these observations, we envision that the stable structures of non-double helical nucleic acids formed on mRNA might play a role in modulating the rate of translation elongation to efficiently functionalize the translated proteins. However, the effects of the topology and stability of mRNA structures on the translation elongation reaction remain elusive.

The translation process is divided into the following three steps: initiation, elongation, and termination. Initiation is the rate-determining step of translation.87,88 Thus, the extent of elongation by the ribosome after a certain reaction time largely depends on the time at which each ribosome initiates the translation, making the analysis of translation elongation difficult. To overcome the rate-determining step of the initiation, we have developed an analytical technology named “synchronized translation” that enables analyses of ribosome progression, focusing on the elongation step.89 In this technology, the ribosomes, which have started translation reaction, are stalled at a specific codon on mRNA, and the elongation reaction is restarted from that specific codon (Figure 14). In this way, we can analyze the progression of the ribosome, as the rate-determining step at initiation is already crossed when the ribosomes are synchronized at a specific codon.

By means of the “synchronized translation,” it has become possible to analyze the effects of non-double helical nucleic acids on translation elongation. We focused on the G-quadruplex structures formed on mRNA because the RNA G-quadruplexes tend to be more stable compared to those formed by cognate DNA sequences. It has been demonstrated that protein expression levels are reduced when G-quadruplexes form on the 5′-untranslated region (UTR) of mRNAs.90 However, it has not been analyzed whether the translational elongation is suppressed by G-quadruplexes formed on the ORF. It is possible that the effects of G-quadruplexes on ORF have not been studied, as the mRNA sequences with the potential of forming G-quadruplexes in the ORF are less abundant compared to those with the potential of forming G-quadruplexes in the 5′-UTR.91 On the other hand, it is also considered that the appearance of the sequences with G-quadruplex forming potential in ORF has been evolutionarily avoided, as it disturbs translation elongation.91 In this case, if the sequence with G-quadruplex forming potential is conserved at some specific regions within the ORF, the G-quadruplexes formed would play some role in the biological systems. Based on this assumption, we constructed mRNAs with a sequence capable of forming G-quadruplex in its ORF and analyzed translation elongation using “synchronized translation.”92 As results, the translation elongation stalled in vitro for up to approximately 5 min. In addition, molecular mass spectrometry of the halted products demonstrated that translation elongation was halted at a codon located 5, 6, or 7 nucleotides before the G-quadruplex (Figure 15). The suppression of translation elongation caused by the G-quadruplexes on ORF was also observed as decreased protein expression levels in cells.93 The degree of suppression was dependent on the thermodynamic stability of the G-quadruplex elements, which were evaluated under molecular crowding conditions (Figure 16) that mimic intracellular molecular environments. We also demonstrated that in addition to the pseudoknot, translation suppression caused by G-quadruplex on ORF triggers ribosomal frameshift by inserting the slippery sequence at an appropriate position located at the site for binding of peptidyl-tRNA and aminoacyl-tRNA.94 Since the efficiency of translation suppression depends on the position of the G-quadruplex within the reading frame of the ribosome,95 it is considered that the frameshift makes it easier for the ribosome to overcome the G-quadruplex region and continue the elongation reaction. These results suggest that non-double helical structures formed on the ORF of mRNAs modulate not only the gene expression levels but also the process of translation. In particular, an increase or decrease in the elongation rate, which depends on the stability and structural topology, is one of the main factors affecting the modulation mechanisms. In addition, as described in section 3.6, the dynamic behavior of non-double helical nucleic acids is also an important factor in modulating the processes related to gene expression.

3.5 Role of Non-Double Helical Nucleic Acids in Replication.

Intracellular genomic DNA must be accurately copied during the replication in each cell cycle and then distributed to the daughter cells. This is one of the key processes in the central dogma of molecular biology. In the replication reaction, DNA polymerase, along with other supporting proteins, synthesizes a new DNA strand complementary to the template DNA. During the reaction process, the double strand must be unwound to form a single-stranded state for facilitating the replication. At this instance, non-double helical structures can be formed along the template DNA, which can affect replication. Replication errors lead to mutations and recombination of genomic DNA, which can cause serious diseases.

We quantitatively studied the effect of non-double helical nucleic acids on the replication efficiency of DNA strands. We found that the topology of non-double helical nucleic acids greatly affected the efficiency of DNA replication by DNA polymerases (Figure 17a).96 In this study, the i-motif structure derived from Hif1α, which is a cancer-related gene, was stable (−ΔG°37 = 3.1 kcal mol−1) at pH 6.0. The replication stalled immediately before the formed Hif1α i-motif along the template DNA. The rate constant (ks) to overcome the i-motif structure was 0.39 min−1. In contrast, although the hairpin structure via Watson-Crick base pairing showed similar stability (−ΔG°37 = 4.0 kcal mol−1), the replication efficiency was much higher with ks of 3.7 min−1. Efficiencies of the replication reactions affected by structural properties such as stability and topology of the template DNA were analyzed by a method named quantitative study of topology-dependent replication (QSTR). QSTR enables a quantitative phase diagram analysis of the replication rate vs. stability of the DNA structure and helps determine the dependency of replication on the topology of the template DNA. We collected QSTR plots from the results of various structures with different stabilities and topologies, including G-quadruplexes. A linear relationship was observed between the stability and replication efficiency; however, the linearity differed depending on the topology (Figure 17a, left panel). The activation free energy, ΔG, can be expressed as a relationship including RTlnks. Thus, if the QSTR plots are linear, which indicates that the ratio of ΔG and −ΔG°37 is the same for replication of DNA with the same topology, the replication mechanisms through the unfolding of non-double helices may also be the same. From the slope of the QSTR plot, the i-motif, as well as the antiparallel and parallel G-quadruplexes, showed the strongest stalling of replication among the analyzed structures. However, under crowding conditions, topology-dependent replications showed a different trend from that in the non-crowding condition (Figure 17a, right panel). The human telomere G-quadruplex formed a parallel topology in the crowding condition and effectively repressed replication in this condition, as was also observed in the case of G-quadruplexes with a parallel topology under non-crowding conditions.

In contrast, the replication stall caused by i-motifs was effectively resolved in the presence of 20 wt% PEG 200 (Figure 17a, right panel), which was similar to the trend observed for the replication of hairpin structures under non-crowding condition. However, 20 wt% PEG 1000 effectively repressed the replication of the i-motifs. These results suggest that each crowding condition alters the stability and topology of the non-canonical structures formed on the template DNA, resulting in the regulation of the free energy of activation for unwinding the structure by DNA polymerase. By quantitatively analyzing the data obtained from these thermodynamic analyses, it is possible to understand the influence of the molecular environment on gene replication and expression, which in turn depends on the stability and topology of the template DNA.

As the changes in Watson-Crick and Hoogsteen base pair formation are triggered by the surrounding chemical environment, chemical compounds should also be able to control gene replication and expression. We identified that fisetin, one of the flavanols from plants, showed a specific binding to the i-motif derived from the promoter region of the human vascular endothelial growth factor (VEGF) gene.97 The binding to the VEGF i-motif drastically enhanced the photo-induction of the excited state intramolecular proton transfer (ESIPT) reaction of fisetin due to the stabilization of the tautomeric structure that emits strong fluorescence.98 This unique response coincided with the transformation of the i-motif to the hairpin structure where the putative Watson-Crick base pairs were formed between some guanines and cytosines within the i-motif sequence. The QSTR plot clearly indicated this transformation by fisetin; the plot of the VEGF i-motif shifted towards the plots obtained from the hairpin structures (Figure 17b). The dual effect of fisetin on the VEGF i-motif (fluorescence emission and release of replication stall) suggests that fisetin can be used in theranostics for cancer therapy.

Moreover, we investigated how G-quadruplex formation in human VEGF gene carrying oxidative damage could be recovered using a chemical strategy involving the regulation of G-quadruplex topology. Guanine bases on G-quadruplexes can be oxidized to form 8-oxo-7,8-dihydroguanine (8-oxoG). It has been reported that the presence of 8-oxoG in a sequence affects G-quadruplex formation, which can be related to cancer progression.99 We found that G-quadruplex derived from the promoter region of the human VEGF gene showed a different topology post the introduction of 8-oxoG in the sequence as compared to the unoxidized G-quadruplex structure. Furthermore, QSTR plots indicated that the oxidized G-quadruplex did not efficiently block the replication (Figure 17c).100 For the recovery of the function of the oxidized G-quadruplex, we, in collaboration with the B. H. Kim group, developed a guanine-tract modified with pyrene at the 5′-position (5′-pyrene-UGGGT-3′). This modified oligonucleotide could bind to the oxidized G-quadruplex by replacing the oxidized guanine tract, which resulted in stable intermolecular interactions with other intact guanine-tracts (Figure 17c).100 As shown in the QSTR plots, the properties of the oxidized G-quadruplexes with the pyrene-modified oligonucleotide were similar to those of the unoxidized G-quadruplexes, indicating that the modified oligonucleotide recovered the function of the G-quadruplex.

3.6 Dynamic Regulation of Biosystems by Non-Double Helical Nucleic Acids.

Biomolecular systems in cells maintain homeostasis by dynamically changing and processing gene expression. The formation of the non-double helical structures of nucleic acids on the template strands affects the efficiency and accuracy of the replication, transcription, and translation, as described above, and thus, potentially modulates gene expression.101 One of the characteristic features of the reactions in the central dogma is that the reactions progress concurrently and consecutively in a unidirectional manner. Therefore, events occurring in the range of similar time scales can interfere with and affect each other.102 DNAs and RNAs, which function as the template strands, always repeat the formation and dissociation of the structures with the progression of the reactions. In addition, as described in section 3.1, their structures and stabilities are dependent on various environmental factors.78 Based on the dynamic features of nucleic acids, we envisioned that the non-double helical nucleic acids might play important roles in modulating gene expression, depending on spatiotemporal conditions in cells and cellular life cycles. This type of modulation can be hypothesized as a dimensional code in the central dogma based on the higher-order nucleic acid structures that are different from the regulatory sequence code that is dependent on the primary sequence in the template strand (Figure 18).

Co-transcriptional folding of nascent RNA is one of the reaction processes that provides conformational dynamics in RNA structures.103 Due to the directionality of the transcription process, folding of nascent RNA occurs sequentially from the 5′ to 3′ end that provides high potential to form immaturely folded metastable structures during transcription.103,104 The metastable structures would possibly lead to both unfavorable misfolded structures and favorable intermediates of the correct folding process.102 As the folding rates of mRNAs with potential to form G-quadruplexes are generally slower than those of the canonical helices, such as hairpins containing small bulges and internal loops, co-transcriptional G-quadruplex folding is expected to be disturbed by the formation of alternative metastable structures.105107 We constructed a real-time monitoring system of G-quadruplex formation during and after the transcription, and we demonstrated that the nascent RNA with G-quadruplex forming potential tended to be entrapped in the metastable secondary structure during transcription, especially when the sequence had a cytosine or uracil rich region upstream of the sequence with potential to form the G-quadruplex (Figure 19).108 We also demonstrated that the rates of RNA conformational transitions from the kinetically folded secondary structure to the thermodynamically stable G-quadruplexes depended on the environmental factors such as crowding condition in cells.108

Considering that the RNA G-quadruplexes affect the translation efficiency,92,93 the formation of G-quadruplexes in nascent RNAs, depending on the intracellular chemical and molecular environments, would be an example of the dimensional code in the central dogma. For example, intracellular abundant RNAs could potentially alter the dynamics of the folding of nascent RNAs through RNA-RNA interactions. We have focused on transfer RNAs (tRNA), which are one of the most abundant non-coding RNAs and known to be expressed at higher levels in particular cancers;109,110 this is because tRNAs act as not only the transporters of the amino acids into the ribosomes but also modulators of gene expression through tRNA-RNA interaction.111,112 By using in vitro transcription and translation systems, we have demonstrated that the G-quadruplex formation on the nascent RNAs is well suppressed due to the interaction between kinetically formed secondary structure and tRNA; further, G-quadruplex-mediated suppression of protein expression was weakened in the presence of tRNA at concentrations similar to the observed cellular levels.59 An increase in the protein expression level by disturbing the G-quadruplex formation through kinetically formed secondary structure was also demonstrated in Escherichia coli.113 The formation of G-quadruplex and suppression of translation was observed when the translation elongation rate was reduced by using antibiotics that caused time lag between transcription and translation of the region with the G-quadruplex-forming potential. However, the suppression of gene expression was lost when we mutated the sequence at the 5′ flanking region of the sequence with G-quadruplex forming potential to form a relatively stable hairpin structure.113 These observations suggest that G-quadruplex formation on the nascent RNA and its impact on gene expression depends not only on the rate of the conformational transition from the metastable RNA structure, but also on other dynamic characteristics (e.g., co-transcriptional translation in prokaryotes) that are associated with gene expression processes (Figure 20).

Co-translational folding and assembly of nascent proteins is also one of the reaction processes that provides structural diversity in mature proteins.86,114,115 C. Anfinsen, who received the Nobel Prize in Chemistry in 1972, proposed, based on his study of ribonucleases, that proteins spontaneously fold into a structure with the lowest free energy.116 Anfinsen's dogma has been accepted as a general property in protein folding. However, when the proteins are refolded from their denatured state, some proteins fold only with the aid of chaperone proteins under certain conditions.117 This suggests that the processes associated with natural protein folding are not simple. It is considered that the co-translational folding of nascent proteins on the ribosome is important for some proteins to be functional, especially for those consisting of multiple domains.86 The impacts of co-translational folding that affect mature protein structures and functions have been demonstrated by focusing on synonymous mutations, which change the codon usage from normal to rare or vice versa. Changes in codon usage altered the translation elongation rate at the codon level and kinetically controlled co-translation protein folding and function of mature proteins, even when the amino acid sequence of the translated product was the same.84,85

If the translation elongation rate is an important factor affecting the process of co-translational folding, the non-double helical structures on mRNA would also affect the elongation rate and protein folding. Based on this hypothesis, we focused on a region of mRNA with G-quadruplex forming potential, which is located on the ORF of human estrogen receptor alpha (hERα) mRNA. The sequence region encodes amino acids at the hinge region of hERα, which connects the DNA binding and estrogen binding domains; G-quadruplex formation in this region would affect the folding of hERα.118,119 Mutational analyses were performed by introducing synonymous codons that alter the stabilities of G-quadruplex structures formed by the sequences.120 As a result, we found different cellular protein expression patterns resulting from proteolysis at specific regions. In addition, the amount of the cleaved protein depended on the stability of the G-quadruplex formed by the sequence variants, even when the translated products had the same amino acid sequence. These results suggest that the G-quadruplex on hERα mRNA controls the protein structure by kinetically connecting translation elongation and co-translational protein folding. In addition to G-quadruplexes, other non-double helical structures alter their stabilities depending on various factors described above. As we have demonstrated that alterations in protein expression patterns depend on the stability of G-quadruplex on hERα mRNA, it is expected that the non-double helical structures on mRNA function like protein folding codes, which could modulate the translation elongation rate and co-translational protein folding (Figure 21). This type of modulation is considered as an evident example of the dimensional code in the central dogma.

4.1 Effect of Ionic Liquid.

Nucleic acids have been widely used as analytes in biological research. In the field, it is important to purify and store high-quality nucleic acids from various biological samples for obtaining reproducible experimental results. Additionally, nucleic acids have attracted attention for their potential nanotechnological applications. Based on the above-mentioned chemical and structural properties of nucleic acids, methods for controlling their conformational polymorphism and enhancing sequence selectivity are important for technological applications. However, the lack of media that support the favorable behavior of nucleic acids has been a bottleneck in accelerating the biological and nanotechnological applications of nucleic acids.

Ionic liquids (ILs) are solvents that can be utilized to regulate the behavior of nucleic acids, because ILs exhibit properties that are not found in water and molecular liquids.121124 ILs provide favorable environments for chemical reactions, as they are non-volatile at vapor pressures close to zero.125 Based on these advantageous properties, ILs are widely used in various fields. For example, ILs can be used as electrolytes for batteries without fear of ignition or explosion, and as a solution that dissolves cellulose (the base of bioethanol) without heating.126 These excellent properties of ILs have been extensively utilized in bioscience and nanobiotechnology. Liquids formed by dissolving slightly less than 20% water in choline dihydrogen phosphate (dhp) are considered to be hydrated ILs because they behave as ILs (Figure 22a). Proteins show solubility in hydrated ILs of choline dhp,127 which exhibits chaperone-like activity toward the protein.128 Moreover, the deep eutectic solvent (DES), a liquid formed by mixing choline chloride (2-hydroxyethyl-trimethylammonium chloride) and urea at a 1:2 ratio, also significantly alters the properties of biomolecules (Figure 22b). For example, lysozyme and bacterial enzyme activities are maintained in DES, although urea generally denatures proteins.129,130

The structure and stability of nucleic acids in ILs and DESs have been analyzed in the recent years. As a standard aqueous solution for biochemical experiments, a neutral aqueous solution containing 100 mM NaCl or KCl has been commonly used. In this solution, the cations bind to the nucleic acid to shield the negative charge of the phosphate group. In a solution with a high cation concentration, the cations also bind to the bases and sugars, thereby affecting the structure and stability of nucleic acids. It was reported that the long DNA duplexes from salmon testes showed the same B-type duplex in both choline dhp and aqueous solutions. It is also noteworthy that the B-type structure of nucleic acids is retained for a much longer period in choline dhp than in aqueous solution.131 However, some structures, depending on characteristic sequences, turn out to be different in standard aqueous solutions and ILs. For example, in the DES solution, a 32 bp duplex forms A-type duplexes; however, the G•C-rich duplex forms a Z-type duplex. Although G-rich DNA sequences typically form a B-type structure in aqueous solution,132 three types of structures are possible, even if the sequence is identical: anti-parallel type, parallel type, and hybrid type. For example, the human telomere sequence dAG3(T2AG3)3 folds into a G-quadruplex of an antiparallel and a hybrid type in aqueous solutions containing Na+ and K+, respectively.53 The human telomere sequence forms an anti-parallel G-quadruplex in choline dhp, but forms a parallel-type structure in DES containing 100 mM KCl (K+/DES); however, no particular structure is formed in DES alone.53,127 Therefore, ILs change the behavior of DNA in a manner distinct from the effects of aqueous solutions.132

In standard aqueous solutions, G•C base pairs in a duplex are much more stable than the A•T base pairs. However, A•T base pairs in choline dhp are stabilized to a greater extent than G•C base pairs.52 Furthermore, and in general, triplex and i-motif structures are very unstable in neutral aqueous solutions. However, in choline dhp, the structures can be stably formed.52 Such changes in DNA stability due to choline dhp have been analyzed using thermodynamic analysis for deciphering the interaction parameters. By structural analysis, J. Plavec and our group have also revealed, via NMR experiments and molecular dynamics (MD) simulations, that choline ions bind to A•T rich regions in the minor grooves of the B-form DNA duplexes and stabilize the A•T base pairs via hydrogen bonds (Figure 23a).133,134 On the other hand, formation of a G•C base pair was prohibited by choline ions, which interact with guanine nucleobase in the single-stranded state, resulting in the destabilization of duplex with abundant G•C base pairs.52 Several in vitro and in silico studies have been conducted to investigate the mechanism of this behavior in more detail. S. Senapati et al. analyzed the interactions of cations in IL with DNA base pairs in depth.135 The results obtained from MD simulations further suggested that the electrostatic association of cations in IL with the DNA backbone contributes significantly to the DNA stability.135 They also confirmed the intrusion of IL molecules into the DNA minor groove using the fluorescent dye displacement assay. Using MD simulations, N. V. Hud et al. have shown differences in solvation energies of IL between single-stranded DNA with G and C bases and that with A and T bases change the stability of DNA duplex.136 Thus, specific interactions of IL with the grooves in the nucleic acids affect the stability and structure of duplexes.

Moreover, our group also has demonstrated the formation of relatively stable DNA triplexes under the crowding condition with choline dhp at neutral pH; notably, since formation of triplexes requires protonation of cytosine, they are less stable at neutral pH in an aqueous solution.137 Thermodynamic analyses have revealed that the stability of Hoogsteen base pairs improves to the same level as observed for Watson-Crick base pairs in hydrated ILs. The detailed molecular mechanism of triplex stabilization by ILs was analyzed via MD simulations, which suggested the binding of choline ions in the grooves around the third strand (Figure 23b).138 H. Ohno et al. have also reported the formation of stable G-quadruplexes in choline dhp through internalization of Na+ or K+ ions into G-quartet.127 From a comparison of G-quadruplex and i-motif, we have demonstrated that i-motifs are more stable than G-quadruplexes in the presence of choline dhp. The i-motif structures were found to be dramatically stabilized by choline ions through their binding to loop regions and grooves, leading to the stable formation of i-motifs in neutral aqueous solutions (Figure 23c).53 Thus, ILs can act as good solvents for controlling the stability of nucleic acids.

4.2 Effect of Molecular Crowding.

The stability, structure, and function of nucleic acids are affected by the surrounding conditions. Biomolecules, such as nucleic acids, proteins, metabolites, and polysaccharides, are contained in living cells. Biomolecules occupy a significant proportion in cells, resulting in molecular crowding environments (Figure 24).139,140 In such environments, the chemical potentials and activity coefficients of the biomolecules change due to the volume excluded by biomolecules.140142 Moreover, the intracellular physical properties such as osmotic pressure, dielectric constant, and viscosity are significantly different from those in an aqueous solution.142 To reproduce intracellular molecular crowding, experiments have been performed in aqueous solutions containing co-solutes (Figure 25). Co-solutes, especially those with a low molecular weight, alter the physical properties of the solution; these changes affect the interactions of nucleic acids with water molecules and cations.44,142145

The excluded volume effect caused by the addition of large co-solutes reportedly does not have any significant effect on the thermal stability of structures formed by short oligonucleotides. In principle, the structure formation of nucleic acids involves the formation of hydrogen-bonding networks with water molecules that bind nucleic acids. It is known that the formation of this network, called hydration, is greatly influenced by the water activity in the solution. The water activity of a solution agent is decreased by the addition of co-solutes.146 In molecular crowding conditions, the stability of nucleic acids is changed by the degree of hydration of nucleic acids, which is measured by −Δnw, the number of water molecules taken up during the structure formation of nucleic acids. The decreased water activity induced by the co-solutes is unfavorable for the formation of structures that require hydration and is favorable for the formation of structures that are accompanied by dehydration. For example, some duplex formations are involved in hydration; thus, the duplex structure is destabilized in the molecular crowding solution.143,147 By contrast, the formation of triplexes, G-quadruplexes, and junctions is accompanied by dehydration. Thus, these structures are stabilized under crowded conditions.44,147 Changes in the stability of nucleic acids also depend on the chemical structure of the cosolutes. The number and positions of the hydroxyl groups of the cosolutes is important. For example, to investigate the effect of the cosolute chemical structure on the number of water molecules taken up during duplex formation (−Δnw), the stability change of the duplex in the presence of cosolutes, namely ethylene glycol (EG), glycerol, 1,3-propanediol, 2-methoxyethanol, 1,2-dimethoxyethane, or PEG, was investigated.148 The values of −Δnw for EG and glycerol were small. In contrast, those obtained for 1,2-dimethoxyethane and PEGs were large; the data for 1,3-propanediol and 2-methoxyethanol appeared to be intermediate between the values of −Δnw of the PEGs and glycerol. These trends indicate that the number of water molecules taken up per base pair differs depending on the cosolute structure.

There exists a method to predict the hybridization energy and folding structure of nucleic acids by using thermodynamic parameters based on the NN model (see section 2.3). DNA duplexes consisting of the same nearest-neighbor composition have similar thermodynamic parameters, even if a large amount of co-solute is added to the solution. Therefore, the stability of nucleic acids under crowding conditions can be predicted by the nearest neighbor model.37,149 The nearest neighbor parameters that we have developed are very useful for predicting the stability of RNA/RNA, DNA/DNA, and RNA/DNA duplexes, including in environments similar to those inside cells.37

Utilizing the effects of co-solutes on nucleic acids, nucleic acid functions can be enhanced. For example, when the hammerhead ribozyme folds into the active structures that show cleavage activity with dehydration, the activity of the hammerhead ribozyme is increased by 20-fold following the addition of PEG 8000, relative to that in an aqueous solution.48 Similarly, the activity of the hairpin ribozyme is also increased in solutions containing co-solutes, as dehydration occurs by formation of the active structure.150 Moreover, the thrombin aptamer binding with thrombin protein involves the uptake of water molecules, thus, the binding is promoted in the presence of co-solutes.151 Taking advantage of changes in the behavior of biomolecules via hydration and dehydration in molecular crowding environments, can be a powerful tool for the rational design of oligonucleotides such as microRNAs and antisense nucleic acids, which can be used in living cells to regulate biological reactions.

4.3 Phase Separation with Nucleic Acids.

Liquid-liquid phase separation has recently attracted attention as an important phenomenon that characterizes the intracellular environment. Various types of droplets have been reported to form in the nucleus and cytoplasm (Figure 26) and are known as “membrane-less organelles.”152 It is becoming clear that these droplets repeatedly form and disappear reversibly and act as fields for biological reactions, including transcription/translation, protein association reactions involving signaling proteins, and amyloid formation. For example, DNA in the nucleus is under a molecular crowding environment and is densely folded using histone proteins. Transcription factors that control transcription initiation form densely concentrated droplets, and when the droplets contain DNA, transcriptional activity increases.153 In addition, FUS (fused in sarcoma) protein causes amyotrophic lateral sclerosis (ALS) by forming rigid amyloid fibers.154 The FUS protein forms droplets and is concentrated in these droplets to form fibers. The reversibility of the droplets is lost eventually and amyloid fibers are precipitated.155 The behavior of FUS protein may be regulated by droplet formation.

The characteristics of droplets are high reversibility and high fluidity, which allow them to move inside the cell while concentrating liquid inclusions. In an intracellular molecular crowding environment, droplets may accumulate transcription factors and amyloid-forming proteins and then carry out subsequent reactions. For example, the intrinsically disordered domain of prions prevents abnormal protein aggregation (irreversible denaturation) by forming droplets.156 Furthermore, it has also been reported that the formation of such droplets is mitigated by RNA.157 To date, the main component of droplets has been considered to be proteins (or peptides); however, in recent years, it has been reported that RNA forms droplets or aggregates in droplets. For example, RNAs, containing hairpins and G-quadruplexes, which are transcribed from genes involved in neurodegenerative diseases, form droplets.158 Since these RNAs also interact with peptides that show cytotoxicity in cells in neurodegenerative diseases, the relationship between RNA phase separation and cytotoxicity has received increasing attention in the past few years.21,158,159 In the recent years, it has been found that phase separation is regulated by the structure of RNA, as described in section 5.2; the influence of RNA structure on disease progression has attracted attention.160

5.1 Cancer and Non-Double Helical Nucleic Acids.

The canonical DNA duplex structure retains genetic information, depending on the base sequence. In contrast, non-canonical DNA structure formation largely depends on the molecular environment and tends to induce mutations. Focusing on these characteristics of DNA structures, we hypothesized that the DNA “structure” could have a role in “controlling” the expression of genetic information in the cell. Non-canonical structures induce transcriptional mutations. It was also found that transcriptional mutations from the template DNAs with G-quadruplexes were inhibited near membrane surfaces, such as organelle membranes, because G-quadruplexes are unstable on the membrane surface.161 G-quadruplexes are reported to be located in the euchromatin, indicating that G-quadruplexes facilitate transcription.162 Cancer-related genes of c-Myc,163,164 c-Kit,165 KRAS,166 hTERT,167 BCL2,168 and VEGF169 were found to form G-quadruplexes, suggesting that G-quadruplexes may regulate the expression of cancer-related genes.

We analyzed the effect of DNA structure on intracellular gene expression during cancer progression. The intracellular chemical environments change significantly due to cell carcinogenesis and its progression (e.g., changes in the crowding environments due to decreased K+ concentration, decreased dielectric constant, overexpression of specific proteins, etc.). Therefore, we analyzed the effects of changes in the intracellular chemical environments during cancer development and progression on transcriptional activity, targeting the G-quadruplexes that induce transcriptional mutations with higher efficiency. The amounts of RNA transcribed from DNA with a G-quadruplex in normal cells, cancer cells, and malignant cancer cells were compared. It was shown that, in normal cells, the amount of RNA transcribed from DNA with a G-quadruplex structure was less than that obtained from DNA without G-quadruplex, thus revealing that transcription was suppressed by the G-quadruplex.164 It was also found that the amount of RNA transcribed from DNA with G-quadruplex was higher in cancer cells than that in malignant cancer cells. To analyze the changes in the intracellular environment, the expression level of the ion channel protein KCNH1, which releases K+ ions from the cells, was observed. The KCNH1 expression level was observed to have been increased in cancer cells, and the highest expression level was observed in malignant cancer cells. In other words, it was observed that K+ concentration, which contributes to G-quadruplex stabilization, was reduced in cancer cells. The results suggest that in normal cells, G-quadruplexes are formed on the oncogenes and transcription is inhibited (Figure 27, upper panel). However, in malignant cancer cells, G-quadruplexes are destabilized, and transcription of cancer-activating genes is promoted (Figure 27, lower panel). In this study, a novel mechanism was identified, in which the G-quadruplex structure was formed in the regulatory region of the oncogene in response to changes in the intracellular environment due to malignant transformation of the cell.164 Several ion channels on the cell surface control the ingress and egress of ions in cells and are considered to be involved in cell cycle control, signal transduction, and disease onset. This study demonstrated that not only the cancerous state but also transient changes in the intracellular ion environment via ion channels may affect nucleic acid “structure” and “control” the disease onset.

5.2 Neurodegenerative Diseases and Non-Double Helical Nucleic Acids.

In cells, phase separation reversibly produces various types of droplets, as described in section 4.3. Droplets are known to serve as reaction fields for many biological reactions, such as transcription/translation, protein association, and amyloid formation; droplet-mediated control mechanisms for these biological reactions have drawn attention. In recent years, J. Wang et al. have found that RNAs with hairpins and G-quadruplex structures, which were transcribed from genes involved in neurodegenerative diseases, formed droplets.21 Since these RNAs also interact with peptides that show cytotoxicity in cells in neurodegenerative diseases, the relationship between RNA phase separation and cytotoxicity has recently begun to draw attention. Abnormalities in protein-protein interactions and protein folding, or dysfunction of key proteolytic mechanisms in the cell induce protein aggregation and accumulation. Conditions with these characteristics are often referred to as “proteinopathies,” and examples include multiple diseases such as the Parkinson’s disease (PD),170 Huntington's disease (HD),171174 and amyotrophic lateral sclerosis (ALS).175,176 These aberrant proteins are considered to be the main causes of neuronal death and dysfunction, which are the characteristics of degenerative diseases.177,178 Thus, a major underlying cause of neurodegenerative diseases is the aggregation of peptides or proteins via solid-liquid phase separation. Examples of phase separation include intracytoplasmic aggregates of the RNA-binding proteins FUS and TDP-43 in ALS and frontotemporal dementia (FTD), intracytoplasmic aggregates of the microtubule-associated protein tau (MAPT) in Alzheimer's disease (AD), and aggregates of polyglutamine (polyQ) in HD.179 Many of these proteinaceous assemblies also spread from one region of the brain to another, which is consistent with the progressive nature of these diseases. Recently, RNAs transcribed from genes associated with neurodegenerative diseases have been shown to form non-canonical structures and undergo liquid-liquid phase separation.158 Thus, evidence substantiating the association of aberrant phase separation with various diseases such as cancer and neurodegeneration has been accumulating. Importantly, the efficiency of phase separation was found to be largely dependent on the higher-order structures of RNA,160,180 suggesting that nucleic acids may control not only gene expression but also phase separation.

In neurodegenerative diseases, cellular ion channels are inactivated, and the intracellular ionic environment is considered to be different from that of normal cells. In addition, the intracellular crowding environments are significantly different from those of normal cells due to the overexpression of cytotoxic peptides and changes in cell volume. As described above, the structures of nucleic acids change in response to molecular environments.78 Thus, droplet formation may change depending on the RNA structure. The phase separation mechanism of repeat RNA sequences was analyzed using r(CAG)8, r(CUG)8, r(GGGUUA)8, and r(GGGGCC)8, which are associated with neurodegenerative diseases. Upon analysis of the structures of repeat RNA by CD spectra and native gel electrophoreses, r(CAG)8 and r(CUG)8 showed a hairpin structures. Moreover, r(GGGUUA)8 and r(GGGGCC)8 were shown to form a G-quadruplex under similar conditions. Furthermore, the ultraviolet scattering intensity of the solution containing these repeat RNAs at 350 nm showed that the repeat RNAs capable of forming the G-quadruplex structure formed droplets. Droplet formation of r(GGGGCC)8 was analyzed in a molecular crowding environment in the presence of EG, glycerol, PEG 200, PEG 8000, and dextran. It was found that the addition of cosolutes promoted droplet formation (Figure 28a). Furthermore, it was also shown that the droplet formation rate differed depending on the type of the crowding molecule. Therefore, the values of tmax, an index of droplet formation rates; the ΔTm value, an index of G-quadruplex structure stabilization upon addition of crowding molecules; and the change in physical properties of the solution (viscosity, water utilization, coexistence solute, excluded volume, and dielectric constant) were compared. It was found that the stability of the G-quadruplex and the change in the dielectric constant of the solution correlated with the tmax value (Figures 28b and 28c).160 Decreasing the dielectric constant of the solution favors interaction with K+, and K+ binds to the G-quartet when the G-quadruplex structure is formed. In other words, G-quadruplex stabilization by lowering the dielectric constant of the solution promoted RNA droplet formation.

Since the droplet does not have a membrane separating its inside content from the outside content, RNAs in the droplet rapidly move in and out of the droplet near the droplet surface; the droplet itself repeatedly appears, disappears, and fuses due to an external stimulus. Therefore, the droplet provides an environment in which RNA and proteins that interact with the RNA can get appropriately aggregated and transported to an appropriate reaction field, and various reactions can be easily controlled. As described above, the nucleic acid structure changes greatly depending on the surrounding molecular environment. It is known that π–π stacking and cation–π interactions are important for droplet formation.181 The cellular biomolecules bind to nucleic acids via π–π stacking and cation–π interactions, depending on the structures of nucleic acids. As the intracellular environment changes significantly, depending on the cell cycle, it is presumed that there is a mechanism by which nucleic acid structures change in response to changes in the intracellular environment and in turn control various biological phenomena through droplet formation. Nucleic acid structures may also play an important role in the onset and progression of neurodegenerative diseases, as droplets have been reported to be involved in controlling amyloid aggregation and neurodegenerative disease-associated transcriptional activation.182

5.3 Metabolites and Non-Double Helical Nucleic Acids.

Nucleic acids with unique structures and sequences drastically change their conformation and stability upon specific interactions with other molecules. Aptamers are well-known nucleic acids that specifically recognize the target molecules. Since the concept of aptamers and their selection technology, which is often referred to as synthetic evolution of ligands by exponential enrichment (SELEX), was first established in 1990,183,184 aptamers targeting various biologically relevant molecules, such as proteins and small molecules, have been developed.185 In general, aptamers form non-double helical tertiary structures to provide a binding pocket or cavity for the target molecule.

More than a decade after the demonstration of artificial selection of aptamers, researchers noticed the presence of synonymous RNAs in nature that directly recognize and interact with the metabolites and modulate gene expression. The functional RNAs, which were first discovered by R. R. Breaker et al. in 2002, are known as riboswitches.186188 The riboswitch is generally divided into an aptamer domain and an expression platform.189 The expression platform modulates gene expression through alteration of RNA structures that define the “on” or “off” state of the downstream gene expression. The aptamer domain triggers structural changes in the expression platform upon binding to a specific metabolite. The aptamer domains in riboswitches generally form unique and complicated tertiary structures, even when compared to the structural aspects of the artificial aptamers. Riboswitches, which target various metabolites such as amino acids, coenzymes, and signal transduction molecules, are categorized into more than 40 classes based on their unique tertiary structures.190

Since the interaction between the aptamer domain and target metabolite is the most important reaction to functionalize the riboswitch, various physicochemical analyses have been performed to understand the mechanism of the riboswitch-mediated gene modulation.191193 As described above, physicochemical parameters of the solution are important factors that determine the structures and stabilities of non-double helical nucleic acids, and hence, we envisaged that the affinity of the aptamer domain towards the metabolite might largely depend on the molecular environment in cells.78 We have focused on the effects of molecular crowding on the riboswitch functions based on our previous knowledge that ribozyme function involving tertiary interaction was enhanced under the crowding conditions.48 When we analyzed the binding affinity between 2-aminopurine, which is an analogous molecule of adenine, and the aptamer domain derived from adenine-specific riboswitch located on the mRNA of Vibrio vulnificus adenine deaminase, an increase in the affinity was observed in the presence of crowding cosolute (PEG 200); this was especially true in the solution with relatively low magnesium concentration. Based on the structural analyses using enzymatic RNA cleavage, it was suggested that PEG 200 increased the affinity by stabilizing the process in which the tertiary structure of the aptamer domain is induced upon adenine binding due to favorable contribution from dehydration caused by tertiary structure formation.194 An enhancement in the affinity of the metabolite binding was also observed when we used the aptamer domain derived from the flavin mononucleotide (FMN)-specific riboswitch in the impX gene of Fusobacterium nucleatum, in which the ligand binding process was proposed to occur through the pre-organized aptamer domain.195 Even in the absence of magnesium ions, the FMN-specific aptamer domain of the riboswitch bound to FMN under crowding conditions, with an affinity similar to that observed in the presence of physiological magnesium concentration in the diluted solution. From the thermodynamic parameters of the binding reaction, obtained via isothermal titration calorimetry (ITC) and fluorometric analyses using fluorophore-modified RNA, it was suggested that the aptamer domain shows a more dynamic conformational rearrangement upon FMN binding under the crowding condition than in the diluted solution containing magnesium (Figure 29).195 This dynamic property would play an important role in efficiently switching the gene expression in response to the target metabolite. In addition to the environmental effects, we have also investigated the intrinsic structural components that affect binding dynamics using the FMN-specific aptamer domain. We found that conserved tertiary interactions, which are located away from the binding pocket, affected the binding affinity of the aptamer domain.191 In addition, the conserved helical region could contribute to alter kinetic parameters, such as association and dissociation rate constants of the binding reaction, based on the base pair compositions.195,196 As the riboswitches modulate the gene expression involved in the metabolism of their target metabolites, both the non-helical and helical regions of the aptamer domains have been optimized evolutionarily to refine the feedback modulation in each organism.

5.4 Non-Double Helical Nucleic Acids as Therapeutic Targets.

As described in section 3, we envision that non-double helical nucleic acids play important roles in modulating gene expression. The dimensional code enables spatiotemporal and reversible modulation of gene expression in cells based on alterations in structures and stabilities, including topological changes of the non-double helical nucleic acids, depending on the environmental conditions (Figure 18). Even if the alteration of gene expression is transient, deviation from the constitutive gene expression pattern potentially leads to irreparable changes in cellular viabilities that sometimes would adversely affect the health of the organisms. Since various non-double helical structures have been proposed to be involved in the aforementioned human diseases, the nucleic acid regions capable of forming such non-double helical structures are attractive therapeutic targets. Although targeting the primary sequence of the regions related to the diseases by using a designed oligonucleotide is a seemingly straightforward approach, it is not practical when one considers the functions of these regions in normal cells. In this case, directly targeting non-double helical nucleic acids and selectively modulating their topology or stability would be desirable. Thus, we have attempted to construct structure targeting artificial (STAr) materials that target higher-order nucleic acid structures for therapeutic purposes (Figure 30).

G-quadruplexes are one of the non-double helical nucleic acid structures involved in various human diseases, such as cancers and neurological diseases, as described in sections 5.1 and 5.2. Thus, various artificial chemicals that target a unique tertiary structure have been synthesized. Although interactions between various small ligands and G-quadruplexes have been quantitatively investigated in vitro, there are few compounds that can control G-quadruplex-mediated biological reactions inside the cells. This is expected due to the differences between the environmental conditions in vitro and in cells. As described above, intracellular crowding conditions affect the topology and stability of non-double helical nucleic acids. The affinity of the molecular interactions is also influenced by the molecular environment. In addition, the chemical compounds targeting G-quadruplexes inside the cells should have considerably high specificity, so that the normal functions of other DNAs and RNAs are not influenced by the presence of these compounds. To achieve the specific targeting of G-quadruplexes, we have proposed that instead of the cationic small molecules with planar backbone structures, which interact with G-quadruplexes through end stacking supported by electrostatic interactions with negatively charged back bone, electro-negative chemicals might show superior properties as the STAr material for G-quadruplexes.197 For example, as described in section 3.2, anionic copper phthalocyanine (Cu-APC) and Hemin maintained their G-quadruplex binding affinities under the molecular crowding environment and inhibited telomerase activities even in the presence of excess amounts of double-stranded DNAs.65,198,199 Furthermore, anionic zinc phthalocyanine (Zn-APC) produced reactive oxygen species upon irradiation of excitation light at the complexed state with G-quadruplex and cleaved the mRNA at the region with potential to form G-quadruplex, which resulted in the prohibition of cellular proliferation.200 These observations strongly suggest the importance of considering the cellular crowding environment while constructing the STAr materials.

From the viewpoint of the dynamics of the non-double helical structures, targeting structures that are alternatives of the disease-related structure would be another approach for constructing STAr materials (Figure 31). For example, studies have suggested the importance of equilibria between hairpin and G-quadruplex conformers in several types of RNA regions, including repeated sequences21,201 and precursor miRNAs,202,203 as well as in antisense-mediated splicing switching.204 Thus, an artificial molecule that targets the alternative hairpin region would modulate the G-quadruplex-mediated biological functions in these RNAs. Based on this assumption, we have constructed a triplex-forming artificial oligonucleotide, which consists of peptide nucleic acid (PNA) to specifically target the hairpin region.205207 The PNA contains chemically modified nucleobases, which were developed by E. Rozners et al., to enable recognition of various base pair compositions at neutral pH,208211 while natural triplex-forming oligonucleotides only recognize a region of purine or pyrimidine tracts under acidic conditions. To demonstrate the potential of the triplex-forming PNA, we have simply targeted the hairpin structure formed on the 5′-UTR of mRNA and demonstrated the downregulation of gene expression with sequence specificity.206 The triplex is one of the non-double helical nucleic acid structures in which Hoogsteen base pairs of the third strand are stabilized under the molecular crowding condition;212,213 thus, induction of the triplex is ideal for shifting the conformational equilibrium toward the hairpin conformer against the alternative G-quadruplex conformer, which is also stabilized in the crowding condition.44,214,215 In addition, triplex formation targeting RNAs would be applicable as a therapeutic approach that modulates not only the non-double helical nucleic acids but also the conformational switching between RNA secondary structures involved in disease onset and progression.216218

5.5 New Devices Based on the Nucleic Acids.

5.5.1 Sensing the Crowding Condition in Cells:

By using the topology changes in G-quadruplexes, we proposed a novel sensing material based on the G-quadruplex DNA for crowding conditions.17,219 This sensor material was designed to utilize a Förster resonance energy transfer (FRET) signal through the formation of a G-quadruplex (Figure 32a). G-quadruplex DNA from human telomeres changes the topology, depending on crowding conditions. Thus, the FRET signals unique for each topology provide the information regarding the changes in crowding conditions. We applied this technique to monitor different crowding conditions in cellular organelles. The sensor material injected into living cells showed different FRET signals in the nucleus and cytosol (Figure 32b). Furthermore, the nucleolus in the nucleus showed a relatively high FRET, a trend similar to that observed in a solution with highly concentrated PEG 200 in vitro. For the nucleus and cytosol, FRET signals were similar to those observed in concentrated Ficoll-70 and bovine serum albumin, respectively. These results provide useful information for mimicking the crowding conditions in the nucleolus, nucleus, and cytosol in the form of solutions containing PEG 200, Ficoll-70, and bovine serum albumin, respectively.

To demonstrate the applicability of this technique, we compared the reported stability of the double helices in cells with the predicted stability using our novel NN parameters available under any crowding conditions.30 As a result, we found that 50 wt% PEG 200 in 100 mM NaCl is the best mimicking condition for the nucleolus, while 40 wt% Ficoll-70 in 100 mM NaCl optimally mimicked the crowding condition in the nucleus.30 The formation of nucleolus in the nucleus is vital, as the ribosomal RNA is specifically transcribed in the nucleolus. Therefore, the PEG 200 condition, which optimally mimics the crowding condition in the nucleolus, is beneficial for the study of ribosomal RNA in a crowding-dependent manner in vitro. Notably, the analysis of local intracellular crowding is important for understanding the behavior of nucleic acids in cells.

5.5.2 RNA-Capturing Microsphere Particles (R-CAMPs):

One of the useful characteristics of non-double helical nucleic acids for constructing functional materials is their ability to recognize specific molecules. The tertiary structure formed by the nucleic acids, as typified by aptamers, provides a binding pocket or cavity for the target molecule, as explained in section 5.3. The aptamers have been utilized for various biotechnological and therapeutic applications such as diagnostics, drug delivery, bioimaging, and synthetic biology.220224 In these applications, aptamers artificially obtained by a selection technology under certain conditions have been used. However, the molecular environments, in which the aptamers should function during the application, would be different from the conditions in which the aptamer was originally obtained. For example, in the case of diagnostic applications, aptamers should recognize target analytes in buffer solutions containing biological samples. For drug delivery applications, the conditions would include an extracellular fluid and a matrix. For bioimaging and synthetic biology applications, the conditions would be an intracellular crowded milieu. Since the conformational topology and stability of the non-double helical structures formed by the nucleic acids are sensitive to the molecular environments, as mentioned above, the original aptamer obtained under a different set of conditions may not optimally perform in the actual field of application. Thus, fine tuning of aptamers, depending on each conceivable microenvironment in the application field, is required for the efficient functionalization of the molecular devices containing the aptamer.

To enable simple optimization of aptamers in various molecular environments, we have constructed a new device, consisting of microsphere particles, called RNA-capturing microsphere particles (R-CAMPs).225 R-CAMPs were constructed by a combination of technologies for next generation sequencing (NGS) and for capturing nascent RNA transcripts on spherical particles. We obtained microsphere particles displaying double-stranded template DNA by emulsion PCR and subsequent primer extension on the microsphere particles. RNA transcripts from the particles were co-transcriptionally captured on the template DNA. Through these processes, we obtained microsphere particles, each of which displayed large numbers of DNA and RNA clones of an identical sequence. In addition, after a single reaction at the laboratory level, millions of R-CAMPs, each of which displayed different sequences derived from the originally designed library, were obtained. By using the R-CAMPs, we have demonstrated optimization of RNA aptamer, which binds a small molecule and enhances its fluorescence signal, under molecular crowding conditions (Figure 33).225 The technology used for flexible optimization of RNA sequences in different solution conditions was also applicable to the optimization of junction sequences, which connect two aptamer units, for constructing an efficient signaling aptamer that emits fluorescence signal in the presence of a target small molecule.226 The R-CAMPs can be constructed not only from the artificially designed DNA libraries but also from a library of natural genomes and transcriptomes. Thus, the R-CAMPs would be useful for the optimization of aptamer sequences, as well as screening of natural nucleic acids that could be targets of natural and artificially modified molecules such as proteins, peptides, and therapeutically important small chemicals (Figure 33).

5.5.3 Nucleic Acid Sensors in IL:

Nucleic acids demonstrate sequence selectivity and conformational polymorphism; therefore, owing to these characteristics, nucleic acids are used as powerful tools in biotechnology and nanotechnology. Single-stranded nucleic acids recognize other nucleic acid sequences via sequence-specific binding through the formation of Watson-Crick base pairs and Hoogsteen base pairs. Base pair formation has been utilized in several detection systems, such as microarrays of nucleic acids,227 chips for hybridization-based gene sequencing and phylogenetic studies,228,229 microarrays for single nucleotide polymorphism (SNP) analysis,230 and transcriptome analysis231 (Figure 34a). However, these approaches sometimes lack sequence specificity and tend to result in false-positive detection.232,233 In contrast, the formation of Hoogsteen base pairs is a promising method for recognition of specific duplex targets.234,235 The formation of Hoogsteen base pairs, which are very unstable in aqueous solution, is stabilized in choline dhp (see section 4.1), and sequence-specific sensing of duplex targets is possible without complicated processes.

ILs are of interest as solvents in nanotechnology, as described in section 4.1. The sequence specificity is greatly improved via the interactions between DNA and IL.236,237 Systems for sensing specific DNA sequences, especially SNPs, are important in the fields of diagnosis and biotechnology.139,238242 As described above, the main DNA-sensing systems have been developed based on the formation of Watson-Crick A•T and G•C base pairs between the target sequence and the sensor DNA.243246 The sensor DNA should recognize a fully matched target sequence owing to the difference in thermodynamic stability between fully matched Watson-Crick base pairs and mismatched base pairs; however, certain mismatches, such as the G•T mismatch, are thermodynamically stable.247249 Unfortunately, SNPs are often accompanied by G or T mutations, and G•T mismatches are frequently formed between the target and sensor DNA.250,251 Such stable mismatches often induce false-positive test results.

It was demonstrated that the ILs of choline dhp significantly change the stability of DNA structures.137,138 Using unique interactions of DNA with choline dhp, new DNA sensors have been developed based on the stability difference. In choline dhp, mismatched Hoogsteen base pairs were found to be remarkably destabilized relative to their stabilities in the aqueous solution.236 Furthermore, molecular beacon-type DNA sensors that bind to the duplex by a Hoogsteen base pair could bind to the HIV-1 sequence in choline dhp at low target concentrations (Figure 34b).236 Importantly, the sensor DNA showed high selectivity for the target sequence (10,000 times more than existing methods in an aqueous solution). Moreover, the sensor DNA was more protected from the contaminating nucleases in choline dhp relative to that in aqueous solution.236 Electrochemical sensors, which have been considered highly sensitive, have a detection limit of 10 nM.252 Therefore, a sequence-specific sensor based on the formation of Hoogsteen base pairs is possible without duplex denaturation or complicated instrumentation in IL.236

We envision that nucleic acids not only store the genetic information in canonical B-type double helices but also direct biological functions through non-B-type highly ordered structures.17 In this review, we have described historical research attempts based on physicochemical analyses of the structure and stability of nucleic acids, including both B- and non-B-type helices. We have also described our attempts to understand the biological functions of nucleic acids and tried to explain the underlying mechanisms based on the behaviors of nucleic acids, considering that their dynamics depend on the surrounding molecular environment. Our major achievements are summarized below.

6.1 Quantification of the Stability of Non-Double Helical Nucleic Acids in Molecular Crowding.

Under crowded conditions, the physical properties of the solutions differ completely from those of the diluted aqueous solutions. These differences affect the structure and stability of nucleic acids. We have quantitatively analyzed the contributions of the molecular environments to the structures and stabilities of the non-double helical nucleic acids. Based on the quantitative data in various environmental conditions, we developed an energy database to predict the structures and stabilities of nucleic acids in biological environments, including those in molecular crowding conditions.

6.2 Elucidation of the Functions of Non-canonical Nucleic Acids and the Development of Methods for their Control.

Based on the energetic database, we have investigated the biological functions of non-double helical structures by correlating their structures and stabilities with their effects on gene expression processes. We found that non-canonical nucleic acids play key roles in regulating gene expression. We also revealed that the dynamic behaviors of non-double helical structures in response to changes in the molecular and chemical environments are important factors that determine their biological functions.

6.3 Development of Functional Materials Utilizing Non-Double Helical Nucleic Acids.

The database was applied to design a strategy for controlling the formation of non-double helical nucleic acids. We demonstrated chemical approaches for modulating the biological reactions involving specific structural aspects of non-double helical nucleic acids. These approaches are completely different from those targeting the primary sequences of nucleic acids by sequence complementarity and enable the spatiotemporal regulation of biological functions involved in nucleic acid structures in response to molecular environments in the cells. We envision that the regulation of the dimensional code in the central dogma will elucidate a novel technological aspect in chemical biology.

The authors, especially NS, are grateful to the authors of the cited papers published from FIBER and FIRST, Konan University, and other collaborators. This work was supported by the Grants-in-Aid from Scientific Research from the Ministry of Education, Culture, Sports, Science and Technology (MEXT), the Japan Society for the Promotion of Science (JSPS), especially a Grant-in-Aid from Scientific Research on Innovative Areas “Chemistry for Multimolecular Crowding Biosystems” (JP17H06351), JSPS KAKENHI (JP19H00928 and JP18KK0164), MEXT-Supported Program for the Strategic Research Foundation at Private Universities (2014–2019), Japan, the Hirao Taro Foundation of Konan Gakuen for Academic Research, the Okazaki Kazuo Foundation of Konan Gakuen for Advanced Scientific Research, and the Chubei Itoh Foundation.

Naoki Sugimoto 【Award recipient】

Naoki Sugimoto received his Ph.D. in 1985 from Kyoto University, Japan. After completing his postdoctoral work at the University of Rochester, USA, he joined Konan University, Kobe, Japan in 1988 and has been a full professor since 1994. Since 2003, he has been the director at the Frontier Institute for Biomolecular Engineering Research (FIBER) at Konan University. He has received many awards including Imbach-Townsend Award from International Society for Nucleosides, Nucleotides, and Nucleic Acids in 2018 and the CSJ Award in 2019. His research interests are focused on biophysical chemistry, biomaterials, nanobio-engineering, molecular design, biofunctional chemistry, and chemical biology of nucleic acids.

Tamaki Endoh

Tamaki Endoh received his Ph.D. in 2006 from Tokyo Institute of Technology, Japan. He has worked as an assistant professor at Okayama University. In 2009, he joined the Frontier Institute for Biomolecular Engineering Research (FIBER) at Konan University and was promoted to associate professor in 2016. His research focuses on nucleic acid chemistry and cellular engineering.

Shuntaro Takahashi

Shuntaro Takahashi received his Ph.D. in 2007 from Tokyo Institute of Technology, Japan. After working at Tokyo Institute of Technology as an assistant professor, he joined FIBER, Konan University, Kobe, Japan in 2012 and was promoted to associate professor in 2020. He is currently studying the biophysical aspects of nucleic acids in cells, as well as the effect of molecular crowding on nucleic acid structures affecting the cellular metabolism.

Hisae Tateishi-Karimata

Hisae Tateishi-Karimata received her Ph.D. in 2008 from Konan University, Japan. After working at the University of Illinois, IL, USA as a visiting postdoctoral fellow and at Fine Co. Ltd., Osaka, Japan as a research associate, she joined FIBER, Konan University, Kobe, Japan in 2010 as an assistant professor, and was promoted to associate professor in 2020. Her research interests focus on understanding the biophysical chemistry of nucleic acids in vitro and in cells, as well as the biotechnological applications of nucleic acids.