DNA, PCR and Directionality

@zzalpha

DNA and RNA are … complex, really large molecules that encode the genetic information of organisms. I can’t do justice to all the functions that these do (I know very little), but we can think of them as instruction manuals that describe the organism: when a cell replicates, it creates a copy of this manual for the offspring cell, so that it "can know" what needs to be done and how to do it.

DNA and RNA are long polymers built up from these units called nucleotides: Adenine (A), Guanine (G), Cytosine (C), Thymine(T) and Uracil (U). DNA is made of A,C,G,T nucleotides, and RNA is made of A,C,G,U nucleotides (depending on whether it has T or U we can see if it’s DNA or RNA).

We will not talk about RNA much. It is the primary genetic medium in some viruses — such as influenza, cold, ebola etc. It is often considered as a precursor to DNA: scientists in artificial life laboratories have been trying to create RNA from scratch: it’s a simpler, less stable molecule.

Anyway, back to DNA. DNA is a polymer of a the A/T/C/G nucleotides, so a single strand of DNA would look like this:

.....ATTTCGGATTCGGGA.....

While single stranded DNA does exist, nucleotides have a tendency tend to pair up in the following ways: A — T/U and C — G. Bonding here refers to hydrogen bonds, which is a form of attraction that is weaker than chemical attraction. (more on this later). Because of this tendency to pair, DNA is more commonly found in a double stranded form, like this:

.....ATTTCGGATTCGGGA.....
.....TAAAGCCTAAGCCCT.....

(the nucleotide below each nucleotide is its corresponding complement. This is called Watson-Crick base pairing, after the guys who discovered the double helical structure of DNA … DNA tends to wind itself in a double helical form, but I can’t draw that in text, so we will stick to this linear representation)

I don’t know the exact details of how they know it or how it is done, but it is possible to identify the specific subsequences of DNA which are responsible for a certain property or the creation of an enzyme. To study these subsequences further, scientists would like to generate multiple copies of it: i.e. amplifying the subsequence.

PCR

There’s a technique called polymerase chain reaction (PCR) which can be used to amplify these subsequences. The invention of this process won the Nobel Prize for in the 90s.

Say the subsequence to be amplified is as above (i.e. we want to produce multiple copies of the following sequence of letters):

.....ATTTCGGATTCGGGA.....
.....TAAAGCCTAAGCCCT.....

The PCR process basically involves:

heating to split the strands apart:

.....ATTTCGGATTCGGGA.....             .....TAAAGCCTAAGCCCT.....

adding these things called primers: a primer is a short tag/sequence that bind at a specific location: in this case, one of the primers might be TAAA. So this small fragment will attach itself like so:

.....ATTTCGGATTCGGGA.....
     TAAA

Similarly, a primer for the other end of the other strand is used: maybe the fragment GGGA, which will bind itself like this:

                GGGA
.....TAAAGCCTAAGCCCT.....

Now by sorcery (i.e. organic chemistry) each of these contraptions are grown from these two directions.

.....ATTTCGGATTCGGGA.....
     TAAA --->

           <--- GGGA
.....TAAAGCCTAAGCCCT.....

tada! you now have two copies of what you started with:

.....ATTTCGGATTCGGGA.....      .....ATTTCGGATTCGGGA.....
.....TAAAGCCTAAGCCCT.....      .....TAAAGCCTAAGCCCT.....

(How do they get this process to terminate precisely at the other end. i.e. obtaining two copies of [0,1] instead of yielding \$\[1,infty\)\$,and \$(- infty, 0]\$ Ans: They use something called terminators which attach at the other end, and prevent the process from continuing further.)

A question to ponder about: why doesn’t the following happen instead?

                GGGA --->         .....ATTTCGGATTCGGGA.....
.....TAAAGCCTAAGCCCT.....         <--- TAAA

(i.e. obtaining \$(-infty,0]\$ and \$\[1,infty\)\$, the complement of what we wanted?)

To understand this, we need to look at the actual structure of DNA:

The Composition Of DNA:

Nucleotides have a common structure/framework. They look like:

  (Ph)
    \
     C5    O
      \  /   \
       C4     C1 - (base)
        \      |
         C3 - C2 - [OH]
         |
         OH

(base) contains the specific molecules which make them into the A/C/T/G nucleotides. They are parts of the molecule that differ from nucleotide to nucleotide.
C: represents carbon. They are numbered 1 through 5.
O: represents oxygen.
[OH]: represents an optional hydroxyl group.
(Ph) represents the phosphate group:

        OH
         |
    O -- P -- O -- (C5)
        ||
         O

P represents phosphorous
|| is supposed to indicate a double bond.
this entire unit is attached to the C5 carbon of the sugar ring above.
- all the H atoms attached to the carbons have conveniently been ignored.

Whether [OH] is present at the C2 carbon or not determines whether it is called DNA or RNA. Deoxy-ribonucleic acid (DNA) does not have the OH group at C2, while ribonucleic acid (RNA) does.

Phosphodiester linkage:

These nucleotoides assemble into the polymer that is DNA as follows:

The OH of C3 of one nucleotide and the OH in the (Ph) of an other meet up, and eliminate a molecule of water between them.

     ...
      C3
      |
      O
      |
      H         }
       `    H   }  this molecule of water is eliminated
        ` /     }
         O      }
         |
    O - P -- O -- C5 ...
        ||
         O

becomes:

     ...
      C3           H     H
      |             \   /
      O        +      O
      |
  O - P -- O -- C5 ...
      ||
       O

(I think of it like dehydration, the water that is "common" between these two molecules is removed)
This is called a phosphodiester bond.

Let us try to see what a pair of nucleotides connected by a phosphodiester bond is going to look like:

 (Ph)
    \
     C5    O
      \  /   \
       C4     C1 - (base)
        \      |
         C3 - C2 - [OH]
         |
         O
         |  }--> (this is the new phosphodiester bond)
       O=P
        / \
       O   O
            \
             C5    O
              \  /   \
               C4     C1 - base
                \      |
                 C3 - C2 - [OH]
                 |
                 OH

This process can be repeated indefinitely … yielding a strand of DNA/RNA!
"Dehydration" was an oversimplified take on this process. For the water molecule to be removed, either a C — O bond or a P — O bond must be broken; this is not an easy task as both these bonds are quite stable and require a lot of energy to be broken.

How does one add a nucleotide anyway?

We don’t add the nucleotide as such, but a more reactive variant called a nucleotide triphosphate (NTP): NTPs are like nucleotides, except, instead of a single phosphate attached to C5, there is a string of three phosphates attached to C5, which looks like so:

    O       O       O
   ||      ||      ||
O---P---O---P---O---P---O
    |       |       |    \
    O       O       O     \
                           C5    O
                            \  /   \
                             C4     C1 - (base)
                              \      |
                               C3 - C2 - *OH*
                               |
                               OH

The two extra phosphate groups are not very strongly attached, and can break off when convenient, yielding a tremendous glut of energy. In fact, one particular NTP, called Adenosine triphosphate (ATP) is used as the currency of energy in molecular biology. (ATP is, in fact, the NTP with A (adenine) as its base)

With some magnesium ions (Mg++) in the mix, these extra phosphates can be siphoned off, i.e. the following elopes with Mg:

    O       O
   ||      ||
O---P---O---P---O
    |       |
    O       O

This leaves the remaining phosphorous atom free to bond with the oxygen O of the OH attached to C3.
Thus, the C5 phosphate of the NTP that we added is going to attach itself to the --OH group attached to the C3 carbon.

What does this imply?

We can add a new nucleotide to our existing string only at one of its ends: the C5 of our reagent (NTP) is going to attach itself to the C3 of the DNA by means of the phosphodiester bond.
In other words, the strand can only be grown from its C3 end.
(The proper terminology for this sense of directionality is 5' to 3')

By the nature of the process of how NTPs add nucleotides, the reverse does not work: i.e we cannot add a nucleotide to the C5. (in other words, 3' to 5' does not work)

DNA has some sense of direction/orientation built into it!

Double Stranded DNA:

As the name implies, it involves two strands oriented and attached as below:

    (Ph)                                       OH
      \                                       /
       C5    O                        C2 -- C3
        \  /   \                      /      |
         C4     C1 - base ... BASE - C1     C4
          \      |                     \   /  \
           C3 - C2                       O     C5
          /                                      \
         O                                    O   O
        /                                      \ /
   O = P                                        P = O
      / \                                      /
     O   O                                    O
      \                                      /
       C5    O                        C2 -- C3
        \  /   \                      /      |
         C4     C1 - base ... BASE - C1     C4
          \      |                     \   /  \
           C3 - C2                       O     C5
          /                                      \
         O                                    O   O
        /                                      \ /
     O=P                                        P = O
      / \                                      /
     O   O                                    O
      \                                      /
       C5    O                        C2 -- C3
        \  /   \                      /      |
         C4     C1 - base ... BASE - C1     C4
          \      |                     \   /  \
           C3 - C2                       O     C5
          /                                      \
         OH                                       (Ph)

base and BASE represent complementary pairs: i.e. A — T,` C --G`.
notice that the optional [OH] there was before has been removed: this is DNA. RNA does not often form a double strand.
`. . . ` above represent hydrogen bonds, which sort of involve a hydrogen atom being shared. They are weaker than covalent chemical bonds, but are still quite strong. (Hydrogen bonding also is the reason why water doesn’t readily evaporate, while something like rubbing alcohol almost instantly does.)

Observe that if we read top to bottom, the left strand is going from (Ph) to OH; while the right hand strand goes from OH to (Ph). In other words, the left one is running 5' to 3', the right one is going 3' to 5'. The two strands are said to be antiparallel.

Implications for PCR

Back to the question on PCR:
Let’s say we’re performing PCR on: (meaning we want to create multiple copies of that section)

.....ATTTCGGATTCGGGA.....
.....TAAAGCCTAAGCCCT.....

Now, one of those strands is running from 3' to 5' and the other is opposite, say like so:

     3'   ---->    5'
.....ATTTCGGATTCGGGA.....
.....TAAAGCCTAAGCCCT.....
     5'   <----    3'

and when we split the two strands, we have

     3'   ---->    5'                   5'   <----    3'
.....ATTTCGGATTCGGGA.....   and    .....ATTTCGGATTCGGGA.....

At which ends of each of these should we add the primer?

Let’s first pick the strand on the left. We could add a primer in either of these ways:

     5' 3'                                     5' 3'
     TAAA                                      CCCT
.....ATTTCGGATTCGGGA.....  or  .....ATTTCGGATTCGGGA.....
     3'            5'               3'            5'

But the directedness of how DNA polymerization means that each primer will be grown from its 3' end, i.e. the two will proceed as:

     5' 3'                                     5' 3'
     TAAAGC --->                               CCCTxxxx --->
.....ATTTCGGATTCGGGA..... and  .....ATTTCGGATTCGGGA.....
     3'            5'               3'            5'

(xxxx — I don’t care/know what is added here.)
So one replicates the sequence, the other is useless to us.
Applying this same logic to the other strand, our primer ought to be attached at the other end:

     5'            3'
.....TAAAGCCTAAGCCCT.....
           <--- GGGA
                3' 5'

The polymerization will then happen right to left, replicating the sequence we need.

And that answers the riddle!

TL;DR: DNA has a sense of direction built into it; and can only be grown/replicated in one direction.

REFERENCES: https://en.wikipedia.org/wiki/Directionality_%28molecular_biology%29

NOTES:

Since addition always happens at the 3' -OH, removing this OH means nothing more can be added here. This is the principle behind terminators: add a nucleotide without this OH, and the replication cannot continue from there onwards.
(but how do they attach the terminator to the precise, desired location?)
I have conveniently ignored the negative charges on the oxygen atoms of the phosphates and triphosphates. A minus sign looks too similar to a bond in this representation.
The explanation is greatly simplified: there are a ton of enzymes (called polymerases) that chaperone the process. They even manage to proofread what is created for errors!
I don’t know how they manage to attach the primer to the sequence in the first place.