Faculty of Transportation Sciences, CTU Prague
In the article the genetic code is studied. A short simpified introduction of the
biochemical principles of genetics is given. Then the genetic code is introducedand significance of its degeneracy is discussed. The error-correcting capability ofthe double-stranded DNA code is referred.
In the last years we can often read about research of the genetic code. One interesting question is, if we can really look at the genetic code as atthe code in the sense of mathematic coding theory and, if this is possible,which are the properties of the genetic code.
Proteins are basic elements of all organisms — bacterias, plants and ani-mals. Almost all other organic compounds in living bodies are producedby work of proteins. We can say — with a bit of overstatement — thatan organism (a species, or an individual body) is defined by its proteins.
Proteins consist of amino acids joined by peptide bounds to linear
structure. These amino acids chains are folded into complex three-dimensional structure, which determines function of the protein. How-ever, this structure is almost fully determined by the sequence of aminoacids (which is called also primary structure).
There exists only 20 amino acids, which appears in proteins. One
molecule of protein contains from dozens to hundreds of thousands aminoacid units.
Nucleic acids are macromolecules, which main role is conservation, trans-fer and processing of information. There are two kinds of nucleic acids— DNA and RNA.
Deoxyribonucleic acid (DNA) is a macromolecule that contains in-
struction for creating of proteins (and then for the whole organism). Sin-gle molecule of DNA is a chemically linked chain of nucleotides. The nu-cleotide consists of a sugar, a phosphate and a nucleobasis (or basis). There are four kinds of bases: adenine (A), thymine (T), guanine (G)and cytosine (C). These four bases acts as an alphabet for genetic code.
This single-stranded DNA exists only in some type of viruses. Every
higher organism has double-stranded DNA, consisting of two parallelstrands of DNA. These strands are associated through hydrogen bondsbetween nucleobases. Each base form hydrogen bonds to only one other:A to T and C to G (these pairs are called complementary bases), so thebase sequence in one strand is fully defined by the sequence on the secondone. The molecules of two double-stranded DNA are twisted each otherand form a structure described as double helix.
A part of DNA, which codes one protein, is called gene. Ribonucleic acid (RNA) has always only one strand. Its chemical
composition is slightly different, but its logical structure is the same asin the single-stranded DNA. One of this small differences is, that RNAuses nucleobasis uracil (U) instead of thymine. RNA has an importantfunction in processing of information from the DNA.
In the process of proteosynthesis a part of DNA is transcribed to a RNA,which serves as a ”working copy”. This phase is named transcription. This arised piece of RNA, transript, is used as a recipe for synthesis ofprotein in process called translation.
To split the genetic information between two daughter cells, each
DNA strand is used as a template for synthetising a new strand, whichis (or it shall be) identical to the opposite of the template strand. This
process is called replication. The origin of the first molecule of DNA issomewhere at the very first origins of life.
This is a normal information flow pathway in all live organisms. There
exists some exceptions from this schema only in some viruses and prions. (Prions are infectious protein particles, responsible for some uncommondiseases as the BSE. Neither prions nor viruses are autonomous liveorganisms.)
After many experiments was discovered, that genetic code is based ontriplets — units of three nucleotids. This triplets are codewords (calledcodons) of genetic code. Genetic code is then quaternary block code oflenght 3 with code alphabet {A, C, G, U }. (Here U means nucleobasisuracil, which is in RNA present instead of thymine. Genetic code isusually described through the mediation of the RNA transcript.)
Overall, there are 43 = 64 different codon combinations, which is more
enough for coding of all 20 amino acids. This is — from mathematician’spoint of view — good opportunity to establish some error-detection pro-perties to the code. But in fact, all codons (except three) are meaningful,and most amino acids are coded by more than one codon — the geneticcode is degenerated.
The correspondence between codons and amino acids is irregular (see
Table 1). Three amino acids (Leucine, Arginine and Serine) are coded
by six different codons, most of them are coded by two or four codons,and two amino acids (Methionine and Tryptophan) are coded each byone codon. Three meaningless codons — UAA, UAG and UGA — areso-called termination or stop codons: they code the end of the proteinchain.
The genetic code in single-stranded DNA is also a model for tran-
scripton of the genetic information into RNA and its translation withinproteosynthesis.
In genetics, errors in genetic code are called mutations. A single error ispoint mutation.
Genetic code in itself hasn’t any error detection ability. However
degeneracy of the genetic code make it slightly fault-tolerant. Table 1shows that two or four codons, which specifies certain amino acids, ty-pically differs only on the third position.
For example codon GGU (coding Glycine) can tolerate any point
mutation at the third position — codons GGC, GGA and GGG codeGlycine as well. Codon AAA can tolerate mutation A→G in the thirdposition — both codons AAA and AAG code Lysine. These mutations,which doesn’t affect resultant amino acid, are so-called silent mutations.
In addition, mutations A↔G and C↔U are more likely than the
others, which adds another piece of fault-tolerance.
However, mutation at the third position has the same probability as
mutation at the first or at the second position of the codon.
Of course, another types of mutations than point mutations can occur
as well. These are for example burst errors, where shorter or longer partof the nucleic acid strand is damaged (typically a corruption of twoadjacent T bases caused by UV light), deletions, where one or morenucleotide is missing, or insertions, where one or more extra nucleotidesare added. Insertions and deletions may lead to loss of synchronisationof the code and are mostly irreversible.
Mutations with loss of synchronisation or large-scale mutations can-
not be repaired by straight way. In such cases a big redundancy ofgenetic information is used — usually somewhere in the cell there existsanother copy of the gene.
Table 2: Probability of the errorless translation - example
Transciption can be desribed as a symmetric transmission channel,
where p is the probability that a transmitted character value is received
• probability that a transmitted character value i is received as a
Table 2 shows probability that received codon denotes the same amino
acid as the transmitted one (that means the given codon is either re-ceived unchanged, or the silent mutation occurs), under assumptionp1 = 9.10−5, p2 = 10−5. These numeric assumption is chosen only forexample (the real values depend on the many conditions including theinfluence of the environment), but one can consider, that the degeneracyof the genetic code has really certain positive influence on the fault-tolerance. For example, the codon AUG (which is the unique codon forMethionine) has probability of successful translation only 0.9997 but thecodon CUG (one from six codons for Leucine) has this probability 0.9999.
In double-stranded DNA, each nucleotide is coupled with its oppositeon the second strand. The triplets in the double-stranded DNA canbe studied as vectors of 6 nucleotids, where on the odd positions arenucleotids of the first strand and on the even position i + 1 is nucleotidewhich is opposite to the nucleotide on the position i. For example thetriplet ATG corresponds with codeword ATTAGC.
From this point of view, the genetic code in double-stranded DNA
is quaternary block code of lenght 6 with code alphabet {A, C, G, T }. Its information rate is 1/2, it has 3 information characters on the oddpositions, and 3 check characters on the even positions.
Minimal distance of the code is 2, because Hamming distance between
codewords ATCGCG and ATCGAT is 2 and it doesn’t exists any pairof codewords with Hamming distance 1. It implies, that genetic codedetects all single errors. In fact, it detects much bigger set of errors: allerrors up to order 3, in which all wrong characters are either on the oddpositions, or on the even positions; and all errors where on the pair ofpositions 2k − 1 and 2k (k = 1, 2, 3) are not complementary bases.
Molecular machinery performing translation and replication is very ex-act. Probability of a single error is about 10−7. Much more often, thanreplacement of one basis to another one, damage of basis caused by spon-taneous change of chemical structure or by external influences occurs.
Thus, the really code alphabet for the genetic code should look like
{A, C, G, T, X}, where X stands for ”something else” or ”nothing”. Be-cause X is an illegal character (no one codeword contains X), the geneticcode in double-stranded DNA has error correction capability. It can cor-rect all errors up to order 3, where at least one X occurs, such that onany pair of positions 2k − 1 and 2k (k = 1, 2, 3) doesn’t occurs morethan one X. It has certain importance, because relatively often occursone specific type of burst errors — damage of part of one strand.
The genetic code on double-stranded DNA has bigger error detection
and error correction capability than on the single-stranded DNA (oron the RNA). This is one of reasons, why living organisms use double-stranded DNA for long-term storage of the genetic information, while
RNA is used for its short-term processing. (Other reasons follows fromthe bigger chemical stability of DNA.)
We demonstrate that the genetic code is really the code in the mathe-matical sense. It hasn’t very strong error detection and error correctionproperties, but this is not only disadvantage — this fact is the motivepower for the evolution and leads to the bigger variability of the life.
You Can Do It! LLC Welcome to November! The leaves are falling, it’s getting colder, and winter’s on its way. Time to bundle up and get our metabolic fires burning with regular exercise. In this newsletter, we bring you the continuing saga of osteoporosis—what it is, how it operates, and the risk factors that contribute to it. After all, the more we know about osteoporosis, the
Erfahrungsbericht über eine Katarakt (“Grauer Star”) -Operation und Einsatz einer Multifokallinse - aus der Sicht des Patienten Durchgeführt in der Augenklinik Berlin-Marzahn, Brebacher Weg 15, 12683 Berlin im Februar 2002 Behandelnder Arzt und Operateur: Dozent Dr. med. habil. Dietze, Ärztlicher Direktor dieser Klinik Vorbemerkung: Auch als aufgeklärter Patient ist man bei der