Feedback archiveFeedback 2001

How is information content measured?

10 September 2001

From Stephen Halperin of the Czech Republic, who gave permission for his full name to be used. His letter is printed first in its entirety, then with point-by-point responses by Dr Don Batten [now] of Creation Ministries International–Australia, interspersed as per normal email fashion.

Dear Sirs,

From your website, you say this:

“Information content is measured not by the number of traits, but by what is called the specified complexity of a base sequence or protein amino acid sequence.”

My questions are these:

1) How are you measuring specified complexity?

2) Since DNA codes for proteins, what is the relationship between your definition of complexity and the number of genes contained in DNA?

3) You also claim that mutations reduce complexity. If a mutation causes a single amino acid substitution (e.g alanine to leucine), or causes a duplication of DNA, why is this a reduction of information content?

Dear Sirs,

From your website, you say this:

“Information content is measured not by the number of traits, but by what is called the specified complexity of a base sequence or protein amino acid sequence.”

My questions are these:

1) How are you measuring specified complexity?

Dear Stephen,

Information theorist, Lee Spetner, defines specified complexity in mathematical/thermodynamic terms in Lee Spetner/Edward Max Dialogue. Following is an excerpt from this paper (NDT=neo-Darwinian theory):

In my book [Not by Chance], I did not quantify the information gain or loss in a mutation. I didn’t do it mainly because I was reluctant to introduce equations and scare off the average reader. And anyway, I thought it rather obvious that a mutation that destroys the functionality of a gene (such as a repressor gene) is a loss of information. I also thought it rather obvious that a mutation that reduces the specificity of an enzyme is also a loss of information. But I shall take this opportunity to quantify the information difference before and after mutation in an important special case, which I described in my book.

The information content of the genome is difficult to evaluate with any precision. Fortunately, for my purposes, I need only consider the change in the information in an enzyme caused by a mutation. The information content of an enzyme is the sum of many parts, among which are:

  • Level of catalytic activity
  • Specificity with respect to the substrate
  • Strength of binding to cell structure
  • Specificity of binding to cell structure
  • Specificity of the amino-acid sequence devoted to specifying the enzyme for degradation

These are all difficult to evaluate, but the easiest to get a handle on is the information in the substrate specificity.

To estimate the information in an enzyme I shall assume that the information content of the enzyme itself is at least the maximum information gained in transforming the substrate distribution into the product distribution. (I think this assumption is reasonable, but to be rigorous it should really be proved.) We can think of the substrate specificity of the enzyme as a kind of filter. The entropy of the ensemble of substances separated after filtration is less than the entropy of the original ensemble of the mixture. We can therefore say that the filtration process results in an information gain equal to the decrease in entropy. Let’s imagine a uniform distribution of substrates presented to many copies of an enzyme. I choose a uniform distribution of substrates because that will permit the enzyme to express its maximum information gain. The substrates considered here are restricted to a set of similar molecules on which the enzyme has the same metabolic effect. This restriction not only simplifies our exercise but it applies to the case I discussed in my book.

The products of a substrate on which the enzyme has a higher activity will be more numerous than those of a substrate on which the enzyme has a lower activity. Because of the filtering, the distribution of concentrations of products will have a lower entropy than that of substrates. Note that we are neglecting whatever entropy change stems from the chemical changes of the substrates into products, and we are focusing on the entropy change reflected in the distributions of the products of the substrates acted upon by the enzyme.

The entropy of an ensemble of n elements with fractional concentrations f1, … , fn is given by

Equation 1(1)

and if the base of the logarithm is 2, the units of entropy are bits.

As a first illustration of this formula let us take the extreme case where there are n possible substrates, and the enzyme has a nonzero activity on only one of them. This is perfect filtering. The input entropy for a uniform distribution of n elements is, from (1), given by

Equation 2(2)

since the fi’s are each 1/n. The entropy of the output is zero,

Equation 3(3)

because all the concentrations except one are zero, and the concentration of that one is 1. Then the decrease in entropy brought about by the selectivity of the enzyme is then the difference between (2) and (3), or


Another example is the other extreme case in which the enzyme does not discriminate at all among the n substrates. In this case the input and output entropies are the same, namely

Equation 4(4)

Therefore, the information gain, which is the difference between HO and HI, in this case is zero,

Equation 5(5)

We normalize the activities of the enzyme on the various substrates and these normalized activities will then be the fractional concentrations of the products. This normalization will eliminate from our consideration the effect of the absolute activity level on the information content, leaving us with only the effect of the selectivity.

Although these simplifications prevent us from calculating the total entropy decrease achieved by action of the enzyme, we are able to calculate the entropy change due to enzyme specificity alone.

The Dangers of Conclusion Jumping

Figure 1

Spetner: As a final example let me take part of a series of experiments I discussed in my book, which demonstrate the dangers of conclusion jumping. This subject bears emphasis because evolutionists from Darwin on have been guilty of jumping to unwarranted conclusions from inadequate data. I shall here take only a portion of the discussion in my book, namely, what I took from a paper by Burleigh et al. (1974, Biochem. J.143:341) to illustrate my point.

Ribitol is a naturally occurring sugar that some soil bacteria can normally metabolize, and ribitol dehydrogenase is the enzyme that catalyzes the first step in its metabolism. Xylitol is a sugar very similar in structure to ribitol, but does not occur in nature. Bacteria cannot normally live on xylitol, but when a large population of them were cultured on only xylitol, mutants appeared that were able to metabolize it. The wild-type enzyme was found to have a small activity on xylitol, but not large enough for the bacteria to live on xylitol alone.

The mutant enzyme had an activity large enough to permit the bacterium to live on xylitol alone. Fig. 1 shows the activity of the wild-type enzyme and the mutant enzyme on both ribitol and xylitol. Note that the mutant enzyme has a lower activity on ribitol and a higher activity on xylitol than does the wild-type enzyme. An evolutionist would be tempted to see here the beginning of a trend. He might be inclined to jump to the conclusion that with a series of many mutations of this kind, one after another, evolution could produce an enzyme that would have a high activity on xylitol and a low, or zero, activity on ribitol. Now wouldn’t that be a useful thing for a bacterium that had only xylitol available and no ribitol? Such a series would produce the kind of evolutionary change NDT calls for. It would be an example of the kind of series that would support NDT. The series would have to consist of mutations that would, step by step, lower the activity of the enzyme on the first substrate while increasing it on the second. But Fig. 1 is misleading in this regard because it provides only a restricted view of the story. Burleigh and his colleagues also measured the activities of the two enzymes on another similar sugar, L-arabitol, and the results of these measurements are shown in Fig. 2. With the additional data on L-arabitol, a different picture emerges. No longer do we see the mutation just swinging the activity away from ribitol and toward xylitol. We see instead a general lowering of the selectivity of the enzyme over the set of substrates. The activity profiles in Fig.2 show that the wild-type enzyme is more selective than is the mutant enzyme.

Figure 2

In Fig. 1 alone, there appears to be a trend evolving an enzyme with a high activity on xylitol and a low activity on ribitol. But Fig. 2 shows that such an extrapolation is unwarranted. It shows instead a much different trend. An extrapolation of the trend that appears in Fig. 2 would indicate that a series of such mutations could result in an enzyme that had no selectivity at all, but exhibited the same low activity on a wide set of substrates.

The point to be made from this example is that conclusion jumping from the observation of an apparent trend is a risky business. From a little data, the mutation appears to add information to the enzyme. From a little more data, the mutation appears to be degrading the enzyme’s specificity and losing information. Just as we calculated information in the two special cases above, we can calculate the information in the enzyme acting on a uniform mixture of the three substrates for both the wild type and the mutant enzyme. Using the measured activity values reported by Burleigh et al. we find the information in the specificities of the two enzymes to be 0.74 and 0.38 bits respectively. The information in the wild-type enzyme then turns out to be about twice that of the mutant.

The evolutionist community, from Darwin to today, has based its major claims on unwarranted conclusion jumping. Darwin saw that pigeon breeders could achieve a wide variety of forms in their pigeons by selection, and he assumed that the reach of selection was unlimited. Evolutionists, who have seen crops and farm animals bred to have many commercially desirable features, have jumped to the conclusion that natural selection, in the course of millions of years, could achieve many-fold greater adaptive changes than artificial selection has achieved in only tens of years. I have shown in my book that such extrapolations are ill founded because breeding experiments, such as those giving wheat greater protein content or vegetables greater size, result from mutations that disable repressor genes. The conclusions jumped to were false because they were based on data that could not be extrapolated to long sequences. One cannot gain information from a long sequence of steps that all lose information. As I noted in my book, that would be like the merchant who lost a little money on each sale, but thought he could make it up on volume.

2) Since DNA codes for proteins, what is the relationship between your definition of complexity and the number of genes contained in DNA?

There is of course a rough relationship between the number of proteins coded for by a DNA sequence and the level of specified complexity. But “number of genes” is very approximate. For example, the human DNA supposedly contains some 30,000 “genes” and yet the human cell can produce over 100,000 proteins (estimates range up to 150,000 or even more). Obviously, there is much that is not known about how 30,000 “genes” can produce so many different proteins. A more accurate measure of the specified complexity of a given genome would be the number of proteins coded. However, there is also much information not involved directly in protein production—for example, in chromosome structure. And there is probably a huge amount of information present that determines developmental sequences, for example—none of this is really understood. There is also the possibility of error-checking sequences, etc., etc. There is just not enough known yet about the functions of all the DNA sequences to meaningfully quantify the information properly.

3) You also claim that mutations reduce complexity. If a mutation causes a single amino acid substitution (e.g alanine to leucine), or causes a duplication of DNA, why is this a reduction of information content?

See Spetner for an example that explains the principles involved (above). However, a mutation does not necessarily reduce specified complexity—just that it is so likely to do so that it cannot be the mechanism for generating the huge amount of specified complexity that we see in living things. That mutations are known primarily by the defects they cause testifies to the overwhelming tendency for them to reduce the information in living things (just like a mistake on my computer keyboard will decrease the information content of what I am typing). Spetner also discusses gene duplication at the above URL. However, just think: if you buy two copies of the newspaper, do you buy twice as much information? Of course not. Duplication of anything does not constitute an increase of information. Random mutations to change the duplicated gene would not add information unless the mutated sequence coded for some new, useful protein. To illustrate: if “superman” were the duplicated “gene”, and mutations in the letters changed it to “sxyxvawtu ”, you have clearly lost information, although you have a new sequence. This is the difference between complexity and specified complexity. A pile of sand is complex , but is information-poor, because it specifies nothing.

I hope this helps.
Don Batten

Related Articles