# Lab 1: Measuring (uncertainty-based) information

## ISE483/SSIE583: Evolutionary Systems and Biologically Inspired Computing

### Topics

• Python refresher using Python Lab 1 notebook.
• Computing the Hartley and Shannon measures of information of letter distribution in text

1. Go through lectures 2 and 3.
2. Complete the following exercises, worth a total of 10 points. Points for each question are shown below for ISE483 and SSIE583.
3. Submit code as separate Python files in the 'ISE483 & SSIE583 Lab 1 ' assignment in Brightspace. Summarize your answers in a PDF document including figures, if any.
4. Assignment is due before class on February 13st

Questions or problems? See email and office hours information.

NOTE: Shannon's entropy measures the information we receive when uncertainty is removed from situations described probabilistically. Intuitively, this is the "amount of surprise" we experience when uncovering the correct symbol. When there are no distinguishing characteristics among the alternatives, the uncertainty is best captured by Hartley's measure on the set of alternatives---which yields they same value of the Shannon entropy for a uniform probability distribution, whereby all n alternatives have the same probability 1/n. This way, the Hartley measure is always the upper bound of the Shannon entropy. Intuitively, this means that when the underlying probability distribution is not uniform, uncovering the correct symbol yields less surprise than when all alternatives are equally likely (or have no distinguishing characteristics). For example, consider two symbols {a, b} with probability distribution (5/6, 1/6). Shannon's entropy for this distribution is 0.65 bits, whereas the Hartley measure of two alternatives is 1 bit, which is the same as the Shannon entropy if the probability distribution were (1/2, 1/2). So Hartley's measure tells us that we receive 1 bit of information when we discover the correct symbol (we need one yes-no question to settle the symbol). Shannon's entropy, in contrast, tells us that we receive only 0.65 bits of information---we are less surprised, because we know that symbol a is much more likely than symbol b. This means that if we were to guess the correct symbol multiple times, knowing the distribution, on average we would guess the correct symbol more often than if we did not know the distribution. In other words, on average, we would need less than one yes-no question.

1. [3 points][2 points] Calculate Hartley and Shannon (entropy) measures of information: Shannon entropy is computed from a probability distribution defined on a set of choices for a variable (e.g., symbols). Write a function that accepts a vector/array of probability values (that sum up to 1), and returns the Shannon entropy of that distribution. To do this, use the package 'numpy' to declare a 1-dimentional array that shall contain the probability values. Compute the logarithm (base 2) of every entry in the probability array, and store the values in a new array. Multiply the entries of the probability-array with the corresponding entries in the logarithm-array; sum it up and invert the sign. Refer to item 7 of the Python Lab 1 notebook for useful code snippets. Hartley entropy is simply the logarithm (base 2) of the number of distinct choices, regardless of their probabilities. Write a function to calculate it for the same situation as the one for Shannon entropy above.
• Build your intuition: Hand-code probability distributions for a small number of symbols and plot them (refer to item 15 of the Python Lab 1 notebook for methods). What do the information measures of these distributions say about the organization of the symbols? (You don't have to report on this sub-item.)

2. [5 points][4 points] Calculate letter-entropy of a piece of text: Write a program that reads the contents of a text file letter-by-letter, and counts the frequencies of occurrence of each of the 26 English letters and the space character. Refer to item 14 of the Python Lab 1 notebook for ways to generate frequency counts. Store the frequencies in a 'numpy' array. Treat lower-case and upper-case letters as one and the same. Normalize the frequencies to obtain the probability of characters in this text, by dividing each value by the sum of all frequencies. Pass the normalized array to the function you wrote above to compute the Hartley and Shannon measures of information of the text. Report the values.
• Make sure that the text you choose has at least 4000 characters in it. If the text is too short the Shannon entropy you compute won't be very reliable as it will vary widely depending on the actual piece (think why). In the same vein, do you think that the Hartley measure depends as much on the text? Here is the full text The Ticket That Exploded by W.S. Burroughs (where the idea of Language as a Virus is expressed).
• Lipograms: Quote from wikipedia, "A lipogram (from Ancient Greek: λειπογρ?μματοςleipográmmatos, "leaving out a letter") is a kind of constrained writing or word game consisting of writing paragraphs or longer works in which a particular letter or group of letters is avoided—usually a common vowel, and frequently "E", the most common letter in the English language." Compute the information measures of Gadsby, a lipogram novel written by Ernest Vincent Wright's in 1939. It has over 50,000 words without using the letter "E". Compare them with that of the normal text above. Is there a difference? Would you always expect a difference? Explain in a few sentences as to why or why not.
• Compute the two measures of information of a random piece of text with at least 4000 characters (preferably, about the same length as the normal text that you chose above). Write a program to generate a random piece of text (including spaces) generated with a uniform probability distribution of letters, and save it in a file. Refer to items 2 and 11 of the Python Lab 1 notebook for some hints on generating random text. For reading and writing contents into file, refer to item 12.
• In a sentence or two, describe the intuitive difference in the values of the two measures of information that you observe between the random and normal texts.

3. [2 points][2 points] More about Shannon entropy: If you scramble the letters of the meaningful text that you considered above, and then measure its Shannon entropy, it will be the same as the original. Yet, the scrambled text is gibberish. Describe in a few lines how you would extend the standard Shannon entropy formula so that it may distinguish meaningful text from its letter-shuffled versions.

4. [Extra 2 points][2 points] Interpreting Shannon entropy: An infant is learning the English alphabet, and has learnt only the first 4 symbols so far: {a,b,c,d}. Her mom observes that she tends to utter certain letters more frequently than others in her babbles. Specifically, she observes that the infant utters symbol 'a' with probability 1/2, symbol 'b' with probability 1/4, and the symbols 'c' and 'd' each with probability 1/8. Calculate the Shannon entropy of this infant's language. We have seen in class that Shannon entropy measures the average number of yes-no questions required to zero-in on a symbol. Now imagine that mom is listening to the infant's random babble, and dad who can't hear the infant is trying to guess it by asking mom yes-no questions. For the above symbol set, construct the smallest sequence of yes-no questions that dad would have to ask mom, so that the average or the expected number of those questions is equal to the Shannon entropy that you calculated. Report the sequence of questions you constructed, and a brief summary of the rationale behind it.