{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This is a quick review of some basic python features that you will need to do your assignments. \n", "Of course, this is far from being exhaustive. We have put together some code snippets below. \n", "You should be able to run any piece on its own in python version 3.x - just place the cursor in \n", "the code cell you want to execute, then click on 'Cell/Run' or click on the \"play\" button.\n", "\n", "Python is indentation-sensitive by design – so, please follow the left-indentation that I have used. Otherwise, \n", "you will receive an 'IndentationError'. It doesn’t have to be the exact same amount of indentation, however. \n", "For example, if you see a tab, it means that there needs to be some left-indentation there; \n", "it could be just a single space (I just prefer tabs only for clarity). But if you don’t see any indentation, \n", "then there shouldn’t be any.\n", "\n", "Any line that starts with a '#' is a comment, that is Python does not execute it.\n", "\n", "Official Python documentation - http://python.org/doc/\n", "\n", "A popular book “A byte of python\" - http://www.swaroopch.com/notes/Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Review" ] }, { "cell_type": "code", "execution_count": 133, "metadata": {}, "outputs": [], "source": [ "#This command opens a python terminal that has gives you access to this notebook's namespace.\n", "#that is, variables you define within this notebook are only accessible through the special console \n", "#invoked by the command below. The default Python kernel or console you see below does not provide \n", "#you that acess. NOTE that this is a special iPython \"magic\" command, not a regular Python command.\n", "\n", "%qtconsole" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**(1)** At the beginning of a program you might want to specify that you want to use certain “packages” in order to \n", "use their special features that python doesn’t automatically provide. Some packages like ‘math’ and ‘array’ come with python. Others like ‘pygame’ and ‘numpy’ need to be installed. To tell the program that you want to use a package, say, for example:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import math as m \n", "#'math' is the name of the package, 'm' is its alias that is easier to use\n", "\n", "import random as rnd\n", "#‘Random’ package: Use this package to pull random numbers from a given range." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**(2)** Each package has its features implemented in the form of “modules” or \"functions\" to which \n", "you can pass parameters and get results in return. You can invoke the modules with the \n", "name of the package (or its alias) followed by a ‘.’ that is then followed by the name of the module (modules don’t have aliases). For example:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.0\n" ] } ], "source": [ "print((m.sqrt(16))) #'sqrt' is the square root module" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-1.0\n" ] } ], "source": [ "print((m.cos(m.pi))) #besides functions packages also have constants (e.g., 'pi') defined" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "28\n" ] } ], "source": [ "print((rnd.randint(0,100))) #randomly pull a number from the range 0 through 100" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**(3) Conditional logic** statements: Note how the else-if is implemented in python – it is ‘elif’. \n", "Also, notice the colon at the end of the conditions which is part of the syntax." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x is < 1\n" ] } ], "source": [ "x = 0\n", "\n", "if x == 1: print(\"x is 1\")\n", "elif x > 1:\n", "\tprint(\"x is > 1\")\n", "else:\n", "\tprint(\"x is < 1\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**(4) Loops**: Again note the colon at the end of the loop statements." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "canopy_exercise": { "cell_type": "" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10\n" ] } ], "source": [ "x=0\n", "\n", "while x < 10:\n", "\tx += 1 #the '+= ' which is typical in C also works in python\n", "\n", "print(x)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-35\n" ] } ], "source": [ "for i in range(0,10): #range(0,10) returns a list of values from 0 through 9\n", "\tx -= i\n", "\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NOTE: All datatypes in Python have their indices starting from 0 as opposed to 1." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**(5) Lists**: A list is an indexed container where you can store a variety of elements like integers, decimals, and strings in the same list." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[10, 'mad', 3.2, 'more']\n" ] } ], "source": [ "L = [10, 'mad',3.2] #list containing 3 different types of elements. Note the square brackets.\n", "\n", "L.append('more') #append a string to the end of the list\n", "\n", "print(L)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10\n" ] } ], "source": [ "print((L[0])) #access an element of the list; indices start with 0, and not 1" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n" ] } ], "source": [ "print((L.index('mad'))) #retrieves the 'index' of 'mad' in L" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[10, 3.2, 'more']\n" ] } ], "source": [ "del L[1] #delete the element stored at index 1\n", "\n", "print(L)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]\n" ] } ], "source": [ "#A standard way of generating a list of numbers (this one generates a list from 0 through 19).\n", "Lst = list(range(0,20)) \n", "\n", "print(Lst)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "([3, 4, 5, 6, 7, 8, 9], [15, 16, 17, 18])\n" ] } ], "source": [ "print((Lst[3:10], Lst[15:19])) #indexing parts of a list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**(5a) List generators**: this is a very useful functionality that you will often need." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]\n" ] } ], "source": [ "L1 = list(n*2 for n in range(10))\n", "\n", "print(L1)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]\n" ] } ], "source": [ "#Equivalently,\n", "L1 = [n*2 for n in range(10)]\n", "\n", "print(L1)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n" ] } ], "source": [ "#Generates a list of indices for items in another list\n", "L2 = [L1.index(item) for item in L1]\n", "\n", "print(L2)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(0, 'a'), (1, 'b'), (2, 'c')]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#You can also generate a list of tuples in the following way using the function \"enumerate\":\n", "list(enumerate(['a','b','c']))\n" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 3), (3, 4), (3, 5)]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Another way to generate a list of tuples is by making all possible combinations of items from a set of lists:\n", "list((x,y) for x in range(1,4) for y in range(3,6))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**(6) Tuples**: A tuple is very similar to list, in that it is also an indexed container used to store \n", "a variety of elements, but it is treated as an immutable data-type. You can have a list of tuples, \n", "for example, but you can’t append to a tuple after it is defined." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2.5, 3.4)\n" ] } ], "source": [ "T1 = (2.5, 3.4) #Note the parantheses (square brackets for a list)\n", "\n", "print(T1)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[(2.5, 3.4)]\n" ] } ], "source": [ "L = [] #To declare an empty list, you can also say: L = list()\n", "\n", "L.append(T1) #appends a tuple to a list; note that there is no \"append\" available for a tuple\n", "\n", "print(L)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2.5, 3.4)\n" ] } ], "source": [ "print(L[0])" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "((12, 14), 1.5, (12, 13, 54))\n" ] } ], "source": [ "T2 = ((12,14),1.5,(12,13,54)) #tuple within a tuple; 2-D coordinates and velocity, for example\n", "\n", "print(T2) #accessing the contents of a tuple is the same as for a list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**(7) Arrays**: A popular package for using arrays is “numpy”. \n", "A nice tutorial is here - http://www.scipy.org/Tentative_NumPy_Tutorial." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 30 23 100]\n" ] } ], "source": [ "import numpy as np\n", "\n", "x = np.array([30,23,100]) #you don’t have to mention the datatype (e.g., integer)\n", "\n", "print(x)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "23\n" ] } ], "source": [ "print(x[1]) #the second element of x: this is how you index an array" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 90 69 300]\n" ] } ], "source": [ "y = x * 3 #multiplication of the entire array\n", "\n", "print(y) #all elements" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(array([2], dtype=int64),)\n" ] } ], "source": [ "#retrieve the index of an element in array; notice that the result is a tuple of arrays\n", "print(np.where(x == 100)) " ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2\n" ] } ], "source": [ "ind = np.where(x == 100) \n", "\n", "print(ind[0][0])" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", " 0. 0. 0. 0.]\n" ] } ], "source": [ "z = np.zeros(100) #create an array of length 100 and store 0’s in it\n", "\n", "print(z)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " **(7a)** You can also use numpy to perform numerical calculations like logarithm, exponentiation etc." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.0\n" ] } ], "source": [ "print(np.log2(2)) #base-2 logarithm of a single number" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1. 5.08746284 8.83289001]\n" ] } ], "source": [ "print(np.log2([2,34,456])) #compute the log of an array of numbers" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "158\n" ] } ], "source": [ "print(np.sum([1,59,98])) #arithmetic sum of an array" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**(8) Matrices**: Generally, matrices are two-dimensional arrays. \n", "You can also think of a matrix as an array of arrays. Use numpy to define them:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[11 32]\n", " [ 3 94]]\n" ] } ], "source": [ "import numpy as np\n", "\n", "m = np.matrix([[11,32], [3,94]]) #pass a list of equally-sized lists to define a matrix\n", "\n", "print(m)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "32\n" ] } ], "source": [ "print(m[0,1]) #retrieves the element in the 0th row and 1st column of the matrix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**(9) Dictionary**: It is defined as a set of \"keys\" and their corresponding \"values\". \n", "Given a key, you can retrieve its value from a dictionary. Dictionaries are convenient in \n", "defining relationships between entities, for example between alphabetic letters and numbers:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10\n" ] } ], "source": [ "Char_to_Num = { 'A' : 10, 'B' : 36, 'C' : 21} #note the use of curly braces\n", "#the 'key' is defined on the left of the colon, and its 'value' on the right\n", "\n", "print(Char_to_Num['A']) #retrieves the value of the key 'A'" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['A', 'B', 'C']\n" ] } ], "source": [ "print(list(Char_to_Num.keys())) #retrieves keys only, and returns them in a list" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[10, 36, 21]\n" ] } ], "source": [ "print(list(Char_to_Num.values())) #retrieves values only, and returns them in a list" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "C\n" ] } ], "source": [ "print(list(Char_to_Num.keys())[list(Char_to_Num.values()).index(21)]) #retrieve the key for a value" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**(10) Strings**: They can be considered as arrays and indexed as you index an array" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "y\n" ] } ], "source": [ "s = \"python is\"\n", "\n", "print(s[1]) #the second character of the string" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "python is not a snake\n" ] } ], "source": [ "s = s + \" not a snake\" #the '+' concatenates the two strings\n", "\n", "print(s)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10\n" ] } ], "source": [ "print(s.index('not')) #index strings just as you index lists and tuples" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5\n" ] } ], "source": [ "print(s.index('n')) #retrieves the index of the first occurrence of 'n'" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'a,b,c'" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#A clever way of generating a string with characters separated by commas (notice that 'join' is not \n", "#a standalone command -- it is a function defined for string objects; in this example):\n", "','.join(['a','b','c'])" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['s', 't', 'r', 'i', 'n', 'g']" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#You can make a list out of the characters of a string\n", "list(\"string\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**(11) ASCII values**: Every keyboard character has a numerical ASCII value associated with it. \n", "Sometimes it is useful to have access to it, as it may be easier to deal with those numbers, \n", "rather than the original characters, in your program. Aside: 'ord' is short for ordinal." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "70\n" ] } ], "source": [ "print(ord('F')) #the ASCII value of upper case F" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "L\n" ] } ], "source": [ "print(chr(76)) #the character whose ASCII value is 76" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "s\n" ] } ], "source": [ "print(chr(ord('s'))) #returns the lower case 's '" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**(12) Files**: Reading and writing." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9\n" ] } ], "source": [ "# First set the working directory, where you want to read and write files\n", "import os\n", "\n", "# print(os.getcwd()) #retrieves current working directory\n", "\n", "# Switch to a different working directory, if you want to access files from there\n", "# os.chdir('/Users/smanicka/Documents/My IU/Spring 2015/Teaching I485/Lab 0')\n", "\n", "# Now move on to file operations\n", "f = open(\"text.txt\",\"r\") #open file for reading; MAKE SURE THAT THIS FILE EXISTS\n", "\n", "lines = f.readlines() #stores each line of the text in 'lines'\n", "\n", "print(len(lines)) #number of lines in this file" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Ipsum faucibus vitae aliquet nec ullamcorper sit. Tellus molestie nunc non blandit massa. Pretium aenean pharetra magna ac placerat. A erat nam at lectus urna duis convallis. Phasellus egestas tellus rutrum tellus. Sed sed risus pretium quam vulputate. Consequat mauris nunc congue nisi vitae suscipit tellus mauris. Luctus venenatis lectus magna fringilla urna porttitor. Praesent semper feugiat nibh sed pulvinar proin gravida. Purus viverra accumsan in nisl nisi scelerisque eu. Sit amet risus nullam eget felis eget nunc.a stringa string\n" ] } ], "source": [ "print(lines[8]) #retrieves the 8th line, assuming there are that many lines" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "f\n" ] } ], "source": [ "print(lines[8][6]) #retrieves the 6th character of the 8th line" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'M'" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#If you want to do a fresh read, you should open the file again!\n", "f = open(\"text.txt\",\"r\")\n", "\n", "chars = f.read() #stores all characters of the text, not single lines, in 'chars'\n", "\n", "chars[124] #retrieves the 124th character of the text." ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "f = open(\"text.txt\",\"a\") #open file for writing; append to existing contents\n", "\n", "s = 'a string'\n", "\n", "f.write(s)\n", "\n", "f.close() #this is required after a write, else you won’t see the new contents" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**(13) Defining a function**: The following should be self-explanatory. Again, please pay \n", "attention to the indentation for the statement that follow after the first ‘def’ statement. \n", "To run the following code, make sure that the file “text.txt” exists in the working directory and has words in it." ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Lorem', 'ipsum', 'dolor', 'sit', 'amet,', 'consectetur', 'adipiscing', 'elit,', 'sed', 'do', 'eiusmod', 'tempor', 'incididunt', 'ut', 'labore', 'et', 'dolore', 'magna', 'aliqua.', 'Mollis', 'nunc', 'sed', 'id', 'semper', 'risus', 'in', 'hendrerit', 'gravida.', 'Tincidunt', 'augue', 'interdum', 'velit', 'euismod', 'in', 'pellentesque', 'massa.', 'Amet', 'dictum', 'sit', 'amet', 'justo', 'donec.', 'Eu', 'augue', 'ut', 'lectus', 'arcu', 'bibendum', 'at', 'varius.', 'Ut', 'sem', 'viverra', 'aliquet', 'eget', 'sit', 'amet', 'tellus.', 'Rhoncus', 'mattis', 'rhoncus', 'urna', 'neque', 'viverra.', 'Augue', 'lacus', 'viverra', 'vitae', 'congue', 'eu', 'consequat.', 'Imperdiet', 'proin', 'fermentum', 'leo', 'vel', 'orci', 'porta', 'non', 'pulvinar.', 'Cras', 'adipiscing', 'enim', 'eu', 'turpis', 'egestas', 'pretium', 'aenean.', 'Tristique', 'senectus', 'et', 'netus', 'et.', 'Ut', 'lectus', 'arcu', 'bibendum', 'at', 'varius', 'vel', 'pharetra', 'vel.', 'Arcu', 'felis', 'bibendum', 'ut', 'tristique.', 'At', 'tellus', 'at', 'urna', 'condimentum', 'mattis', 'pellentesque.', 'Tellus', 'in', 'metus', 'vulputate', 'eu.', 'Adipiscing', 'elit', 'pellentesque', 'habitant', 'morbi', 'tristique', 'senectus.']\n" ] } ], "source": [ "def getWords():\n", "\t\n", " f = open(\"text.txt\",\"r\")\n", "\t\n", " words = f.readline().split() \n", "\t#NOTE: readline() returns a string, while readlines() returns a list.\n", "\t#Also, split() works only on a string, NOT on a list, and returns a list.\n", "\t\n", " return words\n", "\n", "# To call a function, you can do the following, for example:\n", "words = getWords()\n", "\n", "print(words)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**(14) Histograms:** Histograms (frequency plots) are very common plots that you will encounter in any statistics course. You can use numpy.histogram() or Python's inbuilt '.count()' method to generate frequency counts." ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2 2 1] [ 97 98 99 100]\n" ] } ], "source": [ "import numpy as np\n", "\n", "letters = ['a','a','b','c','b']\n", "\n", "# numpy.histogram() won't run on a list of strings, so we convert them into ASCII numbers\n", "\n", "LettersInASCII = [ord(x) for x in letters]\n", "\n", "h = np.histogram(LettersInASCII, bins = list(range(ord('a'), ord('c') + 2)))\n", "\n", "# h[0] = counts for each number in LettersInASCII\n", "# h[1] = \"bins\" into which the numbers of LettersInASCII are put before counting\n", "# For more information on what \"bins\" mean, see \n", "# http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html\n", "\n", "print(h[0], h[1]) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A way to count frequencies directly on a list of strings without converting them into numbers:" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "letters = ['a','a','b','c','b']\n", "\n", "letters.count('a')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also make a dictionary of all the letters with their frequencies listed against each:" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'a': 2, 'b': 2, 'c': 1}" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dict((x, letters.count(x)) for x in letters) #akin to a list \"generator\" in iterm (5a) above" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**(15) Plotting methods:** Use the package called 'matplotlib'. Typically, we use the subpackage 'matplotlib.pyplot' to do most of the plotting." ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "plt.plot([1,2,4,7],[9,10,13,20]) #by default draws a blue line plot\n", "\n", "plt.xlabel('X axis')\n", "\n", "plt.ylabel('Y axis')\n", "\n", "plt.title('The simplest plot')\n", "\n", "plt.show() #this is essential to display a plot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To make a bar plot, you can't use the usual 'plot()' method, instead you should call the 'bar()' function." ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "values = [1,5,9]\n", "barheights = [3,6,7]\n", "\n", "plt.bar(values, barheights)\n", "\n", "plt.xlabel('value')\n", "plt.ylabel('height')\n", "plt.axis([0,10,2,8])\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use 'hist()' to make a histogram by passing to it a list of raw data. Use the 'normed' parameter to specify if you want raw frequency counts or normalized probability values. The example below also shows how to annotate a plot -- use 'annotate(text, x-y tuple of the plot to be annotated, x-y tuple specifying where in the plot area the text is to be placed, **other parameters like arrow properties)." ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "data = [1,2,3,2,2,1,1,3,4,4,4,4,4,5,5,6,2,3,1,4]\n", "\n", "plt.hist(data, bins=6, facecolor='brown', alpha=0.5) #compute raw frequencies\n", "\n", "plt.axis([0,7,0,7]) #specify the range of values accepted on each axis\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sample exercises\n", "This is NOT the answer to the exercises listed in Lab1. This is a sample question and answer given to give an idea of how to approach the exercises listed in Lab1.\n", "\n", "1. Read in the text file\n", "2. Count the frequencies of characters in the text, plot a histogram.\n", "3. Calculate the Shannon entropy of the text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Solution 1." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First let's open the .txt file and then we can read the text." ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "\n", "text = open(\"text.txt\", \"r\")\n", "my_text = text.read()\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Solution 2.\n", "Now let's get the frequencies of the characters in the text.\n", "First we can set an empty dictionary where we'll save the frequencies.\n" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [], "source": [ "frequencies = {}\n", " \n", "for character in my_text: \n", " if character in frequencies: \n", " frequencies[character] += 1 \n", " else: \n", " frequencies[character] = 1 " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For every character in my text we open a \"class\" in our dictionary associated to that character and if the character appears in the text, we add '1' to the class in other case we can just set '1' to take the character into account." ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [], "source": [ "sorted_freqs = dict(sorted(frequencies.items())) #We are sorting the items in the dictionary" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's plot a histogram.\n", "To construct the histogram we are basically taking the keys in the x-axis and the respective values of those keys in the y-axis. \n", "NOTE: Since we have not cleaned the text, this dictionary will contain values such as '\\n','\\ufeff' which won't show in the histogram. " ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "plt.xlabel('Characters') \n", "plt.ylabel('Frequency') \n", "plt.title('Frequencies of characters in the text') \n", "plt.bar(sorted_freqs.keys(), sorted_freqs.values()) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Solution 3" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4.197647087087611" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "prob={} #making a dictionary with probability values instead of frequencies\n", "for key,value in sorted_freqs.items(): # this for loop runs through every item of the sorted_freq dictionary\n", " prob[key]=value/sum(sorted_freqs.values())\n", "\n", "H=0 #seting H to zero\n", "for key,value in prob.items():\n", " H+= -value* np.log2(value)\n", "\n", "H" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.197647087087612\n" ] } ], "source": [ "\n", "# An alternative way using lists to count characters in a text and their corresponding probabilities.\n", "prob = [(c, my_text.count(c)/len(my_text)) for c in set(my_text)]\n", "\n", "#calculating shannon entropy for the text\n", "H = sum(-p*np.log2(p) for (c, p) in prob)\n", "print(H)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 1 }