Data Types#

Evelyn Campbell, Ph.D.

Python offers a number of different data types that can be manipulated and used by various functions. Some important built-in Python data types include booleans, strings, integers, and floats. These data types can be used to build various data structures, such as lists, dictionaries, arrays, and dataframes, which will be covered in Chapters 4 and 6. Here we will explore each data type and corresponding functions that are useful when working with these data types.

Booleans#

Booleans are a data type that consist of two possible outcomes: True or False. Under the hood, these values take on a binary value, where True is equal to 1 and False is equal to 0. Booleans are very commonly used with comparison operators (discussed more in Section 3.4), and because they also can have a numeric meaning, they can be used in calculations as well. Let’s start with a simple example of a Boolean.

boolval = 5 < 3
boolval
False

Above, the variable boolval is equated to the expression 5 < 3, which reads “5 is less than 3.” Because 5 is not in fact less than 3, the entire statement is False, and this Boolean value is assigned to boolval.

Below, we add 5 to the value of boolval. Recall that False has a numerical value of 0, so essentially, boolval + 5 is the same as 0 + 5:

boolval = boolval + 5
boolval
5

Using the variable directly in a comparison expression, we can see that the value of boolval is less than 10, and thus returns another Boolean value of True:

boolval < 10
True

Python has built-in functions that use values and variables as input to perform a task and produce an output. We have already used some basic functions, such as the print() function, and we will learn about a few more that are associated with datatypes. Built-in functions will be further discussed in section Section 3.5.

For now, we will use a few basic functions associated with data types. The bool() function converts an input (i.e. a numeric value, string, or even data structures) to a boolean value.

j = 5
k = bool(j)
k
True

Any input that has value will give an output of True when called into the bool() function. Any input that is null or empty will give a False output.

something = 6542
nothing = 0
print(bool(something))
print(bool(nothing))
True
False

Strings#

A string a data type that can consist of concatenated alphanumeric and punctuation characters. According to the Merriam-Webster dictionary, to concatenate means to link together in a series or chain.

Strings are recognized by Python through the use of single (’ ‘), double (” “), or triple (‘’’ ‘’’) quotation marks.

words = 'This is a sentence.'
print(words)
This is a sentence.

Double quotes are recommended as a first option use, as they allow for the use of single quotations inside.

print("This isn't easy.")
This isn't easy.
print('This isn't easy.')
  Cell In[8], line 1
    print('This isn't easy.')
                           ^
SyntaxError: unterminated string literal (detected at line 1)

The above error can be fixed by an escape sequence. Escape sequences are string modifiers that allow for the use of certain characters that would otherwise be misinterpreted by Python. Because strings are created by the use of quotes, the escape sequences \' and \" allow for the use of quotes as part of a string:

print('This isn\'t easy.')
This isn't easy.

Other useful escape sequences include \n and \t. These allow for a new line and tab spacing to be added to a string, respectively.

sentences = '''This is the first sentence \nThis is the second sentence! \tThis is the third sentence?'''
print(sentences)
This is the first sentence 
This is the second sentence! 	This is the third sentence?

Strings can be used in simple additive mathematical operations, like addition and multiplication, resulting in concatenation of the strings:

print(words+words)

words2 = words * 2
print(words2)
This is a sentence.This is a sentence.
This is a sentence.This is a sentence.

In the above example, we see that Python prints the sentence twice, but these sentences run into each other (i.e. there is no space in between). We have to specifically tell Python to add this space. We can do this by printing the string variables that we want along with a space in quotation marks (” “). We can also do this by adding multiple arguments to the print() function, separated by a comma.

print(words + " " + words)
print(words, words)
This is a sentence. This is a sentence.
This is a sentence. This is a sentence.

Escape sequences also can be used in the print() function as an argument or through concatenation:

print(words, '\t', 'This isn\'t easy.')      # Escape sequence used as an argument in the print function
print('\n')                                  # Escape sequence used to print a blank line
print('This isn\'t easy.' + '\t' + words)    # Escape sequence concatenated to strings in the print function
This is a sentence. 	 This isn't easy.


This isn't easy.	This is a sentence.

When manipulating string variables, data scientists will often use what are called methods. A method is piece of code that is associated with a defined variable, as opposed to a function which uses defined variables as input arguments for parameters. Functions will be further discussed in the upcoming section.

Some methods can be used on strings to quickly and efficiently alter them. A few include the .upper(), .lower(), .capitalize(), .title(), and .swapcase() methods. There are many others, but these few are great to start exploring the different ways string variables can be manipulated:

candy = "candy is my favorite treat. my favorite candy is BUBBLE GUM."
print(candy.upper())
print(candy.lower())
print(candy.capitalize())
print(candy.title())
print(candy.swapcase())
CANDY IS MY FAVORITE TREAT. MY FAVORITE CANDY IS BUBBLE GUM.
candy is my favorite treat. my favorite candy is bubble gum.
Candy is my favorite treat. my favorite candy is bubble gum.
Candy Is My Favorite Treat. My Favorite Candy Is Bubble Gum.
CANDY IS MY FAVORITE TREAT. MY FAVORITE CANDY IS bubble gum.

A really useful method that can be used on strings is the .replace() method. This method allows you to replace a given string expression with a new one of your choosing:

print(candy.replace("BUBBLE GUM", "LICORICE"))
candy is my favorite treat. my favorite candy is LICORICE.

Numeric values can also be recognized as a string by putting them within quotation marks or using them as an argument in the str() function.

two = "2"
true = str(True)
print(two)
print(true)
2
True

We can confirm that these are indeed strings by calling the type() function on these variables, which can be used on any variable to check its data type.

print(type(two))
print(type(true))
<class 'str'>
<class 'str'>

Keep in mind that when a numerical value is converted to a string, it can no longer be used to perform certain mathematical calculations, such as division, subtraction, or exponentiation.

two ** 2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[18], line 1
----> 1 two ** 2

TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'

It can be used in addition and multiplication, but more so in a “stringy” way and not a “mathy” way:

two + two
'22'

This is the only time when 2 + 2 equals 22. 🙃

Integers & Floats#

Integers and floats are numerical data types that are often used to perform mathematical operations. Integers consist of whole numbers, while floats consist of whole numbers with floating decimal places. Floats can hold up to 15 significant figures following the decimal point and can be used to obtain more accurate calculations. However, it is easier and faster for a computer to do calculations using integers. Thus, one must weigh the pros and cons of using these data types when doing calculations and writing functions to obtain outcomes that are most aligned with their end goals. Let’s take a look at these data types in use.

a = 4567
b = 45.67
type(a)
int
type(b)
float

These numerical data types can be converted between floats and integers using the float() and int() functions. Let’s see what happens when we convert the integer value 4567 to a float and the float value 45.67 to an integer:

float(a)
4567.0
int(b)
45

We can see that the conversion of an integer to a float simply adds one significant figure after the decimal place. Moreover, converting a float to an integer rounds the number down to the nearest whole number. We can also convert numerical values in strings and boolean data types to integers and floats

print(int(False))
print(float(True))
print(int('45'))
print(float('45'))
0
1.0
45
45.0

Remember, the int() and float() functions can only convert recognized numerical values. A string of letters cannot be converted to a float or integer.

int('Sorry')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[26], line 1
----> 1 int('Sorry')

ValueError: invalid literal for int() with base 10: 'Sorry'

By understanding data types, we can begin to use them in other analyses and functionalities in Python. Next, we will learn how to use data types in comparisons, which can help further down the line in functions (Chapter 3.5), for loops (Chapter 5.3), and subsetting data from DataFrames (Chapter 6.6).