# Arrays

## Contents

# Arrays#

*Evelyn Campbell, Ph.D.*

An *array* is a data structure that consists of a collection of elements organized into a grid-like shape. In Python, arrays can be one-dimensional, akin to a list, or multidimensional (2D, 3D, etc.). However, unlike a list, an array consists of elements that are all of the same data type. This makes arrays convenient for storage and manipulation of data elements. Arrays are offered through the `numpy`

library, and are often used in conjunction with other Python libraries, such as `pandas`

, `scipy`

, and `scikit-learn`

(linked below). We will explore arrays in this section, along with commonly used functions used with arrays.

## Constructing arrays#

To make an array, we first need to import `numpy`

. This makes the methods in the `numpy`

library available to us in our current Python session. Two useful functions in `numpy`

are `array()`

which creates numpy arrays from other data types and `arange()`

which creates arrays containing regularly spaced floating-point numbers. After `import numpy as np`

we can use these methods by prepending `np.`

to their names: `np.array()`

and `np.arange()`

```
import numpy as np
my_list = [30, 50, 70, 90]
my_array = np.array(my_list)
my_array
```

```
array([30, 50, 70, 90])
```

We see here that `np.array()`

can make an array. The `np.arange()`

function can also create arrays. `arange()`

generates arrays of (floating-point) numbers with uniform spacing. When invoked with one argument, `arange()`

generates a list of numbers starting with 1:

```
my_array1 = np.arange(10)
my_array1
```

```
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
```

When used (“called”) with two arguments, `np.arange(start, stop)`

begins at `start`

and ends *before* `stop`

:

```
my_array2 = np.arange(4, 8)
my_array2
```

```
array([4, 5, 6, 7])
```

And with three arguments, `np.arange(start, stop, step)`

begins at `start`

, ends *before* `stop`

, and has an interval `step`

between adjacent elements:

```
my_array3 = np.arange(100,200,10)
my_array3
```

```
array([100, 110, 120, 130, 140, 150, 160, 170, 180, 190])
```

So far, these have been one-dimensional arrays. Arrays have an attribute `.shape`

that is a tuple that counts the number of elements in each dimension:

```
print(my_array1.shape, my_array2.shape, my_array3.shape)
```

```
(10,) (4,) (10,)
```

Each of these shapes has only one element; this tells us these are 1d numpy arrays.

## Mathematical operations with arrays#

Arrays also allow for convenient elementwise calculations. For example, we can easily multiply our two arrays to obtain a new array of values.

```
my_array4 = my_array1 * my_array3
print(my_array1)
print(my_array3)
print(my_array4)
```

```
[0 1 2 3 4 5 6 7 8 9]
[100 110 120 130 140 150 160 170 180 190]
[ 0 110 240 390 560 750 960 1190 1440 1710]
```

The resulting array consists of the products of element-by-element multiplication of the first two arrays. Keep in mind that when performing calculations with multiple arrays, the dimensions of the arrays must be *compatible*.

Performing elementwise operations on arrays of different shapes is called **broadcasting**, and a discussion on array shape compatibility in mathematical operations can be found in the referenced documentation on *Array broadcasting in numpy* below.

More simply, we can also perform a desired calculation on all elements of an array using scalar values:

```
my_array3 / 20 + 7
```

```
array([12. , 12.5, 13. , 13.5, 14. , 14.5, 15. , 15.5, 16. , 16.5])
```

## Reshaping and combining arrays#

Arrays can also be reshaped and combined. We can use the `np.reshape()`

function to change the first two arrays from a 1-dimensional 1x4 array to a 2-dimensional 2x2 array.

```
print(my_array)
print(my_array2)
```

```
[30 50 70 90]
[4 5 6 7]
```

```
reshape1 = np.reshape(my_array, (2,2))
reshape2 = np.reshape(my_array2, (2,2))
```

```
reshape1
```

```
array([[30, 50],
[70, 90]])
```

```
reshape2
```

```
array([[4, 5],
[6, 7]])
```

When combining arrays that have the same shape, we can use the `np.row_stack()`

and `np.column_stack()`

functions to concatenate the rows and columns of multiple arrays, respectively.

```
combined_col = np.column_stack((reshape1, reshape2))
combined_col
```

```
array([[30, 50, 4, 5],
[70, 90, 6, 7]])
```

```
combined_row = np.row_stack((reshape1, reshape2))
combined_row
```

```
array([[30, 50],
[70, 90],
[ 4, 5],
[ 6, 7]])
```

## Array functions#

Construction and reshaping of arrays is an important consideration if you wish to perform aggregate functions on them. Some useful aggregate functions that can be performed on arrays include `np.min()`

, `np.max()`

, `np.sum()`

, and `np.average()`

. These functions can be applied to the entire array or across rows and columns.

```
print(reshape1)
```

```
[[30 50]
[70 90]]
```

```
np.min(reshape1)
```

```
30
```

To apply these functions across columns of the array, use an `axis=0`

argument. To apply them across rows, use an `axis=1`

argument. The returned array will be the same length as the number of columns or rows.

```
np.sum(reshape1, axis=0)
```

```
array([100, 140])
```

```
np.average(reshape1, axis=1)
```

```
array([40., 80.])
```

We can retrieve individual elements of arrays with square brackets containing numbers for the indexes. Since `combined_col`

is a 2d-array, we need two indexes. The conventional order is *rows then columns*.

```
print(reshape1)
```

```
[[30 50]
[70 90]]
```

```
reshape1[0,1] # row 0, column 1
```

```
50
```

```
reshape1[1,1] # row 1, column 1
```

```
90
```

Remember that because of **0-based indexing**; the element in the first row and first column is [0,0] and the element in the second row and second column is [1,1].

And subsets of the array can be retrieved by using **slices** for the indexes. Slices are of the form start:stop. If either start or stop is omitted, the slices go as far as possible (to the beginning or the ending of the array on that axis.

```
combined_col
```

```
array([[30, 50, 4, 5],
[70, 90, 6, 7]])
```

```
combined_col[1, 1:3]
```

```
array([90, 6])
```

If we omit both start and stop, `:`

is a symbol for an index that is shorthand for “all the elements”:

```
combined_col[:, 1:3] # This gives columns 1 and 2 for all the rows.
```

```
array([[50, 4],
[90, 6]])
```

And slicing has one more trick: you can give slices three numbers by adding a second colon. The third number specifies a “step”, causing the slice to take non-adjacent cells:

```
my_array1
```

```
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
```

```
my_array1[2::] # The list from element 2 to the end
```

```
array([2, 3, 4, 5, 6, 7, 8, 9])
```

```
my_array1[2::2] # starting at 2, skipping every other element/Numpy
```

```
array([2, 4, 6, 8])
```

Like with lists, we can use numbers or slices like this if we want subsets of the arrays. Unlike lists, we can ask for two-dimensional slices by putting two slices inside the square brackets.

A number of other aggregate functions can be applied to transform elements within an array. These include `np.sqrt()`

, `np.square()`

, `np.power()`

, `np.log()`

, and many others.

```
np.power(reshape2, 3)
```

```
array([[ 64, 125],
[216, 343]])
```

## Indexing and Slicing#

1D arrays can be indexed similarly to how lists are indexed, as mentioned in the section 4.1.

To begin in demonstrating this, let’s make a new array of string elements called `flowers`

:

```
flowers = np.array(['orchid', 'rose', 'lilac', 'peony', 'tulip', 'dahlia', 'lily', 'violet'])
flowers
```

```
array(['orchid', 'rose', 'lilac', 'peony', 'tulip', 'dahlia', 'lily',
'violet'], dtype='<U6')
```

As seen with lists, arrays utilize *zero-indexing*, meaning that the first element of an array has an index of 0. If we wanted to select the third to the sixth (inclusive) element in `flowers`

, we need recognize that this corresponds to indexes 2 and 5, respectively.

A single colon (:) can be used to slice a range of elements in an array. The format for simple slicing an array is as follows:

```
array[start:end]
```

If used between the indices *j* and *k*, slicing the elements of an array will return all elements between *j* and *k*, __excluding k__.

In this case, we use 2:6 to slice from the third to the sixth element because we want to include the sixth element (which is located at index 5):

```
flowers[2:6]
```

```
array(['lilac', 'peony', 'tulip', 'dahlia'], dtype='<U6')
```

Arrays can also be sliced in intervals. Slicing in intervals takes on the following format:

```
array[start:end:step]
```

The step dictates the spacing between values. If we wanted to slice `flowers`

starting at the second element to the sixth element (inclusive) in steps of two, we would use the following code:

```
flowers[1:6:2]
```

```
array(['rose', 'peony', 'dahlia'], dtype='<U6')
```

Slices can be made without explicitly indicating the start of end or the slice. In these cases, Python will start slicing with the first element if the beginning is not indicated and will stop at the last element if the end is not indicated:

```
flowers[1::2] # Starts at the second element and selects every other element to the end of the array
```

```
array(['rose', 'peony', 'dahlia', 'violet'], dtype='<U6')
```

```
flowers[:4] # Selects every element between the first and forth (inclusive) element
```

```
array(['orchid', 'rose', 'lilac', 'peony'], dtype='<U6')
```

When arrays become very long, it may be useful to use **negative-indexing**. Negative indexing assigns indices of elements starting from the last element. The very last element of an array has an index of -1, the penultimate, -2, and so on. If we want the fourth-from-last element of `flowers`

, we could use the following:

```
flowers[-4]
```

```
'tulip'
```

Negative indexing can be combined with slicing to return elements of an array in a reverse order. For example, we can list all the flowers from `lily`

to `lilac`

in reverse order using the following:

```
flowers[6:1:-2]
```

```
array(['lily', 'tulip', 'lilac'], dtype='<U6')
```

Here, it’s okay that the starting index is greater than the ending index because we indicated the step to go from back to front.

## Nested arrays#

When working with a multidimensional array, the same rules apply, but we have to keep in mind that the arrays within the a 2D array themselves have indices. Let’s construct two more arrays called `fruits`

and `pantone`

and make a combined array with `flowers`

.

```
fruits = np.array(['strawberry', 'banana', 'blueberry', 'pineapple', 'cherry', 'papaya', 'lychee', 'mango'])
pantone = np.array(['emerald', 'polignac', 'saffron', 'fuchsia rose', 'marsala', 'ultra violet', 'mustard', 'lapis blue'])
combined = np.row_stack((flowers, fruits, pantone))
combined
```

```
array([['orchid', 'rose', 'lilac', 'peony', 'tulip', 'dahlia', 'lily',
'violet'],
['strawberry', 'banana', 'blueberry', 'pineapple', 'cherry',
'papaya', 'lychee', 'mango'],
['emerald', 'polignac', 'saffron', 'fuchsia rose', 'marsala',
'ultra violet', 'mustard', 'lapis blue']], dtype='<U12')
```

We can assess the final dimensions of `combined`

using the `shape`

method. The `shape`

method returns a tuple indicating the rows and columns:

```
combined.shape
```

```
(3, 8)
```

Now, we are working with an array of arrays that is 3 rows by 8 columns. As such, we have to be able to determine the index of each array to be able to even access the individual elements within them. For instance, if we wanted to identify the 3rd color within `pantone`

, we first have to be able to index the array with colors:

```
combined[2]
```

```
array(['emerald', 'polignac', 'saffron', 'fuchsia rose', 'marsala',
'ultra violet', 'mustard', 'lapis blue'], dtype='<U12')
```

Once we can get that array, getting the 3rd element would be as easy as identifying the index within `pantone`

:

```
combined[2][2]
```

```
'saffron'
```

To this point, all of these slicing mechanisms are interchangeable between lists and arrays. However, if we wanted to take a particular index in all arrays of `combined`

, the following would work for an array of arrays, but not a list of lists:

```
combined[:,2]
```

```
array(['lilac', 'blueberry', 'saffron'], dtype='<U12')
```

Notice that this gives a different result than the following code:

```
combined[:][2]
```

```
array(['emerald', 'polignac', 'saffron', 'fuchsia rose', 'marsala',
'ultra violet', 'mustard', 'lapis blue'], dtype='<U12')
```

In slicing, a comma separates rows from columns. Hence, `combined[:,2]`

is saying “from all rows of `combined`

, take the column at index 2, while `combined[:][2]`

is saying “from all elements in `combined`

take the element at index 2.” This syntax, while subtle, greatly affects the granularity of slicing.

Ranges can also be used to slice specific subsets of columns and rows, as shown below:

```
combined[0:2,2:6]
```

```
array([['lilac', 'peony', 'tulip', 'dahlia'],
['blueberry', 'pineapple', 'cherry', 'papaya']], dtype='<U12')
```

We see that the above code returned the elements from index 2 to 5 (inclusive) from the `flowers`

and `fruits`

arrays within `combined`

. The `flowers`

and `fruits`

arrays were returned because they are the elements at 0 and 1 index, respectively, within `combined`

.

Using 2-D arrays to store and organize numerical data types can also be very useful. Below, we create a new array of array consisting of a collection of even numbers, numbers divisible by five, and numbers divisible by ten:

```
evens = np.arange(2.0, 11.0, 2)
fives = np.arange(5, 26, 5)
tens = np.arange(10, 51, 10)
num_comb = np.column_stack((evens, fives, tens))
num_comb
```

```
array([[ 2., 5., 10.],
[ 4., 10., 20.],
[ 6., 15., 30.],
[ 8., 20., 40.],
[10., 25., 50.]])
```

We can use slicing to perform operations on specific subsets of `num_comb`

:

```
num_comb[:][-1] + num_comb[3].sum()**2
```

```
array([4634., 4649., 4674.])
```

Furthermore, these slices can be used as input to functions:

```
np.power(num_comb[:,0], 2)
```

```
array([ 4., 16., 36., 64., 100.])
```

When working with multidimensional arrays, being able to subset portions and do calculations via functions or operations can be an easy way to analyze data, particularly, if the array is organized in a meaningful way. From this, new arrays can be created and appended to existing arrays to update or complete a dataset:

```
squared = np.power(num_comb[:,0], 2)
num_comb = np.column_stack((num_comb, squared))
num_comb
```

```
array([[ 2., 5., 10., 4.],
[ 4., 10., 20., 16.],
[ 6., 15., 30., 36.],
[ 8., 20., 40., 64.],
[ 10., 25., 50., 100.]])
```

Arrays are a powerful data type that make for easy and seamless data storage and analysis. The rules around slicing for arrays can be a bit tricky. It’s important to practice to get used indexing and to understand how to access arrays within arrays vs *elements* of said arrays.