Accessing Columns
Contents
Accessing Columns#
Listing columns#
One DataFrame
property is columns
. And it shows that we assigned the following column names.
planets.columns
Index(['name', 'solar_distance_km_6', 'mass_kg_24', 'density_kg_m3',
'gravity_m_s2'],
dtype='object')
pandas has constructed an Index
object for our columns. But we can get back our column list
with the Index
method tolist
.
planets.columns.tolist()
['name', 'solar_distance_km_6', 'mass_kg_24', 'density_kg_m3', 'gravity_m_s2']
Extracting features#
Let’s take a look at the first dimension of our data, the features, such as name
.
planets.name
0 Mercury
1 Venus
2 Earth
3 Mars
4 Jupiter
5 Saturn
6 Uranus
7 Neptune
Name: name, dtype: object
Above, we’ve extracted from our data a one-dimensional sequence of the names of the planets in our solar system – even though there was no such list in our initial input data planets_data
.
Of course, we did specify a sequence like the above when constructing the DataFrame
from planets_dict
. And, regardless of how the DataFrame
is constructed, we can treat it like a dictionary, as well.
Above, we extracted the name
feature of our data using its namesake property. We can alternatively do the same via dictionary subscription, specifying the feature to extract as a string.
planets['name']
0 Mercury
1 Venus
2 Earth
3 Mars
4 Jupiter
5 Saturn
6 Uranus
7 Neptune
Name: name, dtype: object
Note
Accessing columns via their associated DataFrame
properties, as in the first example, can be convenient. But the dictionary subscription syntax can be more explicit, and it becomes necessary when column names preclude their use as properties.
For example, a feature named name-old
, accessed as planets.name-old
, would be interpreted by Python as planets.name - old
– that is, the name
feature minus some entity named old
….
The Series#
The sequence of our extracted feature is another data type provided by pandas: the Series
.
type(planets.name)
pandas.core.series.Series
The pandas Series
bears similarities to the DataFrame
, but the Series
handles data one-dimensionally – like Python’s list
.
And like the list
, the DataFrame
, and the Index
, the Series
provides methods of its own.
We can also extract the next feature, representing the distances of these planets from the sun, in 106 km.
planets.solar_distance_km_6
0 57.9
1 108.2
2 149.6
3 227.9
4 778.6
5 1433.5
6 2872.5
7 4495.1
Name: solar_distance_km_6, dtype: float64
And we can compute aggregates of this data, such as the average or mean, thanks to the Series
method, mean
.
planets.solar_distance_km_6.mean()
1265.4125000000001
Feature selection#
We can also extract multiple features from our DataFrame
, to produce another two-dimensional DataFrame
, consisting of only the features specified.
This can also be achieved via dictionary subscription, specifying a list
of features to include in the resulting DataFrame
.
planets[['name', 'solar_distance_km_6']]
name | solar_distance_km_6 | |
---|---|---|
0 | Mercury | 57.9 |
1 | Venus | 108.2 |
2 | Earth | 149.6 |
3 | Mars | 227.9 |
4 | Jupiter | 778.6 |
5 | Saturn | 1433.5 |
6 | Uranus | 2872.5 |
7 | Neptune | 4495.1 |
Attention
Above we doubled our square brackets – the outer set indicating the subscription operation and the inner set the list
of features to include in our slice.
Omission of either set of brackets will result in an error.
In subsequent sections we’ll learn more methods of slicing a DataFrame
– such as loc
– in Selection by Label.