How to Select and Modify Data

2021-10-24 Pandas

In this article, we will talk about how to select data at any position of DataFrame and modify it.

[Pandas] How to Select data of DataFrame

at, iat, loc, iloc

If we want to select and modify the data of a DataFrame, we can use at, iat, loc, iloc properties of DataFrame.

Difference

at, loc: using row name and column name to select data
iat, iloc: using row index and column index to select data
at, iat: select single data of DataFrame
loc, iloc: select single or multiple data of DataFrame
at, iat are clearly faster than loc, iloc

Firstly, we will prepare data.

import pandas as pd

df = pd.DataFrame({
    "name": ["Kevin", "Jack", "Mary"],
    "score": [80, 90, 95]
}, index=["K", "J", "M"])

DataFrame.at

Get value
We can use at to access a single value for a row/column label pair.
at provide label-based lookup.

df.at["J", "score"]

Result: 90

Set values
We can also use at to set value at specified row/column labels pair.

df.at["M", "score"] = 93
df.at["M", "score"]

Result: 93

DataFrame.loc

Get value
We can use loc to access a group of rows and columns by label(s) or a boolean array.
loc also provide label-based lookup. Additionally it support boolean array.

Select single element by specifying index and column label
This is same as at.

df.loc["M", "name"]

Result: Mary

Select whole row only specifying row index label.
This is defferent from at and at can be used in this way.

df.loc["K"]

Result:

name     Kevin
score       80
Name: K, dtype: object

Select a column. In DataFrame every series is a attribute. We can use dot syntax to select column from DataFrame.

df.score

Result:

K    80
J    90
M    95
Name: score, dtype: int64

But we can not use attribute syntax when column name includes spaces. Syntax error will occure.

df.my score

  File "path/to/xxx.py", line 1
    df.my score
          ^
SyntaxError: invalid syntax

Select multiple labels
We can use a list or array of labels in loc[].

df.loc[['K', 'M']]

Result

	name	score
K	Kevin	80
M	Mary	95

Slice by labels
We can use labels range to slice data in loc[].

df.loc['J':'M']

Result

	name	score
J	Jack	90
M	Mary	95

Slice by boolean array
We can also use boolean array to slice data in loc[]. For example, if we want to slice first and third row, we can give [True, False, True] boolean array to loc[].

df.loc[[True, False, True]]

Result

	name	score
K	Kevin	80
M	Mary	95

Slice by an alignable boolean Series
We can also use a series of boolean array to slice data in loc[].

df.loc[pd.Series([True, False, True], index=["K", "J", "M"])]

Result

	name	score
K	Kevin	80
M	Mary	95

Slice by an alignable index
We can also use a index to slice data in loc[].

df.loc[pd.Index(['K', 'J'])]

Result

	name	score
K	Kevin	80
J	Jack	90

Use conditon to filter data
We can also use a condition returning a boolean series to select data in loc[].

df.loc[df['score'] > 80]

Result

	name	score
J	Jack	90
M	Mary	95

df.loc[df['score'] > 80, "name"]

Result

J    Jack
M    Mary
Name: name, dtype: object

Use function as conditon to filter data
We can also use a user defined function or lambda function to return a boolean series to select data in loc[].

df.loc[lambda row: row["score"] < 90]

Result

	name	score
K	Kevin	80

def myfunc(row):
    return row['name'] == 'Mary'
df.loc[myfunc]

Result

	name	score
M	Mary	95

Set values
We can use loc to set values for all matching items.

df.loc[["K", "J"], "score"] = 87
df.loc[["K", "J"]]

Result

	name	score
K	Kevin	87
J	Jack	87

df.loc["K"] = "NULL"
df

Result

	name	score
K	NULL	NULL
J	Jack	87
M	Mary	93

Use both at and loc
We also can use the combination of loc and at to access data. Firstly we use loc to get the row series, then use at to get value of some column.

df.loc["K"].at["score"]

Result: 80

But we can’t use the combination of at and loc.

df.at["K"].loc["score"]

TypeError: _get_value() missing 1 required positional argument: 'col'

DataFrame.iat

Select value
The iat attribute accepts row and column indices.

df.iat[0, 1]

Result: 80

Extracting values from Series We can use iat attribute to extract values from Series.

df["score"].iat[1]

Result: 90

Setting value

df.iat[0, 1] = 99

Result

	name	score
K	Kevin	99
J	Jack	90
M	Mary	95

DataFrame.iloc

Select value
Select multixist or array of index to extract multiple rows from DataFrame

df.iloc[[0, 1]]

Result

	name	score
K	Kevin	80
J	Jack	90

We can also use slicing syntax like list to extract multiple rows.
Attention: the end of range is excluded.

df.iloc[0:1]

Result

	name	score
K	Kevin	80

We can only specify the index after colon if we want to extract from first row.
Attention: the end of range is excluded.

df.iloc[:2]

Result

	name	score
K	Kevin	80
J	Jack	90

We can also only specify the index before colon if we want to extract to last row.

df.iloc[1:]

Result

	name	score
J	Jack	90
M	Mary	95

We can specify negative index value if we want to extract rows in reverse order.

df.iloc[-2:-1]

Result

	name	score
J	Jack	90

We can also select subset of columns of a range of rows from DataFrame by specifying both row index range and column index range.

df.iloc[:2, [1]]

Result

	score
K	80
J	90

Setting value
We can also update value extracted by iloc.

df.iloc[0:1, :2] = "Updated"

Result

	name	score
K	Updated	Updated
J	Jack	90
M	Mary	95

That's it ! Code Snippets

Pandas >> How to Select and Modify Data

Table of Contents

at, iat, loc, iloc

Difference

DataFrame.at

DataFrame.loc

DataFrame.iat

DataFrame.iloc

Contact us

Pandas >> How to Select and Modify Data

Table of Contents

at, iat, loc, iloc

Difference

DataFrame.at

DataFrame.loc

DataFrame.iat

DataFrame.iloc

Subscribe and be the FIRST reader of our latest articles

Contact us