Pandas >> How to Select and Modify Data
Table of Contents
In this article, we will talk about how to select data at any position of DataFrame and modify it.
at, iat, loc, iloc
If we want to select and modify the data of a DataFrame
, we can use at, iat, loc, iloc properties of DataFrame
.
Difference
at
,loc
: using row name and column name to select dataiat
,iloc
: using row index and column index to select dataat
,iat
: select single data of DataFrameloc
,iloc
: select single or multiple data of DataFrameat
,iat
are clearly faster thanloc
,iloc
Firstly, we will prepare data.
import pandas as pd
df = pd.DataFrame({
"name": ["Kevin", "Jack", "Mary"],
"score": [80, 90, 95]
}, index=["K", "J", "M"])
DataFrame.at
Get value
We can use at
to access a single value for a row/column label pair.
at
provide label-based lookup.
df.at["J", "score"]
Result: 90
Set values
We can also use at to set value at
specified row/column labels pair.
df.at["M", "score"] = 93
df.at["M", "score"]
Result: 93
DataFrame.loc
Get value
We can use loc
to access a group of rows and columns by label(s) or a boolean array.
loc
also provide label-based lookup. Additionally it support boolean array.
- Select single element by specifying index and column label
This is same asat
.
df.loc["M", "name"]
Result: Mary
- Select whole row only specifying row index label.
This is defferent fromat
andat
can be used in this way.
df.loc["K"]
Result:
name Kevin
score 80
Name: K, dtype: object
- Select a column.
In
DataFrame
everyseries
is a attribute. We can use dot syntax to select column fromDataFrame
.
df.score
Result:
K 80
J 90
M 95
Name: score, dtype: int64
But we can not use attribute syntax when column name includes spaces. Syntax error will occure.
df.my score
File "path/to/xxx.py", line 1
df.my score
^
SyntaxError: invalid syntax
- Select multiple labels
We can use a list or array of labels inloc[]
.
df.loc[['K', 'M']]
Result
name | score | |
---|---|---|
K | Kevin | 80 |
M | Mary | 95 |
- Slice by labels
We can use labels range to slice data inloc[]
.
df.loc['J':'M']
Result
name | score | |
---|---|---|
J | Jack | 90 |
M | Mary | 95 |
- Slice by boolean array
We can also use boolean array to slice data inloc[]
. For example, if we want to slice first and third row, we can give[True, False, True]
boolean array toloc[]
.
df.loc[[True, False, True]]
Result
name | score | |
---|---|---|
K | Kevin | 80 |
M | Mary | 95 |
- Slice by an alignable boolean Series
We can also use a series of boolean array to slice data inloc[]
.
df.loc[pd.Series([True, False, True], index=["K", "J", "M"])]
Result
name | score | |
---|---|---|
K | Kevin | 80 |
M | Mary | 95 |
- Slice by an alignable index
We can also use a index to slice data inloc[]
.
df.loc[pd.Index(['K', 'J'])]
Result
name | score | |
---|---|---|
K | Kevin | 80 |
J | Jack | 90 |
- Use conditon to filter data
We can also use a condition returning a boolean series to select data inloc[]
.
df.loc[df['score'] > 80]
Result
name | score | |
---|---|---|
J | Jack | 90 |
M | Mary | 95 |
df.loc[df['score'] > 80, "name"]
Result
J Jack
M Mary
Name: name, dtype: object
- Use function as conditon to filter data
We can also use a user defined function or lambda function to return a boolean series to select data inloc[]
.
df.loc[lambda row: row["score"] < 90]
Result
name | score | |
---|---|---|
K | Kevin | 80 |
def myfunc(row):
return row['name'] == 'Mary'
df.loc[myfunc]
Result
name | score | |
---|---|---|
M | Mary | 95 |
Set values
We can use loc
to set values for all matching items.
df.loc[["K", "J"], "score"] = 87
df.loc[["K", "J"]]
Result
name | score | |
---|---|---|
K | Kevin | 87 |
J | Jack | 87 |
df.loc["K"] = "NULL"
df
Result
name | score | |
---|---|---|
K | NULL | NULL |
J | Jack | 87 |
M | Mary | 93 |
Use both at
and loc
We also can use the combination of loc
and at
to access data.
Firstly we use loc
to get the row series, then use at
to get value of some column.
df.loc["K"].at["score"]
Result: 80
But we can’t use the combination of at
and loc
.
df.at["K"].loc["score"]
TypeError: _get_value() missing 1 required positional argument: 'col'
DataFrame.iat
Select value
The iat
attribute accepts row and column indices.
df.iat[0, 1]
Result: 80
- Extracting values from Series
We can use
iat
attribute to extract values fromSeries
.
df["score"].iat[1]
Result: 90
Setting value
df.iat[0, 1] = 99
Result
name | score | |
---|---|---|
K | Kevin | 99 |
J | Jack | 90 |
M | Mary | 95 |
DataFrame.iloc
Select value
Select multixist or array of index to extract multiple rows from DataFrame
df.iloc[[0, 1]]
Result
name | score | |
---|---|---|
K | Kevin | 80 |
J | Jack | 90 |
We can also use slicing syntax like list to extract multiple rows.
Attention: the end of range is excluded.
df.iloc[0:1]
Result
name | score | |
---|---|---|
K | Kevin | 80 |
We can only specify the index after colon if we want to extract from first row.
Attention: the end of range is excluded.
df.iloc[:2]
Result
name | score | |
---|---|---|
K | Kevin | 80 |
J | Jack | 90 |
We can also only specify the index before colon if we want to extract to last row.
df.iloc[1:]
Result
name | score | |
---|---|---|
J | Jack | 90 |
M | Mary | 95 |
We can specify negative index value if we want to extract rows in reverse order.
df.iloc[-2:-1]
Result
name | score | |
---|---|---|
J | Jack | 90 |
We can also select subset of columns of a range of rows from DataFrame by specifying both row index range and column index range.
df.iloc[:2, [1]]
Result
score | |
---|---|
K | 80 |
J | 90 |
Setting value
We can also update value extracted by iloc.
df.iloc[0:1, :2] = "Updated"
Result
name | score | |
---|---|---|
K | Updated | Updated |
J | Jack | 90 |
M | Mary | 95 |