Pandas >> How to Select and Modify Data
Table of Contents
In this article, we will talk about how to select data at any position of DataFrame and modify it.
![[Pandas] How to Select data of DataFrame](/img/pandas_00005.png)
at, iat, loc, iloc
If we want to select and modify the data of a DataFrame, we can use at, iat, loc, iloc properties of DataFrame.
Difference
at,loc: using row name and column name to select dataiat,iloc: using row index and column index to select dataat,iat: select single data of DataFrameloc,iloc: select single or multiple data of DataFrameat,iatare clearly faster thanloc,iloc
Firstly, we will prepare data.
import pandas as pd
df = pd.DataFrame({
"name": ["Kevin", "Jack", "Mary"],
"score": [80, 90, 95]
}, index=["K", "J", "M"])
DataFrame.at
Get value
We can use at to access a single value for a row/column label pair.
at provide label-based lookup.
df.at["J", "score"]
Result: 90
Set values
We can also use at to set value at specified row/column labels pair.
df.at["M", "score"] = 93
df.at["M", "score"]
Result: 93
DataFrame.loc
Get value
We can use loc to access a group of rows and columns by label(s) or a boolean array.
loc also provide label-based lookup. Additionally it support boolean array.
- Select single element by specifying index and column label
This is same asat.
df.loc["M", "name"]
Result: Mary
- Select whole row only specifying row index label.
This is defferent fromatandatcan be used in this way.
df.loc["K"]
Result:
name Kevin
score 80
Name: K, dtype: object
- Select a column.
In
DataFrameeveryseriesis a attribute. We can use dot syntax to select column fromDataFrame.
df.score
Result:
K 80
J 90
M 95
Name: score, dtype: int64
But we can not use attribute syntax when column name includes spaces. Syntax error will occure.
df.my score
File "path/to/xxx.py", line 1
df.my score
^
SyntaxError: invalid syntax
- Select multiple labels
We can use a list or array of labels inloc[].
df.loc[['K', 'M']]
Result
| name | score | |
|---|---|---|
| K | Kevin | 80 |
| M | Mary | 95 |
- Slice by labels
We can use labels range to slice data inloc[].
df.loc['J':'M']
Result
| name | score | |
|---|---|---|
| J | Jack | 90 |
| M | Mary | 95 |
- Slice by boolean array
We can also use boolean array to slice data inloc[]. For example, if we want to slice first and third row, we can give[True, False, True]boolean array toloc[].
df.loc[[True, False, True]]
Result
| name | score | |
|---|---|---|
| K | Kevin | 80 |
| M | Mary | 95 |
- Slice by an alignable boolean Series
We can also use a series of boolean array to slice data inloc[].
df.loc[pd.Series([True, False, True], index=["K", "J", "M"])]
Result
| name | score | |
|---|---|---|
| K | Kevin | 80 |
| M | Mary | 95 |
- Slice by an alignable index
We can also use a index to slice data inloc[].
df.loc[pd.Index(['K', 'J'])]
Result
| name | score | |
|---|---|---|
| K | Kevin | 80 |
| J | Jack | 90 |
- Use conditon to filter data
We can also use a condition returning a boolean series to select data inloc[].
df.loc[df['score'] > 80]
Result
| name | score | |
|---|---|---|
| J | Jack | 90 |
| M | Mary | 95 |
df.loc[df['score'] > 80, "name"]
Result
J Jack
M Mary
Name: name, dtype: object
- Use function as conditon to filter data
We can also use a user defined function or lambda function to return a boolean series to select data inloc[].
df.loc[lambda row: row["score"] < 90]
Result
| name | score | |
|---|---|---|
| K | Kevin | 80 |
def myfunc(row):
return row['name'] == 'Mary'
df.loc[myfunc]
Result
| name | score | |
|---|---|---|
| M | Mary | 95 |
Set values
We can use loc to set values for all matching items.
df.loc[["K", "J"], "score"] = 87
df.loc[["K", "J"]]
Result
| name | score | |
|---|---|---|
| K | Kevin | 87 |
| J | Jack | 87 |
df.loc["K"] = "NULL"
df
Result
| name | score | |
|---|---|---|
| K | NULL | NULL |
| J | Jack | 87 |
| M | Mary | 93 |
Use both at and loc
We also can use the combination of loc and at to access data.
Firstly we use loc to get the row series, then use at to get value of some column.
df.loc["K"].at["score"]
Result: 80
But we can’t use the combination of at and loc.
df.at["K"].loc["score"]
TypeError: _get_value() missing 1 required positional argument: 'col'
DataFrame.iat
Select value
The iat attribute accepts row and column indices.
df.iat[0, 1]
Result: 80
- Extracting values from Series
We can use
iatattribute to extract values fromSeries.
df["score"].iat[1]
Result: 90
Setting value
df.iat[0, 1] = 99
Result
| name | score | |
|---|---|---|
| K | Kevin | 99 |
| J | Jack | 90 |
| M | Mary | 95 |
DataFrame.iloc
Select value
Select multixist or array of index to extract multiple rows from DataFrame
df.iloc[[0, 1]]
Result
| name | score | |
|---|---|---|
| K | Kevin | 80 |
| J | Jack | 90 |
We can also use slicing syntax like list to extract multiple rows.
Attention: the end of range is excluded.
df.iloc[0:1]
Result
| name | score | |
|---|---|---|
| K | Kevin | 80 |
We can only specify the index after colon if we want to extract from first row.
Attention: the end of range is excluded.
df.iloc[:2]
Result
| name | score | |
|---|---|---|
| K | Kevin | 80 |
| J | Jack | 90 |
We can also only specify the index before colon if we want to extract to last row.
df.iloc[1:]
Result
| name | score | |
|---|---|---|
| J | Jack | 90 |
| M | Mary | 95 |
We can specify negative index value if we want to extract rows in reverse order.
df.iloc[-2:-1]
Result
| name | score | |
|---|---|---|
| J | Jack | 90 |
We can also select subset of columns of a range of rows from DataFrame by specifying both row index range and column index range.
df.iloc[:2, [1]]
Result
| score | |
|---|---|
| K | 80 |
| J | 90 |
Setting value
We can also update value extracted by iloc.
df.iloc[0:1, :2] = "Updated"
Result
| name | score | |
|---|---|---|
| K | Updated | Updated |
| J | Jack | 90 |
| M | Mary | 95 |