Pandas >> How to Select and Modify Data

2021-10-24 Pandas

Table of Contents

In this article, we will talk about how to select data at any position of DataFrame and modify it.

[Pandas] How to Select data of DataFrame


at, iat, loc, iloc

If we want to select and modify the data of a DataFrame, we can use at, iat, loc, iloc properties of DataFrame.

Difference

  • at, loc: using row name and column name to select data
  • iat, iloc: using row index and column index to select data
  • at, iat: select single data of DataFrame
  • loc, iloc: select single or multiple data of DataFrame
  • at, iat are clearly faster than loc, iloc

Firstly, we will prepare data.

import pandas as pd

df = pd.DataFrame({
    "name": ["Kevin", "Jack", "Mary"],
    "score": [80, 90, 95]
}, index=["K", "J", "M"])

DataFrame.at

Get value
We can use at to access a single value for a row/column label pair.
at provide label-based lookup.

df.at["J", "score"]

Result: 90


Set values
We can also use at to set value at specified row/column labels pair.

df.at["M", "score"] = 93
df.at["M", "score"]

Result: 93


DataFrame.loc

Get value
We can use loc to access a group of rows and columns by label(s) or a boolean array.
loc also provide label-based lookup. Additionally it support boolean array.

  • Select single element by specifying index and column label
    This is same as at.
df.loc["M", "name"]

Result: Mary

  • Select whole row only specifying row index label.
    This is defferent from at and at can be used in this way.
df.loc["K"]

Result:

name     Kevin
score       80
Name: K, dtype: object
  • Select a column. In DataFrame every series is a attribute. We can use dot syntax to select column from DataFrame.
df.score

Result:

K    80
J    90
M    95
Name: score, dtype: int64

But we can not use attribute syntax when column name includes spaces. Syntax error will occure.

df.my score

  File "path/to/xxx.py", line 1
    df.my score
          ^
SyntaxError: invalid syntax
  • Select multiple labels
    We can use a list or array of labels in loc[].
df.loc[['K', 'M']]

Result

name score
K Kevin 80
M Mary 95
  • Slice by labels
    We can use labels range to slice data in loc[].
df.loc['J':'M']

Result

name score
J Jack 90
M Mary 95
  • Slice by boolean array
    We can also use boolean array to slice data in loc[]. For example, if we want to slice first and third row, we can give [True, False, True] boolean array to loc[].
df.loc[[True, False, True]]

Result

name score
K Kevin 80
M Mary 95
  • Slice by an alignable boolean Series
    We can also use a series of boolean array to slice data in loc[].
df.loc[pd.Series([True, False, True], index=["K", "J", "M"])]

Result

name score
K Kevin 80
M Mary 95
  • Slice by an alignable index
    We can also use a index to slice data in loc[].
df.loc[pd.Index(['K', 'J'])]

Result

name score
K Kevin 80
J Jack 90
  • Use conditon to filter data
    We can also use a condition returning a boolean series to select data in loc[].
df.loc[df['score'] > 80]

Result

name score
J Jack 90
M Mary 95
df.loc[df['score'] > 80, "name"]

Result

J    Jack
M    Mary
Name: name, dtype: object
  • Use function as conditon to filter data
    We can also use a user defined function or lambda function to return a boolean series to select data in loc[].
df.loc[lambda row: row["score"] < 90]

Result

name score
K Kevin 80
def myfunc(row):
    return row['name'] == 'Mary'
df.loc[myfunc]

Result

name score
M Mary 95

Set values
We can use loc to set values for all matching items.

df.loc[["K", "J"], "score"] = 87
df.loc[["K", "J"]]

Result

name score
K Kevin 87
J Jack 87
df.loc["K"] = "NULL"
df

Result

name score
K NULL NULL
J Jack 87
M Mary 93

Use both at and loc
We also can use the combination of loc and at to access data. Firstly we use loc to get the row series, then use at to get value of some column.

df.loc["K"].at["score"]

Result: 80


But we can’t use the combination of at and loc.

df.at["K"].loc["score"]

TypeError: _get_value() missing 1 required positional argument: 'col'

DataFrame.iat

Select value
The iat attribute accepts row and column indices.

df.iat[0, 1]

Result: 80

  • Extracting values from Series We can use iat attribute to extract values from Series.
df["score"].iat[1]

Result: 90

Setting value

df.iat[0, 1] = 99

Result

name score
K Kevin 99
J Jack 90
M Mary 95

DataFrame.iloc

Select value
Select multixist or array of index to extract multiple rows from DataFrame

df.iloc[[0, 1]]

Result

name score
K Kevin 80
J Jack 90

We can also use slicing syntax like list to extract multiple rows.
Attention: the end of range is excluded.

df.iloc[0:1]

Result

name score
K Kevin 80

We can only specify the index after colon if we want to extract from first row.
Attention: the end of range is excluded.

df.iloc[:2]

Result

name score
K Kevin 80
J Jack 90

We can also only specify the index before colon if we want to extract to last row.

df.iloc[1:]

Result

name score
J Jack 90
M Mary 95

We can specify negative index value if we want to extract rows in reverse order.

df.iloc[-2:-1]

Result

name score
J Jack 90

We can also select subset of columns of a range of rows from DataFrame by specifying both row index range and column index range.

df.iloc[:2, [1]]

Result

score
K 80
J 90

Setting value
We can also update value extracted by iloc.

df.iloc[0:1, :2] = "Updated"

Result

name score
K Updated Updated
J Jack 90
M Mary 95

Subscribe and be the FIRST reader of our latest articles

* indicates required

Contact us