Pandas DataFrames
What is a DataFrame?
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.
Example
Create a simple Pandas DataFrame:
1234567891011
import pandas as pd data = { "calories": [420, 380, 390], "duration": [50, 40, 45]} #load data into a DataFrame object:df = pd.DataFrame(data) print(df)
Result
1 2 3 4 |
calories duration 0 420 50 1 380 40 2 390 45 |
Locate Row
As you can see from the result above, the DataFrame is like a table with rows and columns.
Pandas use the loc
attribute to return one or more specified row(s)
Example
Return row 0:
12
#refer to the row index:print(df.loc[0])
Result
1 2 3 |
calories 420 duration 50 Name: 0, dtype: int64 |
Note: This example returns a Pandas Series.
Example
Return row 0 and 1:
12
#use a list of indexes:print(df.loc[[0, 1]])
Result
1 2 3 |
calories duration 0 420 50 1 380 40 |
Note: When using []
, the result is a Pandas DataFrame.
Get Certified!
Named Indexes
With the index
argument, you can name your own indexes.
Example
Add a list of names to give each row a name:
12345678910
import pandas as pd data = { "calories": [420, 380, 390], "duration": [50, 40, 45]} df = pd.DataFrame(data, index = ["day1", "day2", "day3"]) print(df)
Result
1 2 3 4 |
calories duration day1 420 50 day2 380 40 day3 390 45 |
Locate Named Indexes
Use the named index in the loc
attribute to return the specified row(s).
Example
Return “day2”:
12
#refer to the named index:print(df.loc["day2"])
Result
1 2 3 |
calories 380 duration 40 Name: 0, dtype: int64 |
Load Files Into a DataFrame
If your data sets are stored in a file, Pandas can load them into a DataFrame.
Example
Load a comma separated file (CSV file) into a DataFrame:
12345
import pandas as pd df = pd.read_csv('data.csv') print(df)
You will learn more about importing files in the next chapters.