The DataFrame of pandas is a very common object used in doing data analysis. This article introduces four common ways (NumPy, dist, list, csv file) to build a DataFrame.
The three main components are as follows.
The default type of columns
and index
: RangeIndex.
columns
, index
.import pandas as pd
import numpy as np
df_np1 = pd.DataFrame(np.arange(12).reshape(3, 4))
print(df_np1)
print(type(df_np1))
# 0 1 2 3
# 0 0 1 2 3
# 1 4 5 6 7
# 2 8 9 10 11
# <class 'pandas.core.indexes.range.RangeIndex'>
columns
, index
.df_np2 = pd.DataFrame(np.arrange(12).reshape(3, 4),
columns=['col_0', 'col_1', 'col_2'],
index=['index_0', 'index_1', 'index_2'])
print(df_np2)
print(type(df_np2))
# col_0 col_1 col_2 col_3
# index_0 0 1 2 3
# index_1 4 5 6 7
# index_2 8 9 10 11
# <class 'pandas.core.indexes.base.Index'>
If you have set columns and index, the type of columns and index is Index.
df_dict = pd.DataFrame({
'col_0': [0, 1, 2],
'col_1': [3, 4, 5],
'col_2': [6, 7, 8]
})
print(df_dict)
# col_0 col_1 col_2
# 0 0 3 6
# 1 1 4 7
# 2 2 5 8
df_list = pd.DataFrame([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
print(df_list)
# 0 1 2
# 0 0 1 2
# 1 3 4 5
# 2 6 7 8
df_csv = pd.read_csv('xxx.csv', index_col=0)
print(df_csv)
# name age
# id
# 1 'A' 30
# 2 'B' 31
# 3 'C' 32
# 4 'D' 33
# 5 'E' 34