Data Science Tips - Pandas & Collection of Data sets
Pandas Use
-
column data type check
df_raw.dtypes
-
pandas time delta transform
df_raw['기간'] = pd.to_timedelta(df_raw['기간(분)'], unit='minutes') df_raw['최종 시간'] = df_raw['참가 시간'] + df_raw['기간']
https://stackoverflow.com/a/37078012/1349104
transform time date data
df_raw['인턴시작일'] = pd.to_datetime(df_raw['인턴시작일'], format='%Y-%m-%d', errors='raise').dt.date df_raw['인턴종료일'] = pd.to_datetime(df_raw['인턴종료일'], format='%Y-%m-%d', errors='raise').dt.date
- typeError-dataframe is not callable.
```python
- Should be change from () to []
- Before start a loop, you need to declare variable
- next row operation
df.shift(1)
- timedelta posive, negative check
time_deltas < pd.Timedelta(0)
- grow the dataframe
define row[] and row.append(new row) finally pd.Dataframe(row)
- grow the dataframe
for index, row in df_excel_data.iterrows() : for i in range(4) : row_dict = row.to_dict() df_team_data = df_team_data.append(row_dict, ignore_index=True)
- grow the dataframe
list_row = [] dic_row = {} for i, r in df_final.iterrows() : for k, v in r.iteritems() : row = {k: v} dic_row.update(row) list_row.append(dic_row) dic_row = {} print(list_row) df_row = pd.DataFrame(list_row) print(df_row)
-
add up the rows in dataframe 링크
- Nan을 포함한 rows
df_raw[df_raw.isnull().any(axis=1)]
- file head 보기
open(fpath).readlines()[0:2]
- value_counts() to dataframe
df_raw['item_name'].value_counts().rename_axis('item_name').reset_index(name='counts')
- Pandas: Multiple columns into one column
df.stack().reset_index()
- split the cell
df.str.split(delimiter', expand=True)
- Packages
from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split from keras.models import Sequential from keras.layers import Dense from keras.callbacks import ModelCheckpoint, EarlyStopping
- Label 제외 (DataFrame)
x = df_train_pitcher[df_train_pitcher.columns.difference(['y'])] # y만 제외함
More…