Data Science Tips - Pandas & Collection of Data sets

1 minute read

Pandas Use

  1. column data type check

     df_raw.dtypes
    
  2. pandas time delta transform

     df_raw['기간'] = pd.to_timedelta(df_raw['기간(분)'], unit='minutes')
     df_raw['최종 시간'] = df_raw['참가 시간'] + df_raw['기간']
    

    https://stackoverflow.com/a/37078012/1349104

    transform time date data

     df_raw['인턴시작일'] = pd.to_datetime(df_raw['인턴시작일'], format='%Y-%m-%d', errors='raise').dt.date
     df_raw['인턴종료일'] = pd.to_datetime(df_raw['인턴종료일'], format='%Y-%m-%d', errors='raise').dt.date
    

    [Pandas] 일자와 시간(dt) 처리법 : 네이버 블로그

  3. typeError-dataframe is not callable. ```python
    1. Should be change from () to []
    2. Before start a loop, you need to declare variable `` Watch out!
  4. next row operation
     df.shift(1)
    
  5. timedelta posive, negative check
     time_deltas < pd.Timedelta(0)
    
  6. grow the dataframe
         define row[]
         and row.append(new row)
         finally pd.Dataframe(row)
    
  7. grow the dataframe
     for index, row in df_excel_data.iterrows() :
         for i in range(4) :
             row_dict = row.to_dict()
             df_team_data = df_team_data.append(row_dict, ignore_index=True)
    
  8. grow the dataframe
         list_row = []
         dic_row = {}
         for i, r in df_final.iterrows() :
             for k, v in r.iteritems() :
                 row = {k: v}
                 dic_row.update(row)
             list_row.append(dic_row)
             dic_row = {}
         print(list_row)
         df_row = pd.DataFrame(list_row)
         print(df_row)
    
  9. add up the rows in dataframe 링크

  10. Nan을 포함한 rows
    df_raw[df_raw.isnull().any(axis=1)]
    
  11. file head 보기
    open(fpath).readlines()[0:2]
    
  12. value_counts() to dataframe
    df_raw['item_name'].value_counts().rename_axis('item_name').reset_index(name='counts')
    
  13. Pandas: Multiple columns into one column
       df.stack().reset_index()
    
  14. split the cell
      df.str.split(delimiter', expand=True)
    
  15. Packages
    from sklearn.preprocessing import StandardScaler
    from sklearn.model_selection import train_test_split
    from keras.models import Sequential
    from keras.layers import Dense
    from keras.callbacks import ModelCheckpoint, EarlyStopping
    
  16. Label 제외 (DataFrame)
    x = df_train_pitcher[df_train_pitcher.columns.difference(['y'])]  # y만 제외함 
    

More…

UCI Machine Learning Repository: Data Sets