반응형

 

날짜 데이터 처리

 

www.youtube.com/watch?v=x7pu1lnBE날짜 데이터 처리

 

import pandas as pd

 

df1 = pd.read_csv('Data01.csv')

print(df1.shape)df1.head()

 

(15000, 7)

 

날짜                 상품명              바코드             발주가능상태      입고수량         카테고리           출고수량

2019-05-16 V2_0 8.801047e+12 발주가능 384 세탁세제 72
2019-05-16 V2_3 8.801047e+12 발주가능 207 세탁세제 5
2019-05-16 V2_7 8.801047e+12 발주가능 520 헤어케어 19
2019-05-16 V2_11 8.801047e+12 발주가능 0 세탁세제 5
2019-05-16 V2_14 8.801046e+12 발주가능 96 세탁세제 1

 

df1.info()

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 15000 entries, 0 to 14999

Data columns (total 7 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 날짜 15000 non-null object

1 상품명 15000 non-null object

2 바코드 14992 non-null float64

3 발주가능상태 15000 non-null object

4 입고수량 15000 non-null int64

5 카테고리 15000 non-null object

6 출고수량 15000 non-null int64

dtypes: float64(1), int64(2), object(4)

memory usage: 820.4+ KB

 

df1.describe()

 

바코드                                          입고수량                                         출고수량

1.499200e+04 15000.000000 15000.000000
1.294839e+13 58.144533 7.205200
5.243929e+12 348.398250 19.819656
6.414100e+12 0.000000 0.000000
8.801046e+12 0.000000 0.000000
8.801047e+12 0.000000 1.000000
1.880105e+13 0.000000 4.000000
2.880105e+13 9777.000000 308.000000

날짜 데이터 처리

 

df1['날짜']

 

0 2019-05-16

1 2019-05-16

2 2019-05-16

3 2019-05-16

4 2019-05-16

...

14995 2019-08-07

14996 2019-08-07

14997 2019-08-07

14998 2019-08-07

14999 2019-08-07

Name: 날짜, Length: 15000, dtype: object

 

pd.to_datetime(df1['날짜'])

 

0 2019-05-16

1 2019-05-16

2 2019-05-16

3 2019-05-16

4 2019-05-16

...

14995 2019-08-07

14996 2019-08-07

14997 2019-08-07

14998 2019-08-07

14999 2019-08-07

Name: 날짜, Length: 15000, dtype: datetime64[ns]

 

df1['날짜_datetime'] = pd.to_datetime(df1['날짜'])

 

df1.head(2)

 

날짜            상품명            바코드           발주가능상태    입고수량       카테고리     출고수량    날짜_datetime01

2019-05-16 V2_0 8.801047e+12 발주가능 384 세탁세제 72 2019-05-16
2019-05-16 V2_3 8.801047e+12 발주가능 207 세탁세제 5 2019-05-16

 

df1['날짜_datetime'].dt.year

 

0 2019

1 2019

2 2019

3 2019

4 2019

...

14995 2019

14996 2019

14997 2019

14998 2019

14999 2019

Name: 날짜_datetime, Length: 15000, dtype: int64

 

df1['날짜_연도'] = df1['날짜_datetime'].dt.yeardf1['날짜_월'] = df1['날짜_datetime'].dt.monthdf1['날짜_일'] = df1['날짜_datetime'].dt.daydf1['날짜_요일'] = df1['날짜_datetime'].dt.day_name()

 

df1.head(2)

 

날짜      상품명     바코드     발주가능상태 입고수량 카테고리 출고 수량날짜_요일

2019-05-16 V2_0 8.801047e+12 발주가능 384 세탁세제 72 2019-05-16 2019 5 16 Thursday
2019-05-16 V2_3 8.801047e+12 발주가능 207 세탁세제 5 2019-05-16 2019 5 16 Thursday
  • 복잡한 처리

df2 = pd.read_csv('Data02.csv')print(df2.shape)

df2.head()

(46749, 9)

 

결제수단      배송번호     배송시작일   배송완료일   상품구매금액 상품번호    수량          주문경로     주문일

무통장입금 D-20181227-0000648-00 2019-01-07 오전 8:56 2019-01-11 오전 5:32 71450 1077.0 1 PC쇼핑몰 20181227
무통장입금 D-20181229-0000119-00 2019-01-03 오전 9:30 2019-01-07 오전 5:32 141240 42.0 2 모바일웹 20181229
무통장입금 D-20181230-0000100-00 2019-01-03 오전 9:30 2019-01-07 오전 5:32 13910 1271.0 1 모바일웹 20181230
무통장입금 D-20181231-0000087-00 2019-01-03 오전 9:30 2019-01-07 오전 5:32 81600 1085.0 1 모바일웹 20181231
무통장입금 D-20181231-0000108-00 2019-01-07 오전 8:56 2019-01-11 오전 5:32 94710 1106.0 1 모바일웹 20181231

 

df2.info()

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 46749 entries, 0 to 46748

Data columns (total 9 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 결제수단 46749 non-null object

1 배송번호 46749 non-null object

2 배송시작일 46749 non-null object

3 배송완료일 46749 non-null object

4 상품구매금액 46749 non-null int64

5 상품번호 46661 non-null float64

6 수량 46749 non-null int64

7 주문경로 46749 non-null object

8 주문일 46749 non-null int64

dtypes: float64(1), int64(3), object(5)

memory usage: 3.2+ MB

  • 정규식을 이용한 처리 1

# pd.to_datetime(df2['배송시작일'])

 

def date_function(row):

    return row[0:10]

df2['배송시작일'].apply(date_function)

 

0 2019-01-07

1 2019-01-03

2 2019-01-03

3 2019-01-03

4 2019-01-07

...

46744 2017-12-29

46745 2018-01-03

46746 2018-01-02

46747 2018-01-03

46748 2018-01-03

Name: 배송시작일, Length: 46749, dtype: object

 

def date_function(row):

  return row[0:10

]df2['배송시작일(datetime)']= pd.to_datetime(df2['배송시작일'].apply(date_function))

 

df2['배송시작일(datetime)']

 

0 2019-01-07

1 2019-01-03

2 2019-01-03

3 2019-01-03

4 2019-01-07

...

46744 2017-12-29

46745 2018-01-03

46746 2018-01-02

46747 2018-01-03

46748 2018-01-03

Name: 배송시작일(datetime), Length: 46749, dtype: datetime64[ns]

 

df2['배송완료일(datetime)']= pd.to_datetime(df2['배송완료일'].apply(date_function))

 

df2.head(2)

 

결제수단배송번호배송시작일배송완료일상품구매금액상품번호수량주문경로주문일배송시작일(datetime)배송완료일(datetime)01

무통장입금 D-20181227-0000648-00 2019-01-07 오전 8:56 2019-01-11 오전 5:32 71450 1077.0 1 PC쇼핑몰 20181227 2019-01-07 2019-01-11
무통장입금 D-20181229-0000119-00 2019-01-03 오전 9:30 2019-01-07 오전 5:32 141240 42.0 2 모바일웹 20181229 2019-01-03 2019-01-07

 

df2['배송완료일(datetime)']-df2['배송시작일(datetime)']

 

0 4 days

1 4 days

2 4 days

3 4 days

4 4 days

...

46744 4 days

46745 4 days

46746 9 days

46747 3 days

46748 4 days

Length: 46749, dtype: timedelta64[ns]

 

df2['배송 기간'] = df2['배송완료일(datetime)']-df2['배송시작일(datetime)']

 

df2['배송 기간'].describecribe()

 

count 46749

mean 4 days 04:38:57.008278251

std 1 days 04:56:12.144368428

min 0 days 00:00:00

25% 4 days 00:00:00

50% 4 days 00:00:00

75% 4 days 00:00:00

max 47 days 00:00:00

Name: 배송 기간, dtype: object

 

  • 정규식을 이용한 처리 2

pd.to_datetime(df2['주문일'])

 

0 1970-01-01 00:00:00.020181227

1 1970-01-01 00:00:00.020181229

2 1970-01-01 00:00:00.020181230

3 1970-01-01 00:00:00.020181231

4 1970-01-01 00:00:00.020181231

...

46744 1970-01-01 00:00:00.020171229

46745 1970-01-01 00:00:00.020171229

46746 1970-01-01 00:00:00.020171230

46747 1970-01-01 00:00:00.020171231

46748 1970-01-01 00:00:00.020171231

Name: 주문일, Length: 46749, dtype: datetime64[ns]

 

def date_function2(row):

  result = str(row)

  return result

 

df2['주문일'].apply(date_function2)

 

0 20181227

1 20181229

2 20181230

3 20181231

4 20181231

...

46744 20171229

46745 20171229

46746 20171230

46747 20171231

46748 20171231

Name: 주문일, Length: 46749, dtype: object

 

def date_function2(row):

  result = str(row)

  return result[0:4] + '-' + result[4:6] + '-' + result[6:8]

 

df2['주문일'].apply(date_function2)

 

0 2018-12-27

1 2018-12-29

2 2018-12-30

3 2018-12-31

4 2018-12-31

...

46744 2017-12-29

46745 2017-12-29

46746 2017-12-30

46747 2017-12-31

46748 2017-12-31

Name: 주문일, Length: 46749, dtype: object

 

def date_function2(row):

  result = str(row)

  return result[0:4] + '-' + result[4:6] + '-' + result[6:8]

 

df2['주문일(datetime)']= pd.to_datetime(df2['주문일'].apply(date_function2) )df2['주문일(datetime)']

 

0 2018-12-27

1 2018-12-29

2 2018-12-30

3 2018-12-31

4 2018-12-31

...

46744 2017-12-29

46745 2017-12-29

46746 2017-12-30

46747 2017-12-31

46748 2017-12-31

Name: 주문일(datetime), Length: 46749, dtype: datetime64[ns]

반응형

+ Recent posts