Pandas - Returning single dates for a date range and match weekday binary values

Dataset:

Below dataset is supposed to replicate a timetable data set for a travel company (e.g. routes via train or bus or plane etc.)

df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],

                   'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 

                   'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 

                   'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],

                   'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'], 

                   'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})

    print(df)

operator - operating company e.g. ABC Airlines, DEF Train Company

from - departing from e.g. London, New York, Narnia

to - destination e.g. Paris

valid_from - start of a date range (can be any day of the week) where route is available for purchase for the operator e.g. 2019-11-01

valid_to - end of date range (can be any day of the week) where route is available to purchase for the operator e.g. 2019-11-12

day_of_week - binary representing availability for Sun to Sat e.g. 0101010 means route is available on Mon, Wed, and Fri in the date range

Required:

An output dataset that converts the date range to single individual dates and their availability derived from the day_of_week field. The main goal is to get a clean dataset which can then loaded into Tableau to then build a report that would easily show route availability.

Desired Output:

dfout = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a'], 'from': ['a', 'a', 'a', 'a', 'a', 'a', 'a'], 'to': ['b', 'b', 'b', 'b', 'b', 'b', 'b'], 'date': ['13/11/2018', '14/11/2018', '15/11/2018', '16/11/2018', '17/11/2018', '18/11/2018', '19/11/2018'], 'available': [1, 1, 1, 1, 0, 1, 1]})

print(dfout)

So this would be the output for op_a for the route a to b for date range 2018-11-13 to 2018-11-19.

The dataset is weird as. Date ranges can be quite random, but day_of_week will always show availability for the days of the week in that date range. Some of the same date ranges may even have different day_of_week binary combinations, but essentially if at any point the day_of_week indicates an availability for a given date range, route and operator, then it will be taken to be available for the date.

What I've tried to do:

Using the following for help: Pandas: decompress date range to individual dates

import pandas as pd



df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],

                   'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 

                   'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 

                   'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],

                   'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'], 

                   'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})



df.set_index(['operator', 'from','to'], inplace=True)



df['valid_from'] = pd.to_datetime(df['valid_from'])

df['valid_to'] = pd.to_datetime(df['valid_to'])



df['row'] = range(len(df))

starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})

ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})



df_decomp = pd.concat([starts, ends])

df_decomp = df_decomp.set_index('row', append=True)

df_decomp.sort_index()



df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))

Result looks promising. My final thoughts are to:

add a weekday column that returns the weekday of the date starting with Sunday as 0

add an available column that returns the binary value in day_of_week using weekday as the position index

lastly, to somehow remove duplicate operator,from and to rows and keeping available's that have 1 and dropping those that are 0 or if there are no 1's for those operators'/from's/to's then keep the available as 0...

madness...apologies for the long-windedness and I hope I'm making some sense. Any help on this would be much appreciated.

Edit:

Updated the 'What I've tried to do' part above.

Updated dataset a tad to include a bit more variety in the dates (still the same dataset just adjusted valid_to dates)

edited Nov 20 '18 at 10:59

asked Nov 15 '18 at 17:36

AK91

757

Nicely asked question

– John H
Nov 15 '18 at 17:37

--Thanks @JohnH

– AK91
Nov 17 '18 at 21:29

add a comment |

Dataset:

Below dataset is supposed to replicate a timetable data set for a travel company (e.g. routes via train or bus or plane etc.)

df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],

                   'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 

                   'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 

                   'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],

                   'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'], 

                   'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})

    print(df)

operator - operating company e.g. ABC Airlines, DEF Train Company

from - departing from e.g. London, New York, Narnia

to - destination e.g. Paris

valid_from - start of a date range (can be any day of the week) where route is available for purchase for the operator e.g. 2019-11-01

valid_to - end of date range (can be any day of the week) where route is available to purchase for the operator e.g. 2019-11-12

day_of_week - binary representing availability for Sun to Sat e.g. 0101010 means route is available on Mon, Wed, and Fri in the date range

Required:

Desired Output:

dfout = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a'], 'from': ['a', 'a', 'a', 'a', 'a', 'a', 'a'], 'to': ['b', 'b', 'b', 'b', 'b', 'b', 'b'], 'date': ['13/11/2018', '14/11/2018', '15/11/2018', '16/11/2018', '17/11/2018', '18/11/2018', '19/11/2018'], 'available': [1, 1, 1, 1, 0, 1, 1]})

print(dfout)

So this would be the output for op_a for the route a to b for date range 2018-11-13 to 2018-11-19.

What I've tried to do:

Using the following for help: Pandas: decompress date range to individual dates

import pandas as pd



df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],

                   'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 

                   'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 

                   'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],

                   'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'], 

                   'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})



df.set_index(['operator', 'from','to'], inplace=True)



df['valid_from'] = pd.to_datetime(df['valid_from'])

df['valid_to'] = pd.to_datetime(df['valid_to'])



df['row'] = range(len(df))

starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})

ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})



df_decomp = pd.concat([starts, ends])

df_decomp = df_decomp.set_index('row', append=True)

df_decomp.sort_index()



df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))

Result looks promising. My final thoughts are to:

add a weekday column that returns the weekday of the date starting with Sunday as 0

add an available column that returns the binary value in day_of_week using weekday as the position index

lastly, to somehow remove duplicate operator,from and to rows and keeping available's that have 1 and dropping those that are 0 or if there are no 1's for those operators'/from's/to's then keep the available as 0...

madness...apologies for the long-windedness and I hope I'm making some sense. Any help on this would be much appreciated.

Edit:

Updated the 'What I've tried to do' part above.

Updated dataset a tad to include a bit more variety in the dates (still the same dataset just adjusted valid_to dates)

edited Nov 20 '18 at 10:59

asked Nov 15 '18 at 17:36

AK91

757

Nicely asked question

– John H
Nov 15 '18 at 17:37

--Thanks @JohnH

– AK91
Nov 17 '18 at 21:29

add a comment |

Dataset:

Below dataset is supposed to replicate a timetable data set for a travel company (e.g. routes via train or bus or plane etc.)

df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],

                   'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 

                   'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 

                   'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],

                   'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'], 

                   'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})

    print(df)

operator - operating company e.g. ABC Airlines, DEF Train Company

from - departing from e.g. London, New York, Narnia

to - destination e.g. Paris

valid_from - start of a date range (can be any day of the week) where route is available for purchase for the operator e.g. 2019-11-01

valid_to - end of date range (can be any day of the week) where route is available to purchase for the operator e.g. 2019-11-12

day_of_week - binary representing availability for Sun to Sat e.g. 0101010 means route is available on Mon, Wed, and Fri in the date range

Required:

Desired Output:

dfout = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a'], 'from': ['a', 'a', 'a', 'a', 'a', 'a', 'a'], 'to': ['b', 'b', 'b', 'b', 'b', 'b', 'b'], 'date': ['13/11/2018', '14/11/2018', '15/11/2018', '16/11/2018', '17/11/2018', '18/11/2018', '19/11/2018'], 'available': [1, 1, 1, 1, 0, 1, 1]})

print(dfout)

So this would be the output for op_a for the route a to b for date range 2018-11-13 to 2018-11-19.

What I've tried to do:

Using the following for help: Pandas: decompress date range to individual dates

import pandas as pd



df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],

                   'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 

                   'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 

                   'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],

                   'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'], 

                   'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})



df.set_index(['operator', 'from','to'], inplace=True)



df['valid_from'] = pd.to_datetime(df['valid_from'])

df['valid_to'] = pd.to_datetime(df['valid_to'])



df['row'] = range(len(df))

starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})

ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})



df_decomp = pd.concat([starts, ends])

df_decomp = df_decomp.set_index('row', append=True)

df_decomp.sort_index()



df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))

Result looks promising. My final thoughts are to:

add a weekday column that returns the weekday of the date starting with Sunday as 0

add an available column that returns the binary value in day_of_week using weekday as the position index

lastly, to somehow remove duplicate operator,from and to rows and keeping available's that have 1 and dropping those that are 0 or if there are no 1's for those operators'/from's/to's then keep the available as 0...

madness...apologies for the long-windedness and I hope I'm making some sense. Any help on this would be much appreciated.

Edit:

Updated the 'What I've tried to do' part above.

Updated dataset a tad to include a bit more variety in the dates (still the same dataset just adjusted valid_to dates)

edited Nov 20 '18 at 10:59

asked Nov 15 '18 at 17:36

AK91

757

Dataset:

Below dataset is supposed to replicate a timetable data set for a travel company (e.g. routes via train or bus or plane etc.)

df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],

                   'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 

                   'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 

                   'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],

                   'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'], 

                   'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})

    print(df)

operator - operating company e.g. ABC Airlines, DEF Train Company

from - departing from e.g. London, New York, Narnia

to - destination e.g. Paris

valid_from - start of a date range (can be any day of the week) where route is available for purchase for the operator e.g. 2019-11-01

valid_to - end of date range (can be any day of the week) where route is available to purchase for the operator e.g. 2019-11-12

day_of_week - binary representing availability for Sun to Sat e.g. 0101010 means route is available on Mon, Wed, and Fri in the date range

Required:

Desired Output:

dfout = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a'], 'from': ['a', 'a', 'a', 'a', 'a', 'a', 'a'], 'to': ['b', 'b', 'b', 'b', 'b', 'b', 'b'], 'date': ['13/11/2018', '14/11/2018', '15/11/2018', '16/11/2018', '17/11/2018', '18/11/2018', '19/11/2018'], 'available': [1, 1, 1, 1, 0, 1, 1]})

print(dfout)

So this would be the output for op_a for the route a to b for date range 2018-11-13 to 2018-11-19.

What I've tried to do:

Using the following for help: Pandas: decompress date range to individual dates

import pandas as pd



df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],

                   'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 

                   'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 

                   'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],

                   'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'], 

                   'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})



df.set_index(['operator', 'from','to'], inplace=True)



df['valid_from'] = pd.to_datetime(df['valid_from'])

df['valid_to'] = pd.to_datetime(df['valid_to'])



df['row'] = range(len(df))

starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})

ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})



df_decomp = pd.concat([starts, ends])

df_decomp = df_decomp.set_index('row', append=True)

df_decomp.sort_index()



df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))

Result looks promising. My final thoughts are to:

add a weekday column that returns the weekday of the date starting with Sunday as 0

add an available column that returns the binary value in day_of_week using weekday as the position index

lastly, to somehow remove duplicate operator,from and to rows and keeping available's that have 1 and dropping those that are 0 or if there are no 1's for those operators'/from's/to's then keep the available as 0...

madness...apologies for the long-windedness and I hope I'm making some sense. Any help on this would be much appreciated.

Edit:

Updated the 'What I've tried to do' part above.

Updated dataset a tad to include a bit more variety in the dates (still the same dataset just adjusted valid_to dates)

python pandas dataframe

edited Nov 20 '18 at 10:59

asked Nov 15 '18 at 17:36

AK91

757

edited Nov 20 '18 at 10:59

asked Nov 15 '18 at 17:36

AK91

757

edited Nov 20 '18 at 10:59

asked Nov 15 '18 at 17:36

AK91

757

asked Nov 15 '18 at 17:36

AK91

757

asked Nov 15 '18 at 17:36

AK91

757

Nicely asked question

– John H
Nov 15 '18 at 17:37

--Thanks @JohnH

– AK91
Nov 17 '18 at 21:29

add a comment |

Nicely asked question

– John H
Nov 15 '18 at 17:37

--Thanks @JohnH

– AK91
Nov 17 '18 at 21:29

Nicely asked question

– John H
Nov 15 '18 at 17:37

--Thanks @JohnH

– AK91
Nov 17 '18 at 21:29

add a comment |

2 Answers
2

active

oldest

votes

If you don't care too much about speed, you can use iterrows() and df.at:

import pandas as pd



df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'], 'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '01/05/2019', '01/05/2019'], 'valid_to': ['19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/02/2019', '21/02/2019', '21/02/2019', '21/02/2019', '10/05/2019', '11/05/2019'], 'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})



df['valid_from'] = pd.to_datetime(df['valid_from'])

df['valid_to'] = pd.to_datetime(df['valid_to'])

df['day'] = (df['valid_from']+pd.to_timedelta(1, unit='d')).dt.weekday # gives weekdays : ) = Sunday

print df.head()





df_out = pd.DataFrame(columns=['available', 'date', 'from', 'operator', 'to'])



idx = 0

for i, row in df.iterrows():

    daterange = row['valid_to'] - row['valid_from']

    print daterange.days



    daystring = 52 * (row['day_of_week'])  # extend string to allow going through multiple weeks



    for j in range(daterange.days+1):

        df_out.at[idx, ['available', 'date', 'from', 'operator', 'to']] = [ # replaced set_value with df.at

            int(daystring[j + row['day']]), # use day of the week as starting position

            row['valid_from']+pd.to_timedelta(j, unit='d'),

            row['from'],

            row['operator'],

            row['to']

            ]



        # row['day_of_week'][j]

        idx += 1



df_out.drop_duplicates(inplace=True) # drop all duplicates

df_0 = df_out[df_out['available']==0]

df_1 = df_out[df_out['available']==1]

df_out = df_0.merge(df_1, how='outer', left_on=['date', 'from', 'operator', 'to'], right_on=['date', 'from', 'operator', 'to'])

df_out.fillna(0, inplace=True)



df_out['available'] = df_out['available_x'] + df_out['available_y']

df_out.drop(['available_x', 'available_y'], axis=1, inplace=True)

df_out.sort_values(by='date',inplace=True)

print df_out

edited Nov 20 '18 at 19:04

answered Nov 15 '18 at 19:30

warped

1338

thanks for the answer - only thing is that the day_of_week starts on Sunday, and the date range could start on e.g. a Tuesday, so I would need the day_of_week binary to match the weekday in the date range (if that makes sense?)

– AK91
Nov 16 '18 at 6:19

I think (and correct me if I'm wrong) but your row['day_of_week'][j] within the value param is taking the binary combos from left to right in day_of_week and assigning them against each individual single dates within the date range. Is there a way to alter that part to match the day_of_week against the weekday within the dates in the daterange?

– AK91
Nov 16 '18 at 9:53

you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.

– warped
Nov 16 '18 at 11:32

yes, i think you got it. I had to amend the df['day'] = df['valid_from'].dt.weekday part to df['day'] = np.where(df['valid_from'].dt.weekday == 6, df['valid_from'].dt.weekday - 6, df['valid_from'].dt.weekday + 1) because pandas has Mon as 0 and the dataset's day_of_week has Sun starting at 0. The only things left is to remove duplicates and keep available as either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastly set_value is deprecated so would need to get that fixed somehow...

– AK91
Nov 17 '18 at 13:17

It's a bit hacky, but it should work

– warped
Nov 19 '18 at 9:46

|
show 3 more comments

This done the trick:

import pandas as pd

import numpy as np



# dataset

df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],

                   'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 

                   'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 

                   'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],

                   'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'], 

                   'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})



# set operator, from, to as index

df.set_index(['operator', 'from','to'], inplace=True)



# convert date ranges to datetime types

df['valid_from'] = pd.to_datetime(df['valid_from'])

df['valid_to'] = pd.to_datetime(df['valid_to'])



# bring individual dates in date ranges and stack

df['row'] = range(len(df))

starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})

ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})



df_decomp = pd.concat([starts, ends])

df_decomp = df_decomp.set_index('row', append=True)

df_decomp.sort_index()



df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))



# remove indexes

df_decomp.reset_index(level=3, drop=True, inplace=True)

df_decomp.reset_index(inplace=True)



# create weekday column

df_decomp['weekday'] = np.where(df_decomp['date'].dt.weekday == 6, 

                            df_decomp['date'].dt.weekday - 6, 

                            df_decomp['date'].dt.weekday + 1)



# use weekday to extract availability in day_of_week

df_decomp['available'] = [b[a] for a, b in zip(df_decomp['weekday'], df_decomp['day_of_week'])]

df_decomp['available'] = df_decomp['available'].astype('int')



# sort values and remove duplicates with available=1 taking priority

df_decomp = df_decomp.sort_values('available', ascending=False).drop_duplicates(['operator','from','to','date'])

df_decomp = df_decomp.sort_values(['operator','from','to','date'])



df_decomp

answered Nov 20 '18 at 14:55

AK91

757

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53325057%2fpandas-returning-single-dates-for-a-date-range-and-match-weekday-binary-values%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

If you don't care too much about speed, you can use iterrows() and df.at:

import pandas as pd



df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'], 'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '01/05/2019', '01/05/2019'], 'valid_to': ['19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/02/2019', '21/02/2019', '21/02/2019', '21/02/2019', '10/05/2019', '11/05/2019'], 'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})



df['valid_from'] = pd.to_datetime(df['valid_from'])

df['valid_to'] = pd.to_datetime(df['valid_to'])

df['day'] = (df['valid_from']+pd.to_timedelta(1, unit='d')).dt.weekday # gives weekdays : ) = Sunday

print df.head()





df_out = pd.DataFrame(columns=['available', 'date', 'from', 'operator', 'to'])



idx = 0

for i, row in df.iterrows():

    daterange = row['valid_to'] - row['valid_from']

    print daterange.days



    daystring = 52 * (row['day_of_week'])  # extend string to allow going through multiple weeks



    for j in range(daterange.days+1):

        df_out.at[idx, ['available', 'date', 'from', 'operator', 'to']] = [ # replaced set_value with df.at

            int(daystring[j + row['day']]), # use day of the week as starting position

            row['valid_from']+pd.to_timedelta(j, unit='d'),

            row['from'],

            row['operator'],

            row['to']

            ]



        # row['day_of_week'][j]

        idx += 1



df_out.drop_duplicates(inplace=True) # drop all duplicates

df_0 = df_out[df_out['available']==0]

df_1 = df_out[df_out['available']==1]

df_out = df_0.merge(df_1, how='outer', left_on=['date', 'from', 'operator', 'to'], right_on=['date', 'from', 'operator', 'to'])

df_out.fillna(0, inplace=True)



df_out['available'] = df_out['available_x'] + df_out['available_y']

df_out.drop(['available_x', 'available_y'], axis=1, inplace=True)

df_out.sort_values(by='date',inplace=True)

print df_out

edited Nov 20 '18 at 19:04

answered Nov 15 '18 at 19:30

warped

1338

thanks for the answer - only thing is that the day_of_week starts on Sunday, and the date range could start on e.g. a Tuesday, so I would need the day_of_week binary to match the weekday in the date range (if that makes sense?)

– AK91
Nov 16 '18 at 6:19

I think (and correct me if I'm wrong) but your row['day_of_week'][j] within the value param is taking the binary combos from left to right in day_of_week and assigning them against each individual single dates within the date range. Is there a way to alter that part to match the day_of_week against the weekday within the dates in the daterange?

– AK91
Nov 16 '18 at 9:53

you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.

– warped
Nov 16 '18 at 11:32

yes, i think you got it. I had to amend the df['day'] = df['valid_from'].dt.weekday part to df['day'] = np.where(df['valid_from'].dt.weekday == 6, df['valid_from'].dt.weekday - 6, df['valid_from'].dt.weekday + 1) because pandas has Mon as 0 and the dataset's day_of_week has Sun starting at 0. The only things left is to remove duplicates and keep available as either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastly set_value is deprecated so would need to get that fixed somehow...

– AK91
Nov 17 '18 at 13:17

It's a bit hacky, but it should work

– warped
Nov 19 '18 at 9:46

|
show 3 more comments

If you don't care too much about speed, you can use iterrows() and df.at:

import pandas as pd



df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'], 'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '01/05/2019', '01/05/2019'], 'valid_to': ['19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/02/2019', '21/02/2019', '21/02/2019', '21/02/2019', '10/05/2019', '11/05/2019'], 'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})



df['valid_from'] = pd.to_datetime(df['valid_from'])

df['valid_to'] = pd.to_datetime(df['valid_to'])

df['day'] = (df['valid_from']+pd.to_timedelta(1, unit='d')).dt.weekday # gives weekdays : ) = Sunday

print df.head()





df_out = pd.DataFrame(columns=['available', 'date', 'from', 'operator', 'to'])



idx = 0

for i, row in df.iterrows():

    daterange = row['valid_to'] - row['valid_from']

    print daterange.days



    daystring = 52 * (row['day_of_week'])  # extend string to allow going through multiple weeks



    for j in range(daterange.days+1):

        df_out.at[idx, ['available', 'date', 'from', 'operator', 'to']] = [ # replaced set_value with df.at

            int(daystring[j + row['day']]), # use day of the week as starting position

            row['valid_from']+pd.to_timedelta(j, unit='d'),

            row['from'],

            row['operator'],

            row['to']

            ]



        # row['day_of_week'][j]

        idx += 1



df_out.drop_duplicates(inplace=True) # drop all duplicates

df_0 = df_out[df_out['available']==0]

df_1 = df_out[df_out['available']==1]

df_out = df_0.merge(df_1, how='outer', left_on=['date', 'from', 'operator', 'to'], right_on=['date', 'from', 'operator', 'to'])

df_out.fillna(0, inplace=True)



df_out['available'] = df_out['available_x'] + df_out['available_y']

df_out.drop(['available_x', 'available_y'], axis=1, inplace=True)

df_out.sort_values(by='date',inplace=True)

print df_out

edited Nov 20 '18 at 19:04

answered Nov 15 '18 at 19:30

warped

1338

thanks for the answer - only thing is that the day_of_week starts on Sunday, and the date range could start on e.g. a Tuesday, so I would need the day_of_week binary to match the weekday in the date range (if that makes sense?)

– AK91
Nov 16 '18 at 6:19

I think (and correct me if I'm wrong) but your row['day_of_week'][j] within the value param is taking the binary combos from left to right in day_of_week and assigning them against each individual single dates within the date range. Is there a way to alter that part to match the day_of_week against the weekday within the dates in the daterange?

– AK91
Nov 16 '18 at 9:53

you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.

– warped
Nov 16 '18 at 11:32

yes, i think you got it. I had to amend the df['day'] = df['valid_from'].dt.weekday part to df['day'] = np.where(df['valid_from'].dt.weekday == 6, df['valid_from'].dt.weekday - 6, df['valid_from'].dt.weekday + 1) because pandas has Mon as 0 and the dataset's day_of_week has Sun starting at 0. The only things left is to remove duplicates and keep available as either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastly set_value is deprecated so would need to get that fixed somehow...

– AK91
Nov 17 '18 at 13:17

It's a bit hacky, but it should work

– warped
Nov 19 '18 at 9:46

|
show 3 more comments

If you don't care too much about speed, you can use iterrows() and df.at:

import pandas as pd



df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'], 'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '01/05/2019', '01/05/2019'], 'valid_to': ['19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/02/2019', '21/02/2019', '21/02/2019', '21/02/2019', '10/05/2019', '11/05/2019'], 'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})



df['valid_from'] = pd.to_datetime(df['valid_from'])

df['valid_to'] = pd.to_datetime(df['valid_to'])

df['day'] = (df['valid_from']+pd.to_timedelta(1, unit='d')).dt.weekday # gives weekdays : ) = Sunday

print df.head()





df_out = pd.DataFrame(columns=['available', 'date', 'from', 'operator', 'to'])



idx = 0

for i, row in df.iterrows():

    daterange = row['valid_to'] - row['valid_from']

    print daterange.days



    daystring = 52 * (row['day_of_week'])  # extend string to allow going through multiple weeks



    for j in range(daterange.days+1):

        df_out.at[idx, ['available', 'date', 'from', 'operator', 'to']] = [ # replaced set_value with df.at

            int(daystring[j + row['day']]), # use day of the week as starting position

            row['valid_from']+pd.to_timedelta(j, unit='d'),

            row['from'],

            row['operator'],

            row['to']

            ]



        # row['day_of_week'][j]

        idx += 1



df_out.drop_duplicates(inplace=True) # drop all duplicates

df_0 = df_out[df_out['available']==0]

df_1 = df_out[df_out['available']==1]

df_out = df_0.merge(df_1, how='outer', left_on=['date', 'from', 'operator', 'to'], right_on=['date', 'from', 'operator', 'to'])

df_out.fillna(0, inplace=True)



df_out['available'] = df_out['available_x'] + df_out['available_y']

df_out.drop(['available_x', 'available_y'], axis=1, inplace=True)

df_out.sort_values(by='date',inplace=True)

print df_out

edited Nov 20 '18 at 19:04

answered Nov 15 '18 at 19:30

warped

1338

If you don't care too much about speed, you can use iterrows() and df.at:

import pandas as pd



df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'], 'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '01/05/2019', '01/05/2019'], 'valid_to': ['19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/02/2019', '21/02/2019', '21/02/2019', '21/02/2019', '10/05/2019', '11/05/2019'], 'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})



df['valid_from'] = pd.to_datetime(df['valid_from'])

df['valid_to'] = pd.to_datetime(df['valid_to'])

df['day'] = (df['valid_from']+pd.to_timedelta(1, unit='d')).dt.weekday # gives weekdays : ) = Sunday

print df.head()





df_out = pd.DataFrame(columns=['available', 'date', 'from', 'operator', 'to'])



idx = 0

for i, row in df.iterrows():

    daterange = row['valid_to'] - row['valid_from']

    print daterange.days



    daystring = 52 * (row['day_of_week'])  # extend string to allow going through multiple weeks



    for j in range(daterange.days+1):

        df_out.at[idx, ['available', 'date', 'from', 'operator', 'to']] = [ # replaced set_value with df.at

            int(daystring[j + row['day']]), # use day of the week as starting position

            row['valid_from']+pd.to_timedelta(j, unit='d'),

            row['from'],

            row['operator'],

            row['to']

            ]



        # row['day_of_week'][j]

        idx += 1



df_out.drop_duplicates(inplace=True) # drop all duplicates

df_0 = df_out[df_out['available']==0]

df_1 = df_out[df_out['available']==1]

df_out = df_0.merge(df_1, how='outer', left_on=['date', 'from', 'operator', 'to'], right_on=['date', 'from', 'operator', 'to'])

df_out.fillna(0, inplace=True)



df_out['available'] = df_out['available_x'] + df_out['available_y']

df_out.drop(['available_x', 'available_y'], axis=1, inplace=True)

df_out.sort_values(by='date',inplace=True)

print df_out

edited Nov 20 '18 at 19:04

answered Nov 15 '18 at 19:30

warped

1338

edited Nov 20 '18 at 19:04

answered Nov 15 '18 at 19:30

warped

1338

answered Nov 15 '18 at 19:30

warped

1338

answered Nov 15 '18 at 19:30

warped

1338

thanks for the answer - only thing is that the day_of_week starts on Sunday, and the date range could start on e.g. a Tuesday, so I would need the day_of_week binary to match the weekday in the date range (if that makes sense?)

– AK91
Nov 16 '18 at 6:19

I think (and correct me if I'm wrong) but your row['day_of_week'][j] within the value param is taking the binary combos from left to right in day_of_week and assigning them against each individual single dates within the date range. Is there a way to alter that part to match the day_of_week against the weekday within the dates in the daterange?

– AK91
Nov 16 '18 at 9:53

you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.

– warped
Nov 16 '18 at 11:32

yes, i think you got it. I had to amend the df['day'] = df['valid_from'].dt.weekday part to df['day'] = np.where(df['valid_from'].dt.weekday == 6, df['valid_from'].dt.weekday - 6, df['valid_from'].dt.weekday + 1) because pandas has Mon as 0 and the dataset's day_of_week has Sun starting at 0. The only things left is to remove duplicates and keep available as either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastly set_value is deprecated so would need to get that fixed somehow...

– AK91
Nov 17 '18 at 13:17

It's a bit hacky, but it should work

– warped
Nov 19 '18 at 9:46

|
show 3 more comments

thanks for the answer - only thing is that the day_of_week starts on Sunday, and the date range could start on e.g. a Tuesday, so I would need the day_of_week binary to match the weekday in the date range (if that makes sense?)

– AK91
Nov 16 '18 at 6:19

I think (and correct me if I'm wrong) but your row['day_of_week'][j] within the value param is taking the binary combos from left to right in day_of_week and assigning them against each individual single dates within the date range. Is there a way to alter that part to match the day_of_week against the weekday within the dates in the daterange?

– AK91
Nov 16 '18 at 9:53

you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.

– warped
Nov 16 '18 at 11:32

yes, i think you got it. I had to amend the df['day'] = df['valid_from'].dt.weekday part to df['day'] = np.where(df['valid_from'].dt.weekday == 6, df['valid_from'].dt.weekday - 6, df['valid_from'].dt.weekday + 1) because pandas has Mon as 0 and the dataset's day_of_week has Sun starting at 0. The only things left is to remove duplicates and keep available as either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastly set_value is deprecated so would need to get that fixed somehow...

– AK91
Nov 17 '18 at 13:17

It's a bit hacky, but it should work

– warped
Nov 19 '18 at 9:46

thanks for the answer - only thing is that the day_of_week starts on Sunday, and the date range could start on e.g. a Tuesday, so I would need the day_of_week binary to match the weekday in the date range (if that makes sense?)

– AK91
Nov 16 '18 at 6:19

I think (and correct me if I'm wrong) but your row['day_of_week'][j] within the value param is taking the binary combos from left to right in day_of_week and assigning them against each individual single dates within the date range. Is there a way to alter that part to match the day_of_week against the weekday within the dates in the daterange?

– AK91
Nov 16 '18 at 9:53

you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.

– warped
Nov 16 '18 at 11:32

yes, i think you got it. I had to amend the df['day'] = df['valid_from'].dt.weekday part to

df['day'] = np.where(df['valid_from'].dt.weekday == 6,                      df['valid_from'].dt.weekday - 6,                      df['valid_from'].dt.weekday + 1)

because pandas has Mon as 0 and the dataset's day_of_week has Sun starting at 0. The only things left is to remove duplicates and keep available as either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastly set_value is deprecated so would need to get that fixed somehow...

– AK91
Nov 17 '18 at 13:17

yes, i think you got it. I had to amend the df['day'] = df['valid_from'].dt.weekday part to

df['day'] = np.where(df['valid_from'].dt.weekday == 6,                      df['valid_from'].dt.weekday - 6,                      df['valid_from'].dt.weekday + 1)

It's a bit hacky, but it should work

– warped
Nov 19 '18 at 9:46

|
show 3 more comments

This done the trick:

import pandas as pd

import numpy as np



# dataset

df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],

                   'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 

                   'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 

                   'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],

                   'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'], 

                   'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})



# set operator, from, to as index

df.set_index(['operator', 'from','to'], inplace=True)



# convert date ranges to datetime types

df['valid_from'] = pd.to_datetime(df['valid_from'])

df['valid_to'] = pd.to_datetime(df['valid_to'])



# bring individual dates in date ranges and stack

df['row'] = range(len(df))

starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})

ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})



df_decomp = pd.concat([starts, ends])

df_decomp = df_decomp.set_index('row', append=True)

df_decomp.sort_index()



df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))



# remove indexes

df_decomp.reset_index(level=3, drop=True, inplace=True)

df_decomp.reset_index(inplace=True)



# create weekday column

df_decomp['weekday'] = np.where(df_decomp['date'].dt.weekday == 6, 

                            df_decomp['date'].dt.weekday - 6, 

                            df_decomp['date'].dt.weekday + 1)



# use weekday to extract availability in day_of_week

df_decomp['available'] = [b[a] for a, b in zip(df_decomp['weekday'], df_decomp['day_of_week'])]

df_decomp['available'] = df_decomp['available'].astype('int')



# sort values and remove duplicates with available=1 taking priority

df_decomp = df_decomp.sort_values('available', ascending=False).drop_duplicates(['operator','from','to','date'])

df_decomp = df_decomp.sort_values(['operator','from','to','date'])



df_decomp

answered Nov 20 '18 at 14:55

AK91

757

add a comment |

This done the trick:

import pandas as pd

import numpy as np



# dataset

df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],

                   'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 

                   'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 

                   'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],

                   'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'], 

                   'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})



# set operator, from, to as index

df.set_index(['operator', 'from','to'], inplace=True)



# convert date ranges to datetime types

df['valid_from'] = pd.to_datetime(df['valid_from'])

df['valid_to'] = pd.to_datetime(df['valid_to'])



# bring individual dates in date ranges and stack

df['row'] = range(len(df))

starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})

ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})



df_decomp = pd.concat([starts, ends])

df_decomp = df_decomp.set_index('row', append=True)

df_decomp.sort_index()



df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))



# remove indexes

df_decomp.reset_index(level=3, drop=True, inplace=True)

df_decomp.reset_index(inplace=True)



# create weekday column

df_decomp['weekday'] = np.where(df_decomp['date'].dt.weekday == 6, 

                            df_decomp['date'].dt.weekday - 6, 

                            df_decomp['date'].dt.weekday + 1)



# use weekday to extract availability in day_of_week

df_decomp['available'] = [b[a] for a, b in zip(df_decomp['weekday'], df_decomp['day_of_week'])]

df_decomp['available'] = df_decomp['available'].astype('int')



# sort values and remove duplicates with available=1 taking priority

df_decomp = df_decomp.sort_values('available', ascending=False).drop_duplicates(['operator','from','to','date'])

df_decomp = df_decomp.sort_values(['operator','from','to','date'])



df_decomp

answered Nov 20 '18 at 14:55

AK91

757

add a comment |

This done the trick:

import pandas as pd

import numpy as np



# dataset

df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],

                   'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 

                   'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 

                   'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],

                   'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'], 

                   'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})



# set operator, from, to as index

df.set_index(['operator', 'from','to'], inplace=True)



# convert date ranges to datetime types

df['valid_from'] = pd.to_datetime(df['valid_from'])

df['valid_to'] = pd.to_datetime(df['valid_to'])



# bring individual dates in date ranges and stack

df['row'] = range(len(df))

starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})

ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})



df_decomp = pd.concat([starts, ends])

df_decomp = df_decomp.set_index('row', append=True)

df_decomp.sort_index()



df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))



# remove indexes

df_decomp.reset_index(level=3, drop=True, inplace=True)

df_decomp.reset_index(inplace=True)



# create weekday column

df_decomp['weekday'] = np.where(df_decomp['date'].dt.weekday == 6, 

                            df_decomp['date'].dt.weekday - 6, 

                            df_decomp['date'].dt.weekday + 1)



# use weekday to extract availability in day_of_week

df_decomp['available'] = [b[a] for a, b in zip(df_decomp['weekday'], df_decomp['day_of_week'])]

df_decomp['available'] = df_decomp['available'].astype('int')



# sort values and remove duplicates with available=1 taking priority

df_decomp = df_decomp.sort_values('available', ascending=False).drop_duplicates(['operator','from','to','date'])

df_decomp = df_decomp.sort_values(['operator','from','to','date'])



df_decomp

answered Nov 20 '18 at 14:55

AK91

757

This done the trick:

import pandas as pd

import numpy as np



# dataset

df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],

                   'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 

                   'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 

                   'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],

                   'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'], 

                   'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})



# set operator, from, to as index

df.set_index(['operator', 'from','to'], inplace=True)



# convert date ranges to datetime types

df['valid_from'] = pd.to_datetime(df['valid_from'])

df['valid_to'] = pd.to_datetime(df['valid_to'])



# bring individual dates in date ranges and stack

df['row'] = range(len(df))

starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})

ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})



df_decomp = pd.concat([starts, ends])

df_decomp = df_decomp.set_index('row', append=True)

df_decomp.sort_index()



df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))



# remove indexes

df_decomp.reset_index(level=3, drop=True, inplace=True)

df_decomp.reset_index(inplace=True)



# create weekday column

df_decomp['weekday'] = np.where(df_decomp['date'].dt.weekday == 6, 

                            df_decomp['date'].dt.weekday - 6, 

                            df_decomp['date'].dt.weekday + 1)



# use weekday to extract availability in day_of_week

df_decomp['available'] = [b[a] for a, b in zip(df_decomp['weekday'], df_decomp['day_of_week'])]

df_decomp['available'] = df_decomp['available'].astype('int')



# sort values and remove duplicates with available=1 taking priority

df_decomp = df_decomp.sort_values('available', ascending=False).drop_duplicates(['operator','from','to','date'])

df_decomp = df_decomp.sort_values(['operator','from','to','date'])



df_decomp

answered Nov 20 '18 at 14:55

AK91

757

answered Nov 20 '18 at 14:55

AK91

757

answered Nov 20 '18 at 14:55

AK91

757

answered Nov 20 '18 at 14:55

AK91

757

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ndtyjky