Pandas - Returning single dates for a date range and match weekday binary values












1















Dataset:



Below dataset is supposed to replicate a timetable data set for a travel company (e.g. routes via train or bus or plane etc.)



df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
print(df)


operator - operating company e.g. ABC Airlines, DEF Train Company



from - departing from e.g. London, New York, Narnia



to - destination e.g. Paris



valid_from - start of a date range (can be any day of the week) where route is available for purchase for the operator e.g. 2019-11-01



valid_to - end of date range (can be any day of the week) where route is available to purchase for the operator e.g. 2019-11-12



day_of_week - binary representing availability for Sun to Sat e.g. 0101010 means route is available on Mon, Wed, and Fri in the date range



Required:



An output dataset that converts the date range to single individual dates and their availability derived from the day_of_week field. The main goal is to get a clean dataset which can then loaded into Tableau to then build a report that would easily show route availability.



Desired Output:



dfout = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a'], 'from': ['a', 'a', 'a', 'a', 'a', 'a', 'a'], 'to': ['b', 'b', 'b', 'b', 'b', 'b', 'b'], 'date': ['13/11/2018', '14/11/2018', '15/11/2018', '16/11/2018', '17/11/2018', '18/11/2018', '19/11/2018'], 'available': [1, 1, 1, 1, 0, 1, 1]})
print(dfout)


So this would be the output for op_a for the route a to b for date range 2018-11-13 to 2018-11-19.



The dataset is weird as. Date ranges can be quite random, but day_of_week will always show availability for the days of the week in that date range. Some of the same date ranges may even have different day_of_week binary combinations, but essentially if at any point the day_of_week indicates an availability for a given date range, route and operator, then it will be taken to be available for the date.



What I've tried to do:



Using the following for help: Pandas: decompress date range to individual dates



import pandas as pd

df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})

df.set_index(['operator', 'from','to'], inplace=True)

df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])

df['row'] = range(len(df))
starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})
ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})

df_decomp = pd.concat([starts, ends])
df_decomp = df_decomp.set_index('row', append=True)
df_decomp.sort_index()

df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))


Result looks promising. My final thoughts are to:




  1. add a weekday column that returns the weekday of the date starting with Sunday as 0

  2. add an available column that returns the binary value in day_of_week using weekday as the position index

  3. lastly, to somehow remove duplicate operator,from and to rows and keeping available's that have 1 and dropping those that are 0 or if there are no 1's for those operators'/from's/to's then keep the available as 0...


madness...apologies for the long-windedness and I hope I'm making some sense. Any help on this would be much appreciated.



Edit:




  • Updated the 'What I've tried to do' part above.

  • Updated dataset a tad to include a bit more variety in the dates (still the same dataset just adjusted valid_to dates)










share|improve this question

























  • Nicely asked question

    – John H
    Nov 15 '18 at 17:37











  • --Thanks @JohnH

    – AK91
    Nov 17 '18 at 21:29


















1















Dataset:



Below dataset is supposed to replicate a timetable data set for a travel company (e.g. routes via train or bus or plane etc.)



df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
print(df)


operator - operating company e.g. ABC Airlines, DEF Train Company



from - departing from e.g. London, New York, Narnia



to - destination e.g. Paris



valid_from - start of a date range (can be any day of the week) where route is available for purchase for the operator e.g. 2019-11-01



valid_to - end of date range (can be any day of the week) where route is available to purchase for the operator e.g. 2019-11-12



day_of_week - binary representing availability for Sun to Sat e.g. 0101010 means route is available on Mon, Wed, and Fri in the date range



Required:



An output dataset that converts the date range to single individual dates and their availability derived from the day_of_week field. The main goal is to get a clean dataset which can then loaded into Tableau to then build a report that would easily show route availability.



Desired Output:



dfout = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a'], 'from': ['a', 'a', 'a', 'a', 'a', 'a', 'a'], 'to': ['b', 'b', 'b', 'b', 'b', 'b', 'b'], 'date': ['13/11/2018', '14/11/2018', '15/11/2018', '16/11/2018', '17/11/2018', '18/11/2018', '19/11/2018'], 'available': [1, 1, 1, 1, 0, 1, 1]})
print(dfout)


So this would be the output for op_a for the route a to b for date range 2018-11-13 to 2018-11-19.



The dataset is weird as. Date ranges can be quite random, but day_of_week will always show availability for the days of the week in that date range. Some of the same date ranges may even have different day_of_week binary combinations, but essentially if at any point the day_of_week indicates an availability for a given date range, route and operator, then it will be taken to be available for the date.



What I've tried to do:



Using the following for help: Pandas: decompress date range to individual dates



import pandas as pd

df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})

df.set_index(['operator', 'from','to'], inplace=True)

df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])

df['row'] = range(len(df))
starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})
ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})

df_decomp = pd.concat([starts, ends])
df_decomp = df_decomp.set_index('row', append=True)
df_decomp.sort_index()

df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))


Result looks promising. My final thoughts are to:




  1. add a weekday column that returns the weekday of the date starting with Sunday as 0

  2. add an available column that returns the binary value in day_of_week using weekday as the position index

  3. lastly, to somehow remove duplicate operator,from and to rows and keeping available's that have 1 and dropping those that are 0 or if there are no 1's for those operators'/from's/to's then keep the available as 0...


madness...apologies for the long-windedness and I hope I'm making some sense. Any help on this would be much appreciated.



Edit:




  • Updated the 'What I've tried to do' part above.

  • Updated dataset a tad to include a bit more variety in the dates (still the same dataset just adjusted valid_to dates)










share|improve this question

























  • Nicely asked question

    – John H
    Nov 15 '18 at 17:37











  • --Thanks @JohnH

    – AK91
    Nov 17 '18 at 21:29
















1












1








1








Dataset:



Below dataset is supposed to replicate a timetable data set for a travel company (e.g. routes via train or bus or plane etc.)



df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
print(df)


operator - operating company e.g. ABC Airlines, DEF Train Company



from - departing from e.g. London, New York, Narnia



to - destination e.g. Paris



valid_from - start of a date range (can be any day of the week) where route is available for purchase for the operator e.g. 2019-11-01



valid_to - end of date range (can be any day of the week) where route is available to purchase for the operator e.g. 2019-11-12



day_of_week - binary representing availability for Sun to Sat e.g. 0101010 means route is available on Mon, Wed, and Fri in the date range



Required:



An output dataset that converts the date range to single individual dates and their availability derived from the day_of_week field. The main goal is to get a clean dataset which can then loaded into Tableau to then build a report that would easily show route availability.



Desired Output:



dfout = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a'], 'from': ['a', 'a', 'a', 'a', 'a', 'a', 'a'], 'to': ['b', 'b', 'b', 'b', 'b', 'b', 'b'], 'date': ['13/11/2018', '14/11/2018', '15/11/2018', '16/11/2018', '17/11/2018', '18/11/2018', '19/11/2018'], 'available': [1, 1, 1, 1, 0, 1, 1]})
print(dfout)


So this would be the output for op_a for the route a to b for date range 2018-11-13 to 2018-11-19.



The dataset is weird as. Date ranges can be quite random, but day_of_week will always show availability for the days of the week in that date range. Some of the same date ranges may even have different day_of_week binary combinations, but essentially if at any point the day_of_week indicates an availability for a given date range, route and operator, then it will be taken to be available for the date.



What I've tried to do:



Using the following for help: Pandas: decompress date range to individual dates



import pandas as pd

df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})

df.set_index(['operator', 'from','to'], inplace=True)

df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])

df['row'] = range(len(df))
starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})
ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})

df_decomp = pd.concat([starts, ends])
df_decomp = df_decomp.set_index('row', append=True)
df_decomp.sort_index()

df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))


Result looks promising. My final thoughts are to:




  1. add a weekday column that returns the weekday of the date starting with Sunday as 0

  2. add an available column that returns the binary value in day_of_week using weekday as the position index

  3. lastly, to somehow remove duplicate operator,from and to rows and keeping available's that have 1 and dropping those that are 0 or if there are no 1's for those operators'/from's/to's then keep the available as 0...


madness...apologies for the long-windedness and I hope I'm making some sense. Any help on this would be much appreciated.



Edit:




  • Updated the 'What I've tried to do' part above.

  • Updated dataset a tad to include a bit more variety in the dates (still the same dataset just adjusted valid_to dates)










share|improve this question
















Dataset:



Below dataset is supposed to replicate a timetable data set for a travel company (e.g. routes via train or bus or plane etc.)



df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
print(df)


operator - operating company e.g. ABC Airlines, DEF Train Company



from - departing from e.g. London, New York, Narnia



to - destination e.g. Paris



valid_from - start of a date range (can be any day of the week) where route is available for purchase for the operator e.g. 2019-11-01



valid_to - end of date range (can be any day of the week) where route is available to purchase for the operator e.g. 2019-11-12



day_of_week - binary representing availability for Sun to Sat e.g. 0101010 means route is available on Mon, Wed, and Fri in the date range



Required:



An output dataset that converts the date range to single individual dates and their availability derived from the day_of_week field. The main goal is to get a clean dataset which can then loaded into Tableau to then build a report that would easily show route availability.



Desired Output:



dfout = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a'], 'from': ['a', 'a', 'a', 'a', 'a', 'a', 'a'], 'to': ['b', 'b', 'b', 'b', 'b', 'b', 'b'], 'date': ['13/11/2018', '14/11/2018', '15/11/2018', '16/11/2018', '17/11/2018', '18/11/2018', '19/11/2018'], 'available': [1, 1, 1, 1, 0, 1, 1]})
print(dfout)


So this would be the output for op_a for the route a to b for date range 2018-11-13 to 2018-11-19.



The dataset is weird as. Date ranges can be quite random, but day_of_week will always show availability for the days of the week in that date range. Some of the same date ranges may even have different day_of_week binary combinations, but essentially if at any point the day_of_week indicates an availability for a given date range, route and operator, then it will be taken to be available for the date.



What I've tried to do:



Using the following for help: Pandas: decompress date range to individual dates



import pandas as pd

df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})

df.set_index(['operator', 'from','to'], inplace=True)

df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])

df['row'] = range(len(df))
starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})
ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})

df_decomp = pd.concat([starts, ends])
df_decomp = df_decomp.set_index('row', append=True)
df_decomp.sort_index()

df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))


Result looks promising. My final thoughts are to:




  1. add a weekday column that returns the weekday of the date starting with Sunday as 0

  2. add an available column that returns the binary value in day_of_week using weekday as the position index

  3. lastly, to somehow remove duplicate operator,from and to rows and keeping available's that have 1 and dropping those that are 0 or if there are no 1's for those operators'/from's/to's then keep the available as 0...


madness...apologies for the long-windedness and I hope I'm making some sense. Any help on this would be much appreciated.



Edit:




  • Updated the 'What I've tried to do' part above.

  • Updated dataset a tad to include a bit more variety in the dates (still the same dataset just adjusted valid_to dates)







python pandas dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 '18 at 10:59







AK91

















asked Nov 15 '18 at 17:36









AK91AK91

757




757













  • Nicely asked question

    – John H
    Nov 15 '18 at 17:37











  • --Thanks @JohnH

    – AK91
    Nov 17 '18 at 21:29





















  • Nicely asked question

    – John H
    Nov 15 '18 at 17:37











  • --Thanks @JohnH

    – AK91
    Nov 17 '18 at 21:29



















Nicely asked question

– John H
Nov 15 '18 at 17:37





Nicely asked question

– John H
Nov 15 '18 at 17:37













--Thanks @JohnH

– AK91
Nov 17 '18 at 21:29







--Thanks @JohnH

– AK91
Nov 17 '18 at 21:29














2 Answers
2






active

oldest

votes


















1














If you don't care too much about speed, you can use iterrows() and df.at:



import pandas as pd

df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'], 'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '01/05/2019', '01/05/2019'], 'valid_to': ['19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/02/2019', '21/02/2019', '21/02/2019', '21/02/2019', '10/05/2019', '11/05/2019'], 'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})

df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])
df['day'] = (df['valid_from']+pd.to_timedelta(1, unit='d')).dt.weekday # gives weekdays : ) = Sunday
print df.head()


df_out = pd.DataFrame(columns=['available', 'date', 'from', 'operator', 'to'])

idx = 0
for i, row in df.iterrows():
daterange = row['valid_to'] - row['valid_from']
print daterange.days

daystring = 52 * (row['day_of_week']) # extend string to allow going through multiple weeks

for j in range(daterange.days+1):
df_out.at[idx, ['available', 'date', 'from', 'operator', 'to']] = [ # replaced set_value with df.at
int(daystring[j + row['day']]), # use day of the week as starting position
row['valid_from']+pd.to_timedelta(j, unit='d'),
row['from'],
row['operator'],
row['to']
]

# row['day_of_week'][j]
idx += 1

df_out.drop_duplicates(inplace=True) # drop all duplicates
df_0 = df_out[df_out['available']==0]
df_1 = df_out[df_out['available']==1]
df_out = df_0.merge(df_1, how='outer', left_on=['date', 'from', 'operator', 'to'], right_on=['date', 'from', 'operator', 'to'])
df_out.fillna(0, inplace=True)

df_out['available'] = df_out['available_x'] + df_out['available_y']
df_out.drop(['available_x', 'available_y'], axis=1, inplace=True)
df_out.sort_values(by='date',inplace=True)
print df_out





share|improve this answer


























  • thanks for the answer - only thing is that the day_of_week starts on Sunday, and the date range could start on e.g. a Tuesday, so I would need the day_of_week binary to match the weekday in the date range (if that makes sense?)

    – AK91
    Nov 16 '18 at 6:19













  • I think (and correct me if I'm wrong) but your row['day_of_week'][j] within the value param is taking the binary combos from left to right in day_of_week and assigning them against each individual single dates within the date range. Is there a way to alter that part to match the day_of_week against the weekday within the dates in the daterange?

    – AK91
    Nov 16 '18 at 9:53











  • you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.

    – warped
    Nov 16 '18 at 11:32











  • yes, i think you got it. I had to amend the df['day'] = df['valid_from'].dt.weekday part to df['day'] = np.where(df['valid_from'].dt.weekday == 6, df['valid_from'].dt.weekday - 6, df['valid_from'].dt.weekday + 1) because pandas has Mon as 0 and the dataset's day_of_week has Sun starting at 0. The only things left is to remove duplicates and keep available as either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastly set_value is deprecated so would need to get that fixed somehow...

    – AK91
    Nov 17 '18 at 13:17













  • It's a bit hacky, but it should work

    – warped
    Nov 19 '18 at 9:46



















0














This done the trick:



import pandas as pd
import numpy as np

# dataset
df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})

# set operator, from, to as index
df.set_index(['operator', 'from','to'], inplace=True)

# convert date ranges to datetime types
df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])

# bring individual dates in date ranges and stack
df['row'] = range(len(df))
starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})
ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})

df_decomp = pd.concat([starts, ends])
df_decomp = df_decomp.set_index('row', append=True)
df_decomp.sort_index()

df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))

# remove indexes
df_decomp.reset_index(level=3, drop=True, inplace=True)
df_decomp.reset_index(inplace=True)

# create weekday column
df_decomp['weekday'] = np.where(df_decomp['date'].dt.weekday == 6,
df_decomp['date'].dt.weekday - 6,
df_decomp['date'].dt.weekday + 1)

# use weekday to extract availability in day_of_week
df_decomp['available'] = [b[a] for a, b in zip(df_decomp['weekday'], df_decomp['day_of_week'])]
df_decomp['available'] = df_decomp['available'].astype('int')

# sort values and remove duplicates with available=1 taking priority
df_decomp = df_decomp.sort_values('available', ascending=False).drop_duplicates(['operator','from','to','date'])
df_decomp = df_decomp.sort_values(['operator','from','to','date'])

df_decomp





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53325057%2fpandas-returning-single-dates-for-a-date-range-and-match-weekday-binary-values%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    If you don't care too much about speed, you can use iterrows() and df.at:



    import pandas as pd

    df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'], 'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '01/05/2019', '01/05/2019'], 'valid_to': ['19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/02/2019', '21/02/2019', '21/02/2019', '21/02/2019', '10/05/2019', '11/05/2019'], 'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})

    df['valid_from'] = pd.to_datetime(df['valid_from'])
    df['valid_to'] = pd.to_datetime(df['valid_to'])
    df['day'] = (df['valid_from']+pd.to_timedelta(1, unit='d')).dt.weekday # gives weekdays : ) = Sunday
    print df.head()


    df_out = pd.DataFrame(columns=['available', 'date', 'from', 'operator', 'to'])

    idx = 0
    for i, row in df.iterrows():
    daterange = row['valid_to'] - row['valid_from']
    print daterange.days

    daystring = 52 * (row['day_of_week']) # extend string to allow going through multiple weeks

    for j in range(daterange.days+1):
    df_out.at[idx, ['available', 'date', 'from', 'operator', 'to']] = [ # replaced set_value with df.at
    int(daystring[j + row['day']]), # use day of the week as starting position
    row['valid_from']+pd.to_timedelta(j, unit='d'),
    row['from'],
    row['operator'],
    row['to']
    ]

    # row['day_of_week'][j]
    idx += 1

    df_out.drop_duplicates(inplace=True) # drop all duplicates
    df_0 = df_out[df_out['available']==0]
    df_1 = df_out[df_out['available']==1]
    df_out = df_0.merge(df_1, how='outer', left_on=['date', 'from', 'operator', 'to'], right_on=['date', 'from', 'operator', 'to'])
    df_out.fillna(0, inplace=True)

    df_out['available'] = df_out['available_x'] + df_out['available_y']
    df_out.drop(['available_x', 'available_y'], axis=1, inplace=True)
    df_out.sort_values(by='date',inplace=True)
    print df_out





    share|improve this answer


























    • thanks for the answer - only thing is that the day_of_week starts on Sunday, and the date range could start on e.g. a Tuesday, so I would need the day_of_week binary to match the weekday in the date range (if that makes sense?)

      – AK91
      Nov 16 '18 at 6:19













    • I think (and correct me if I'm wrong) but your row['day_of_week'][j] within the value param is taking the binary combos from left to right in day_of_week and assigning them against each individual single dates within the date range. Is there a way to alter that part to match the day_of_week against the weekday within the dates in the daterange?

      – AK91
      Nov 16 '18 at 9:53











    • you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.

      – warped
      Nov 16 '18 at 11:32











    • yes, i think you got it. I had to amend the df['day'] = df['valid_from'].dt.weekday part to df['day'] = np.where(df['valid_from'].dt.weekday == 6, df['valid_from'].dt.weekday - 6, df['valid_from'].dt.weekday + 1) because pandas has Mon as 0 and the dataset's day_of_week has Sun starting at 0. The only things left is to remove duplicates and keep available as either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastly set_value is deprecated so would need to get that fixed somehow...

      – AK91
      Nov 17 '18 at 13:17













    • It's a bit hacky, but it should work

      – warped
      Nov 19 '18 at 9:46
















    1














    If you don't care too much about speed, you can use iterrows() and df.at:



    import pandas as pd

    df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'], 'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '01/05/2019', '01/05/2019'], 'valid_to': ['19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/02/2019', '21/02/2019', '21/02/2019', '21/02/2019', '10/05/2019', '11/05/2019'], 'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})

    df['valid_from'] = pd.to_datetime(df['valid_from'])
    df['valid_to'] = pd.to_datetime(df['valid_to'])
    df['day'] = (df['valid_from']+pd.to_timedelta(1, unit='d')).dt.weekday # gives weekdays : ) = Sunday
    print df.head()


    df_out = pd.DataFrame(columns=['available', 'date', 'from', 'operator', 'to'])

    idx = 0
    for i, row in df.iterrows():
    daterange = row['valid_to'] - row['valid_from']
    print daterange.days

    daystring = 52 * (row['day_of_week']) # extend string to allow going through multiple weeks

    for j in range(daterange.days+1):
    df_out.at[idx, ['available', 'date', 'from', 'operator', 'to']] = [ # replaced set_value with df.at
    int(daystring[j + row['day']]), # use day of the week as starting position
    row['valid_from']+pd.to_timedelta(j, unit='d'),
    row['from'],
    row['operator'],
    row['to']
    ]

    # row['day_of_week'][j]
    idx += 1

    df_out.drop_duplicates(inplace=True) # drop all duplicates
    df_0 = df_out[df_out['available']==0]
    df_1 = df_out[df_out['available']==1]
    df_out = df_0.merge(df_1, how='outer', left_on=['date', 'from', 'operator', 'to'], right_on=['date', 'from', 'operator', 'to'])
    df_out.fillna(0, inplace=True)

    df_out['available'] = df_out['available_x'] + df_out['available_y']
    df_out.drop(['available_x', 'available_y'], axis=1, inplace=True)
    df_out.sort_values(by='date',inplace=True)
    print df_out





    share|improve this answer


























    • thanks for the answer - only thing is that the day_of_week starts on Sunday, and the date range could start on e.g. a Tuesday, so I would need the day_of_week binary to match the weekday in the date range (if that makes sense?)

      – AK91
      Nov 16 '18 at 6:19













    • I think (and correct me if I'm wrong) but your row['day_of_week'][j] within the value param is taking the binary combos from left to right in day_of_week and assigning them against each individual single dates within the date range. Is there a way to alter that part to match the day_of_week against the weekday within the dates in the daterange?

      – AK91
      Nov 16 '18 at 9:53











    • you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.

      – warped
      Nov 16 '18 at 11:32











    • yes, i think you got it. I had to amend the df['day'] = df['valid_from'].dt.weekday part to df['day'] = np.where(df['valid_from'].dt.weekday == 6, df['valid_from'].dt.weekday - 6, df['valid_from'].dt.weekday + 1) because pandas has Mon as 0 and the dataset's day_of_week has Sun starting at 0. The only things left is to remove duplicates and keep available as either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastly set_value is deprecated so would need to get that fixed somehow...

      – AK91
      Nov 17 '18 at 13:17













    • It's a bit hacky, but it should work

      – warped
      Nov 19 '18 at 9:46














    1












    1








    1







    If you don't care too much about speed, you can use iterrows() and df.at:



    import pandas as pd

    df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'], 'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '01/05/2019', '01/05/2019'], 'valid_to': ['19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/02/2019', '21/02/2019', '21/02/2019', '21/02/2019', '10/05/2019', '11/05/2019'], 'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})

    df['valid_from'] = pd.to_datetime(df['valid_from'])
    df['valid_to'] = pd.to_datetime(df['valid_to'])
    df['day'] = (df['valid_from']+pd.to_timedelta(1, unit='d')).dt.weekday # gives weekdays : ) = Sunday
    print df.head()


    df_out = pd.DataFrame(columns=['available', 'date', 'from', 'operator', 'to'])

    idx = 0
    for i, row in df.iterrows():
    daterange = row['valid_to'] - row['valid_from']
    print daterange.days

    daystring = 52 * (row['day_of_week']) # extend string to allow going through multiple weeks

    for j in range(daterange.days+1):
    df_out.at[idx, ['available', 'date', 'from', 'operator', 'to']] = [ # replaced set_value with df.at
    int(daystring[j + row['day']]), # use day of the week as starting position
    row['valid_from']+pd.to_timedelta(j, unit='d'),
    row['from'],
    row['operator'],
    row['to']
    ]

    # row['day_of_week'][j]
    idx += 1

    df_out.drop_duplicates(inplace=True) # drop all duplicates
    df_0 = df_out[df_out['available']==0]
    df_1 = df_out[df_out['available']==1]
    df_out = df_0.merge(df_1, how='outer', left_on=['date', 'from', 'operator', 'to'], right_on=['date', 'from', 'operator', 'to'])
    df_out.fillna(0, inplace=True)

    df_out['available'] = df_out['available_x'] + df_out['available_y']
    df_out.drop(['available_x', 'available_y'], axis=1, inplace=True)
    df_out.sort_values(by='date',inplace=True)
    print df_out





    share|improve this answer















    If you don't care too much about speed, you can use iterrows() and df.at:



    import pandas as pd

    df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'], 'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '01/05/2019', '01/05/2019'], 'valid_to': ['19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/02/2019', '21/02/2019', '21/02/2019', '21/02/2019', '10/05/2019', '11/05/2019'], 'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})

    df['valid_from'] = pd.to_datetime(df['valid_from'])
    df['valid_to'] = pd.to_datetime(df['valid_to'])
    df['day'] = (df['valid_from']+pd.to_timedelta(1, unit='d')).dt.weekday # gives weekdays : ) = Sunday
    print df.head()


    df_out = pd.DataFrame(columns=['available', 'date', 'from', 'operator', 'to'])

    idx = 0
    for i, row in df.iterrows():
    daterange = row['valid_to'] - row['valid_from']
    print daterange.days

    daystring = 52 * (row['day_of_week']) # extend string to allow going through multiple weeks

    for j in range(daterange.days+1):
    df_out.at[idx, ['available', 'date', 'from', 'operator', 'to']] = [ # replaced set_value with df.at
    int(daystring[j + row['day']]), # use day of the week as starting position
    row['valid_from']+pd.to_timedelta(j, unit='d'),
    row['from'],
    row['operator'],
    row['to']
    ]

    # row['day_of_week'][j]
    idx += 1

    df_out.drop_duplicates(inplace=True) # drop all duplicates
    df_0 = df_out[df_out['available']==0]
    df_1 = df_out[df_out['available']==1]
    df_out = df_0.merge(df_1, how='outer', left_on=['date', 'from', 'operator', 'to'], right_on=['date', 'from', 'operator', 'to'])
    df_out.fillna(0, inplace=True)

    df_out['available'] = df_out['available_x'] + df_out['available_y']
    df_out.drop(['available_x', 'available_y'], axis=1, inplace=True)
    df_out.sort_values(by='date',inplace=True)
    print df_out






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 20 '18 at 19:04

























    answered Nov 15 '18 at 19:30









    warpedwarped

    1338




    1338













    • thanks for the answer - only thing is that the day_of_week starts on Sunday, and the date range could start on e.g. a Tuesday, so I would need the day_of_week binary to match the weekday in the date range (if that makes sense?)

      – AK91
      Nov 16 '18 at 6:19













    • I think (and correct me if I'm wrong) but your row['day_of_week'][j] within the value param is taking the binary combos from left to right in day_of_week and assigning them against each individual single dates within the date range. Is there a way to alter that part to match the day_of_week against the weekday within the dates in the daterange?

      – AK91
      Nov 16 '18 at 9:53











    • you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.

      – warped
      Nov 16 '18 at 11:32











    • yes, i think you got it. I had to amend the df['day'] = df['valid_from'].dt.weekday part to df['day'] = np.where(df['valid_from'].dt.weekday == 6, df['valid_from'].dt.weekday - 6, df['valid_from'].dt.weekday + 1) because pandas has Mon as 0 and the dataset's day_of_week has Sun starting at 0. The only things left is to remove duplicates and keep available as either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastly set_value is deprecated so would need to get that fixed somehow...

      – AK91
      Nov 17 '18 at 13:17













    • It's a bit hacky, but it should work

      – warped
      Nov 19 '18 at 9:46



















    • thanks for the answer - only thing is that the day_of_week starts on Sunday, and the date range could start on e.g. a Tuesday, so I would need the day_of_week binary to match the weekday in the date range (if that makes sense?)

      – AK91
      Nov 16 '18 at 6:19













    • I think (and correct me if I'm wrong) but your row['day_of_week'][j] within the value param is taking the binary combos from left to right in day_of_week and assigning them against each individual single dates within the date range. Is there a way to alter that part to match the day_of_week against the weekday within the dates in the daterange?

      – AK91
      Nov 16 '18 at 9:53











    • you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.

      – warped
      Nov 16 '18 at 11:32











    • yes, i think you got it. I had to amend the df['day'] = df['valid_from'].dt.weekday part to df['day'] = np.where(df['valid_from'].dt.weekday == 6, df['valid_from'].dt.weekday - 6, df['valid_from'].dt.weekday + 1) because pandas has Mon as 0 and the dataset's day_of_week has Sun starting at 0. The only things left is to remove duplicates and keep available as either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastly set_value is deprecated so would need to get that fixed somehow...

      – AK91
      Nov 17 '18 at 13:17













    • It's a bit hacky, but it should work

      – warped
      Nov 19 '18 at 9:46

















    thanks for the answer - only thing is that the day_of_week starts on Sunday, and the date range could start on e.g. a Tuesday, so I would need the day_of_week binary to match the weekday in the date range (if that makes sense?)

    – AK91
    Nov 16 '18 at 6:19







    thanks for the answer - only thing is that the day_of_week starts on Sunday, and the date range could start on e.g. a Tuesday, so I would need the day_of_week binary to match the weekday in the date range (if that makes sense?)

    – AK91
    Nov 16 '18 at 6:19















    I think (and correct me if I'm wrong) but your row['day_of_week'][j] within the value param is taking the binary combos from left to right in day_of_week and assigning them against each individual single dates within the date range. Is there a way to alter that part to match the day_of_week against the weekday within the dates in the daterange?

    – AK91
    Nov 16 '18 at 9:53





    I think (and correct me if I'm wrong) but your row['day_of_week'][j] within the value param is taking the binary combos from left to right in day_of_week and assigning them against each individual single dates within the date range. Is there a way to alter that part to match the day_of_week against the weekday within the dates in the daterange?

    – AK91
    Nov 16 '18 at 9:53













    you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.

    – warped
    Nov 16 '18 at 11:32





    you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.

    – warped
    Nov 16 '18 at 11:32













    yes, i think you got it. I had to amend the df['day'] = df['valid_from'].dt.weekday part to df['day'] = np.where(df['valid_from'].dt.weekday == 6, df['valid_from'].dt.weekday - 6, df['valid_from'].dt.weekday + 1) because pandas has Mon as 0 and the dataset's day_of_week has Sun starting at 0. The only things left is to remove duplicates and keep available as either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastly set_value is deprecated so would need to get that fixed somehow...

    – AK91
    Nov 17 '18 at 13:17







    yes, i think you got it. I had to amend the df['day'] = df['valid_from'].dt.weekday part to df['day'] = np.where(df['valid_from'].dt.weekday == 6, df['valid_from'].dt.weekday - 6, df['valid_from'].dt.weekday + 1) because pandas has Mon as 0 and the dataset's day_of_week has Sun starting at 0. The only things left is to remove duplicates and keep available as either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastly set_value is deprecated so would need to get that fixed somehow...

    – AK91
    Nov 17 '18 at 13:17















    It's a bit hacky, but it should work

    – warped
    Nov 19 '18 at 9:46





    It's a bit hacky, but it should work

    – warped
    Nov 19 '18 at 9:46













    0














    This done the trick:



    import pandas as pd
    import numpy as np

    # dataset
    df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
    'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
    'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
    'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
    'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
    'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})

    # set operator, from, to as index
    df.set_index(['operator', 'from','to'], inplace=True)

    # convert date ranges to datetime types
    df['valid_from'] = pd.to_datetime(df['valid_from'])
    df['valid_to'] = pd.to_datetime(df['valid_to'])

    # bring individual dates in date ranges and stack
    df['row'] = range(len(df))
    starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})
    ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})

    df_decomp = pd.concat([starts, ends])
    df_decomp = df_decomp.set_index('row', append=True)
    df_decomp.sort_index()

    df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))

    # remove indexes
    df_decomp.reset_index(level=3, drop=True, inplace=True)
    df_decomp.reset_index(inplace=True)

    # create weekday column
    df_decomp['weekday'] = np.where(df_decomp['date'].dt.weekday == 6,
    df_decomp['date'].dt.weekday - 6,
    df_decomp['date'].dt.weekday + 1)

    # use weekday to extract availability in day_of_week
    df_decomp['available'] = [b[a] for a, b in zip(df_decomp['weekday'], df_decomp['day_of_week'])]
    df_decomp['available'] = df_decomp['available'].astype('int')

    # sort values and remove duplicates with available=1 taking priority
    df_decomp = df_decomp.sort_values('available', ascending=False).drop_duplicates(['operator','from','to','date'])
    df_decomp = df_decomp.sort_values(['operator','from','to','date'])

    df_decomp





    share|improve this answer




























      0














      This done the trick:



      import pandas as pd
      import numpy as np

      # dataset
      df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
      'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
      'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
      'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
      'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
      'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})

      # set operator, from, to as index
      df.set_index(['operator', 'from','to'], inplace=True)

      # convert date ranges to datetime types
      df['valid_from'] = pd.to_datetime(df['valid_from'])
      df['valid_to'] = pd.to_datetime(df['valid_to'])

      # bring individual dates in date ranges and stack
      df['row'] = range(len(df))
      starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})
      ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})

      df_decomp = pd.concat([starts, ends])
      df_decomp = df_decomp.set_index('row', append=True)
      df_decomp.sort_index()

      df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))

      # remove indexes
      df_decomp.reset_index(level=3, drop=True, inplace=True)
      df_decomp.reset_index(inplace=True)

      # create weekday column
      df_decomp['weekday'] = np.where(df_decomp['date'].dt.weekday == 6,
      df_decomp['date'].dt.weekday - 6,
      df_decomp['date'].dt.weekday + 1)

      # use weekday to extract availability in day_of_week
      df_decomp['available'] = [b[a] for a, b in zip(df_decomp['weekday'], df_decomp['day_of_week'])]
      df_decomp['available'] = df_decomp['available'].astype('int')

      # sort values and remove duplicates with available=1 taking priority
      df_decomp = df_decomp.sort_values('available', ascending=False).drop_duplicates(['operator','from','to','date'])
      df_decomp = df_decomp.sort_values(['operator','from','to','date'])

      df_decomp





      share|improve this answer


























        0












        0








        0







        This done the trick:



        import pandas as pd
        import numpy as np

        # dataset
        df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
        'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
        'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
        'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
        'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
        'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})

        # set operator, from, to as index
        df.set_index(['operator', 'from','to'], inplace=True)

        # convert date ranges to datetime types
        df['valid_from'] = pd.to_datetime(df['valid_from'])
        df['valid_to'] = pd.to_datetime(df['valid_to'])

        # bring individual dates in date ranges and stack
        df['row'] = range(len(df))
        starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})
        ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})

        df_decomp = pd.concat([starts, ends])
        df_decomp = df_decomp.set_index('row', append=True)
        df_decomp.sort_index()

        df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))

        # remove indexes
        df_decomp.reset_index(level=3, drop=True, inplace=True)
        df_decomp.reset_index(inplace=True)

        # create weekday column
        df_decomp['weekday'] = np.where(df_decomp['date'].dt.weekday == 6,
        df_decomp['date'].dt.weekday - 6,
        df_decomp['date'].dt.weekday + 1)

        # use weekday to extract availability in day_of_week
        df_decomp['available'] = [b[a] for a, b in zip(df_decomp['weekday'], df_decomp['day_of_week'])]
        df_decomp['available'] = df_decomp['available'].astype('int')

        # sort values and remove duplicates with available=1 taking priority
        df_decomp = df_decomp.sort_values('available', ascending=False).drop_duplicates(['operator','from','to','date'])
        df_decomp = df_decomp.sort_values(['operator','from','to','date'])

        df_decomp





        share|improve this answer













        This done the trick:



        import pandas as pd
        import numpy as np

        # dataset
        df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
        'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
        'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
        'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
        'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
        'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})

        # set operator, from, to as index
        df.set_index(['operator', 'from','to'], inplace=True)

        # convert date ranges to datetime types
        df['valid_from'] = pd.to_datetime(df['valid_from'])
        df['valid_to'] = pd.to_datetime(df['valid_to'])

        # bring individual dates in date ranges and stack
        df['row'] = range(len(df))
        starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})
        ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})

        df_decomp = pd.concat([starts, ends])
        df_decomp = df_decomp.set_index('row', append=True)
        df_decomp.sort_index()

        df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))

        # remove indexes
        df_decomp.reset_index(level=3, drop=True, inplace=True)
        df_decomp.reset_index(inplace=True)

        # create weekday column
        df_decomp['weekday'] = np.where(df_decomp['date'].dt.weekday == 6,
        df_decomp['date'].dt.weekday - 6,
        df_decomp['date'].dt.weekday + 1)

        # use weekday to extract availability in day_of_week
        df_decomp['available'] = [b[a] for a, b in zip(df_decomp['weekday'], df_decomp['day_of_week'])]
        df_decomp['available'] = df_decomp['available'].astype('int')

        # sort values and remove duplicates with available=1 taking priority
        df_decomp = df_decomp.sort_values('available', ascending=False).drop_duplicates(['operator','from','to','date'])
        df_decomp = df_decomp.sort_values(['operator','from','to','date'])

        df_decomp






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 20 '18 at 14:55









        AK91AK91

        757




        757






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53325057%2fpandas-returning-single-dates-for-a-date-range-and-match-weekday-binary-values%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            The Sandy Post

            Danny Elfman

            Pages that link to "Head v. Amoskeag Manufacturing Co."