Pandas - Returning single dates for a date range and match weekday binary values
Dataset:
Below dataset is supposed to replicate a timetable data set for a travel company (e.g. routes via train or bus or plane etc.)
df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
print(df)
operator - operating company e.g. ABC Airlines, DEF Train Company
from - departing from e.g. London, New York, Narnia
to - destination e.g. Paris
valid_from - start of a date range (can be any day of the week) where route is available for purchase for the operator e.g. 2019-11-01
valid_to - end of date range (can be any day of the week) where route is available to purchase for the operator e.g. 2019-11-12
day_of_week - binary representing availability for Sun to Sat e.g. 0101010 means route is available on Mon, Wed, and Fri in the date range
Required:
An output dataset that converts the date range to single individual dates and their availability derived from the day_of_week field. The main goal is to get a clean dataset which can then loaded into Tableau to then build a report that would easily show route availability.
Desired Output:
dfout = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a'], 'from': ['a', 'a', 'a', 'a', 'a', 'a', 'a'], 'to': ['b', 'b', 'b', 'b', 'b', 'b', 'b'], 'date': ['13/11/2018', '14/11/2018', '15/11/2018', '16/11/2018', '17/11/2018', '18/11/2018', '19/11/2018'], 'available': [1, 1, 1, 1, 0, 1, 1]})
print(dfout)
So this would be the output for op_a for the route a to b for date range 2018-11-13 to 2018-11-19.
The dataset is weird as. Date ranges can be quite random, but day_of_week will always show availability for the days of the week in that date range. Some of the same date ranges may even have different day_of_week binary combinations, but essentially if at any point the day_of_week indicates an availability for a given date range, route and operator, then it will be taken to be available for the date.
What I've tried to do:
Using the following for help: Pandas: decompress date range to individual dates
import pandas as pd
df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
df.set_index(['operator', 'from','to'], inplace=True)
df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])
df['row'] = range(len(df))
starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})
ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})
df_decomp = pd.concat([starts, ends])
df_decomp = df_decomp.set_index('row', append=True)
df_decomp.sort_index()
df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))
Result looks promising. My final thoughts are to:
- add a
weekdaycolumn that returns the weekday of thedatestarting withSundayas0
- add an
availablecolumn that returns the binary value inday_of_weekusingweekdayas the position index - lastly, to somehow remove duplicate
operator,fromandtorows and keepingavailable's that have1and dropping those that are0or if there are no1's for thoseoperators'/from's/to's then keep the available as0...
madness...apologies for the long-windedness and I hope I'm making some sense. Any help on this would be much appreciated.
Edit:
- Updated the 'What I've tried to do' part above.
- Updated dataset a tad to include a bit more variety in the dates (still the same dataset just adjusted
valid_todates)
python pandas dataframe
add a comment |
Dataset:
Below dataset is supposed to replicate a timetable data set for a travel company (e.g. routes via train or bus or plane etc.)
df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
print(df)
operator - operating company e.g. ABC Airlines, DEF Train Company
from - departing from e.g. London, New York, Narnia
to - destination e.g. Paris
valid_from - start of a date range (can be any day of the week) where route is available for purchase for the operator e.g. 2019-11-01
valid_to - end of date range (can be any day of the week) where route is available to purchase for the operator e.g. 2019-11-12
day_of_week - binary representing availability for Sun to Sat e.g. 0101010 means route is available on Mon, Wed, and Fri in the date range
Required:
An output dataset that converts the date range to single individual dates and their availability derived from the day_of_week field. The main goal is to get a clean dataset which can then loaded into Tableau to then build a report that would easily show route availability.
Desired Output:
dfout = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a'], 'from': ['a', 'a', 'a', 'a', 'a', 'a', 'a'], 'to': ['b', 'b', 'b', 'b', 'b', 'b', 'b'], 'date': ['13/11/2018', '14/11/2018', '15/11/2018', '16/11/2018', '17/11/2018', '18/11/2018', '19/11/2018'], 'available': [1, 1, 1, 1, 0, 1, 1]})
print(dfout)
So this would be the output for op_a for the route a to b for date range 2018-11-13 to 2018-11-19.
The dataset is weird as. Date ranges can be quite random, but day_of_week will always show availability for the days of the week in that date range. Some of the same date ranges may even have different day_of_week binary combinations, but essentially if at any point the day_of_week indicates an availability for a given date range, route and operator, then it will be taken to be available for the date.
What I've tried to do:
Using the following for help: Pandas: decompress date range to individual dates
import pandas as pd
df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
df.set_index(['operator', 'from','to'], inplace=True)
df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])
df['row'] = range(len(df))
starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})
ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})
df_decomp = pd.concat([starts, ends])
df_decomp = df_decomp.set_index('row', append=True)
df_decomp.sort_index()
df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))
Result looks promising. My final thoughts are to:
- add a
weekdaycolumn that returns the weekday of thedatestarting withSundayas0
- add an
availablecolumn that returns the binary value inday_of_weekusingweekdayas the position index - lastly, to somehow remove duplicate
operator,fromandtorows and keepingavailable's that have1and dropping those that are0or if there are no1's for thoseoperators'/from's/to's then keep the available as0...
madness...apologies for the long-windedness and I hope I'm making some sense. Any help on this would be much appreciated.
Edit:
- Updated the 'What I've tried to do' part above.
- Updated dataset a tad to include a bit more variety in the dates (still the same dataset just adjusted
valid_todates)
python pandas dataframe
Nicely asked question
– John H
Nov 15 '18 at 17:37
--Thanks @JohnH
– AK91
Nov 17 '18 at 21:29
add a comment |
Dataset:
Below dataset is supposed to replicate a timetable data set for a travel company (e.g. routes via train or bus or plane etc.)
df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
print(df)
operator - operating company e.g. ABC Airlines, DEF Train Company
from - departing from e.g. London, New York, Narnia
to - destination e.g. Paris
valid_from - start of a date range (can be any day of the week) where route is available for purchase for the operator e.g. 2019-11-01
valid_to - end of date range (can be any day of the week) where route is available to purchase for the operator e.g. 2019-11-12
day_of_week - binary representing availability for Sun to Sat e.g. 0101010 means route is available on Mon, Wed, and Fri in the date range
Required:
An output dataset that converts the date range to single individual dates and their availability derived from the day_of_week field. The main goal is to get a clean dataset which can then loaded into Tableau to then build a report that would easily show route availability.
Desired Output:
dfout = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a'], 'from': ['a', 'a', 'a', 'a', 'a', 'a', 'a'], 'to': ['b', 'b', 'b', 'b', 'b', 'b', 'b'], 'date': ['13/11/2018', '14/11/2018', '15/11/2018', '16/11/2018', '17/11/2018', '18/11/2018', '19/11/2018'], 'available': [1, 1, 1, 1, 0, 1, 1]})
print(dfout)
So this would be the output for op_a for the route a to b for date range 2018-11-13 to 2018-11-19.
The dataset is weird as. Date ranges can be quite random, but day_of_week will always show availability for the days of the week in that date range. Some of the same date ranges may even have different day_of_week binary combinations, but essentially if at any point the day_of_week indicates an availability for a given date range, route and operator, then it will be taken to be available for the date.
What I've tried to do:
Using the following for help: Pandas: decompress date range to individual dates
import pandas as pd
df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
df.set_index(['operator', 'from','to'], inplace=True)
df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])
df['row'] = range(len(df))
starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})
ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})
df_decomp = pd.concat([starts, ends])
df_decomp = df_decomp.set_index('row', append=True)
df_decomp.sort_index()
df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))
Result looks promising. My final thoughts are to:
- add a
weekdaycolumn that returns the weekday of thedatestarting withSundayas0
- add an
availablecolumn that returns the binary value inday_of_weekusingweekdayas the position index - lastly, to somehow remove duplicate
operator,fromandtorows and keepingavailable's that have1and dropping those that are0or if there are no1's for thoseoperators'/from's/to's then keep the available as0...
madness...apologies for the long-windedness and I hope I'm making some sense. Any help on this would be much appreciated.
Edit:
- Updated the 'What I've tried to do' part above.
- Updated dataset a tad to include a bit more variety in the dates (still the same dataset just adjusted
valid_todates)
python pandas dataframe
Dataset:
Below dataset is supposed to replicate a timetable data set for a travel company (e.g. routes via train or bus or plane etc.)
df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
print(df)
operator - operating company e.g. ABC Airlines, DEF Train Company
from - departing from e.g. London, New York, Narnia
to - destination e.g. Paris
valid_from - start of a date range (can be any day of the week) where route is available for purchase for the operator e.g. 2019-11-01
valid_to - end of date range (can be any day of the week) where route is available to purchase for the operator e.g. 2019-11-12
day_of_week - binary representing availability for Sun to Sat e.g. 0101010 means route is available on Mon, Wed, and Fri in the date range
Required:
An output dataset that converts the date range to single individual dates and their availability derived from the day_of_week field. The main goal is to get a clean dataset which can then loaded into Tableau to then build a report that would easily show route availability.
Desired Output:
dfout = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a', 'op_a'], 'from': ['a', 'a', 'a', 'a', 'a', 'a', 'a'], 'to': ['b', 'b', 'b', 'b', 'b', 'b', 'b'], 'date': ['13/11/2018', '14/11/2018', '15/11/2018', '16/11/2018', '17/11/2018', '18/11/2018', '19/11/2018'], 'available': [1, 1, 1, 1, 0, 1, 1]})
print(dfout)
So this would be the output for op_a for the route a to b for date range 2018-11-13 to 2018-11-19.
The dataset is weird as. Date ranges can be quite random, but day_of_week will always show availability for the days of the week in that date range. Some of the same date ranges may even have different day_of_week binary combinations, but essentially if at any point the day_of_week indicates an availability for a given date range, route and operator, then it will be taken to be available for the date.
What I've tried to do:
Using the following for help: Pandas: decompress date range to individual dates
import pandas as pd
df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
df.set_index(['operator', 'from','to'], inplace=True)
df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])
df['row'] = range(len(df))
starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})
ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})
df_decomp = pd.concat([starts, ends])
df_decomp = df_decomp.set_index('row', append=True)
df_decomp.sort_index()
df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))
Result looks promising. My final thoughts are to:
- add a
weekdaycolumn that returns the weekday of thedatestarting withSundayas0
- add an
availablecolumn that returns the binary value inday_of_weekusingweekdayas the position index - lastly, to somehow remove duplicate
operator,fromandtorows and keepingavailable's that have1and dropping those that are0or if there are no1's for thoseoperators'/from's/to's then keep the available as0...
madness...apologies for the long-windedness and I hope I'm making some sense. Any help on this would be much appreciated.
Edit:
- Updated the 'What I've tried to do' part above.
- Updated dataset a tad to include a bit more variety in the dates (still the same dataset just adjusted
valid_todates)
python pandas dataframe
python pandas dataframe
edited Nov 20 '18 at 10:59
AK91
asked Nov 15 '18 at 17:36
AK91AK91
757
757
Nicely asked question
– John H
Nov 15 '18 at 17:37
--Thanks @JohnH
– AK91
Nov 17 '18 at 21:29
add a comment |
Nicely asked question
– John H
Nov 15 '18 at 17:37
--Thanks @JohnH
– AK91
Nov 17 '18 at 21:29
Nicely asked question
– John H
Nov 15 '18 at 17:37
Nicely asked question
– John H
Nov 15 '18 at 17:37
--Thanks @JohnH
– AK91
Nov 17 '18 at 21:29
--Thanks @JohnH
– AK91
Nov 17 '18 at 21:29
add a comment |
2 Answers
2
active
oldest
votes
If you don't care too much about speed, you can use iterrows() and df.at:
import pandas as pd
df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'], 'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '01/05/2019', '01/05/2019'], 'valid_to': ['19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/02/2019', '21/02/2019', '21/02/2019', '21/02/2019', '10/05/2019', '11/05/2019'], 'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])
df['day'] = (df['valid_from']+pd.to_timedelta(1, unit='d')).dt.weekday # gives weekdays : ) = Sunday
print df.head()
df_out = pd.DataFrame(columns=['available', 'date', 'from', 'operator', 'to'])
idx = 0
for i, row in df.iterrows():
daterange = row['valid_to'] - row['valid_from']
print daterange.days
daystring = 52 * (row['day_of_week']) # extend string to allow going through multiple weeks
for j in range(daterange.days+1):
df_out.at[idx, ['available', 'date', 'from', 'operator', 'to']] = [ # replaced set_value with df.at
int(daystring[j + row['day']]), # use day of the week as starting position
row['valid_from']+pd.to_timedelta(j, unit='d'),
row['from'],
row['operator'],
row['to']
]
# row['day_of_week'][j]
idx += 1
df_out.drop_duplicates(inplace=True) # drop all duplicates
df_0 = df_out[df_out['available']==0]
df_1 = df_out[df_out['available']==1]
df_out = df_0.merge(df_1, how='outer', left_on=['date', 'from', 'operator', 'to'], right_on=['date', 'from', 'operator', 'to'])
df_out.fillna(0, inplace=True)
df_out['available'] = df_out['available_x'] + df_out['available_y']
df_out.drop(['available_x', 'available_y'], axis=1, inplace=True)
df_out.sort_values(by='date',inplace=True)
print df_out
thanks for the answer - only thing is that theday_of_weekstarts on Sunday, and the date range could start on e.g. a Tuesday, so I would need theday_of_weekbinary to match the weekday in the date range (if that makes sense?)
– AK91
Nov 16 '18 at 6:19
I think (and correct me if I'm wrong) but yourrow['day_of_week'][j]within thevalueparam is taking the binary combos from left to right inday_of_weekand assigning them against each individual single dates within the date range. Is there a way to alter that part to match theday_of_weekagainst the weekday within the dates in the daterange?
– AK91
Nov 16 '18 at 9:53
you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.
– warped
Nov 16 '18 at 11:32
yes, i think you got it. I had to amend thedf['day'] = df['valid_from'].dt.weekdaypart todf['day'] = np.where(df['valid_from'].dt.weekday == 6, df['valid_from'].dt.weekday - 6, df['valid_from'].dt.weekday + 1)because pandas has Mon as 0 and the dataset'sday_of_weekhas Sun starting at 0. The only things left is to remove duplicates and keepavailableas either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastlyset_valueis deprecated so would need to get that fixed somehow...
– AK91
Nov 17 '18 at 13:17
It's a bit hacky, but it should work
– warped
Nov 19 '18 at 9:46
|
show 3 more comments
This done the trick:
import pandas as pd
import numpy as np
# dataset
df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
# set operator, from, to as index
df.set_index(['operator', 'from','to'], inplace=True)
# convert date ranges to datetime types
df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])
# bring individual dates in date ranges and stack
df['row'] = range(len(df))
starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})
ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})
df_decomp = pd.concat([starts, ends])
df_decomp = df_decomp.set_index('row', append=True)
df_decomp.sort_index()
df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))
# remove indexes
df_decomp.reset_index(level=3, drop=True, inplace=True)
df_decomp.reset_index(inplace=True)
# create weekday column
df_decomp['weekday'] = np.where(df_decomp['date'].dt.weekday == 6,
df_decomp['date'].dt.weekday - 6,
df_decomp['date'].dt.weekday + 1)
# use weekday to extract availability in day_of_week
df_decomp['available'] = [b[a] for a, b in zip(df_decomp['weekday'], df_decomp['day_of_week'])]
df_decomp['available'] = df_decomp['available'].astype('int')
# sort values and remove duplicates with available=1 taking priority
df_decomp = df_decomp.sort_values('available', ascending=False).drop_duplicates(['operator','from','to','date'])
df_decomp = df_decomp.sort_values(['operator','from','to','date'])
df_decomp
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53325057%2fpandas-returning-single-dates-for-a-date-range-and-match-weekday-binary-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you don't care too much about speed, you can use iterrows() and df.at:
import pandas as pd
df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'], 'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '01/05/2019', '01/05/2019'], 'valid_to': ['19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/02/2019', '21/02/2019', '21/02/2019', '21/02/2019', '10/05/2019', '11/05/2019'], 'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])
df['day'] = (df['valid_from']+pd.to_timedelta(1, unit='d')).dt.weekday # gives weekdays : ) = Sunday
print df.head()
df_out = pd.DataFrame(columns=['available', 'date', 'from', 'operator', 'to'])
idx = 0
for i, row in df.iterrows():
daterange = row['valid_to'] - row['valid_from']
print daterange.days
daystring = 52 * (row['day_of_week']) # extend string to allow going through multiple weeks
for j in range(daterange.days+1):
df_out.at[idx, ['available', 'date', 'from', 'operator', 'to']] = [ # replaced set_value with df.at
int(daystring[j + row['day']]), # use day of the week as starting position
row['valid_from']+pd.to_timedelta(j, unit='d'),
row['from'],
row['operator'],
row['to']
]
# row['day_of_week'][j]
idx += 1
df_out.drop_duplicates(inplace=True) # drop all duplicates
df_0 = df_out[df_out['available']==0]
df_1 = df_out[df_out['available']==1]
df_out = df_0.merge(df_1, how='outer', left_on=['date', 'from', 'operator', 'to'], right_on=['date', 'from', 'operator', 'to'])
df_out.fillna(0, inplace=True)
df_out['available'] = df_out['available_x'] + df_out['available_y']
df_out.drop(['available_x', 'available_y'], axis=1, inplace=True)
df_out.sort_values(by='date',inplace=True)
print df_out
thanks for the answer - only thing is that theday_of_weekstarts on Sunday, and the date range could start on e.g. a Tuesday, so I would need theday_of_weekbinary to match the weekday in the date range (if that makes sense?)
– AK91
Nov 16 '18 at 6:19
I think (and correct me if I'm wrong) but yourrow['day_of_week'][j]within thevalueparam is taking the binary combos from left to right inday_of_weekand assigning them against each individual single dates within the date range. Is there a way to alter that part to match theday_of_weekagainst the weekday within the dates in the daterange?
– AK91
Nov 16 '18 at 9:53
you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.
– warped
Nov 16 '18 at 11:32
yes, i think you got it. I had to amend thedf['day'] = df['valid_from'].dt.weekdaypart todf['day'] = np.where(df['valid_from'].dt.weekday == 6, df['valid_from'].dt.weekday - 6, df['valid_from'].dt.weekday + 1)because pandas has Mon as 0 and the dataset'sday_of_weekhas Sun starting at 0. The only things left is to remove duplicates and keepavailableas either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastlyset_valueis deprecated so would need to get that fixed somehow...
– AK91
Nov 17 '18 at 13:17
It's a bit hacky, but it should work
– warped
Nov 19 '18 at 9:46
|
show 3 more comments
If you don't care too much about speed, you can use iterrows() and df.at:
import pandas as pd
df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'], 'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '01/05/2019', '01/05/2019'], 'valid_to': ['19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/02/2019', '21/02/2019', '21/02/2019', '21/02/2019', '10/05/2019', '11/05/2019'], 'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])
df['day'] = (df['valid_from']+pd.to_timedelta(1, unit='d')).dt.weekday # gives weekdays : ) = Sunday
print df.head()
df_out = pd.DataFrame(columns=['available', 'date', 'from', 'operator', 'to'])
idx = 0
for i, row in df.iterrows():
daterange = row['valid_to'] - row['valid_from']
print daterange.days
daystring = 52 * (row['day_of_week']) # extend string to allow going through multiple weeks
for j in range(daterange.days+1):
df_out.at[idx, ['available', 'date', 'from', 'operator', 'to']] = [ # replaced set_value with df.at
int(daystring[j + row['day']]), # use day of the week as starting position
row['valid_from']+pd.to_timedelta(j, unit='d'),
row['from'],
row['operator'],
row['to']
]
# row['day_of_week'][j]
idx += 1
df_out.drop_duplicates(inplace=True) # drop all duplicates
df_0 = df_out[df_out['available']==0]
df_1 = df_out[df_out['available']==1]
df_out = df_0.merge(df_1, how='outer', left_on=['date', 'from', 'operator', 'to'], right_on=['date', 'from', 'operator', 'to'])
df_out.fillna(0, inplace=True)
df_out['available'] = df_out['available_x'] + df_out['available_y']
df_out.drop(['available_x', 'available_y'], axis=1, inplace=True)
df_out.sort_values(by='date',inplace=True)
print df_out
thanks for the answer - only thing is that theday_of_weekstarts on Sunday, and the date range could start on e.g. a Tuesday, so I would need theday_of_weekbinary to match the weekday in the date range (if that makes sense?)
– AK91
Nov 16 '18 at 6:19
I think (and correct me if I'm wrong) but yourrow['day_of_week'][j]within thevalueparam is taking the binary combos from left to right inday_of_weekand assigning them against each individual single dates within the date range. Is there a way to alter that part to match theday_of_weekagainst the weekday within the dates in the daterange?
– AK91
Nov 16 '18 at 9:53
you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.
– warped
Nov 16 '18 at 11:32
yes, i think you got it. I had to amend thedf['day'] = df['valid_from'].dt.weekdaypart todf['day'] = np.where(df['valid_from'].dt.weekday == 6, df['valid_from'].dt.weekday - 6, df['valid_from'].dt.weekday + 1)because pandas has Mon as 0 and the dataset'sday_of_weekhas Sun starting at 0. The only things left is to remove duplicates and keepavailableas either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastlyset_valueis deprecated so would need to get that fixed somehow...
– AK91
Nov 17 '18 at 13:17
It's a bit hacky, but it should work
– warped
Nov 19 '18 at 9:46
|
show 3 more comments
If you don't care too much about speed, you can use iterrows() and df.at:
import pandas as pd
df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'], 'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '01/05/2019', '01/05/2019'], 'valid_to': ['19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/02/2019', '21/02/2019', '21/02/2019', '21/02/2019', '10/05/2019', '11/05/2019'], 'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])
df['day'] = (df['valid_from']+pd.to_timedelta(1, unit='d')).dt.weekday # gives weekdays : ) = Sunday
print df.head()
df_out = pd.DataFrame(columns=['available', 'date', 'from', 'operator', 'to'])
idx = 0
for i, row in df.iterrows():
daterange = row['valid_to'] - row['valid_from']
print daterange.days
daystring = 52 * (row['day_of_week']) # extend string to allow going through multiple weeks
for j in range(daterange.days+1):
df_out.at[idx, ['available', 'date', 'from', 'operator', 'to']] = [ # replaced set_value with df.at
int(daystring[j + row['day']]), # use day of the week as starting position
row['valid_from']+pd.to_timedelta(j, unit='d'),
row['from'],
row['operator'],
row['to']
]
# row['day_of_week'][j]
idx += 1
df_out.drop_duplicates(inplace=True) # drop all duplicates
df_0 = df_out[df_out['available']==0]
df_1 = df_out[df_out['available']==1]
df_out = df_0.merge(df_1, how='outer', left_on=['date', 'from', 'operator', 'to'], right_on=['date', 'from', 'operator', 'to'])
df_out.fillna(0, inplace=True)
df_out['available'] = df_out['available_x'] + df_out['available_y']
df_out.drop(['available_x', 'available_y'], axis=1, inplace=True)
df_out.sort_values(by='date',inplace=True)
print df_out
If you don't care too much about speed, you can use iterrows() and df.at:
import pandas as pd
df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'], 'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'], 'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'], 'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '01/05/2019', '01/05/2019'], 'valid_to': ['19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/02/2019', '21/02/2019', '21/02/2019', '21/02/2019', '10/05/2019', '11/05/2019'], 'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])
df['day'] = (df['valid_from']+pd.to_timedelta(1, unit='d')).dt.weekday # gives weekdays : ) = Sunday
print df.head()
df_out = pd.DataFrame(columns=['available', 'date', 'from', 'operator', 'to'])
idx = 0
for i, row in df.iterrows():
daterange = row['valid_to'] - row['valid_from']
print daterange.days
daystring = 52 * (row['day_of_week']) # extend string to allow going through multiple weeks
for j in range(daterange.days+1):
df_out.at[idx, ['available', 'date', 'from', 'operator', 'to']] = [ # replaced set_value with df.at
int(daystring[j + row['day']]), # use day of the week as starting position
row['valid_from']+pd.to_timedelta(j, unit='d'),
row['from'],
row['operator'],
row['to']
]
# row['day_of_week'][j]
idx += 1
df_out.drop_duplicates(inplace=True) # drop all duplicates
df_0 = df_out[df_out['available']==0]
df_1 = df_out[df_out['available']==1]
df_out = df_0.merge(df_1, how='outer', left_on=['date', 'from', 'operator', 'to'], right_on=['date', 'from', 'operator', 'to'])
df_out.fillna(0, inplace=True)
df_out['available'] = df_out['available_x'] + df_out['available_y']
df_out.drop(['available_x', 'available_y'], axis=1, inplace=True)
df_out.sort_values(by='date',inplace=True)
print df_out
edited Nov 20 '18 at 19:04
answered Nov 15 '18 at 19:30
warpedwarped
1338
1338
thanks for the answer - only thing is that theday_of_weekstarts on Sunday, and the date range could start on e.g. a Tuesday, so I would need theday_of_weekbinary to match the weekday in the date range (if that makes sense?)
– AK91
Nov 16 '18 at 6:19
I think (and correct me if I'm wrong) but yourrow['day_of_week'][j]within thevalueparam is taking the binary combos from left to right inday_of_weekand assigning them against each individual single dates within the date range. Is there a way to alter that part to match theday_of_weekagainst the weekday within the dates in the daterange?
– AK91
Nov 16 '18 at 9:53
you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.
– warped
Nov 16 '18 at 11:32
yes, i think you got it. I had to amend thedf['day'] = df['valid_from'].dt.weekdaypart todf['day'] = np.where(df['valid_from'].dt.weekday == 6, df['valid_from'].dt.weekday - 6, df['valid_from'].dt.weekday + 1)because pandas has Mon as 0 and the dataset'sday_of_weekhas Sun starting at 0. The only things left is to remove duplicates and keepavailableas either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastlyset_valueis deprecated so would need to get that fixed somehow...
– AK91
Nov 17 '18 at 13:17
It's a bit hacky, but it should work
– warped
Nov 19 '18 at 9:46
|
show 3 more comments
thanks for the answer - only thing is that theday_of_weekstarts on Sunday, and the date range could start on e.g. a Tuesday, so I would need theday_of_weekbinary to match the weekday in the date range (if that makes sense?)
– AK91
Nov 16 '18 at 6:19
I think (and correct me if I'm wrong) but yourrow['day_of_week'][j]within thevalueparam is taking the binary combos from left to right inday_of_weekand assigning them against each individual single dates within the date range. Is there a way to alter that part to match theday_of_weekagainst the weekday within the dates in the daterange?
– AK91
Nov 16 '18 at 9:53
you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.
– warped
Nov 16 '18 at 11:32
yes, i think you got it. I had to amend thedf['day'] = df['valid_from'].dt.weekdaypart todf['day'] = np.where(df['valid_from'].dt.weekday == 6, df['valid_from'].dt.weekday - 6, df['valid_from'].dt.weekday + 1)because pandas has Mon as 0 and the dataset'sday_of_weekhas Sun starting at 0. The only things left is to remove duplicates and keepavailableas either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastlyset_valueis deprecated so would need to get that fixed somehow...
– AK91
Nov 17 '18 at 13:17
It's a bit hacky, but it should work
– warped
Nov 19 '18 at 9:46
thanks for the answer - only thing is that the
day_of_week starts on Sunday, and the date range could start on e.g. a Tuesday, so I would need the day_of_week binary to match the weekday in the date range (if that makes sense?)– AK91
Nov 16 '18 at 6:19
thanks for the answer - only thing is that the
day_of_week starts on Sunday, and the date range could start on e.g. a Tuesday, so I would need the day_of_week binary to match the weekday in the date range (if that makes sense?)– AK91
Nov 16 '18 at 6:19
I think (and correct me if I'm wrong) but your
row['day_of_week'][j] within the value param is taking the binary combos from left to right in day_of_week and assigning them against each individual single dates within the date range. Is there a way to alter that part to match the day_of_week against the weekday within the dates in the daterange?– AK91
Nov 16 '18 at 9:53
I think (and correct me if I'm wrong) but your
row['day_of_week'][j] within the value param is taking the binary combos from left to right in day_of_week and assigning them against each individual single dates within the date range. Is there a way to alter that part to match the day_of_week against the weekday within the dates in the daterange?– AK91
Nov 16 '18 at 9:53
you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.
– warped
Nov 16 '18 at 11:32
you are correct. I changed it such that it takes the actual weekday as starting point. see edited version and comments.
– warped
Nov 16 '18 at 11:32
yes, i think you got it. I had to amend the
df['day'] = df['valid_from'].dt.weekday part to df['day'] = np.where(df['valid_from'].dt.weekday == 6, df['valid_from'].dt.weekday - 6, df['valid_from'].dt.weekday + 1) because pandas has Mon as 0 and the dataset's day_of_week has Sun starting at 0. The only things left is to remove duplicates and keep available as either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastly set_value is deprecated so would need to get that fixed somehow...– AK91
Nov 17 '18 at 13:17
yes, i think you got it. I had to amend the
df['day'] = df['valid_from'].dt.weekday part to df['day'] = np.where(df['valid_from'].dt.weekday == 6, df['valid_from'].dt.weekday - 6, df['valid_from'].dt.weekday + 1) because pandas has Mon as 0 and the dataset's day_of_week has Sun starting at 0. The only things left is to remove duplicates and keep available as either a 1 or 0, giving precedence to 1 if it's there for that operator/route/date. And lastly set_value is deprecated so would need to get that fixed somehow...– AK91
Nov 17 '18 at 13:17
It's a bit hacky, but it should work
– warped
Nov 19 '18 at 9:46
It's a bit hacky, but it should work
– warped
Nov 19 '18 at 9:46
|
show 3 more comments
This done the trick:
import pandas as pd
import numpy as np
# dataset
df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
# set operator, from, to as index
df.set_index(['operator', 'from','to'], inplace=True)
# convert date ranges to datetime types
df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])
# bring individual dates in date ranges and stack
df['row'] = range(len(df))
starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})
ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})
df_decomp = pd.concat([starts, ends])
df_decomp = df_decomp.set_index('row', append=True)
df_decomp.sort_index()
df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))
# remove indexes
df_decomp.reset_index(level=3, drop=True, inplace=True)
df_decomp.reset_index(inplace=True)
# create weekday column
df_decomp['weekday'] = np.where(df_decomp['date'].dt.weekday == 6,
df_decomp['date'].dt.weekday - 6,
df_decomp['date'].dt.weekday + 1)
# use weekday to extract availability in day_of_week
df_decomp['available'] = [b[a] for a, b in zip(df_decomp['weekday'], df_decomp['day_of_week'])]
df_decomp['available'] = df_decomp['available'].astype('int')
# sort values and remove duplicates with available=1 taking priority
df_decomp = df_decomp.sort_values('available', ascending=False).drop_duplicates(['operator','from','to','date'])
df_decomp = df_decomp.sort_values(['operator','from','to','date'])
df_decomp
add a comment |
This done the trick:
import pandas as pd
import numpy as np
# dataset
df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
# set operator, from, to as index
df.set_index(['operator', 'from','to'], inplace=True)
# convert date ranges to datetime types
df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])
# bring individual dates in date ranges and stack
df['row'] = range(len(df))
starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})
ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})
df_decomp = pd.concat([starts, ends])
df_decomp = df_decomp.set_index('row', append=True)
df_decomp.sort_index()
df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))
# remove indexes
df_decomp.reset_index(level=3, drop=True, inplace=True)
df_decomp.reset_index(inplace=True)
# create weekday column
df_decomp['weekday'] = np.where(df_decomp['date'].dt.weekday == 6,
df_decomp['date'].dt.weekday - 6,
df_decomp['date'].dt.weekday + 1)
# use weekday to extract availability in day_of_week
df_decomp['available'] = [b[a] for a, b in zip(df_decomp['weekday'], df_decomp['day_of_week'])]
df_decomp['available'] = df_decomp['available'].astype('int')
# sort values and remove duplicates with available=1 taking priority
df_decomp = df_decomp.sort_values('available', ascending=False).drop_duplicates(['operator','from','to','date'])
df_decomp = df_decomp.sort_values(['operator','from','to','date'])
df_decomp
add a comment |
This done the trick:
import pandas as pd
import numpy as np
# dataset
df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
# set operator, from, to as index
df.set_index(['operator', 'from','to'], inplace=True)
# convert date ranges to datetime types
df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])
# bring individual dates in date ranges and stack
df['row'] = range(len(df))
starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})
ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})
df_decomp = pd.concat([starts, ends])
df_decomp = df_decomp.set_index('row', append=True)
df_decomp.sort_index()
df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))
# remove indexes
df_decomp.reset_index(level=3, drop=True, inplace=True)
df_decomp.reset_index(inplace=True)
# create weekday column
df_decomp['weekday'] = np.where(df_decomp['date'].dt.weekday == 6,
df_decomp['date'].dt.weekday - 6,
df_decomp['date'].dt.weekday + 1)
# use weekday to extract availability in day_of_week
df_decomp['available'] = [b[a] for a, b in zip(df_decomp['weekday'], df_decomp['day_of_week'])]
df_decomp['available'] = df_decomp['available'].astype('int')
# sort values and remove duplicates with available=1 taking priority
df_decomp = df_decomp.sort_values('available', ascending=False).drop_duplicates(['operator','from','to','date'])
df_decomp = df_decomp.sort_values(['operator','from','to','date'])
df_decomp
This done the trick:
import pandas as pd
import numpy as np
# dataset
df = pd.DataFrame({'operator': ['op_a', 'op_a', 'op_a', 'op_a', 'op_b', 'op_b', 'op_b', 'op_b', 'op_c', 'op_c', 'op_c', 'op_c', 'op_d', 'op_d'],
'from': ['a', 'a', 'a', 'a', 'c', 'c', 'c', 'c', 'a', 'a', 'a', 'a', 'x', 'x'],
'to': ['b', 'b', 'b', 'b', 'd', 'd', 'd', 'd', 'b', 'b', 'b', 'b', 'y', 'y'],
'valid_from': ['13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '13/11/2018', '15/02/2019', '15/02/2019', '15/02/2019', '15/02/2019', '20/05/2019', '21/05/2019'],
'valid_to': ['20/11/2018', '20/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '19/11/2018', '21/11/2018', '21/11/2018', '21/02/2019', '21/02/2019', '20/02/2019', '20/02/2019', '30/05/2019', '29/05/2019'],
'day_of_week': ['0101010', '0100010', '0111100', '1101100', '0101010', '0100010', '0111100', '1101100', '0001101', '1110000', '0000000', '0000001', '1000000', '1000001']})
# set operator, from, to as index
df.set_index(['operator', 'from','to'], inplace=True)
# convert date ranges to datetime types
df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])
# bring individual dates in date ranges and stack
df['row'] = range(len(df))
starts = df[['valid_from', 'day_of_week', 'row']].rename(columns={'valid_from': 'date'})
ends = df[['valid_to', 'day_of_week', 'row']].rename(columns={'valid_to':'date'})
df_decomp = pd.concat([starts, ends])
df_decomp = df_decomp.set_index('row', append=True)
df_decomp.sort_index()
df_decomp = df_decomp.groupby(level=[0,1,2,3]).apply(lambda x: x.set_index('date').resample('D').fillna(method='pad'))
# remove indexes
df_decomp.reset_index(level=3, drop=True, inplace=True)
df_decomp.reset_index(inplace=True)
# create weekday column
df_decomp['weekday'] = np.where(df_decomp['date'].dt.weekday == 6,
df_decomp['date'].dt.weekday - 6,
df_decomp['date'].dt.weekday + 1)
# use weekday to extract availability in day_of_week
df_decomp['available'] = [b[a] for a, b in zip(df_decomp['weekday'], df_decomp['day_of_week'])]
df_decomp['available'] = df_decomp['available'].astype('int')
# sort values and remove duplicates with available=1 taking priority
df_decomp = df_decomp.sort_values('available', ascending=False).drop_duplicates(['operator','from','to','date'])
df_decomp = df_decomp.sort_values(['operator','from','to','date'])
df_decomp
answered Nov 20 '18 at 14:55
AK91AK91
757
757
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53325057%2fpandas-returning-single-dates-for-a-date-range-and-match-weekday-binary-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Nicely asked question
– John H
Nov 15 '18 at 17:37
--Thanks @JohnH
– AK91
Nov 17 '18 at 21:29