Postgresql generate date series (performance)











up vote
1
down vote

favorite












Using postgresql version > 10, I have come to an issue when generating date series using the built-in generate_series function. In essence, it does not accord for the day of the month correctly.



I have many different frequencies (provided by the user) that need to be calculated between a given start and end date. The start date can be any date and thus any day of the month. This generates issues when having frequencies such as monthly combined with a start date of 2018-01-31 or 2018-01-30 as shown in the output below.



I created a solution and wanted to post this here for others to use as I could not find any other solution.



However, after some tests I have seen that my solution has a different performance compared to the built-in generate_series when used on (absurdly) large date ranges. Does anyone have an insight as to how this can be improved?



TL;DR: if possible avoid loops as they are a performance hit, scroll to bottom for improved implementation.



Built-in Output



select generate_series(date '2018-01-31', 
date '2018-05-31',
interval '1 month')::date
as frequency;


generates:



 frequency
------------
2018-01-31
2018-02-28
2018-03-28
2018-04-28
2018-05-28


As can be seen from the output, the day of the month is not respected and truncated to the minimum day encountered along the way, in this case: 28 due to the month of februari.



Expected Output



As a result of this issue I created a custom function:



create or replace function generate_date_series(
startsOn date,
endsOn date,
frequency interval)
returns setof date as $$
declare
intervalOn date := startsOn;
count int := 1;
begin
while intervalOn <= endsOn loop
return next intervalOn;
intervalOn := startsOn + (count * frequency);
count := count + 1;
end loop;
return;
end;
$$ language plpgsql immutable;

select generate_date_series(date '2018-01-31',
date '2018-05-31',
interval '1 month')
as frequency;


generates:



 frequency
------------
2018-01-31
2018-02-28
2018-03-31
2018-04-30
2018-05-31


Performance comparison



No matter what date range is provided, the built-in generate_series has a performance of 2ms on average for:



select generate_series(date '1900-01-01', 
date '10000-5-31',
interval '1 month')::date
as frequency;


while the custom function generate_date_series has a performance of 120ms on average for:



select generate_date_series(date '1900-01-01', 
date '10000-5-31',
interval '1 month')::date
as frequency;


Question



In reality, such ranges will never occur and thus it is a non-issue. For most queries the custom generate_date_series will attain the same performance. Although, I do wonder what causes the difference.



Is there a reason why the built-in function is able to attain a constant performance of 2ms on average no matter what range is provided?



Is there a better way to implement generate_date_series that performs as well as the built-in generate_series?



Improved implementation without loops



(derived from the answer of @eurotrash)



create or replace function generate_date_series(startsOn date, endsOn date, frequency interval)
returns setof date as $$
select (startsOn + (frequency * count))::date
from (
select (row_number() over ()) - 1 as count
from generate_series(startsOn, endsOn, frequency)
) series
$$ language sql immutable;


with the improved implementation, the generate_date_series function has a performance of 45ms on average for:



select generate_date_series(date '1900-01-01', 
date '10000-5-31',
interval '1 month')::date
as frequency;


The implementation provided by @eurotrash gives me 80ms on average, which I assume is due to calling the generate_series function twice.










share|improve this question




























    up vote
    1
    down vote

    favorite












    Using postgresql version > 10, I have come to an issue when generating date series using the built-in generate_series function. In essence, it does not accord for the day of the month correctly.



    I have many different frequencies (provided by the user) that need to be calculated between a given start and end date. The start date can be any date and thus any day of the month. This generates issues when having frequencies such as monthly combined with a start date of 2018-01-31 or 2018-01-30 as shown in the output below.



    I created a solution and wanted to post this here for others to use as I could not find any other solution.



    However, after some tests I have seen that my solution has a different performance compared to the built-in generate_series when used on (absurdly) large date ranges. Does anyone have an insight as to how this can be improved?



    TL;DR: if possible avoid loops as they are a performance hit, scroll to bottom for improved implementation.



    Built-in Output



    select generate_series(date '2018-01-31', 
    date '2018-05-31',
    interval '1 month')::date
    as frequency;


    generates:



     frequency
    ------------
    2018-01-31
    2018-02-28
    2018-03-28
    2018-04-28
    2018-05-28


    As can be seen from the output, the day of the month is not respected and truncated to the minimum day encountered along the way, in this case: 28 due to the month of februari.



    Expected Output



    As a result of this issue I created a custom function:



    create or replace function generate_date_series(
    startsOn date,
    endsOn date,
    frequency interval)
    returns setof date as $$
    declare
    intervalOn date := startsOn;
    count int := 1;
    begin
    while intervalOn <= endsOn loop
    return next intervalOn;
    intervalOn := startsOn + (count * frequency);
    count := count + 1;
    end loop;
    return;
    end;
    $$ language plpgsql immutable;

    select generate_date_series(date '2018-01-31',
    date '2018-05-31',
    interval '1 month')
    as frequency;


    generates:



     frequency
    ------------
    2018-01-31
    2018-02-28
    2018-03-31
    2018-04-30
    2018-05-31


    Performance comparison



    No matter what date range is provided, the built-in generate_series has a performance of 2ms on average for:



    select generate_series(date '1900-01-01', 
    date '10000-5-31',
    interval '1 month')::date
    as frequency;


    while the custom function generate_date_series has a performance of 120ms on average for:



    select generate_date_series(date '1900-01-01', 
    date '10000-5-31',
    interval '1 month')::date
    as frequency;


    Question



    In reality, such ranges will never occur and thus it is a non-issue. For most queries the custom generate_date_series will attain the same performance. Although, I do wonder what causes the difference.



    Is there a reason why the built-in function is able to attain a constant performance of 2ms on average no matter what range is provided?



    Is there a better way to implement generate_date_series that performs as well as the built-in generate_series?



    Improved implementation without loops



    (derived from the answer of @eurotrash)



    create or replace function generate_date_series(startsOn date, endsOn date, frequency interval)
    returns setof date as $$
    select (startsOn + (frequency * count))::date
    from (
    select (row_number() over ()) - 1 as count
    from generate_series(startsOn, endsOn, frequency)
    ) series
    $$ language sql immutable;


    with the improved implementation, the generate_date_series function has a performance of 45ms on average for:



    select generate_date_series(date '1900-01-01', 
    date '10000-5-31',
    interval '1 month')::date
    as frequency;


    The implementation provided by @eurotrash gives me 80ms on average, which I assume is due to calling the generate_series function twice.










    share|improve this question


























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      Using postgresql version > 10, I have come to an issue when generating date series using the built-in generate_series function. In essence, it does not accord for the day of the month correctly.



      I have many different frequencies (provided by the user) that need to be calculated between a given start and end date. The start date can be any date and thus any day of the month. This generates issues when having frequencies such as monthly combined with a start date of 2018-01-31 or 2018-01-30 as shown in the output below.



      I created a solution and wanted to post this here for others to use as I could not find any other solution.



      However, after some tests I have seen that my solution has a different performance compared to the built-in generate_series when used on (absurdly) large date ranges. Does anyone have an insight as to how this can be improved?



      TL;DR: if possible avoid loops as they are a performance hit, scroll to bottom for improved implementation.



      Built-in Output



      select generate_series(date '2018-01-31', 
      date '2018-05-31',
      interval '1 month')::date
      as frequency;


      generates:



       frequency
      ------------
      2018-01-31
      2018-02-28
      2018-03-28
      2018-04-28
      2018-05-28


      As can be seen from the output, the day of the month is not respected and truncated to the minimum day encountered along the way, in this case: 28 due to the month of februari.



      Expected Output



      As a result of this issue I created a custom function:



      create or replace function generate_date_series(
      startsOn date,
      endsOn date,
      frequency interval)
      returns setof date as $$
      declare
      intervalOn date := startsOn;
      count int := 1;
      begin
      while intervalOn <= endsOn loop
      return next intervalOn;
      intervalOn := startsOn + (count * frequency);
      count := count + 1;
      end loop;
      return;
      end;
      $$ language plpgsql immutable;

      select generate_date_series(date '2018-01-31',
      date '2018-05-31',
      interval '1 month')
      as frequency;


      generates:



       frequency
      ------------
      2018-01-31
      2018-02-28
      2018-03-31
      2018-04-30
      2018-05-31


      Performance comparison



      No matter what date range is provided, the built-in generate_series has a performance of 2ms on average for:



      select generate_series(date '1900-01-01', 
      date '10000-5-31',
      interval '1 month')::date
      as frequency;


      while the custom function generate_date_series has a performance of 120ms on average for:



      select generate_date_series(date '1900-01-01', 
      date '10000-5-31',
      interval '1 month')::date
      as frequency;


      Question



      In reality, such ranges will never occur and thus it is a non-issue. For most queries the custom generate_date_series will attain the same performance. Although, I do wonder what causes the difference.



      Is there a reason why the built-in function is able to attain a constant performance of 2ms on average no matter what range is provided?



      Is there a better way to implement generate_date_series that performs as well as the built-in generate_series?



      Improved implementation without loops



      (derived from the answer of @eurotrash)



      create or replace function generate_date_series(startsOn date, endsOn date, frequency interval)
      returns setof date as $$
      select (startsOn + (frequency * count))::date
      from (
      select (row_number() over ()) - 1 as count
      from generate_series(startsOn, endsOn, frequency)
      ) series
      $$ language sql immutable;


      with the improved implementation, the generate_date_series function has a performance of 45ms on average for:



      select generate_date_series(date '1900-01-01', 
      date '10000-5-31',
      interval '1 month')::date
      as frequency;


      The implementation provided by @eurotrash gives me 80ms on average, which I assume is due to calling the generate_series function twice.










      share|improve this question















      Using postgresql version > 10, I have come to an issue when generating date series using the built-in generate_series function. In essence, it does not accord for the day of the month correctly.



      I have many different frequencies (provided by the user) that need to be calculated between a given start and end date. The start date can be any date and thus any day of the month. This generates issues when having frequencies such as monthly combined with a start date of 2018-01-31 or 2018-01-30 as shown in the output below.



      I created a solution and wanted to post this here for others to use as I could not find any other solution.



      However, after some tests I have seen that my solution has a different performance compared to the built-in generate_series when used on (absurdly) large date ranges. Does anyone have an insight as to how this can be improved?



      TL;DR: if possible avoid loops as they are a performance hit, scroll to bottom for improved implementation.



      Built-in Output



      select generate_series(date '2018-01-31', 
      date '2018-05-31',
      interval '1 month')::date
      as frequency;


      generates:



       frequency
      ------------
      2018-01-31
      2018-02-28
      2018-03-28
      2018-04-28
      2018-05-28


      As can be seen from the output, the day of the month is not respected and truncated to the minimum day encountered along the way, in this case: 28 due to the month of februari.



      Expected Output



      As a result of this issue I created a custom function:



      create or replace function generate_date_series(
      startsOn date,
      endsOn date,
      frequency interval)
      returns setof date as $$
      declare
      intervalOn date := startsOn;
      count int := 1;
      begin
      while intervalOn <= endsOn loop
      return next intervalOn;
      intervalOn := startsOn + (count * frequency);
      count := count + 1;
      end loop;
      return;
      end;
      $$ language plpgsql immutable;

      select generate_date_series(date '2018-01-31',
      date '2018-05-31',
      interval '1 month')
      as frequency;


      generates:



       frequency
      ------------
      2018-01-31
      2018-02-28
      2018-03-31
      2018-04-30
      2018-05-31


      Performance comparison



      No matter what date range is provided, the built-in generate_series has a performance of 2ms on average for:



      select generate_series(date '1900-01-01', 
      date '10000-5-31',
      interval '1 month')::date
      as frequency;


      while the custom function generate_date_series has a performance of 120ms on average for:



      select generate_date_series(date '1900-01-01', 
      date '10000-5-31',
      interval '1 month')::date
      as frequency;


      Question



      In reality, such ranges will never occur and thus it is a non-issue. For most queries the custom generate_date_series will attain the same performance. Although, I do wonder what causes the difference.



      Is there a reason why the built-in function is able to attain a constant performance of 2ms on average no matter what range is provided?



      Is there a better way to implement generate_date_series that performs as well as the built-in generate_series?



      Improved implementation without loops



      (derived from the answer of @eurotrash)



      create or replace function generate_date_series(startsOn date, endsOn date, frequency interval)
      returns setof date as $$
      select (startsOn + (frequency * count))::date
      from (
      select (row_number() over ()) - 1 as count
      from generate_series(startsOn, endsOn, frequency)
      ) series
      $$ language sql immutable;


      with the improved implementation, the generate_date_series function has a performance of 45ms on average for:



      select generate_date_series(date '1900-01-01', 
      date '10000-5-31',
      interval '1 month')::date
      as frequency;


      The implementation provided by @eurotrash gives me 80ms on average, which I assume is due to calling the generate_series function twice.







      sql postgresql dateinterval generate-series






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 11 at 12:14

























      asked Nov 10 at 13:55









      chvndb

      14519




      14519
























          3 Answers
          3






          active

          oldest

          votes

















          up vote
          3
          down vote



          accepted










          Why your function is slow: you use variables and (more importantly) a loop. Loops are slow. Variables also mean reads from and writes to those variables.



          CREATE OR REPLACE FUNCTION generate_date_series_2(starts_on DATE, ends_on DATE, frequency INTERVAL)
          RETURNS SETOF DATE AS
          $BODY$
          SELECT (starts_on + (frequency * g))::DATE
          FROM generate_series(0, (SELECT COUNT(*)::INTEGER - 1 FROM generate_series(starts_on, ends_on, frequency))) g;
          $BODY$
          LANGUAGE SQL IMMUTABLE;


          The concept is basically the same as your plpgsql function but via a single query instead of a loop. The only problem is deciding how many iterations are needed (i.e. the second parameter to generate_series). Sadly I couldn't think of a better way to get the number of intervals required other than calling generate_series for the dates and using the count of that. Of course if you know your intervals will only ever be certain values then it may be possible to optimise; however this version handles any interval values.



          On my system it's about 50% slower than a pure generate_series, and about 400% faster than your plpgsql version.






          share|improve this answer





















          • Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call to generate_series did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.
            – chvndb
            Nov 11 at 9:56












          • @chvndb Nice, I like your new implementation.
            – eurotrash
            Nov 11 at 10:48


















          up vote
          1
          down vote













          REVISED SOLUTION



          This gives me 97,212 rows in under 7 seconds (approx 0.7ms per row) and also supports leap-years where February have 29 days:



          SELECT      t.day_of_month
          FROM (
          SELECT ds.day_of_month
          , date_part('day', ds.day_of_month) AS day
          , date_part('day', ((day_of_month - date_part('day', ds.day_of_month)::INT + 1) + INTERVAL '1' MONTH) - INTERVAL '1' DAY) AS eom
          FROM (
          SELECT generate_series( date '1900-01-01',
          date '10000-12-31',
          INTERVAL '1 day')::DATE as day_of_month
          ) AS ds
          ) AS t
          --> REMEMBER to change the day at both places below (eg. 31)
          WHERE t.day = 31 OR (t.day = t.eom AND t.day < 31)


          Resulting output:
          Please ensure you change the day on BOTH the RED numbers.
          Performance Output



          The output data:



          Data Output






          share|improve this answer























          • This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
            – chvndb
            Nov 10 at 14:54










          • Revised the solution.
            – Joseph Lee
            Nov 10 at 15:51


















          up vote
          1
          down vote













          You may use date_trunc and add a month to the output ofgenerate_series, the performance should be almost similar.



          SELECT 
          (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day') ::DATE AS frequency
          FROM
          generate_series(
          DATE '2018-01-31', DATE '2018-05-31',
          interval '1 MONTH'
          ) AS dt


          Demo



          Test



          knayak=# select generate_series(date '2018-01-31',
          knayak(# date '2018-05-31',
          knayak(# interval '1 month')::date
          knayak-# as frequency;
          frequency
          ------------
          2018-01-31
          2018-02-28
          2018-03-28
          2018-04-28
          2018-05-28
          (5 rows)

          Time: 0.303 ms
          knayak=#
          knayak=#
          knayak=# SELECT
          knayak-# (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day' ):: DATE AS frequency
          knayak-# FROM
          knayak-# generate_series(
          knayak(# DATE '2018-01-31', DATE '2018-05-31',
          knayak(# interval '1 MONTH'
          knayak(# ) AS dt
          knayak-# ;
          frequency
          ------------
          2018-01-31
          2018-02-28
          2018-03-31
          2018-04-30
          2018-05-31
          (5 rows)

          Time: 0.425 ms





          share|improve this answer























          • The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
            – eurotrash
            Nov 10 at 18:55










          • @eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
            – Kaushik Nayak
            Nov 10 at 19:05











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53239669%2fpostgresql-generate-date-series-performance%23new-answer', 'question_page');
          }
          );

          Post as a guest
































          3 Answers
          3






          active

          oldest

          votes








          3 Answers
          3






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          3
          down vote



          accepted










          Why your function is slow: you use variables and (more importantly) a loop. Loops are slow. Variables also mean reads from and writes to those variables.



          CREATE OR REPLACE FUNCTION generate_date_series_2(starts_on DATE, ends_on DATE, frequency INTERVAL)
          RETURNS SETOF DATE AS
          $BODY$
          SELECT (starts_on + (frequency * g))::DATE
          FROM generate_series(0, (SELECT COUNT(*)::INTEGER - 1 FROM generate_series(starts_on, ends_on, frequency))) g;
          $BODY$
          LANGUAGE SQL IMMUTABLE;


          The concept is basically the same as your plpgsql function but via a single query instead of a loop. The only problem is deciding how many iterations are needed (i.e. the second parameter to generate_series). Sadly I couldn't think of a better way to get the number of intervals required other than calling generate_series for the dates and using the count of that. Of course if you know your intervals will only ever be certain values then it may be possible to optimise; however this version handles any interval values.



          On my system it's about 50% slower than a pure generate_series, and about 400% faster than your plpgsql version.






          share|improve this answer





















          • Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call to generate_series did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.
            – chvndb
            Nov 11 at 9:56












          • @chvndb Nice, I like your new implementation.
            – eurotrash
            Nov 11 at 10:48















          up vote
          3
          down vote



          accepted










          Why your function is slow: you use variables and (more importantly) a loop. Loops are slow. Variables also mean reads from and writes to those variables.



          CREATE OR REPLACE FUNCTION generate_date_series_2(starts_on DATE, ends_on DATE, frequency INTERVAL)
          RETURNS SETOF DATE AS
          $BODY$
          SELECT (starts_on + (frequency * g))::DATE
          FROM generate_series(0, (SELECT COUNT(*)::INTEGER - 1 FROM generate_series(starts_on, ends_on, frequency))) g;
          $BODY$
          LANGUAGE SQL IMMUTABLE;


          The concept is basically the same as your plpgsql function but via a single query instead of a loop. The only problem is deciding how many iterations are needed (i.e. the second parameter to generate_series). Sadly I couldn't think of a better way to get the number of intervals required other than calling generate_series for the dates and using the count of that. Of course if you know your intervals will only ever be certain values then it may be possible to optimise; however this version handles any interval values.



          On my system it's about 50% slower than a pure generate_series, and about 400% faster than your plpgsql version.






          share|improve this answer





















          • Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call to generate_series did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.
            – chvndb
            Nov 11 at 9:56












          • @chvndb Nice, I like your new implementation.
            – eurotrash
            Nov 11 at 10:48













          up vote
          3
          down vote



          accepted







          up vote
          3
          down vote



          accepted






          Why your function is slow: you use variables and (more importantly) a loop. Loops are slow. Variables also mean reads from and writes to those variables.



          CREATE OR REPLACE FUNCTION generate_date_series_2(starts_on DATE, ends_on DATE, frequency INTERVAL)
          RETURNS SETOF DATE AS
          $BODY$
          SELECT (starts_on + (frequency * g))::DATE
          FROM generate_series(0, (SELECT COUNT(*)::INTEGER - 1 FROM generate_series(starts_on, ends_on, frequency))) g;
          $BODY$
          LANGUAGE SQL IMMUTABLE;


          The concept is basically the same as your plpgsql function but via a single query instead of a loop. The only problem is deciding how many iterations are needed (i.e. the second parameter to generate_series). Sadly I couldn't think of a better way to get the number of intervals required other than calling generate_series for the dates and using the count of that. Of course if you know your intervals will only ever be certain values then it may be possible to optimise; however this version handles any interval values.



          On my system it's about 50% slower than a pure generate_series, and about 400% faster than your plpgsql version.






          share|improve this answer












          Why your function is slow: you use variables and (more importantly) a loop. Loops are slow. Variables also mean reads from and writes to those variables.



          CREATE OR REPLACE FUNCTION generate_date_series_2(starts_on DATE, ends_on DATE, frequency INTERVAL)
          RETURNS SETOF DATE AS
          $BODY$
          SELECT (starts_on + (frequency * g))::DATE
          FROM generate_series(0, (SELECT COUNT(*)::INTEGER - 1 FROM generate_series(starts_on, ends_on, frequency))) g;
          $BODY$
          LANGUAGE SQL IMMUTABLE;


          The concept is basically the same as your plpgsql function but via a single query instead of a loop. The only problem is deciding how many iterations are needed (i.e. the second parameter to generate_series). Sadly I couldn't think of a better way to get the number of intervals required other than calling generate_series for the dates and using the count of that. Of course if you know your intervals will only ever be certain values then it may be possible to optimise; however this version handles any interval values.



          On my system it's about 50% slower than a pure generate_series, and about 400% faster than your plpgsql version.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 10 at 19:06









          eurotrash

          2,3751625




          2,3751625












          • Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call to generate_series did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.
            – chvndb
            Nov 11 at 9:56












          • @chvndb Nice, I like your new implementation.
            – eurotrash
            Nov 11 at 10:48


















          • Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call to generate_series did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.
            – chvndb
            Nov 11 at 9:56












          • @chvndb Nice, I like your new implementation.
            – eurotrash
            Nov 11 at 10:48
















          Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call to generate_series did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.
          – chvndb
          Nov 11 at 9:56






          Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call to generate_series did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.
          – chvndb
          Nov 11 at 9:56














          @chvndb Nice, I like your new implementation.
          – eurotrash
          Nov 11 at 10:48




          @chvndb Nice, I like your new implementation.
          – eurotrash
          Nov 11 at 10:48












          up vote
          1
          down vote













          REVISED SOLUTION



          This gives me 97,212 rows in under 7 seconds (approx 0.7ms per row) and also supports leap-years where February have 29 days:



          SELECT      t.day_of_month
          FROM (
          SELECT ds.day_of_month
          , date_part('day', ds.day_of_month) AS day
          , date_part('day', ((day_of_month - date_part('day', ds.day_of_month)::INT + 1) + INTERVAL '1' MONTH) - INTERVAL '1' DAY) AS eom
          FROM (
          SELECT generate_series( date '1900-01-01',
          date '10000-12-31',
          INTERVAL '1 day')::DATE as day_of_month
          ) AS ds
          ) AS t
          --> REMEMBER to change the day at both places below (eg. 31)
          WHERE t.day = 31 OR (t.day = t.eom AND t.day < 31)


          Resulting output:
          Please ensure you change the day on BOTH the RED numbers.
          Performance Output



          The output data:



          Data Output






          share|improve this answer























          • This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
            – chvndb
            Nov 10 at 14:54










          • Revised the solution.
            – Joseph Lee
            Nov 10 at 15:51















          up vote
          1
          down vote













          REVISED SOLUTION



          This gives me 97,212 rows in under 7 seconds (approx 0.7ms per row) and also supports leap-years where February have 29 days:



          SELECT      t.day_of_month
          FROM (
          SELECT ds.day_of_month
          , date_part('day', ds.day_of_month) AS day
          , date_part('day', ((day_of_month - date_part('day', ds.day_of_month)::INT + 1) + INTERVAL '1' MONTH) - INTERVAL '1' DAY) AS eom
          FROM (
          SELECT generate_series( date '1900-01-01',
          date '10000-12-31',
          INTERVAL '1 day')::DATE as day_of_month
          ) AS ds
          ) AS t
          --> REMEMBER to change the day at both places below (eg. 31)
          WHERE t.day = 31 OR (t.day = t.eom AND t.day < 31)


          Resulting output:
          Please ensure you change the day on BOTH the RED numbers.
          Performance Output



          The output data:



          Data Output






          share|improve this answer























          • This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
            – chvndb
            Nov 10 at 14:54










          • Revised the solution.
            – Joseph Lee
            Nov 10 at 15:51













          up vote
          1
          down vote










          up vote
          1
          down vote









          REVISED SOLUTION



          This gives me 97,212 rows in under 7 seconds (approx 0.7ms per row) and also supports leap-years where February have 29 days:



          SELECT      t.day_of_month
          FROM (
          SELECT ds.day_of_month
          , date_part('day', ds.day_of_month) AS day
          , date_part('day', ((day_of_month - date_part('day', ds.day_of_month)::INT + 1) + INTERVAL '1' MONTH) - INTERVAL '1' DAY) AS eom
          FROM (
          SELECT generate_series( date '1900-01-01',
          date '10000-12-31',
          INTERVAL '1 day')::DATE as day_of_month
          ) AS ds
          ) AS t
          --> REMEMBER to change the day at both places below (eg. 31)
          WHERE t.day = 31 OR (t.day = t.eom AND t.day < 31)


          Resulting output:
          Please ensure you change the day on BOTH the RED numbers.
          Performance Output



          The output data:



          Data Output






          share|improve this answer














          REVISED SOLUTION



          This gives me 97,212 rows in under 7 seconds (approx 0.7ms per row) and also supports leap-years where February have 29 days:



          SELECT      t.day_of_month
          FROM (
          SELECT ds.day_of_month
          , date_part('day', ds.day_of_month) AS day
          , date_part('day', ((day_of_month - date_part('day', ds.day_of_month)::INT + 1) + INTERVAL '1' MONTH) - INTERVAL '1' DAY) AS eom
          FROM (
          SELECT generate_series( date '1900-01-01',
          date '10000-12-31',
          INTERVAL '1 day')::DATE as day_of_month
          ) AS ds
          ) AS t
          --> REMEMBER to change the day at both places below (eg. 31)
          WHERE t.day = 31 OR (t.day = t.eom AND t.day < 31)


          Resulting output:
          Please ensure you change the day on BOTH the RED numbers.
          Performance Output



          The output data:



          Data Output







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 10 at 15:51

























          answered Nov 10 at 14:15









          Joseph Lee

          2,0591223




          2,0591223












          • This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
            – chvndb
            Nov 10 at 14:54










          • Revised the solution.
            – Joseph Lee
            Nov 10 at 15:51


















          • This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
            – chvndb
            Nov 10 at 14:54










          • Revised the solution.
            – Joseph Lee
            Nov 10 at 15:51
















          This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
          – chvndb
          Nov 10 at 14:54




          This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
          – chvndb
          Nov 10 at 14:54












          Revised the solution.
          – Joseph Lee
          Nov 10 at 15:51




          Revised the solution.
          – Joseph Lee
          Nov 10 at 15:51










          up vote
          1
          down vote













          You may use date_trunc and add a month to the output ofgenerate_series, the performance should be almost similar.



          SELECT 
          (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day') ::DATE AS frequency
          FROM
          generate_series(
          DATE '2018-01-31', DATE '2018-05-31',
          interval '1 MONTH'
          ) AS dt


          Demo



          Test



          knayak=# select generate_series(date '2018-01-31',
          knayak(# date '2018-05-31',
          knayak(# interval '1 month')::date
          knayak-# as frequency;
          frequency
          ------------
          2018-01-31
          2018-02-28
          2018-03-28
          2018-04-28
          2018-05-28
          (5 rows)

          Time: 0.303 ms
          knayak=#
          knayak=#
          knayak=# SELECT
          knayak-# (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day' ):: DATE AS frequency
          knayak-# FROM
          knayak-# generate_series(
          knayak(# DATE '2018-01-31', DATE '2018-05-31',
          knayak(# interval '1 MONTH'
          knayak(# ) AS dt
          knayak-# ;
          frequency
          ------------
          2018-01-31
          2018-02-28
          2018-03-31
          2018-04-30
          2018-05-31
          (5 rows)

          Time: 0.425 ms





          share|improve this answer























          • The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
            – eurotrash
            Nov 10 at 18:55










          • @eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
            – Kaushik Nayak
            Nov 10 at 19:05















          up vote
          1
          down vote













          You may use date_trunc and add a month to the output ofgenerate_series, the performance should be almost similar.



          SELECT 
          (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day') ::DATE AS frequency
          FROM
          generate_series(
          DATE '2018-01-31', DATE '2018-05-31',
          interval '1 MONTH'
          ) AS dt


          Demo



          Test



          knayak=# select generate_series(date '2018-01-31',
          knayak(# date '2018-05-31',
          knayak(# interval '1 month')::date
          knayak-# as frequency;
          frequency
          ------------
          2018-01-31
          2018-02-28
          2018-03-28
          2018-04-28
          2018-05-28
          (5 rows)

          Time: 0.303 ms
          knayak=#
          knayak=#
          knayak=# SELECT
          knayak-# (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day' ):: DATE AS frequency
          knayak-# FROM
          knayak-# generate_series(
          knayak(# DATE '2018-01-31', DATE '2018-05-31',
          knayak(# interval '1 MONTH'
          knayak(# ) AS dt
          knayak-# ;
          frequency
          ------------
          2018-01-31
          2018-02-28
          2018-03-31
          2018-04-30
          2018-05-31
          (5 rows)

          Time: 0.425 ms





          share|improve this answer























          • The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
            – eurotrash
            Nov 10 at 18:55










          • @eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
            – Kaushik Nayak
            Nov 10 at 19:05













          up vote
          1
          down vote










          up vote
          1
          down vote









          You may use date_trunc and add a month to the output ofgenerate_series, the performance should be almost similar.



          SELECT 
          (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day') ::DATE AS frequency
          FROM
          generate_series(
          DATE '2018-01-31', DATE '2018-05-31',
          interval '1 MONTH'
          ) AS dt


          Demo



          Test



          knayak=# select generate_series(date '2018-01-31',
          knayak(# date '2018-05-31',
          knayak(# interval '1 month')::date
          knayak-# as frequency;
          frequency
          ------------
          2018-01-31
          2018-02-28
          2018-03-28
          2018-04-28
          2018-05-28
          (5 rows)

          Time: 0.303 ms
          knayak=#
          knayak=#
          knayak=# SELECT
          knayak-# (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day' ):: DATE AS frequency
          knayak-# FROM
          knayak-# generate_series(
          knayak(# DATE '2018-01-31', DATE '2018-05-31',
          knayak(# interval '1 MONTH'
          knayak(# ) AS dt
          knayak-# ;
          frequency
          ------------
          2018-01-31
          2018-02-28
          2018-03-31
          2018-04-30
          2018-05-31
          (5 rows)

          Time: 0.425 ms





          share|improve this answer














          You may use date_trunc and add a month to the output ofgenerate_series, the performance should be almost similar.



          SELECT 
          (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day') ::DATE AS frequency
          FROM
          generate_series(
          DATE '2018-01-31', DATE '2018-05-31',
          interval '1 MONTH'
          ) AS dt


          Demo



          Test



          knayak=# select generate_series(date '2018-01-31',
          knayak(# date '2018-05-31',
          knayak(# interval '1 month')::date
          knayak-# as frequency;
          frequency
          ------------
          2018-01-31
          2018-02-28
          2018-03-28
          2018-04-28
          2018-05-28
          (5 rows)

          Time: 0.303 ms
          knayak=#
          knayak=#
          knayak=# SELECT
          knayak-# (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day' ):: DATE AS frequency
          knayak-# FROM
          knayak-# generate_series(
          knayak(# DATE '2018-01-31', DATE '2018-05-31',
          knayak(# interval '1 MONTH'
          knayak(# ) AS dt
          knayak-# ;
          frequency
          ------------
          2018-01-31
          2018-02-28
          2018-03-31
          2018-04-30
          2018-05-31
          (5 rows)

          Time: 0.425 ms






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 10 at 18:50

























          answered Nov 10 at 18:44









          Kaushik Nayak

          15.7k31128




          15.7k31128












          • The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
            – eurotrash
            Nov 10 at 18:55










          • @eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
            – Kaushik Nayak
            Nov 10 at 19:05


















          • The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
            – eurotrash
            Nov 10 at 18:55










          • @eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
            – Kaushik Nayak
            Nov 10 at 19:05
















          The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
          – eurotrash
          Nov 10 at 18:55




          The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
          – eurotrash
          Nov 10 at 18:55












          @eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
          – Kaushik Nayak
          Nov 10 at 19:05




          @eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
          – Kaushik Nayak
          Nov 10 at 19:05


















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53239669%2fpostgresql-generate-date-series-performance%23new-answer', 'question_page');
          }
          );

          Post as a guest




















































































          Popular posts from this blog

          Florida Star v. B. J. F.

          Danny Elfman

          Retrieve a Users Dashboard in Tumblr with R and TumblR. Oauth Issues