Postgresql generate date series (performance)

up vote
1
down vote

favorite

Using postgresql version > 10, I have come to an issue when generating date series using the built-in generate_series function. In essence, it does not accord for the day of the month correctly.

I have many different frequencies (provided by the user) that need to be calculated between a given start and end date. The start date can be any date and thus any day of the month. This generates issues when having frequencies such as monthly combined with a start date of 2018-01-31 or 2018-01-30 as shown in the output below.

I created a solution and wanted to post this here for others to use as I could not find any other solution.

However, after some tests I have seen that my solution has a different performance compared to the built-in generate_series when used on (absurdly) large date ranges. Does anyone have an insight as to how this can be improved?

TL;DR: if possible avoid loops as they are a performance hit, scroll to bottom for improved implementation.

Built-in Output

select generate_series(date '2018-01-31', 

                       date '2018-05-31', 

                       interval '1 month')::date

as frequency;

generates:

 frequency

------------

 2018-01-31

 2018-02-28

 2018-03-28

 2018-04-28

 2018-05-28

As can be seen from the output, the day of the month is not respected and truncated to the minimum day encountered along the way, in this case: 28 due to the month of februari.

Expected Output

As a result of this issue I created a custom function:

create or replace function generate_date_series(

  startsOn date, 

  endsOn date, 

  frequency interval)

returns setof date as $$

declare

  intervalOn date := startsOn;

  count int := 1;

begin

  while intervalOn <= endsOn loop

    return next intervalOn;

    intervalOn := startsOn + (count * frequency);

    count := count + 1;

  end loop;

  return;

end;

$$ language plpgsql immutable;



select generate_date_series(date '2018-01-31', 

                            date '2018-05-31', 

                            interval '1 month')

as frequency;

generates:

 frequency

------------

 2018-01-31

 2018-02-28

 2018-03-31

 2018-04-30

 2018-05-31

Performance comparison

No matter what date range is provided, the built-in generate_series has a performance of 2ms on average for:

select generate_series(date '1900-01-01', 

                       date '10000-5-31', 

                       interval '1 month')::date 

as frequency;

while the custom function generate_date_series has a performance of 120ms on average for:

select generate_date_series(date '1900-01-01', 

                            date '10000-5-31', 

                            interval '1 month')::date 

as frequency;

Question

In reality, such ranges will never occur and thus it is a non-issue. For most queries the custom generate_date_series will attain the same performance. Although, I do wonder what causes the difference.

Is there a reason why the built-in function is able to attain a constant performance of 2ms on average no matter what range is provided?

Is there a better way to implement generate_date_series that performs as well as the built-in generate_series?

Improved implementation without loops

(derived from the answer of @eurotrash)

create or replace function generate_date_series(startsOn date, endsOn date, frequency interval)

returns setof date as $$

select (startsOn + (frequency * count))::date

from (

  select (row_number() over ()) - 1 as count

  from generate_series(startsOn, endsOn, frequency)

) series

$$ language sql immutable;

with the improved implementation, the generate_date_series function has a performance of 45ms on average for:

select generate_date_series(date '1900-01-01', 

                            date '10000-5-31', 

                            interval '1 month')::date 

as frequency;

The implementation provided by @eurotrash gives me 80ms on average, which I assume is due to calling the generate_series function twice.

edited Nov 11 at 12:14

asked Nov 10 at 13:55

chvndb

14519

add a comment |

up vote
1
down vote

favorite

Using postgresql version > 10, I have come to an issue when generating date series using the built-in generate_series function. In essence, it does not accord for the day of the month correctly.

I created a solution and wanted to post this here for others to use as I could not find any other solution.

TL;DR: if possible avoid loops as they are a performance hit, scroll to bottom for improved implementation.

Built-in Output

select generate_series(date '2018-01-31', 

                       date '2018-05-31', 

                       interval '1 month')::date

as frequency;

generates:

 frequency

------------

 2018-01-31

 2018-02-28

 2018-03-28

 2018-04-28

 2018-05-28

As can be seen from the output, the day of the month is not respected and truncated to the minimum day encountered along the way, in this case: 28 due to the month of februari.

Expected Output

As a result of this issue I created a custom function:

create or replace function generate_date_series(

  startsOn date, 

  endsOn date, 

  frequency interval)

returns setof date as $$

declare

  intervalOn date := startsOn;

  count int := 1;

begin

  while intervalOn <= endsOn loop

    return next intervalOn;

    intervalOn := startsOn + (count * frequency);

    count := count + 1;

  end loop;

  return;

end;

$$ language plpgsql immutable;



select generate_date_series(date '2018-01-31', 

                            date '2018-05-31', 

                            interval '1 month')

as frequency;

generates:

 frequency

------------

 2018-01-31

 2018-02-28

 2018-03-31

 2018-04-30

 2018-05-31

Performance comparison

No matter what date range is provided, the built-in generate_series has a performance of 2ms on average for:

select generate_series(date '1900-01-01', 

                       date '10000-5-31', 

                       interval '1 month')::date 

as frequency;

while the custom function generate_date_series has a performance of 120ms on average for:

select generate_date_series(date '1900-01-01', 

                            date '10000-5-31', 

                            interval '1 month')::date 

as frequency;

Question

Is there a reason why the built-in function is able to attain a constant performance of 2ms on average no matter what range is provided?

Is there a better way to implement generate_date_series that performs as well as the built-in generate_series?

Improved implementation without loops

(derived from the answer of @eurotrash)

create or replace function generate_date_series(startsOn date, endsOn date, frequency interval)

returns setof date as $$

select (startsOn + (frequency * count))::date

from (

  select (row_number() over ()) - 1 as count

  from generate_series(startsOn, endsOn, frequency)

) series

$$ language sql immutable;

with the improved implementation, the generate_date_series function has a performance of 45ms on average for:

select generate_date_series(date '1900-01-01', 

                            date '10000-5-31', 

                            interval '1 month')::date 

as frequency;

The implementation provided by @eurotrash gives me 80ms on average, which I assume is due to calling the generate_series function twice.

edited Nov 11 at 12:14

asked Nov 10 at 13:55

chvndb

14519

add a comment |

up vote
1
down vote

favorite

Using postgresql version > 10, I have come to an issue when generating date series using the built-in generate_series function. In essence, it does not accord for the day of the month correctly.

I created a solution and wanted to post this here for others to use as I could not find any other solution.

TL;DR: if possible avoid loops as they are a performance hit, scroll to bottom for improved implementation.

Built-in Output

select generate_series(date '2018-01-31', 

                       date '2018-05-31', 

                       interval '1 month')::date

as frequency;

generates:

 frequency

------------

 2018-01-31

 2018-02-28

 2018-03-28

 2018-04-28

 2018-05-28

As can be seen from the output, the day of the month is not respected and truncated to the minimum day encountered along the way, in this case: 28 due to the month of februari.

Expected Output

As a result of this issue I created a custom function:

create or replace function generate_date_series(

  startsOn date, 

  endsOn date, 

  frequency interval)

returns setof date as $$

declare

  intervalOn date := startsOn;

  count int := 1;

begin

  while intervalOn <= endsOn loop

    return next intervalOn;

    intervalOn := startsOn + (count * frequency);

    count := count + 1;

  end loop;

  return;

end;

$$ language plpgsql immutable;



select generate_date_series(date '2018-01-31', 

                            date '2018-05-31', 

                            interval '1 month')

as frequency;

generates:

 frequency

------------

 2018-01-31

 2018-02-28

 2018-03-31

 2018-04-30

 2018-05-31

Performance comparison

No matter what date range is provided, the built-in generate_series has a performance of 2ms on average for:

select generate_series(date '1900-01-01', 

                       date '10000-5-31', 

                       interval '1 month')::date 

as frequency;

while the custom function generate_date_series has a performance of 120ms on average for:

select generate_date_series(date '1900-01-01', 

                            date '10000-5-31', 

                            interval '1 month')::date 

as frequency;

Question

Is there a reason why the built-in function is able to attain a constant performance of 2ms on average no matter what range is provided?

Is there a better way to implement generate_date_series that performs as well as the built-in generate_series?

Improved implementation without loops

(derived from the answer of @eurotrash)

create or replace function generate_date_series(startsOn date, endsOn date, frequency interval)

returns setof date as $$

select (startsOn + (frequency * count))::date

from (

  select (row_number() over ()) - 1 as count

  from generate_series(startsOn, endsOn, frequency)

) series

$$ language sql immutable;

with the improved implementation, the generate_date_series function has a performance of 45ms on average for:

select generate_date_series(date '1900-01-01', 

                            date '10000-5-31', 

                            interval '1 month')::date 

as frequency;

The implementation provided by @eurotrash gives me 80ms on average, which I assume is due to calling the generate_series function twice.

edited Nov 11 at 12:14

asked Nov 10 at 13:55

chvndb

14519

Using postgresql version > 10, I have come to an issue when generating date series using the built-in generate_series function. In essence, it does not accord for the day of the month correctly.

I created a solution and wanted to post this here for others to use as I could not find any other solution.

TL;DR: if possible avoid loops as they are a performance hit, scroll to bottom for improved implementation.

Built-in Output

select generate_series(date '2018-01-31', 

                       date '2018-05-31', 

                       interval '1 month')::date

as frequency;

generates:

 frequency

------------

 2018-01-31

 2018-02-28

 2018-03-28

 2018-04-28

 2018-05-28

As can be seen from the output, the day of the month is not respected and truncated to the minimum day encountered along the way, in this case: 28 due to the month of februari.

Expected Output

As a result of this issue I created a custom function:

create or replace function generate_date_series(

  startsOn date, 

  endsOn date, 

  frequency interval)

returns setof date as $$

declare

  intervalOn date := startsOn;

  count int := 1;

begin

  while intervalOn <= endsOn loop

    return next intervalOn;

    intervalOn := startsOn + (count * frequency);

    count := count + 1;

  end loop;

  return;

end;

$$ language plpgsql immutable;



select generate_date_series(date '2018-01-31', 

                            date '2018-05-31', 

                            interval '1 month')

as frequency;

generates:

 frequency

------------

 2018-01-31

 2018-02-28

 2018-03-31

 2018-04-30

 2018-05-31

Performance comparison

No matter what date range is provided, the built-in generate_series has a performance of 2ms on average for:

select generate_series(date '1900-01-01', 

                       date '10000-5-31', 

                       interval '1 month')::date 

as frequency;

while the custom function generate_date_series has a performance of 120ms on average for:

select generate_date_series(date '1900-01-01', 

                            date '10000-5-31', 

                            interval '1 month')::date 

as frequency;

Question

Is there a reason why the built-in function is able to attain a constant performance of 2ms on average no matter what range is provided?

Is there a better way to implement generate_date_series that performs as well as the built-in generate_series?

Improved implementation without loops

(derived from the answer of @eurotrash)

create or replace function generate_date_series(startsOn date, endsOn date, frequency interval)

returns setof date as $$

select (startsOn + (frequency * count))::date

from (

  select (row_number() over ()) - 1 as count

  from generate_series(startsOn, endsOn, frequency)

) series

$$ language sql immutable;

with the improved implementation, the generate_date_series function has a performance of 45ms on average for:

select generate_date_series(date '1900-01-01', 

                            date '10000-5-31', 

                            interval '1 month')::date 

as frequency;

The implementation provided by @eurotrash gives me 80ms on average, which I assume is due to calling the generate_series function twice.

sql postgresql dateinterval generate-series

edited Nov 11 at 12:14

asked Nov 10 at 13:55

chvndb

14519

edited Nov 11 at 12:14

asked Nov 10 at 13:55

chvndb

14519

edited Nov 11 at 12:14

asked Nov 10 at 13:55

chvndb

14519

asked Nov 10 at 13:55

chvndb

14519

asked Nov 10 at 13:55

chvndb

14519

add a comment |

3 Answers
3

active

oldest

votes

up vote
3
down vote

accepted

Why your function is slow: you use variables and (more importantly) a loop. Loops are slow. Variables also mean reads from and writes to those variables.

CREATE OR REPLACE FUNCTION generate_date_series_2(starts_on DATE, ends_on DATE, frequency INTERVAL)

        RETURNS SETOF DATE AS

$BODY$

        SELECT (starts_on + (frequency * g))::DATE

        FROM generate_series(0, (SELECT COUNT(*)::INTEGER - 1 FROM generate_series(starts_on, ends_on, frequency))) g;

$BODY$

        LANGUAGE SQL IMMUTABLE;

The concept is basically the same as your plpgsql function but via a single query instead of a loop. The only problem is deciding how many iterations are needed (i.e. the second parameter to generate_series). Sadly I couldn't think of a better way to get the number of intervals required other than calling generate_series for the dates and using the count of that. Of course if you know your intervals will only ever be certain values then it may be possible to optimise; however this version handles any interval values.

On my system it's about 50% slower than a pure generate_series, and about 400% faster than your plpgsql version.

answered Nov 10 at 19:06

eurotrash

2,3751625

Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call to generate_series did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.
– chvndb
Nov 11 at 9:56

@chvndb Nice, I like your new implementation.
– eurotrash
Nov 11 at 10:48

add a comment |

up vote
1
down vote

REVISED SOLUTION

This gives me 97,212 rows in under 7 seconds (approx 0.7ms per row) and also supports leap-years where February have 29 days:

SELECT      t.day_of_month

FROM        (

                SELECT  ds.day_of_month

                        , date_part('day', ds.day_of_month) AS day

                        , date_part('day', ((day_of_month - date_part('day', ds.day_of_month)::INT + 1) + INTERVAL '1' MONTH) - INTERVAL '1' DAY) AS eom

                FROM    (

                            SELECT generate_series( date '1900-01-01', 

                                                    date '10000-12-31', 

                                                    INTERVAL '1 day')::DATE as day_of_month

                        ) AS ds

            ) AS t

            --> REMEMBER to change the day at both places below (eg. 31)

WHERE       t.day = 31 OR (t.day = t.eom AND t.day < 31)

Resulting output:
Please ensure you change the day on BOTH the RED numbers.
Performance Output

The output data:

Data Output

edited Nov 10 at 15:51

answered Nov 10 at 14:15

Joseph Lee

2,0591223

This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
– chvndb
Nov 10 at 14:54

Revised the solution.
– Joseph Lee
Nov 10 at 15:51

add a comment |

up vote
1
down vote

You may use date_trunc and add a month to the output ofgenerate_series, the performance should be almost similar.

SELECT 

  (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day') ::DATE AS frequency 

FROM 

  generate_series(

    DATE '2018-01-31', DATE '2018-05-31', 

    interval '1 MONTH'

  ) AS dt

Demo

Test

knayak=# select generate_series(date '2018-01-31',

knayak(#                        date '2018-05-31',

knayak(#                        interval '1 month')::date

knayak-# as frequency;

 frequency

------------

 2018-01-31

 2018-02-28

 2018-03-28

 2018-04-28

 2018-05-28

(5 rows)



Time: 0.303 ms

knayak=#

knayak=#

knayak=# SELECT

knayak-#   (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day' ):: DATE AS frequency

knayak-# FROM

knayak-#   generate_series(

knayak(#     DATE '2018-01-31', DATE '2018-05-31',

knayak(#     interval '1 MONTH'

knayak(#   ) AS dt

knayak-# ;

 frequency

------------

 2018-01-31

 2018-02-28

 2018-03-31

 2018-04-30

 2018-05-31

(5 rows)



Time: 0.425 ms

edited Nov 10 at 18:50

answered Nov 10 at 18:44

Kaushik Nayak

15.7k31128

The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
– eurotrash
Nov 10 at 18:55

@eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
– Kaushik Nayak
Nov 10 at 19:05

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53239669%2fpostgresql-generate-date-series-performance%23new-answer', 'question_page');
}
);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
3
down vote

accepted

Why your function is slow: you use variables and (more importantly) a loop. Loops are slow. Variables also mean reads from and writes to those variables.

CREATE OR REPLACE FUNCTION generate_date_series_2(starts_on DATE, ends_on DATE, frequency INTERVAL)

        RETURNS SETOF DATE AS

$BODY$

        SELECT (starts_on + (frequency * g))::DATE

        FROM generate_series(0, (SELECT COUNT(*)::INTEGER - 1 FROM generate_series(starts_on, ends_on, frequency))) g;

$BODY$

        LANGUAGE SQL IMMUTABLE;

On my system it's about 50% slower than a pure generate_series, and about 400% faster than your plpgsql version.

answered Nov 10 at 19:06

eurotrash

2,3751625

Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call to generate_series did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.
– chvndb
Nov 11 at 9:56

@chvndb Nice, I like your new implementation.
– eurotrash
Nov 11 at 10:48

add a comment |

up vote
3
down vote

accepted

Why your function is slow: you use variables and (more importantly) a loop. Loops are slow. Variables also mean reads from and writes to those variables.

CREATE OR REPLACE FUNCTION generate_date_series_2(starts_on DATE, ends_on DATE, frequency INTERVAL)

        RETURNS SETOF DATE AS

$BODY$

        SELECT (starts_on + (frequency * g))::DATE

        FROM generate_series(0, (SELECT COUNT(*)::INTEGER - 1 FROM generate_series(starts_on, ends_on, frequency))) g;

$BODY$

        LANGUAGE SQL IMMUTABLE;

On my system it's about 50% slower than a pure generate_series, and about 400% faster than your plpgsql version.

answered Nov 10 at 19:06

eurotrash

2,3751625

Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call to generate_series did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.
– chvndb
Nov 11 at 9:56

@chvndb Nice, I like your new implementation.
– eurotrash
Nov 11 at 10:48

add a comment |

up vote
3
down vote

accepted

Why your function is slow: you use variables and (more importantly) a loop. Loops are slow. Variables also mean reads from and writes to those variables.

CREATE OR REPLACE FUNCTION generate_date_series_2(starts_on DATE, ends_on DATE, frequency INTERVAL)

        RETURNS SETOF DATE AS

$BODY$

        SELECT (starts_on + (frequency * g))::DATE

        FROM generate_series(0, (SELECT COUNT(*)::INTEGER - 1 FROM generate_series(starts_on, ends_on, frequency))) g;

$BODY$

        LANGUAGE SQL IMMUTABLE;

On my system it's about 50% slower than a pure generate_series, and about 400% faster than your plpgsql version.

answered Nov 10 at 19:06

eurotrash

2,3751625

Why your function is slow: you use variables and (more importantly) a loop. Loops are slow. Variables also mean reads from and writes to those variables.

CREATE OR REPLACE FUNCTION generate_date_series_2(starts_on DATE, ends_on DATE, frequency INTERVAL)

        RETURNS SETOF DATE AS

$BODY$

        SELECT (starts_on + (frequency * g))::DATE

        FROM generate_series(0, (SELECT COUNT(*)::INTEGER - 1 FROM generate_series(starts_on, ends_on, frequency))) g;

$BODY$

        LANGUAGE SQL IMMUTABLE;

On my system it's about 50% slower than a pure generate_series, and about 400% faster than your plpgsql version.

answered Nov 10 at 19:06

eurotrash

2,3751625

answered Nov 10 at 19:06

eurotrash

2,3751625

answered Nov 10 at 19:06

eurotrash

2,3751625

answered Nov 10 at 19:06

eurotrash

2,3751625

Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call to generate_series did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.
– chvndb
Nov 11 at 9:56

@chvndb Nice, I like your new implementation.
– eurotrash
Nov 11 at 10:48

add a comment |

Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call to generate_series did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.
– chvndb
Nov 11 at 9:56

@chvndb Nice, I like your new implementation.
– eurotrash
Nov 11 at 10:48

Thanks for the input. I like your implementation as it keeps my solution to get the correct dates without any trickery and it removes the loop completely. However, the double call to generate_series did bother me, so I revised your implementation to remove one call. I extended my question to include this implementation.
– chvndb
Nov 11 at 9:56

@chvndb Nice, I like your new implementation.
– eurotrash
Nov 11 at 10:48

add a comment |

up vote
1
down vote

REVISED SOLUTION

This gives me 97,212 rows in under 7 seconds (approx 0.7ms per row) and also supports leap-years where February have 29 days:

SELECT      t.day_of_month

FROM        (

                SELECT  ds.day_of_month

                        , date_part('day', ds.day_of_month) AS day

                        , date_part('day', ((day_of_month - date_part('day', ds.day_of_month)::INT + 1) + INTERVAL '1' MONTH) - INTERVAL '1' DAY) AS eom

                FROM    (

                            SELECT generate_series( date '1900-01-01', 

                                                    date '10000-12-31', 

                                                    INTERVAL '1 day')::DATE as day_of_month

                        ) AS ds

            ) AS t

            --> REMEMBER to change the day at both places below (eg. 31)

WHERE       t.day = 31 OR (t.day = t.eom AND t.day < 31)

Resulting output:
Please ensure you change the day on BOTH the RED numbers.
Performance Output

The output data:

Data Output

edited Nov 10 at 15:51

answered Nov 10 at 14:15

Joseph Lee

2,0591223

This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
– chvndb
Nov 10 at 14:54

Revised the solution.
– Joseph Lee
Nov 10 at 15:51

add a comment |

up vote
1
down vote

REVISED SOLUTION

This gives me 97,212 rows in under 7 seconds (approx 0.7ms per row) and also supports leap-years where February have 29 days:

SELECT      t.day_of_month

FROM        (

                SELECT  ds.day_of_month

                        , date_part('day', ds.day_of_month) AS day

                        , date_part('day', ((day_of_month - date_part('day', ds.day_of_month)::INT + 1) + INTERVAL '1' MONTH) - INTERVAL '1' DAY) AS eom

                FROM    (

                            SELECT generate_series( date '1900-01-01', 

                                                    date '10000-12-31', 

                                                    INTERVAL '1 day')::DATE as day_of_month

                        ) AS ds

            ) AS t

            --> REMEMBER to change the day at both places below (eg. 31)

WHERE       t.day = 31 OR (t.day = t.eom AND t.day < 31)

Resulting output:
Please ensure you change the day on BOTH the RED numbers.
Performance Output

The output data:

Data Output

edited Nov 10 at 15:51

answered Nov 10 at 14:15

Joseph Lee

2,0591223

This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
– chvndb
Nov 10 at 14:54

Revised the solution.
– Joseph Lee
Nov 10 at 15:51

add a comment |

up vote
1
down vote

REVISED SOLUTION

This gives me 97,212 rows in under 7 seconds (approx 0.7ms per row) and also supports leap-years where February have 29 days:

SELECT      t.day_of_month

FROM        (

                SELECT  ds.day_of_month

                        , date_part('day', ds.day_of_month) AS day

                        , date_part('day', ((day_of_month - date_part('day', ds.day_of_month)::INT + 1) + INTERVAL '1' MONTH) - INTERVAL '1' DAY) AS eom

                FROM    (

                            SELECT generate_series( date '1900-01-01', 

                                                    date '10000-12-31', 

                                                    INTERVAL '1 day')::DATE as day_of_month

                        ) AS ds

            ) AS t

            --> REMEMBER to change the day at both places below (eg. 31)

WHERE       t.day = 31 OR (t.day = t.eom AND t.day < 31)

Resulting output:
Please ensure you change the day on BOTH the RED numbers.
Performance Output

The output data:

Data Output

edited Nov 10 at 15:51

answered Nov 10 at 14:15

Joseph Lee

2,0591223

REVISED SOLUTION

This gives me 97,212 rows in under 7 seconds (approx 0.7ms per row) and also supports leap-years where February have 29 days:

SELECT      t.day_of_month

FROM        (

                SELECT  ds.day_of_month

                        , date_part('day', ds.day_of_month) AS day

                        , date_part('day', ((day_of_month - date_part('day', ds.day_of_month)::INT + 1) + INTERVAL '1' MONTH) - INTERVAL '1' DAY) AS eom

                FROM    (

                            SELECT generate_series( date '1900-01-01', 

                                                    date '10000-12-31', 

                                                    INTERVAL '1 day')::DATE as day_of_month

                        ) AS ds

            ) AS t

            --> REMEMBER to change the day at both places below (eg. 31)

WHERE       t.day = 31 OR (t.day = t.eom AND t.day < 31)

Resulting output:
Please ensure you change the day on BOTH the RED numbers.
Performance Output

The output data:

Data Output

edited Nov 10 at 15:51

answered Nov 10 at 14:15

Joseph Lee

2,0591223

edited Nov 10 at 15:51

answered Nov 10 at 14:15

Joseph Lee

2,0591223

answered Nov 10 at 14:15

Joseph Lee

2,0591223

answered Nov 10 at 14:15

Joseph Lee

2,0591223

This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
– chvndb
Nov 10 at 14:54

Revised the solution.
– Joseph Lee
Nov 10 at 15:51

add a comment |

This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
– chvndb
Nov 10 at 14:54

Revised the solution.
– Joseph Lee
Nov 10 at 15:51

This does not generate the expected output. The problem with this is that it only works for the end of the month, but what if the interval starts on the 30th.
– chvndb
Nov 10 at 14:54

Revised the solution.
– Joseph Lee
Nov 10 at 15:51

add a comment |

up vote
1
down vote

You may use date_trunc and add a month to the output ofgenerate_series, the performance should be almost similar.

SELECT 

  (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day') ::DATE AS frequency 

FROM 

  generate_series(

    DATE '2018-01-31', DATE '2018-05-31', 

    interval '1 MONTH'

  ) AS dt

Demo

Test

knayak=# select generate_series(date '2018-01-31',

knayak(#                        date '2018-05-31',

knayak(#                        interval '1 month')::date

knayak-# as frequency;

 frequency

------------

 2018-01-31

 2018-02-28

 2018-03-28

 2018-04-28

 2018-05-28

(5 rows)



Time: 0.303 ms

knayak=#

knayak=#

knayak=# SELECT

knayak-#   (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day' ):: DATE AS frequency

knayak-# FROM

knayak-#   generate_series(

knayak(#     DATE '2018-01-31', DATE '2018-05-31',

knayak(#     interval '1 MONTH'

knayak(#   ) AS dt

knayak-# ;

 frequency

------------

 2018-01-31

 2018-02-28

 2018-03-31

 2018-04-30

 2018-05-31

(5 rows)



Time: 0.425 ms

edited Nov 10 at 18:50

answered Nov 10 at 18:44

Kaushik Nayak

15.7k31128

The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
– eurotrash
Nov 10 at 18:55

@eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
– Kaushik Nayak
Nov 10 at 19:05

add a comment |

up vote
1
down vote

You may use date_trunc and add a month to the output ofgenerate_series, the performance should be almost similar.

SELECT 

  (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day') ::DATE AS frequency 

FROM 

  generate_series(

    DATE '2018-01-31', DATE '2018-05-31', 

    interval '1 MONTH'

  ) AS dt

Demo

Test

knayak=# select generate_series(date '2018-01-31',

knayak(#                        date '2018-05-31',

knayak(#                        interval '1 month')::date

knayak-# as frequency;

 frequency

------------

 2018-01-31

 2018-02-28

 2018-03-28

 2018-04-28

 2018-05-28

(5 rows)



Time: 0.303 ms

knayak=#

knayak=#

knayak=# SELECT

knayak-#   (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day' ):: DATE AS frequency

knayak-# FROM

knayak-#   generate_series(

knayak(#     DATE '2018-01-31', DATE '2018-05-31',

knayak(#     interval '1 MONTH'

knayak(#   ) AS dt

knayak-# ;

 frequency

------------

 2018-01-31

 2018-02-28

 2018-03-31

 2018-04-30

 2018-05-31

(5 rows)



Time: 0.425 ms

edited Nov 10 at 18:50

answered Nov 10 at 18:44

Kaushik Nayak

15.7k31128

The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
– eurotrash
Nov 10 at 18:55

@eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
– Kaushik Nayak
Nov 10 at 19:05

add a comment |

up vote
1
down vote

You may use date_trunc and add a month to the output ofgenerate_series, the performance should be almost similar.

SELECT 

  (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day') ::DATE AS frequency 

FROM 

  generate_series(

    DATE '2018-01-31', DATE '2018-05-31', 

    interval '1 MONTH'

  ) AS dt

Demo

Test

knayak=# select generate_series(date '2018-01-31',

knayak(#                        date '2018-05-31',

knayak(#                        interval '1 month')::date

knayak-# as frequency;

 frequency

------------

 2018-01-31

 2018-02-28

 2018-03-28

 2018-04-28

 2018-05-28

(5 rows)



Time: 0.303 ms

knayak=#

knayak=#

knayak=# SELECT

knayak-#   (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day' ):: DATE AS frequency

knayak-# FROM

knayak-#   generate_series(

knayak(#     DATE '2018-01-31', DATE '2018-05-31',

knayak(#     interval '1 MONTH'

knayak(#   ) AS dt

knayak-# ;

 frequency

------------

 2018-01-31

 2018-02-28

 2018-03-31

 2018-04-30

 2018-05-31

(5 rows)



Time: 0.425 ms

edited Nov 10 at 18:50

answered Nov 10 at 18:44

Kaushik Nayak

15.7k31128

You may use date_trunc and add a month to the output ofgenerate_series, the performance should be almost similar.

SELECT 

  (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day') ::DATE AS frequency 

FROM 

  generate_series(

    DATE '2018-01-31', DATE '2018-05-31', 

    interval '1 MONTH'

  ) AS dt

Demo

Test

knayak=# select generate_series(date '2018-01-31',

knayak(#                        date '2018-05-31',

knayak(#                        interval '1 month')::date

knayak-# as frequency;

 frequency

------------

 2018-01-31

 2018-02-28

 2018-03-28

 2018-04-28

 2018-05-28

(5 rows)



Time: 0.303 ms

knayak=#

knayak=#

knayak=# SELECT

knayak-#   (date_trunc('month', dt) + INTERVAL '1 MONTH - 1 day' ):: DATE AS frequency

knayak-# FROM

knayak-#   generate_series(

knayak(#     DATE '2018-01-31', DATE '2018-05-31',

knayak(#     interval '1 MONTH'

knayak(#   ) AS dt

knayak-# ;

 frequency

------------

 2018-01-31

 2018-02-28

 2018-03-31

 2018-04-30

 2018-05-31

(5 rows)



Time: 0.425 ms

edited Nov 10 at 18:50

answered Nov 10 at 18:44

Kaushik Nayak

15.7k31128

edited Nov 10 at 18:50

answered Nov 10 at 18:44

Kaushik Nayak

15.7k31128

answered Nov 10 at 18:44

Kaushik Nayak

15.7k31128

answered Nov 10 at 18:44

Kaushik Nayak

15.7k31128

The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
– eurotrash
Nov 10 at 18:55

@eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
– Kaushik Nayak
Nov 10 at 19:05

add a comment |

The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
– eurotrash
Nov 10 at 18:55

@eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
– Kaushik Nayak
Nov 10 at 19:05

The performance is very good, but I assume OP wants his function to handle any interval, not just 1 month.
– eurotrash
Nov 10 at 18:55

@eurotrash : That may be true. But, this method could be cleverly converted to a generalised function with more or less the same performance ( without OP's while loops, of course!)
– Kaushik Nayak
Nov 10 at 19:05

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Name

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ndtyjky