Calculate “position in run” in SQL
I have a table of consecutive ids (integers, 1 ... n), and values (integers), like this:
Input Table:
id value
-- -----
1 1
2 1
3 2
4 3
5 1
6 1
7 1
Going down the table i.e. in order of increasing id, I want to count how many times in a row the same value has been seen consecutively, i.e. the position in a run:
Output Table:
id value position in run
-- ----- ---------------
1 1 1
2 1 2
3 2 1
4 3 1
5 1 1
6 1 2
7 1 3
Any ideas? I've searched for a combination of windowing functions including lead and lag, but can't come up with it. Note that the same value can appear in the value column as part of different runs, so partitioning by value may not help solve this. I'm on Hive 1.2.
hive hiveql
add a comment |
I have a table of consecutive ids (integers, 1 ... n), and values (integers), like this:
Input Table:
id value
-- -----
1 1
2 1
3 2
4 3
5 1
6 1
7 1
Going down the table i.e. in order of increasing id, I want to count how many times in a row the same value has been seen consecutively, i.e. the position in a run:
Output Table:
id value position in run
-- ----- ---------------
1 1 1
2 1 2
3 2 1
4 3 1
5 1 1
6 1 2
7 1 3
Any ideas? I've searched for a combination of windowing functions including lead and lag, but can't come up with it. Note that the same value can appear in the value column as part of different runs, so partitioning by value may not help solve this. I'm on Hive 1.2.
hive hiveql
I'm pretty sure using a variable for representing the previous value, and another variable for keeping track of its run position would be the ideal solution for generating such an output. Honestly though, I'd modify the schema to calculate this kind of stuff when it's inserted/update/delete to help speed up selects.
– Ultimater
Nov 15 '18 at 3:47
add a comment |
I have a table of consecutive ids (integers, 1 ... n), and values (integers), like this:
Input Table:
id value
-- -----
1 1
2 1
3 2
4 3
5 1
6 1
7 1
Going down the table i.e. in order of increasing id, I want to count how many times in a row the same value has been seen consecutively, i.e. the position in a run:
Output Table:
id value position in run
-- ----- ---------------
1 1 1
2 1 2
3 2 1
4 3 1
5 1 1
6 1 2
7 1 3
Any ideas? I've searched for a combination of windowing functions including lead and lag, but can't come up with it. Note that the same value can appear in the value column as part of different runs, so partitioning by value may not help solve this. I'm on Hive 1.2.
hive hiveql
I have a table of consecutive ids (integers, 1 ... n), and values (integers), like this:
Input Table:
id value
-- -----
1 1
2 1
3 2
4 3
5 1
6 1
7 1
Going down the table i.e. in order of increasing id, I want to count how many times in a row the same value has been seen consecutively, i.e. the position in a run:
Output Table:
id value position in run
-- ----- ---------------
1 1 1
2 1 2
3 2 1
4 3 1
5 1 1
6 1 2
7 1 3
Any ideas? I've searched for a combination of windowing functions including lead and lag, but can't come up with it. Note that the same value can appear in the value column as part of different runs, so partitioning by value may not help solve this. I'm on Hive 1.2.
hive hiveql
hive hiveql
asked Nov 15 '18 at 3:28
batteybattey
349212
349212
I'm pretty sure using a variable for representing the previous value, and another variable for keeping track of its run position would be the ideal solution for generating such an output. Honestly though, I'd modify the schema to calculate this kind of stuff when it's inserted/update/delete to help speed up selects.
– Ultimater
Nov 15 '18 at 3:47
add a comment |
I'm pretty sure using a variable for representing the previous value, and another variable for keeping track of its run position would be the ideal solution for generating such an output. Honestly though, I'd modify the schema to calculate this kind of stuff when it's inserted/update/delete to help speed up selects.
– Ultimater
Nov 15 '18 at 3:47
I'm pretty sure using a variable for representing the previous value, and another variable for keeping track of its run position would be the ideal solution for generating such an output. Honestly though, I'd modify the schema to calculate this kind of stuff when it's inserted/update/delete to help speed up selects.
– Ultimater
Nov 15 '18 at 3:47
I'm pretty sure using a variable for representing the previous value, and another variable for keeping track of its run position would be the ideal solution for generating such an output. Honestly though, I'd modify the schema to calculate this kind of stuff when it's inserted/update/delete to help speed up selects.
– Ultimater
Nov 15 '18 at 3:47
add a comment |
1 Answer
1
active
oldest
votes
One way is to use a difference of row numbers approach to classify consecutive same values into one group. Then a row number function to get the desired positions in each group.
Query to assign groups (Running this will help you understand how the groups are assigned.)
select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
Final Query using row_number
to get positions in each group assigned with the above query.
select id,value,row_number() over(partition by value,rnum_diff order by id) as pos_in_grp
from (select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
) t
Excellent!!.. this is somewhat complex and you solved it.
– stack0114106
Nov 20 '18 at 21:16
hi @Vamsi.. could you pls give a try at stackoverflow.com/questions/53752311/…
– stack0114106
Dec 13 '18 at 17:39
@stack0114106 .. been thinking about this..couldn't find a way to solve this yet :) possible with recursive cte in SQL..however, it is not supported in Spark SQL and Hive.
– Vamsi Prabhala
Dec 13 '18 at 18:47
thanks for your reply..whatever way you are solving..fine..pls let me know once you complete
– stack0114106
Dec 13 '18 at 19:03
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53312001%2fcalculate-position-in-run-in-sql%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
One way is to use a difference of row numbers approach to classify consecutive same values into one group. Then a row number function to get the desired positions in each group.
Query to assign groups (Running this will help you understand how the groups are assigned.)
select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
Final Query using row_number
to get positions in each group assigned with the above query.
select id,value,row_number() over(partition by value,rnum_diff order by id) as pos_in_grp
from (select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
) t
Excellent!!.. this is somewhat complex and you solved it.
– stack0114106
Nov 20 '18 at 21:16
hi @Vamsi.. could you pls give a try at stackoverflow.com/questions/53752311/…
– stack0114106
Dec 13 '18 at 17:39
@stack0114106 .. been thinking about this..couldn't find a way to solve this yet :) possible with recursive cte in SQL..however, it is not supported in Spark SQL and Hive.
– Vamsi Prabhala
Dec 13 '18 at 18:47
thanks for your reply..whatever way you are solving..fine..pls let me know once you complete
– stack0114106
Dec 13 '18 at 19:03
add a comment |
One way is to use a difference of row numbers approach to classify consecutive same values into one group. Then a row number function to get the desired positions in each group.
Query to assign groups (Running this will help you understand how the groups are assigned.)
select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
Final Query using row_number
to get positions in each group assigned with the above query.
select id,value,row_number() over(partition by value,rnum_diff order by id) as pos_in_grp
from (select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
) t
Excellent!!.. this is somewhat complex and you solved it.
– stack0114106
Nov 20 '18 at 21:16
hi @Vamsi.. could you pls give a try at stackoverflow.com/questions/53752311/…
– stack0114106
Dec 13 '18 at 17:39
@stack0114106 .. been thinking about this..couldn't find a way to solve this yet :) possible with recursive cte in SQL..however, it is not supported in Spark SQL and Hive.
– Vamsi Prabhala
Dec 13 '18 at 18:47
thanks for your reply..whatever way you are solving..fine..pls let me know once you complete
– stack0114106
Dec 13 '18 at 19:03
add a comment |
One way is to use a difference of row numbers approach to classify consecutive same values into one group. Then a row number function to get the desired positions in each group.
Query to assign groups (Running this will help you understand how the groups are assigned.)
select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
Final Query using row_number
to get positions in each group assigned with the above query.
select id,value,row_number() over(partition by value,rnum_diff order by id) as pos_in_grp
from (select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
) t
One way is to use a difference of row numbers approach to classify consecutive same values into one group. Then a row number function to get the desired positions in each group.
Query to assign groups (Running this will help you understand how the groups are assigned.)
select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
Final Query using row_number
to get positions in each group assigned with the above query.
select id,value,row_number() over(partition by value,rnum_diff order by id) as pos_in_grp
from (select t.*
,row_number() over(order by id) - row_number() over(partition by value order by id) as rnum_diff
from tbl t
) t
answered Nov 16 '18 at 16:04
Vamsi PrabhalaVamsi Prabhala
41.5k42039
41.5k42039
Excellent!!.. this is somewhat complex and you solved it.
– stack0114106
Nov 20 '18 at 21:16
hi @Vamsi.. could you pls give a try at stackoverflow.com/questions/53752311/…
– stack0114106
Dec 13 '18 at 17:39
@stack0114106 .. been thinking about this..couldn't find a way to solve this yet :) possible with recursive cte in SQL..however, it is not supported in Spark SQL and Hive.
– Vamsi Prabhala
Dec 13 '18 at 18:47
thanks for your reply..whatever way you are solving..fine..pls let me know once you complete
– stack0114106
Dec 13 '18 at 19:03
add a comment |
Excellent!!.. this is somewhat complex and you solved it.
– stack0114106
Nov 20 '18 at 21:16
hi @Vamsi.. could you pls give a try at stackoverflow.com/questions/53752311/…
– stack0114106
Dec 13 '18 at 17:39
@stack0114106 .. been thinking about this..couldn't find a way to solve this yet :) possible with recursive cte in SQL..however, it is not supported in Spark SQL and Hive.
– Vamsi Prabhala
Dec 13 '18 at 18:47
thanks for your reply..whatever way you are solving..fine..pls let me know once you complete
– stack0114106
Dec 13 '18 at 19:03
Excellent!!.. this is somewhat complex and you solved it.
– stack0114106
Nov 20 '18 at 21:16
Excellent!!.. this is somewhat complex and you solved it.
– stack0114106
Nov 20 '18 at 21:16
hi @Vamsi.. could you pls give a try at stackoverflow.com/questions/53752311/…
– stack0114106
Dec 13 '18 at 17:39
hi @Vamsi.. could you pls give a try at stackoverflow.com/questions/53752311/…
– stack0114106
Dec 13 '18 at 17:39
@stack0114106 .. been thinking about this..couldn't find a way to solve this yet :) possible with recursive cte in SQL..however, it is not supported in Spark SQL and Hive.
– Vamsi Prabhala
Dec 13 '18 at 18:47
@stack0114106 .. been thinking about this..couldn't find a way to solve this yet :) possible with recursive cte in SQL..however, it is not supported in Spark SQL and Hive.
– Vamsi Prabhala
Dec 13 '18 at 18:47
thanks for your reply..whatever way you are solving..fine..pls let me know once you complete
– stack0114106
Dec 13 '18 at 19:03
thanks for your reply..whatever way you are solving..fine..pls let me know once you complete
– stack0114106
Dec 13 '18 at 19:03
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53312001%2fcalculate-position-in-run-in-sql%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I'm pretty sure using a variable for representing the previous value, and another variable for keeping track of its run position would be the ideal solution for generating such an output. Honestly though, I'd modify the schema to calculate this kind of stuff when it's inserted/update/delete to help speed up selects.
– Ultimater
Nov 15 '18 at 3:47