Get HashMap from RDD
I have a requirement to get a global HashMap from RDD[HashMap]. For example, the RDD is RDD[HashMap[Key, value]]. I want to get a global HashMap from this so that I can use this HashMap for enriching messages present in other RDD.
Could anyone please help me how to do this.
Thanks
scala apache-spark
add a comment |
I have a requirement to get a global HashMap from RDD[HashMap]. For example, the RDD is RDD[HashMap[Key, value]]. I want to get a global HashMap from this so that I can use this HashMap for enriching messages present in other RDD.
Could anyone please help me how to do this.
Thanks
scala apache-spark
2
Does the RDD have only one element?, or do you want to merge all maps in one (which strategy do you want for merging)?, or is it a RDD of tuples(Key -> Value)
?
– Luis Miguel Mejía Suárez
Nov 13 '18 at 0:48
add a comment |
I have a requirement to get a global HashMap from RDD[HashMap]. For example, the RDD is RDD[HashMap[Key, value]]. I want to get a global HashMap from this so that I can use this HashMap for enriching messages present in other RDD.
Could anyone please help me how to do this.
Thanks
scala apache-spark
I have a requirement to get a global HashMap from RDD[HashMap]. For example, the RDD is RDD[HashMap[Key, value]]. I want to get a global HashMap from this so that I can use this HashMap for enriching messages present in other RDD.
Could anyone please help me how to do this.
Thanks
scala apache-spark
scala apache-spark
asked Nov 13 '18 at 0:30
IndiraIndira
215
215
2
Does the RDD have only one element?, or do you want to merge all maps in one (which strategy do you want for merging)?, or is it a RDD of tuples(Key -> Value)
?
– Luis Miguel Mejía Suárez
Nov 13 '18 at 0:48
add a comment |
2
Does the RDD have only one element?, or do you want to merge all maps in one (which strategy do you want for merging)?, or is it a RDD of tuples(Key -> Value)
?
– Luis Miguel Mejía Suárez
Nov 13 '18 at 0:48
2
2
Does the RDD have only one element?, or do you want to merge all maps in one (which strategy do you want for merging)?, or is it a RDD of tuples
(Key -> Value)
?– Luis Miguel Mejía Suárez
Nov 13 '18 at 0:48
Does the RDD have only one element?, or do you want to merge all maps in one (which strategy do you want for merging)?, or is it a RDD of tuples
(Key -> Value)
?– Luis Miguel Mejía Suárez
Nov 13 '18 at 0:48
add a comment |
1 Answer
1
active
oldest
votes
So like the comment says, you'll need a merge function. Assuming a simple hashmap merge works for you such as if the keys/values are unique, then you can merge it into a local map using something as simple as rdd.reduce(_++_)
. Then you'll want to broadcast it so that its efficiently sent to each executor once. Once it's in the broadcast variable then this can be used within your RDD operations on other RDDs such as enriching messages as you said.
val brodcast = sparkContext.broadcast( rdd.reduce(_++_) )
And this can be used by using broadcast.value
Thanks. It worked
– Indira
Nov 13 '18 at 9:12
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272078%2fget-hashmap-from-rdd%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
So like the comment says, you'll need a merge function. Assuming a simple hashmap merge works for you such as if the keys/values are unique, then you can merge it into a local map using something as simple as rdd.reduce(_++_)
. Then you'll want to broadcast it so that its efficiently sent to each executor once. Once it's in the broadcast variable then this can be used within your RDD operations on other RDDs such as enriching messages as you said.
val brodcast = sparkContext.broadcast( rdd.reduce(_++_) )
And this can be used by using broadcast.value
Thanks. It worked
– Indira
Nov 13 '18 at 9:12
add a comment |
So like the comment says, you'll need a merge function. Assuming a simple hashmap merge works for you such as if the keys/values are unique, then you can merge it into a local map using something as simple as rdd.reduce(_++_)
. Then you'll want to broadcast it so that its efficiently sent to each executor once. Once it's in the broadcast variable then this can be used within your RDD operations on other RDDs such as enriching messages as you said.
val brodcast = sparkContext.broadcast( rdd.reduce(_++_) )
And this can be used by using broadcast.value
Thanks. It worked
– Indira
Nov 13 '18 at 9:12
add a comment |
So like the comment says, you'll need a merge function. Assuming a simple hashmap merge works for you such as if the keys/values are unique, then you can merge it into a local map using something as simple as rdd.reduce(_++_)
. Then you'll want to broadcast it so that its efficiently sent to each executor once. Once it's in the broadcast variable then this can be used within your RDD operations on other RDDs such as enriching messages as you said.
val brodcast = sparkContext.broadcast( rdd.reduce(_++_) )
And this can be used by using broadcast.value
So like the comment says, you'll need a merge function. Assuming a simple hashmap merge works for you such as if the keys/values are unique, then you can merge it into a local map using something as simple as rdd.reduce(_++_)
. Then you'll want to broadcast it so that its efficiently sent to each executor once. Once it's in the broadcast variable then this can be used within your RDD operations on other RDDs such as enriching messages as you said.
val brodcast = sparkContext.broadcast( rdd.reduce(_++_) )
And this can be used by using broadcast.value
answered Nov 13 '18 at 8:26
user1084563user1084563
1,7401324
1,7401324
Thanks. It worked
– Indira
Nov 13 '18 at 9:12
add a comment |
Thanks. It worked
– Indira
Nov 13 '18 at 9:12
Thanks. It worked
– Indira
Nov 13 '18 at 9:12
Thanks. It worked
– Indira
Nov 13 '18 at 9:12
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272078%2fget-hashmap-from-rdd%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Does the RDD have only one element?, or do you want to merge all maps in one (which strategy do you want for merging)?, or is it a RDD of tuples
(Key -> Value)
?– Luis Miguel Mejía Suárez
Nov 13 '18 at 0:48