Get HashMap from RDD












-2















I have a requirement to get a global HashMap from RDD[HashMap]. For example, the RDD is RDD[HashMap[Key, value]]. I want to get a global HashMap from this so that I can use this HashMap for enriching messages present in other RDD.



Could anyone please help me how to do this.



Thanks










share|improve this question


















  • 2





    Does the RDD have only one element?, or do you want to merge all maps in one (which strategy do you want for merging)?, or is it a RDD of tuples (Key -> Value)?

    – Luis Miguel Mejía Suárez
    Nov 13 '18 at 0:48
















-2















I have a requirement to get a global HashMap from RDD[HashMap]. For example, the RDD is RDD[HashMap[Key, value]]. I want to get a global HashMap from this so that I can use this HashMap for enriching messages present in other RDD.



Could anyone please help me how to do this.



Thanks










share|improve this question


















  • 2





    Does the RDD have only one element?, or do you want to merge all maps in one (which strategy do you want for merging)?, or is it a RDD of tuples (Key -> Value)?

    – Luis Miguel Mejía Suárez
    Nov 13 '18 at 0:48














-2












-2








-2








I have a requirement to get a global HashMap from RDD[HashMap]. For example, the RDD is RDD[HashMap[Key, value]]. I want to get a global HashMap from this so that I can use this HashMap for enriching messages present in other RDD.



Could anyone please help me how to do this.



Thanks










share|improve this question














I have a requirement to get a global HashMap from RDD[HashMap]. For example, the RDD is RDD[HashMap[Key, value]]. I want to get a global HashMap from this so that I can use this HashMap for enriching messages present in other RDD.



Could anyone please help me how to do this.



Thanks







scala apache-spark






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 13 '18 at 0:30









IndiraIndira

215




215








  • 2





    Does the RDD have only one element?, or do you want to merge all maps in one (which strategy do you want for merging)?, or is it a RDD of tuples (Key -> Value)?

    – Luis Miguel Mejía Suárez
    Nov 13 '18 at 0:48














  • 2





    Does the RDD have only one element?, or do you want to merge all maps in one (which strategy do you want for merging)?, or is it a RDD of tuples (Key -> Value)?

    – Luis Miguel Mejía Suárez
    Nov 13 '18 at 0:48








2




2





Does the RDD have only one element?, or do you want to merge all maps in one (which strategy do you want for merging)?, or is it a RDD of tuples (Key -> Value)?

– Luis Miguel Mejía Suárez
Nov 13 '18 at 0:48





Does the RDD have only one element?, or do you want to merge all maps in one (which strategy do you want for merging)?, or is it a RDD of tuples (Key -> Value)?

– Luis Miguel Mejía Suárez
Nov 13 '18 at 0:48












1 Answer
1






active

oldest

votes


















0














So like the comment says, you'll need a merge function. Assuming a simple hashmap merge works for you such as if the keys/values are unique, then you can merge it into a local map using something as simple as rdd.reduce(_++_). Then you'll want to broadcast it so that its efficiently sent to each executor once. Once it's in the broadcast variable then this can be used within your RDD operations on other RDDs such as enriching messages as you said.



val brodcast = sparkContext.broadcast( rdd.reduce(_++_) )


And this can be used by using broadcast.value






share|improve this answer
























  • Thanks. It worked

    – Indira
    Nov 13 '18 at 9:12











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272078%2fget-hashmap-from-rdd%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














So like the comment says, you'll need a merge function. Assuming a simple hashmap merge works for you such as if the keys/values are unique, then you can merge it into a local map using something as simple as rdd.reduce(_++_). Then you'll want to broadcast it so that its efficiently sent to each executor once. Once it's in the broadcast variable then this can be used within your RDD operations on other RDDs such as enriching messages as you said.



val brodcast = sparkContext.broadcast( rdd.reduce(_++_) )


And this can be used by using broadcast.value






share|improve this answer
























  • Thanks. It worked

    – Indira
    Nov 13 '18 at 9:12
















0














So like the comment says, you'll need a merge function. Assuming a simple hashmap merge works for you such as if the keys/values are unique, then you can merge it into a local map using something as simple as rdd.reduce(_++_). Then you'll want to broadcast it so that its efficiently sent to each executor once. Once it's in the broadcast variable then this can be used within your RDD operations on other RDDs such as enriching messages as you said.



val brodcast = sparkContext.broadcast( rdd.reduce(_++_) )


And this can be used by using broadcast.value






share|improve this answer
























  • Thanks. It worked

    – Indira
    Nov 13 '18 at 9:12














0












0








0







So like the comment says, you'll need a merge function. Assuming a simple hashmap merge works for you such as if the keys/values are unique, then you can merge it into a local map using something as simple as rdd.reduce(_++_). Then you'll want to broadcast it so that its efficiently sent to each executor once. Once it's in the broadcast variable then this can be used within your RDD operations on other RDDs such as enriching messages as you said.



val brodcast = sparkContext.broadcast( rdd.reduce(_++_) )


And this can be used by using broadcast.value






share|improve this answer













So like the comment says, you'll need a merge function. Assuming a simple hashmap merge works for you such as if the keys/values are unique, then you can merge it into a local map using something as simple as rdd.reduce(_++_). Then you'll want to broadcast it so that its efficiently sent to each executor once. Once it's in the broadcast variable then this can be used within your RDD operations on other RDDs such as enriching messages as you said.



val brodcast = sparkContext.broadcast( rdd.reduce(_++_) )


And this can be used by using broadcast.value







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 13 '18 at 8:26









user1084563user1084563

1,7401324




1,7401324













  • Thanks. It worked

    – Indira
    Nov 13 '18 at 9:12



















  • Thanks. It worked

    – Indira
    Nov 13 '18 at 9:12

















Thanks. It worked

– Indira
Nov 13 '18 at 9:12





Thanks. It worked

– Indira
Nov 13 '18 at 9:12


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272078%2fget-hashmap-from-rdd%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Florida Star v. B. J. F.

Error while running script in elastic search , gateway timeout

Adding quotations to stringified JSON object values