Compute count of unique words using ES6 sets
up vote
6
down vote
favorite
I have written a function which counts how many unique words are included in a given text.
The source-code:
const computeCountUniqueWords = (strToExamine) => {
let parts = strToExamine.split(" ");
let validWords = parts.filter((word) => {
return /^w/.test(word);
});
let uniqueWords = new Set(validWords);
return uniqueWords.size;
}
let text1 = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
let text2 = "Etiam ultricies nisi vel augue. Curabitur ullamcorper";
console.log(`Text 1 has ${computeCountUniqueWords(text1)} words`);
console.log(`Text 2 has ${computeCountUniqueWords(text2)} words.`);
I think it has become quite neat and short.
Nevertheless: Is there a better way to solve the described task?
Moveover: Is my checking with the regular expression sufficient? Or should it be enhanced?
Looking forward to reading your answers.
javascript regex ecmascript-6
add a comment |
up vote
6
down vote
favorite
I have written a function which counts how many unique words are included in a given text.
The source-code:
const computeCountUniqueWords = (strToExamine) => {
let parts = strToExamine.split(" ");
let validWords = parts.filter((word) => {
return /^w/.test(word);
});
let uniqueWords = new Set(validWords);
return uniqueWords.size;
}
let text1 = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
let text2 = "Etiam ultricies nisi vel augue. Curabitur ullamcorper";
console.log(`Text 1 has ${computeCountUniqueWords(text1)} words`);
console.log(`Text 2 has ${computeCountUniqueWords(text2)} words.`);
I think it has become quite neat and short.
Nevertheless: Is there a better way to solve the described task?
Moveover: Is my checking with the regular expression sufficient? Or should it be enhanced?
Looking forward to reading your answers.
javascript regex ecmascript-6
add a comment |
up vote
6
down vote
favorite
up vote
6
down vote
favorite
I have written a function which counts how many unique words are included in a given text.
The source-code:
const computeCountUniqueWords = (strToExamine) => {
let parts = strToExamine.split(" ");
let validWords = parts.filter((word) => {
return /^w/.test(word);
});
let uniqueWords = new Set(validWords);
return uniqueWords.size;
}
let text1 = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
let text2 = "Etiam ultricies nisi vel augue. Curabitur ullamcorper";
console.log(`Text 1 has ${computeCountUniqueWords(text1)} words`);
console.log(`Text 2 has ${computeCountUniqueWords(text2)} words.`);
I think it has become quite neat and short.
Nevertheless: Is there a better way to solve the described task?
Moveover: Is my checking with the regular expression sufficient? Or should it be enhanced?
Looking forward to reading your answers.
javascript regex ecmascript-6
I have written a function which counts how many unique words are included in a given text.
The source-code:
const computeCountUniqueWords = (strToExamine) => {
let parts = strToExamine.split(" ");
let validWords = parts.filter((word) => {
return /^w/.test(word);
});
let uniqueWords = new Set(validWords);
return uniqueWords.size;
}
let text1 = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
let text2 = "Etiam ultricies nisi vel augue. Curabitur ullamcorper";
console.log(`Text 1 has ${computeCountUniqueWords(text1)} words`);
console.log(`Text 2 has ${computeCountUniqueWords(text2)} words.`);
I think it has become quite neat and short.
Nevertheless: Is there a better way to solve the described task?
Moveover: Is my checking with the regular expression sufficient? Or should it be enhanced?
Looking forward to reading your answers.
const computeCountUniqueWords = (strToExamine) => {
let parts = strToExamine.split(" ");
let validWords = parts.filter((word) => {
return /^w/.test(word);
});
let uniqueWords = new Set(validWords);
return uniqueWords.size;
}
let text1 = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
let text2 = "Etiam ultricies nisi vel augue. Curabitur ullamcorper";
console.log(`Text 1 has ${computeCountUniqueWords(text1)} words`);
console.log(`Text 2 has ${computeCountUniqueWords(text2)} words.`);
const computeCountUniqueWords = (strToExamine) => {
let parts = strToExamine.split(" ");
let validWords = parts.filter((word) => {
return /^w/.test(word);
});
let uniqueWords = new Set(validWords);
return uniqueWords.size;
}
let text1 = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
let text2 = "Etiam ultricies nisi vel augue. Curabitur ullamcorper";
console.log(`Text 1 has ${computeCountUniqueWords(text1)} words`);
console.log(`Text 2 has ${computeCountUniqueWords(text2)} words.`);
javascript regex ecmascript-6
javascript regex ecmascript-6
asked Nov 11 at 14:23
michael.zech
1,6941433
1,6941433
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
up vote
6
down vote
accepted
Two problems
Your code has two problems.
- It does not remove punctuation from the words resulting in the same words not matching. Eg
text1
has 12 unique words not 13. You countdolor
anddolor.
as different words. - You are ignoring capitalization. You would count
Dolor
anddolor
as different words rather than the same.
String.match
Update I was not paying attention on first posting. There is no /*RegExp.match*/
The better solution is to use the String.match to convert the matches to an array and then directly create the set from that. The code is just one line and the performance is 2.5 times faster than using String.replace
as shown.
thus the ideal solution becomes...
const uniqueWords = txt => new Set(txt.toLowerCase().match(/w+/g)).size;
String.replace
You can use a string replace to iterate over RegExp
matches. It's a little hacky but I much prefer it over other methods such a using a RegExp to split the string, or or RegExp.match
,RegExp.exec
which have an ugly interface and are slower than String.replace
.
Converting the text to lowercase using String.toLowerCase
solves the capitalization problem
const countUniqueWords = text => {
const words = new Set();
text.toLowerCase().replace(/w+/g, word => words.add(word));
return words.size;
}
const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
info1.textContent = `A has ${countUniqueWords(a)} unique words`;
info2.textContent = `B has ${countUniqueWords(b)} unique words.`;
<code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
<code id="info1"></code><br>
<code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
<code id="info2"></code>
As a n00b, what's wrong with the straightforwardtext.toLowerCase().match(/w+/g)
?
– JollyJoker
Nov 12 at 9:52
1
@JollyJoker there is nothing wrong with it, don't know what I was thinking,, my answer is flawed. Sorry I will delete it.
– Blindman67
Nov 12 at 14:39
1
@JollyJoker Can't delete the answer, so havre updated the answer correcting the errors...
– Blindman67
Nov 12 at 15:24
add a comment |
up vote
6
down vote
I like the simplicity of your function, it's pretty easy to understand the implementation.
Couple things I would consider:
A function "computeCountUniqueWords" does a very specific task that counts the number of unique words given a string. But depending on the context of this code(for example, where it's being used, maybe just as a utility library), I probably would prefer a more general function that just gets the unique words in an array and implement "computeCountUniqueWords" based on it. It's a little more functional and have a more general utility. For example:
const getUniqueWords = (string) => {
...
return uniqueWords;
}
const computeCountUniqueWords = (string) => {
return getUniqueWords(string).length;
}
Actually a lot of the times, you'll realize that the code reads very well without "computeCountUniqueWords" function by just calling getUniqueWords(paragraph).length
.
Second thing to consider is what type of string data this function will run on. If performance is not a consideration such as we are processing a small number of strings(even possibly in the millions of words), I would keep the function written as is for the sake for readability and simplicity.
But if this is used on the scale of google crawler or something done frequently such as on mouse move event then the implementation as it stands would be inefficient and not ideal. If we think about it, we can do the 3 operations (splitting strings based on spaces, testing if the string is a valid word and remove uniqueness in one loop through the input string). As it stands, it's probably looping through the input string 3 times which can be a big deal for a very large input string or in a DOM environment, it could hurt the frames per second of the page.
add a comment |
up vote
2
down vote
Here's the match
solution that Blindman67 wanted to avoid for reasons still unclear to me.
If you find it hard to read you could split out something like
const words = s => s.match(/w+/g)
or
const lowerWords = s => s.toLowerCase().match(/w+/g)
const countUniqueWords = s => new Set(s.toLowerCase().match(/w+/g)).size
const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
info1.textContent = `A has ${countUniqueWords(a)} unique words`;
info2.textContent = `B has ${countUniqueWords(b)} unique words.`;
<code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
<code id="info1"></code><br>
<code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
<code id="info2"></code>
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207429%2fcompute-count-of-unique-words-using-es6-sets%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
6
down vote
accepted
Two problems
Your code has two problems.
- It does not remove punctuation from the words resulting in the same words not matching. Eg
text1
has 12 unique words not 13. You countdolor
anddolor.
as different words. - You are ignoring capitalization. You would count
Dolor
anddolor
as different words rather than the same.
String.match
Update I was not paying attention on first posting. There is no /*RegExp.match*/
The better solution is to use the String.match to convert the matches to an array and then directly create the set from that. The code is just one line and the performance is 2.5 times faster than using String.replace
as shown.
thus the ideal solution becomes...
const uniqueWords = txt => new Set(txt.toLowerCase().match(/w+/g)).size;
String.replace
You can use a string replace to iterate over RegExp
matches. It's a little hacky but I much prefer it over other methods such a using a RegExp to split the string, or or RegExp.match
,RegExp.exec
which have an ugly interface and are slower than String.replace
.
Converting the text to lowercase using String.toLowerCase
solves the capitalization problem
const countUniqueWords = text => {
const words = new Set();
text.toLowerCase().replace(/w+/g, word => words.add(word));
return words.size;
}
const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
info1.textContent = `A has ${countUniqueWords(a)} unique words`;
info2.textContent = `B has ${countUniqueWords(b)} unique words.`;
<code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
<code id="info1"></code><br>
<code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
<code id="info2"></code>
As a n00b, what's wrong with the straightforwardtext.toLowerCase().match(/w+/g)
?
– JollyJoker
Nov 12 at 9:52
1
@JollyJoker there is nothing wrong with it, don't know what I was thinking,, my answer is flawed. Sorry I will delete it.
– Blindman67
Nov 12 at 14:39
1
@JollyJoker Can't delete the answer, so havre updated the answer correcting the errors...
– Blindman67
Nov 12 at 15:24
add a comment |
up vote
6
down vote
accepted
Two problems
Your code has two problems.
- It does not remove punctuation from the words resulting in the same words not matching. Eg
text1
has 12 unique words not 13. You countdolor
anddolor.
as different words. - You are ignoring capitalization. You would count
Dolor
anddolor
as different words rather than the same.
String.match
Update I was not paying attention on first posting. There is no /*RegExp.match*/
The better solution is to use the String.match to convert the matches to an array and then directly create the set from that. The code is just one line and the performance is 2.5 times faster than using String.replace
as shown.
thus the ideal solution becomes...
const uniqueWords = txt => new Set(txt.toLowerCase().match(/w+/g)).size;
String.replace
You can use a string replace to iterate over RegExp
matches. It's a little hacky but I much prefer it over other methods such a using a RegExp to split the string, or or RegExp.match
,RegExp.exec
which have an ugly interface and are slower than String.replace
.
Converting the text to lowercase using String.toLowerCase
solves the capitalization problem
const countUniqueWords = text => {
const words = new Set();
text.toLowerCase().replace(/w+/g, word => words.add(word));
return words.size;
}
const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
info1.textContent = `A has ${countUniqueWords(a)} unique words`;
info2.textContent = `B has ${countUniqueWords(b)} unique words.`;
<code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
<code id="info1"></code><br>
<code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
<code id="info2"></code>
As a n00b, what's wrong with the straightforwardtext.toLowerCase().match(/w+/g)
?
– JollyJoker
Nov 12 at 9:52
1
@JollyJoker there is nothing wrong with it, don't know what I was thinking,, my answer is flawed. Sorry I will delete it.
– Blindman67
Nov 12 at 14:39
1
@JollyJoker Can't delete the answer, so havre updated the answer correcting the errors...
– Blindman67
Nov 12 at 15:24
add a comment |
up vote
6
down vote
accepted
up vote
6
down vote
accepted
Two problems
Your code has two problems.
- It does not remove punctuation from the words resulting in the same words not matching. Eg
text1
has 12 unique words not 13. You countdolor
anddolor.
as different words. - You are ignoring capitalization. You would count
Dolor
anddolor
as different words rather than the same.
String.match
Update I was not paying attention on first posting. There is no /*RegExp.match*/
The better solution is to use the String.match to convert the matches to an array and then directly create the set from that. The code is just one line and the performance is 2.5 times faster than using String.replace
as shown.
thus the ideal solution becomes...
const uniqueWords = txt => new Set(txt.toLowerCase().match(/w+/g)).size;
String.replace
You can use a string replace to iterate over RegExp
matches. It's a little hacky but I much prefer it over other methods such a using a RegExp to split the string, or or RegExp.match
,RegExp.exec
which have an ugly interface and are slower than String.replace
.
Converting the text to lowercase using String.toLowerCase
solves the capitalization problem
const countUniqueWords = text => {
const words = new Set();
text.toLowerCase().replace(/w+/g, word => words.add(word));
return words.size;
}
const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
info1.textContent = `A has ${countUniqueWords(a)} unique words`;
info2.textContent = `B has ${countUniqueWords(b)} unique words.`;
<code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
<code id="info1"></code><br>
<code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
<code id="info2"></code>
Two problems
Your code has two problems.
- It does not remove punctuation from the words resulting in the same words not matching. Eg
text1
has 12 unique words not 13. You countdolor
anddolor.
as different words. - You are ignoring capitalization. You would count
Dolor
anddolor
as different words rather than the same.
String.match
Update I was not paying attention on first posting. There is no /*RegExp.match*/
The better solution is to use the String.match to convert the matches to an array and then directly create the set from that. The code is just one line and the performance is 2.5 times faster than using String.replace
as shown.
thus the ideal solution becomes...
const uniqueWords = txt => new Set(txt.toLowerCase().match(/w+/g)).size;
String.replace
You can use a string replace to iterate over RegExp
matches. It's a little hacky but I much prefer it over other methods such a using a RegExp to split the string, or or RegExp.match
,RegExp.exec
which have an ugly interface and are slower than String.replace
.
Converting the text to lowercase using String.toLowerCase
solves the capitalization problem
const countUniqueWords = text => {
const words = new Set();
text.toLowerCase().replace(/w+/g, word => words.add(word));
return words.size;
}
const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
info1.textContent = `A has ${countUniqueWords(a)} unique words`;
info2.textContent = `B has ${countUniqueWords(b)} unique words.`;
<code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
<code id="info1"></code><br>
<code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
<code id="info2"></code>
const countUniqueWords = text => {
const words = new Set();
text.toLowerCase().replace(/w+/g, word => words.add(word));
return words.size;
}
const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
info1.textContent = `A has ${countUniqueWords(a)} unique words`;
info2.textContent = `B has ${countUniqueWords(b)} unique words.`;
<code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
<code id="info1"></code><br>
<code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
<code id="info2"></code>
const countUniqueWords = text => {
const words = new Set();
text.toLowerCase().replace(/w+/g, word => words.add(word));
return words.size;
}
const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
info1.textContent = `A has ${countUniqueWords(a)} unique words`;
info2.textContent = `B has ${countUniqueWords(b)} unique words.`;
<code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
<code id="info1"></code><br>
<code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
<code id="info2"></code>
edited Nov 12 at 15:23
answered Nov 11 at 16:59
Blindman67
6,7641521
6,7641521
As a n00b, what's wrong with the straightforwardtext.toLowerCase().match(/w+/g)
?
– JollyJoker
Nov 12 at 9:52
1
@JollyJoker there is nothing wrong with it, don't know what I was thinking,, my answer is flawed. Sorry I will delete it.
– Blindman67
Nov 12 at 14:39
1
@JollyJoker Can't delete the answer, so havre updated the answer correcting the errors...
– Blindman67
Nov 12 at 15:24
add a comment |
As a n00b, what's wrong with the straightforwardtext.toLowerCase().match(/w+/g)
?
– JollyJoker
Nov 12 at 9:52
1
@JollyJoker there is nothing wrong with it, don't know what I was thinking,, my answer is flawed. Sorry I will delete it.
– Blindman67
Nov 12 at 14:39
1
@JollyJoker Can't delete the answer, so havre updated the answer correcting the errors...
– Blindman67
Nov 12 at 15:24
As a n00b, what's wrong with the straightforward
text.toLowerCase().match(/w+/g)
?– JollyJoker
Nov 12 at 9:52
As a n00b, what's wrong with the straightforward
text.toLowerCase().match(/w+/g)
?– JollyJoker
Nov 12 at 9:52
1
1
@JollyJoker there is nothing wrong with it, don't know what I was thinking,, my answer is flawed. Sorry I will delete it.
– Blindman67
Nov 12 at 14:39
@JollyJoker there is nothing wrong with it, don't know what I was thinking,, my answer is flawed. Sorry I will delete it.
– Blindman67
Nov 12 at 14:39
1
1
@JollyJoker Can't delete the answer, so havre updated the answer correcting the errors...
– Blindman67
Nov 12 at 15:24
@JollyJoker Can't delete the answer, so havre updated the answer correcting the errors...
– Blindman67
Nov 12 at 15:24
add a comment |
up vote
6
down vote
I like the simplicity of your function, it's pretty easy to understand the implementation.
Couple things I would consider:
A function "computeCountUniqueWords" does a very specific task that counts the number of unique words given a string. But depending on the context of this code(for example, where it's being used, maybe just as a utility library), I probably would prefer a more general function that just gets the unique words in an array and implement "computeCountUniqueWords" based on it. It's a little more functional and have a more general utility. For example:
const getUniqueWords = (string) => {
...
return uniqueWords;
}
const computeCountUniqueWords = (string) => {
return getUniqueWords(string).length;
}
Actually a lot of the times, you'll realize that the code reads very well without "computeCountUniqueWords" function by just calling getUniqueWords(paragraph).length
.
Second thing to consider is what type of string data this function will run on. If performance is not a consideration such as we are processing a small number of strings(even possibly in the millions of words), I would keep the function written as is for the sake for readability and simplicity.
But if this is used on the scale of google crawler or something done frequently such as on mouse move event then the implementation as it stands would be inefficient and not ideal. If we think about it, we can do the 3 operations (splitting strings based on spaces, testing if the string is a valid word and remove uniqueness in one loop through the input string). As it stands, it's probably looping through the input string 3 times which can be a big deal for a very large input string or in a DOM environment, it could hurt the frames per second of the page.
add a comment |
up vote
6
down vote
I like the simplicity of your function, it's pretty easy to understand the implementation.
Couple things I would consider:
A function "computeCountUniqueWords" does a very specific task that counts the number of unique words given a string. But depending on the context of this code(for example, where it's being used, maybe just as a utility library), I probably would prefer a more general function that just gets the unique words in an array and implement "computeCountUniqueWords" based on it. It's a little more functional and have a more general utility. For example:
const getUniqueWords = (string) => {
...
return uniqueWords;
}
const computeCountUniqueWords = (string) => {
return getUniqueWords(string).length;
}
Actually a lot of the times, you'll realize that the code reads very well without "computeCountUniqueWords" function by just calling getUniqueWords(paragraph).length
.
Second thing to consider is what type of string data this function will run on. If performance is not a consideration such as we are processing a small number of strings(even possibly in the millions of words), I would keep the function written as is for the sake for readability and simplicity.
But if this is used on the scale of google crawler or something done frequently such as on mouse move event then the implementation as it stands would be inefficient and not ideal. If we think about it, we can do the 3 operations (splitting strings based on spaces, testing if the string is a valid word and remove uniqueness in one loop through the input string). As it stands, it's probably looping through the input string 3 times which can be a big deal for a very large input string or in a DOM environment, it could hurt the frames per second of the page.
add a comment |
up vote
6
down vote
up vote
6
down vote
I like the simplicity of your function, it's pretty easy to understand the implementation.
Couple things I would consider:
A function "computeCountUniqueWords" does a very specific task that counts the number of unique words given a string. But depending on the context of this code(for example, where it's being used, maybe just as a utility library), I probably would prefer a more general function that just gets the unique words in an array and implement "computeCountUniqueWords" based on it. It's a little more functional and have a more general utility. For example:
const getUniqueWords = (string) => {
...
return uniqueWords;
}
const computeCountUniqueWords = (string) => {
return getUniqueWords(string).length;
}
Actually a lot of the times, you'll realize that the code reads very well without "computeCountUniqueWords" function by just calling getUniqueWords(paragraph).length
.
Second thing to consider is what type of string data this function will run on. If performance is not a consideration such as we are processing a small number of strings(even possibly in the millions of words), I would keep the function written as is for the sake for readability and simplicity.
But if this is used on the scale of google crawler or something done frequently such as on mouse move event then the implementation as it stands would be inefficient and not ideal. If we think about it, we can do the 3 operations (splitting strings based on spaces, testing if the string is a valid word and remove uniqueness in one loop through the input string). As it stands, it's probably looping through the input string 3 times which can be a big deal for a very large input string or in a DOM environment, it could hurt the frames per second of the page.
I like the simplicity of your function, it's pretty easy to understand the implementation.
Couple things I would consider:
A function "computeCountUniqueWords" does a very specific task that counts the number of unique words given a string. But depending on the context of this code(for example, where it's being used, maybe just as a utility library), I probably would prefer a more general function that just gets the unique words in an array and implement "computeCountUniqueWords" based on it. It's a little more functional and have a more general utility. For example:
const getUniqueWords = (string) => {
...
return uniqueWords;
}
const computeCountUniqueWords = (string) => {
return getUniqueWords(string).length;
}
Actually a lot of the times, you'll realize that the code reads very well without "computeCountUniqueWords" function by just calling getUniqueWords(paragraph).length
.
Second thing to consider is what type of string data this function will run on. If performance is not a consideration such as we are processing a small number of strings(even possibly in the millions of words), I would keep the function written as is for the sake for readability and simplicity.
But if this is used on the scale of google crawler or something done frequently such as on mouse move event then the implementation as it stands would be inefficient and not ideal. If we think about it, we can do the 3 operations (splitting strings based on spaces, testing if the string is a valid word and remove uniqueness in one loop through the input string). As it stands, it's probably looping through the input string 3 times which can be a big deal for a very large input string or in a DOM environment, it could hurt the frames per second of the page.
answered Nov 11 at 16:59
Frank
30614
30614
add a comment |
add a comment |
up vote
2
down vote
Here's the match
solution that Blindman67 wanted to avoid for reasons still unclear to me.
If you find it hard to read you could split out something like
const words = s => s.match(/w+/g)
or
const lowerWords = s => s.toLowerCase().match(/w+/g)
const countUniqueWords = s => new Set(s.toLowerCase().match(/w+/g)).size
const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
info1.textContent = `A has ${countUniqueWords(a)} unique words`;
info2.textContent = `B has ${countUniqueWords(b)} unique words.`;
<code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
<code id="info1"></code><br>
<code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
<code id="info2"></code>
add a comment |
up vote
2
down vote
Here's the match
solution that Blindman67 wanted to avoid for reasons still unclear to me.
If you find it hard to read you could split out something like
const words = s => s.match(/w+/g)
or
const lowerWords = s => s.toLowerCase().match(/w+/g)
const countUniqueWords = s => new Set(s.toLowerCase().match(/w+/g)).size
const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
info1.textContent = `A has ${countUniqueWords(a)} unique words`;
info2.textContent = `B has ${countUniqueWords(b)} unique words.`;
<code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
<code id="info1"></code><br>
<code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
<code id="info2"></code>
add a comment |
up vote
2
down vote
up vote
2
down vote
Here's the match
solution that Blindman67 wanted to avoid for reasons still unclear to me.
If you find it hard to read you could split out something like
const words = s => s.match(/w+/g)
or
const lowerWords = s => s.toLowerCase().match(/w+/g)
const countUniqueWords = s => new Set(s.toLowerCase().match(/w+/g)).size
const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
info1.textContent = `A has ${countUniqueWords(a)} unique words`;
info2.textContent = `B has ${countUniqueWords(b)} unique words.`;
<code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
<code id="info1"></code><br>
<code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
<code id="info2"></code>
Here's the match
solution that Blindman67 wanted to avoid for reasons still unclear to me.
If you find it hard to read you could split out something like
const words = s => s.match(/w+/g)
or
const lowerWords = s => s.toLowerCase().match(/w+/g)
const countUniqueWords = s => new Set(s.toLowerCase().match(/w+/g)).size
const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
info1.textContent = `A has ${countUniqueWords(a)} unique words`;
info2.textContent = `B has ${countUniqueWords(b)} unique words.`;
<code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
<code id="info1"></code><br>
<code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
<code id="info2"></code>
const countUniqueWords = s => new Set(s.toLowerCase().match(/w+/g)).size
const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
info1.textContent = `A has ${countUniqueWords(a)} unique words`;
info2.textContent = `B has ${countUniqueWords(b)} unique words.`;
<code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
<code id="info1"></code><br>
<code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
<code id="info2"></code>
const countUniqueWords = s => new Set(s.toLowerCase().match(/w+/g)).size
const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
info1.textContent = `A has ${countUniqueWords(a)} unique words`;
info2.textContent = `B has ${countUniqueWords(b)} unique words.`;
<code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
<code id="info1"></code><br>
<code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
<code id="info2"></code>
answered Nov 12 at 11:43
JollyJoker
33116
33116
add a comment |
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207429%2fcompute-count-of-unique-words-using-es6-sets%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown