Compute count of unique words using ES6 sets











up vote
6
down vote

favorite












I have written a function which counts how many unique words are included in a given text.



The source-code:






const computeCountUniqueWords = (strToExamine) => {
let parts = strToExamine.split(" ");

let validWords = parts.filter((word) => {
return /^w/.test(word);
});

let uniqueWords = new Set(validWords);

return uniqueWords.size;
}

let text1 = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
let text2 = "Etiam ultricies nisi vel augue. Curabitur ullamcorper";

console.log(`Text 1 has ${computeCountUniqueWords(text1)} words`);
console.log(`Text 2 has ${computeCountUniqueWords(text2)} words.`);





I think it has become quite neat and short.



Nevertheless: Is there a better way to solve the described task?



Moveover: Is my checking with the regular expression sufficient? Or should it be enhanced?



Looking forward to reading your answers.










share|improve this question


























    up vote
    6
    down vote

    favorite












    I have written a function which counts how many unique words are included in a given text.



    The source-code:






    const computeCountUniqueWords = (strToExamine) => {
    let parts = strToExamine.split(" ");

    let validWords = parts.filter((word) => {
    return /^w/.test(word);
    });

    let uniqueWords = new Set(validWords);

    return uniqueWords.size;
    }

    let text1 = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
    let text2 = "Etiam ultricies nisi vel augue. Curabitur ullamcorper";

    console.log(`Text 1 has ${computeCountUniqueWords(text1)} words`);
    console.log(`Text 2 has ${computeCountUniqueWords(text2)} words.`);





    I think it has become quite neat and short.



    Nevertheless: Is there a better way to solve the described task?



    Moveover: Is my checking with the regular expression sufficient? Or should it be enhanced?



    Looking forward to reading your answers.










    share|improve this question
























      up vote
      6
      down vote

      favorite









      up vote
      6
      down vote

      favorite











      I have written a function which counts how many unique words are included in a given text.



      The source-code:






      const computeCountUniqueWords = (strToExamine) => {
      let parts = strToExamine.split(" ");

      let validWords = parts.filter((word) => {
      return /^w/.test(word);
      });

      let uniqueWords = new Set(validWords);

      return uniqueWords.size;
      }

      let text1 = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
      let text2 = "Etiam ultricies nisi vel augue. Curabitur ullamcorper";

      console.log(`Text 1 has ${computeCountUniqueWords(text1)} words`);
      console.log(`Text 2 has ${computeCountUniqueWords(text2)} words.`);





      I think it has become quite neat and short.



      Nevertheless: Is there a better way to solve the described task?



      Moveover: Is my checking with the regular expression sufficient? Or should it be enhanced?



      Looking forward to reading your answers.










      share|improve this question













      I have written a function which counts how many unique words are included in a given text.



      The source-code:






      const computeCountUniqueWords = (strToExamine) => {
      let parts = strToExamine.split(" ");

      let validWords = parts.filter((word) => {
      return /^w/.test(word);
      });

      let uniqueWords = new Set(validWords);

      return uniqueWords.size;
      }

      let text1 = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
      let text2 = "Etiam ultricies nisi vel augue. Curabitur ullamcorper";

      console.log(`Text 1 has ${computeCountUniqueWords(text1)} words`);
      console.log(`Text 2 has ${computeCountUniqueWords(text2)} words.`);





      I think it has become quite neat and short.



      Nevertheless: Is there a better way to solve the described task?



      Moveover: Is my checking with the regular expression sufficient? Or should it be enhanced?



      Looking forward to reading your answers.






      const computeCountUniqueWords = (strToExamine) => {
      let parts = strToExamine.split(" ");

      let validWords = parts.filter((word) => {
      return /^w/.test(word);
      });

      let uniqueWords = new Set(validWords);

      return uniqueWords.size;
      }

      let text1 = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
      let text2 = "Etiam ultricies nisi vel augue. Curabitur ullamcorper";

      console.log(`Text 1 has ${computeCountUniqueWords(text1)} words`);
      console.log(`Text 2 has ${computeCountUniqueWords(text2)} words.`);





      const computeCountUniqueWords = (strToExamine) => {
      let parts = strToExamine.split(" ");

      let validWords = parts.filter((word) => {
      return /^w/.test(word);
      });

      let uniqueWords = new Set(validWords);

      return uniqueWords.size;
      }

      let text1 = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
      let text2 = "Etiam ultricies nisi vel augue. Curabitur ullamcorper";

      console.log(`Text 1 has ${computeCountUniqueWords(text1)} words`);
      console.log(`Text 2 has ${computeCountUniqueWords(text2)} words.`);






      javascript regex ecmascript-6






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 11 at 14:23









      michael.zech

      1,6941433




      1,6941433






















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          6
          down vote



          accepted










          Two problems



          Your code has two problems.




          1. It does not remove punctuation from the words resulting in the same words not matching. Eg text1 has 12 unique words not 13. You count dolor and dolor. as different words.

          2. You are ignoring capitalization. You would count Dolor and dolor as different words rather than the same.


          String.match



          Update I was not paying attention on first posting. There is no /*RegExp.match*/



          The better solution is to use the String.match to convert the matches to an array and then directly create the set from that. The code is just one line and the performance is 2.5 times faster than using String.replace as shown.



          thus the ideal solution becomes...



          const uniqueWords = txt => new Set(txt.toLowerCase().match(/w+/g)).size;


          String.replace



          You can use a string replace to iterate over RegExp matches. It's a little hacky but I much prefer it over other methods such a using a RegExp to split the string, or RegExp.match, or RegExp.exec which have an ugly interface and are slower than String.replace.



          Converting the text to lowercase using String.toLowerCase solves the capitalization problem






          const countUniqueWords = text => {
          const words = new Set();
          text.toLowerCase().replace(/w+/g, word => words.add(word));
          return words.size;
          }




          const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
          const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
          info1.textContent = `A has ${countUniqueWords(a)} unique words`;
          info2.textContent = `B has ${countUniqueWords(b)} unique words.`;

           
          <code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
          <code id="info1"></code><br>
          <code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
          <code id="info2"></code>








          share|improve this answer























          • As a n00b, what's wrong with the straightforward text.toLowerCase().match(/w+/g)?
            – JollyJoker
            Nov 12 at 9:52






          • 1




            @JollyJoker there is nothing wrong with it, don't know what I was thinking,, my answer is flawed. Sorry I will delete it.
            – Blindman67
            Nov 12 at 14:39






          • 1




            @JollyJoker Can't delete the answer, so havre updated the answer correcting the errors...
            – Blindman67
            Nov 12 at 15:24


















          up vote
          6
          down vote













          I like the simplicity of your function, it's pretty easy to understand the implementation.



          Couple things I would consider:



          A function "computeCountUniqueWords" does a very specific task that counts the number of unique words given a string. But depending on the context of this code(for example, where it's being used, maybe just as a utility library), I probably would prefer a more general function that just gets the unique words in an array and implement "computeCountUniqueWords" based on it. It's a little more functional and have a more general utility. For example:



          const getUniqueWords = (string) => {
          ...
          return uniqueWords;
          }

          const computeCountUniqueWords = (string) => {
          return getUniqueWords(string).length;
          }


          Actually a lot of the times, you'll realize that the code reads very well without "computeCountUniqueWords" function by just calling getUniqueWords(paragraph).length.



          Second thing to consider is what type of string data this function will run on. If performance is not a consideration such as we are processing a small number of strings(even possibly in the millions of words), I would keep the function written as is for the sake for readability and simplicity.



          But if this is used on the scale of google crawler or something done frequently such as on mouse move event then the implementation as it stands would be inefficient and not ideal. If we think about it, we can do the 3 operations (splitting strings based on spaces, testing if the string is a valid word and remove uniqueness in one loop through the input string). As it stands, it's probably looping through the input string 3 times which can be a big deal for a very large input string or in a DOM environment, it could hurt the frames per second of the page.






          share|improve this answer




























            up vote
            2
            down vote













            Here's the match solution that Blindman67 wanted to avoid for reasons still unclear to me.



            If you find it hard to read you could split out something like
            const words = s => s.match(/w+/g)
            or
            const lowerWords = s => s.toLowerCase().match(/w+/g)






            const countUniqueWords = s => new Set(s.toLowerCase().match(/w+/g)).size


            const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
            const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
            info1.textContent = `A has ${countUniqueWords(a)} unique words`;
            info2.textContent = `B has ${countUniqueWords(b)} unique words.`;

             
            <code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
            <code id="info1"></code><br>
            <code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
            <code id="info2"></code>








            share|improve this answer





















              Your Answer





              StackExchange.ifUsing("editor", function () {
              return StackExchange.using("mathjaxEditing", function () {
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
              });
              });
              }, "mathjax-editing");

              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "196"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207429%2fcompute-count-of-unique-words-using-es6-sets%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              6
              down vote



              accepted










              Two problems



              Your code has two problems.




              1. It does not remove punctuation from the words resulting in the same words not matching. Eg text1 has 12 unique words not 13. You count dolor and dolor. as different words.

              2. You are ignoring capitalization. You would count Dolor and dolor as different words rather than the same.


              String.match



              Update I was not paying attention on first posting. There is no /*RegExp.match*/



              The better solution is to use the String.match to convert the matches to an array and then directly create the set from that. The code is just one line and the performance is 2.5 times faster than using String.replace as shown.



              thus the ideal solution becomes...



              const uniqueWords = txt => new Set(txt.toLowerCase().match(/w+/g)).size;


              String.replace



              You can use a string replace to iterate over RegExp matches. It's a little hacky but I much prefer it over other methods such a using a RegExp to split the string, or RegExp.match, or RegExp.exec which have an ugly interface and are slower than String.replace.



              Converting the text to lowercase using String.toLowerCase solves the capitalization problem






              const countUniqueWords = text => {
              const words = new Set();
              text.toLowerCase().replace(/w+/g, word => words.add(word));
              return words.size;
              }




              const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
              const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
              info1.textContent = `A has ${countUniqueWords(a)} unique words`;
              info2.textContent = `B has ${countUniqueWords(b)} unique words.`;

               
              <code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
              <code id="info1"></code><br>
              <code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
              <code id="info2"></code>








              share|improve this answer























              • As a n00b, what's wrong with the straightforward text.toLowerCase().match(/w+/g)?
                – JollyJoker
                Nov 12 at 9:52






              • 1




                @JollyJoker there is nothing wrong with it, don't know what I was thinking,, my answer is flawed. Sorry I will delete it.
                – Blindman67
                Nov 12 at 14:39






              • 1




                @JollyJoker Can't delete the answer, so havre updated the answer correcting the errors...
                – Blindman67
                Nov 12 at 15:24















              up vote
              6
              down vote



              accepted










              Two problems



              Your code has two problems.




              1. It does not remove punctuation from the words resulting in the same words not matching. Eg text1 has 12 unique words not 13. You count dolor and dolor. as different words.

              2. You are ignoring capitalization. You would count Dolor and dolor as different words rather than the same.


              String.match



              Update I was not paying attention on first posting. There is no /*RegExp.match*/



              The better solution is to use the String.match to convert the matches to an array and then directly create the set from that. The code is just one line and the performance is 2.5 times faster than using String.replace as shown.



              thus the ideal solution becomes...



              const uniqueWords = txt => new Set(txt.toLowerCase().match(/w+/g)).size;


              String.replace



              You can use a string replace to iterate over RegExp matches. It's a little hacky but I much prefer it over other methods such a using a RegExp to split the string, or RegExp.match, or RegExp.exec which have an ugly interface and are slower than String.replace.



              Converting the text to lowercase using String.toLowerCase solves the capitalization problem






              const countUniqueWords = text => {
              const words = new Set();
              text.toLowerCase().replace(/w+/g, word => words.add(word));
              return words.size;
              }




              const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
              const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
              info1.textContent = `A has ${countUniqueWords(a)} unique words`;
              info2.textContent = `B has ${countUniqueWords(b)} unique words.`;

               
              <code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
              <code id="info1"></code><br>
              <code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
              <code id="info2"></code>








              share|improve this answer























              • As a n00b, what's wrong with the straightforward text.toLowerCase().match(/w+/g)?
                – JollyJoker
                Nov 12 at 9:52






              • 1




                @JollyJoker there is nothing wrong with it, don't know what I was thinking,, my answer is flawed. Sorry I will delete it.
                – Blindman67
                Nov 12 at 14:39






              • 1




                @JollyJoker Can't delete the answer, so havre updated the answer correcting the errors...
                – Blindman67
                Nov 12 at 15:24













              up vote
              6
              down vote



              accepted







              up vote
              6
              down vote



              accepted






              Two problems



              Your code has two problems.




              1. It does not remove punctuation from the words resulting in the same words not matching. Eg text1 has 12 unique words not 13. You count dolor and dolor. as different words.

              2. You are ignoring capitalization. You would count Dolor and dolor as different words rather than the same.


              String.match



              Update I was not paying attention on first posting. There is no /*RegExp.match*/



              The better solution is to use the String.match to convert the matches to an array and then directly create the set from that. The code is just one line and the performance is 2.5 times faster than using String.replace as shown.



              thus the ideal solution becomes...



              const uniqueWords = txt => new Set(txt.toLowerCase().match(/w+/g)).size;


              String.replace



              You can use a string replace to iterate over RegExp matches. It's a little hacky but I much prefer it over other methods such a using a RegExp to split the string, or RegExp.match, or RegExp.exec which have an ugly interface and are slower than String.replace.



              Converting the text to lowercase using String.toLowerCase solves the capitalization problem






              const countUniqueWords = text => {
              const words = new Set();
              text.toLowerCase().replace(/w+/g, word => words.add(word));
              return words.size;
              }




              const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
              const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
              info1.textContent = `A has ${countUniqueWords(a)} unique words`;
              info2.textContent = `B has ${countUniqueWords(b)} unique words.`;

               
              <code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
              <code id="info1"></code><br>
              <code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
              <code id="info2"></code>








              share|improve this answer














              Two problems



              Your code has two problems.




              1. It does not remove punctuation from the words resulting in the same words not matching. Eg text1 has 12 unique words not 13. You count dolor and dolor. as different words.

              2. You are ignoring capitalization. You would count Dolor and dolor as different words rather than the same.


              String.match



              Update I was not paying attention on first posting. There is no /*RegExp.match*/



              The better solution is to use the String.match to convert the matches to an array and then directly create the set from that. The code is just one line and the performance is 2.5 times faster than using String.replace as shown.



              thus the ideal solution becomes...



              const uniqueWords = txt => new Set(txt.toLowerCase().match(/w+/g)).size;


              String.replace



              You can use a string replace to iterate over RegExp matches. It's a little hacky but I much prefer it over other methods such a using a RegExp to split the string, or RegExp.match, or RegExp.exec which have an ugly interface and are slower than String.replace.



              Converting the text to lowercase using String.toLowerCase solves the capitalization problem






              const countUniqueWords = text => {
              const words = new Set();
              text.toLowerCase().replace(/w+/g, word => words.add(word));
              return words.size;
              }




              const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
              const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
              info1.textContent = `A has ${countUniqueWords(a)} unique words`;
              info2.textContent = `B has ${countUniqueWords(b)} unique words.`;

               
              <code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
              <code id="info1"></code><br>
              <code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
              <code id="info2"></code>








              const countUniqueWords = text => {
              const words = new Set();
              text.toLowerCase().replace(/w+/g, word => words.add(word));
              return words.size;
              }




              const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
              const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
              info1.textContent = `A has ${countUniqueWords(a)} unique words`;
              info2.textContent = `B has ${countUniqueWords(b)} unique words.`;

               
              <code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
              <code id="info1"></code><br>
              <code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
              <code id="info2"></code>





              const countUniqueWords = text => {
              const words = new Set();
              text.toLowerCase().replace(/w+/g, word => words.add(word));
              return words.size;
              }




              const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
              const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
              info1.textContent = `A has ${countUniqueWords(a)} unique words`;
              info2.textContent = `B has ${countUniqueWords(b)} unique words.`;

               
              <code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
              <code id="info1"></code><br>
              <code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
              <code id="info2"></code>






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 12 at 15:23

























              answered Nov 11 at 16:59









              Blindman67

              6,7641521




              6,7641521












              • As a n00b, what's wrong with the straightforward text.toLowerCase().match(/w+/g)?
                – JollyJoker
                Nov 12 at 9:52






              • 1




                @JollyJoker there is nothing wrong with it, don't know what I was thinking,, my answer is flawed. Sorry I will delete it.
                – Blindman67
                Nov 12 at 14:39






              • 1




                @JollyJoker Can't delete the answer, so havre updated the answer correcting the errors...
                – Blindman67
                Nov 12 at 15:24


















              • As a n00b, what's wrong with the straightforward text.toLowerCase().match(/w+/g)?
                – JollyJoker
                Nov 12 at 9:52






              • 1




                @JollyJoker there is nothing wrong with it, don't know what I was thinking,, my answer is flawed. Sorry I will delete it.
                – Blindman67
                Nov 12 at 14:39






              • 1




                @JollyJoker Can't delete the answer, so havre updated the answer correcting the errors...
                – Blindman67
                Nov 12 at 15:24
















              As a n00b, what's wrong with the straightforward text.toLowerCase().match(/w+/g)?
              – JollyJoker
              Nov 12 at 9:52




              As a n00b, what's wrong with the straightforward text.toLowerCase().match(/w+/g)?
              – JollyJoker
              Nov 12 at 9:52




              1




              1




              @JollyJoker there is nothing wrong with it, don't know what I was thinking,, my answer is flawed. Sorry I will delete it.
              – Blindman67
              Nov 12 at 14:39




              @JollyJoker there is nothing wrong with it, don't know what I was thinking,, my answer is flawed. Sorry I will delete it.
              – Blindman67
              Nov 12 at 14:39




              1




              1




              @JollyJoker Can't delete the answer, so havre updated the answer correcting the errors...
              – Blindman67
              Nov 12 at 15:24




              @JollyJoker Can't delete the answer, so havre updated the answer correcting the errors...
              – Blindman67
              Nov 12 at 15:24












              up vote
              6
              down vote













              I like the simplicity of your function, it's pretty easy to understand the implementation.



              Couple things I would consider:



              A function "computeCountUniqueWords" does a very specific task that counts the number of unique words given a string. But depending on the context of this code(for example, where it's being used, maybe just as a utility library), I probably would prefer a more general function that just gets the unique words in an array and implement "computeCountUniqueWords" based on it. It's a little more functional and have a more general utility. For example:



              const getUniqueWords = (string) => {
              ...
              return uniqueWords;
              }

              const computeCountUniqueWords = (string) => {
              return getUniqueWords(string).length;
              }


              Actually a lot of the times, you'll realize that the code reads very well without "computeCountUniqueWords" function by just calling getUniqueWords(paragraph).length.



              Second thing to consider is what type of string data this function will run on. If performance is not a consideration such as we are processing a small number of strings(even possibly in the millions of words), I would keep the function written as is for the sake for readability and simplicity.



              But if this is used on the scale of google crawler or something done frequently such as on mouse move event then the implementation as it stands would be inefficient and not ideal. If we think about it, we can do the 3 operations (splitting strings based on spaces, testing if the string is a valid word and remove uniqueness in one loop through the input string). As it stands, it's probably looping through the input string 3 times which can be a big deal for a very large input string or in a DOM environment, it could hurt the frames per second of the page.






              share|improve this answer

























                up vote
                6
                down vote













                I like the simplicity of your function, it's pretty easy to understand the implementation.



                Couple things I would consider:



                A function "computeCountUniqueWords" does a very specific task that counts the number of unique words given a string. But depending on the context of this code(for example, where it's being used, maybe just as a utility library), I probably would prefer a more general function that just gets the unique words in an array and implement "computeCountUniqueWords" based on it. It's a little more functional and have a more general utility. For example:



                const getUniqueWords = (string) => {
                ...
                return uniqueWords;
                }

                const computeCountUniqueWords = (string) => {
                return getUniqueWords(string).length;
                }


                Actually a lot of the times, you'll realize that the code reads very well without "computeCountUniqueWords" function by just calling getUniqueWords(paragraph).length.



                Second thing to consider is what type of string data this function will run on. If performance is not a consideration such as we are processing a small number of strings(even possibly in the millions of words), I would keep the function written as is for the sake for readability and simplicity.



                But if this is used on the scale of google crawler or something done frequently such as on mouse move event then the implementation as it stands would be inefficient and not ideal. If we think about it, we can do the 3 operations (splitting strings based on spaces, testing if the string is a valid word and remove uniqueness in one loop through the input string). As it stands, it's probably looping through the input string 3 times which can be a big deal for a very large input string or in a DOM environment, it could hurt the frames per second of the page.






                share|improve this answer























                  up vote
                  6
                  down vote










                  up vote
                  6
                  down vote









                  I like the simplicity of your function, it's pretty easy to understand the implementation.



                  Couple things I would consider:



                  A function "computeCountUniqueWords" does a very specific task that counts the number of unique words given a string. But depending on the context of this code(for example, where it's being used, maybe just as a utility library), I probably would prefer a more general function that just gets the unique words in an array and implement "computeCountUniqueWords" based on it. It's a little more functional and have a more general utility. For example:



                  const getUniqueWords = (string) => {
                  ...
                  return uniqueWords;
                  }

                  const computeCountUniqueWords = (string) => {
                  return getUniqueWords(string).length;
                  }


                  Actually a lot of the times, you'll realize that the code reads very well without "computeCountUniqueWords" function by just calling getUniqueWords(paragraph).length.



                  Second thing to consider is what type of string data this function will run on. If performance is not a consideration such as we are processing a small number of strings(even possibly in the millions of words), I would keep the function written as is for the sake for readability and simplicity.



                  But if this is used on the scale of google crawler or something done frequently such as on mouse move event then the implementation as it stands would be inefficient and not ideal. If we think about it, we can do the 3 operations (splitting strings based on spaces, testing if the string is a valid word and remove uniqueness in one loop through the input string). As it stands, it's probably looping through the input string 3 times which can be a big deal for a very large input string or in a DOM environment, it could hurt the frames per second of the page.






                  share|improve this answer












                  I like the simplicity of your function, it's pretty easy to understand the implementation.



                  Couple things I would consider:



                  A function "computeCountUniqueWords" does a very specific task that counts the number of unique words given a string. But depending on the context of this code(for example, where it's being used, maybe just as a utility library), I probably would prefer a more general function that just gets the unique words in an array and implement "computeCountUniqueWords" based on it. It's a little more functional and have a more general utility. For example:



                  const getUniqueWords = (string) => {
                  ...
                  return uniqueWords;
                  }

                  const computeCountUniqueWords = (string) => {
                  return getUniqueWords(string).length;
                  }


                  Actually a lot of the times, you'll realize that the code reads very well without "computeCountUniqueWords" function by just calling getUniqueWords(paragraph).length.



                  Second thing to consider is what type of string data this function will run on. If performance is not a consideration such as we are processing a small number of strings(even possibly in the millions of words), I would keep the function written as is for the sake for readability and simplicity.



                  But if this is used on the scale of google crawler or something done frequently such as on mouse move event then the implementation as it stands would be inefficient and not ideal. If we think about it, we can do the 3 operations (splitting strings based on spaces, testing if the string is a valid word and remove uniqueness in one loop through the input string). As it stands, it's probably looping through the input string 3 times which can be a big deal for a very large input string or in a DOM environment, it could hurt the frames per second of the page.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 11 at 16:59









                  Frank

                  30614




                  30614






















                      up vote
                      2
                      down vote













                      Here's the match solution that Blindman67 wanted to avoid for reasons still unclear to me.



                      If you find it hard to read you could split out something like
                      const words = s => s.match(/w+/g)
                      or
                      const lowerWords = s => s.toLowerCase().match(/w+/g)






                      const countUniqueWords = s => new Set(s.toLowerCase().match(/w+/g)).size


                      const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
                      const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
                      info1.textContent = `A has ${countUniqueWords(a)} unique words`;
                      info2.textContent = `B has ${countUniqueWords(b)} unique words.`;

                       
                      <code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
                      <code id="info1"></code><br>
                      <code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
                      <code id="info2"></code>








                      share|improve this answer

























                        up vote
                        2
                        down vote













                        Here's the match solution that Blindman67 wanted to avoid for reasons still unclear to me.



                        If you find it hard to read you could split out something like
                        const words = s => s.match(/w+/g)
                        or
                        const lowerWords = s => s.toLowerCase().match(/w+/g)






                        const countUniqueWords = s => new Set(s.toLowerCase().match(/w+/g)).size


                        const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
                        const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
                        info1.textContent = `A has ${countUniqueWords(a)} unique words`;
                        info2.textContent = `B has ${countUniqueWords(b)} unique words.`;

                         
                        <code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
                        <code id="info1"></code><br>
                        <code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
                        <code id="info2"></code>








                        share|improve this answer























                          up vote
                          2
                          down vote










                          up vote
                          2
                          down vote









                          Here's the match solution that Blindman67 wanted to avoid for reasons still unclear to me.



                          If you find it hard to read you could split out something like
                          const words = s => s.match(/w+/g)
                          or
                          const lowerWords = s => s.toLowerCase().match(/w+/g)






                          const countUniqueWords = s => new Set(s.toLowerCase().match(/w+/g)).size


                          const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
                          const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
                          info1.textContent = `A has ${countUniqueWords(a)} unique words`;
                          info2.textContent = `B has ${countUniqueWords(b)} unique words.`;

                           
                          <code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
                          <code id="info1"></code><br>
                          <code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
                          <code id="info2"></code>








                          share|improve this answer












                          Here's the match solution that Blindman67 wanted to avoid for reasons still unclear to me.



                          If you find it hard to read you could split out something like
                          const words = s => s.match(/w+/g)
                          or
                          const lowerWords = s => s.toLowerCase().match(/w+/g)






                          const countUniqueWords = s => new Set(s.toLowerCase().match(/w+/g)).size


                          const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
                          const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
                          info1.textContent = `A has ${countUniqueWords(a)} unique words`;
                          info2.textContent = `B has ${countUniqueWords(b)} unique words.`;

                           
                          <code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
                          <code id="info1"></code><br>
                          <code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
                          <code id="info2"></code>








                          const countUniqueWords = s => new Set(s.toLowerCase().match(/w+/g)).size


                          const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
                          const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
                          info1.textContent = `A has ${countUniqueWords(a)} unique words`;
                          info2.textContent = `B has ${countUniqueWords(b)} unique words.`;

                           
                          <code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
                          <code id="info1"></code><br>
                          <code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
                          <code id="info2"></code>





                          const countUniqueWords = s => new Set(s.toLowerCase().match(/w+/g)).size


                          const a = "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor.";
                          const b = "Etiam ultricies nisi vel augue. Curabitur ullamcorper.";
                          info1.textContent = `A has ${countUniqueWords(a)} unique words`;
                          info2.textContent = `B has ${countUniqueWords(b)} unique words.`;

                           
                          <code>A: "Lorem ipsum dolor sit amet consectetuer adipiscing elit aenean commodo ligula eget dolor."</code></br>
                          <code id="info1"></code><br>
                          <code>B: "Etiam ultricies nisi vel augue. Curabitur ullamcorper."</code></br>
                          <code id="info2"></code>






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 12 at 11:43









                          JollyJoker

                          33116




                          33116






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Code Review Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.





                              Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                              Please pay close attention to the following guidance:


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207429%2fcompute-count-of-unique-words-using-es6-sets%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Florida Star v. B. J. F.

                              Danny Elfman

                              Retrieve a Users Dashboard in Tumblr with R and TumblR. Oauth Issues