Only href=“#”, no onclick(), how do I load this in script?












0















I'm in the process of writing a scraper for the articles on the site https://www.welt.de. I'd also like to include the comments. However, when loading the page, not all comments are loaded automatically. Instead one has to click on a link to load more comments, until at some point, all are loaded.



Eg: https://www.welt.de/finanzen/immobilien/article183878020/Bundesbank-sieht-im-Immobilienboom-ein-Stabilitaetsrisiko.html



When you scroll down, there appears a surface "MEHR KOMMENTARE ANZEIGEN" (German for 'show more comments').



This link looks like:



<div href="#" style="text-align: center; height: 44px; cursor: pointer;">
<a style="font-size: 0.6875rem; font-family: ffmark, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-weight: 800; color: rgb(0, 57, 91); line-height: 5;">
<span style="font-size: 0.6875rem; font-family: ffmark, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-weight: 500; margin-right: 0.625rem; text-align: right; color: rgb(120, 120, 120);">
MEHR KOMMENTARE ANZEIGEN
<span style="width: 14px; height: 8px; margin: 0px 0px 0px 0.625rem; padding-top: 0px; display: inline-block; vertical-align: initial;">
<svg viewBox="0 0 15 9" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
<g transform="translate(-608.000000, -4318.000000)" fill="#787878">
<polygon transform="translate(615.205882, 4322.852941) rotate(-90.000000) translate(-615.205882, -4322.852941) " points="618.264706 4315.79412 611.205882 4322.85353 618.264706 4329.91176 619.205882 4328.97059 613.088824 4322.85353 619.205882 4316.73529">
</polygon>
</g>
</g>
</svg>
</span>
</span>
</a>
</div>


However, I do not know how to load this link in a script?



I understand that href="#" is used when a link is handled by javascript and that it is bad style, as it is only used to change the appearance of the mouse, for which there are other methods.



But where is the onClick() method? Kinda dumbfoundead here...










share|improve this question

























  • If there's no onclick then I'd guess that a click handler is registered somewhere in the JavaScript the page loads. Any idea what JavaScript frameworks (if any) the page uses?

    – phuzi
    Nov 15 '18 at 14:44











  • There's like 20 different script files those pages load. All the event handlers will be there somewhere. But as elken shown below, if you are able to extract all the relevant API endpoints, using those will be way better than actually scraping the site. Be mindful of copyrights though, I'm not sure if they would or would not mind.

    – Shilly
    Nov 15 '18 at 14:47













  • When it comes to web scraping I'd personally recommend the use of a headless browser such as headless chrome because you can do things like programmatically click elements without having to sniff for event listeners. You can also do things like wait for the DOM to change or a network request to be made before proceeding. All of which sound like they'd benefit your use case. You can't do that with a content script. Which is what I assume you're using?

    – Khauri McClain
    Nov 15 '18 at 14:53
















0















I'm in the process of writing a scraper for the articles on the site https://www.welt.de. I'd also like to include the comments. However, when loading the page, not all comments are loaded automatically. Instead one has to click on a link to load more comments, until at some point, all are loaded.



Eg: https://www.welt.de/finanzen/immobilien/article183878020/Bundesbank-sieht-im-Immobilienboom-ein-Stabilitaetsrisiko.html



When you scroll down, there appears a surface "MEHR KOMMENTARE ANZEIGEN" (German for 'show more comments').



This link looks like:



<div href="#" style="text-align: center; height: 44px; cursor: pointer;">
<a style="font-size: 0.6875rem; font-family: ffmark, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-weight: 800; color: rgb(0, 57, 91); line-height: 5;">
<span style="font-size: 0.6875rem; font-family: ffmark, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-weight: 500; margin-right: 0.625rem; text-align: right; color: rgb(120, 120, 120);">
MEHR KOMMENTARE ANZEIGEN
<span style="width: 14px; height: 8px; margin: 0px 0px 0px 0.625rem; padding-top: 0px; display: inline-block; vertical-align: initial;">
<svg viewBox="0 0 15 9" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
<g transform="translate(-608.000000, -4318.000000)" fill="#787878">
<polygon transform="translate(615.205882, 4322.852941) rotate(-90.000000) translate(-615.205882, -4322.852941) " points="618.264706 4315.79412 611.205882 4322.85353 618.264706 4329.91176 619.205882 4328.97059 613.088824 4322.85353 619.205882 4316.73529">
</polygon>
</g>
</g>
</svg>
</span>
</span>
</a>
</div>


However, I do not know how to load this link in a script?



I understand that href="#" is used when a link is handled by javascript and that it is bad style, as it is only used to change the appearance of the mouse, for which there are other methods.



But where is the onClick() method? Kinda dumbfoundead here...










share|improve this question

























  • If there's no onclick then I'd guess that a click handler is registered somewhere in the JavaScript the page loads. Any idea what JavaScript frameworks (if any) the page uses?

    – phuzi
    Nov 15 '18 at 14:44











  • There's like 20 different script files those pages load. All the event handlers will be there somewhere. But as elken shown below, if you are able to extract all the relevant API endpoints, using those will be way better than actually scraping the site. Be mindful of copyrights though, I'm not sure if they would or would not mind.

    – Shilly
    Nov 15 '18 at 14:47













  • When it comes to web scraping I'd personally recommend the use of a headless browser such as headless chrome because you can do things like programmatically click elements without having to sniff for event listeners. You can also do things like wait for the DOM to change or a network request to be made before proceeding. All of which sound like they'd benefit your use case. You can't do that with a content script. Which is what I assume you're using?

    – Khauri McClain
    Nov 15 '18 at 14:53














0












0








0








I'm in the process of writing a scraper for the articles on the site https://www.welt.de. I'd also like to include the comments. However, when loading the page, not all comments are loaded automatically. Instead one has to click on a link to load more comments, until at some point, all are loaded.



Eg: https://www.welt.de/finanzen/immobilien/article183878020/Bundesbank-sieht-im-Immobilienboom-ein-Stabilitaetsrisiko.html



When you scroll down, there appears a surface "MEHR KOMMENTARE ANZEIGEN" (German for 'show more comments').



This link looks like:



<div href="#" style="text-align: center; height: 44px; cursor: pointer;">
<a style="font-size: 0.6875rem; font-family: ffmark, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-weight: 800; color: rgb(0, 57, 91); line-height: 5;">
<span style="font-size: 0.6875rem; font-family: ffmark, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-weight: 500; margin-right: 0.625rem; text-align: right; color: rgb(120, 120, 120);">
MEHR KOMMENTARE ANZEIGEN
<span style="width: 14px; height: 8px; margin: 0px 0px 0px 0.625rem; padding-top: 0px; display: inline-block; vertical-align: initial;">
<svg viewBox="0 0 15 9" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
<g transform="translate(-608.000000, -4318.000000)" fill="#787878">
<polygon transform="translate(615.205882, 4322.852941) rotate(-90.000000) translate(-615.205882, -4322.852941) " points="618.264706 4315.79412 611.205882 4322.85353 618.264706 4329.91176 619.205882 4328.97059 613.088824 4322.85353 619.205882 4316.73529">
</polygon>
</g>
</g>
</svg>
</span>
</span>
</a>
</div>


However, I do not know how to load this link in a script?



I understand that href="#" is used when a link is handled by javascript and that it is bad style, as it is only used to change the appearance of the mouse, for which there are other methods.



But where is the onClick() method? Kinda dumbfoundead here...










share|improve this question
















I'm in the process of writing a scraper for the articles on the site https://www.welt.de. I'd also like to include the comments. However, when loading the page, not all comments are loaded automatically. Instead one has to click on a link to load more comments, until at some point, all are loaded.



Eg: https://www.welt.de/finanzen/immobilien/article183878020/Bundesbank-sieht-im-Immobilienboom-ein-Stabilitaetsrisiko.html



When you scroll down, there appears a surface "MEHR KOMMENTARE ANZEIGEN" (German for 'show more comments').



This link looks like:



<div href="#" style="text-align: center; height: 44px; cursor: pointer;">
<a style="font-size: 0.6875rem; font-family: ffmark, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-weight: 800; color: rgb(0, 57, 91); line-height: 5;">
<span style="font-size: 0.6875rem; font-family: ffmark, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-weight: 500; margin-right: 0.625rem; text-align: right; color: rgb(120, 120, 120);">
MEHR KOMMENTARE ANZEIGEN
<span style="width: 14px; height: 8px; margin: 0px 0px 0px 0.625rem; padding-top: 0px; display: inline-block; vertical-align: initial;">
<svg viewBox="0 0 15 9" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
<g transform="translate(-608.000000, -4318.000000)" fill="#787878">
<polygon transform="translate(615.205882, 4322.852941) rotate(-90.000000) translate(-615.205882, -4322.852941) " points="618.264706 4315.79412 611.205882 4322.85353 618.264706 4329.91176 619.205882 4328.97059 613.088824 4322.85353 619.205882 4316.73529">
</polygon>
</g>
</g>
</svg>
</span>
</span>
</a>
</div>


However, I do not know how to load this link in a script?



I understand that href="#" is used when a link is handled by javascript and that it is bad style, as it is only used to change the appearance of the mouse, for which there are other methods.



But where is the onClick() method? Kinda dumbfoundead here...







javascript html web-scraping web-crawler href






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 15 '18 at 15:32









Khauri McClain

2,2701414




2,2701414










asked Nov 15 '18 at 14:38









Thomas KaltfussThomas Kaltfuss

133




133













  • If there's no onclick then I'd guess that a click handler is registered somewhere in the JavaScript the page loads. Any idea what JavaScript frameworks (if any) the page uses?

    – phuzi
    Nov 15 '18 at 14:44











  • There's like 20 different script files those pages load. All the event handlers will be there somewhere. But as elken shown below, if you are able to extract all the relevant API endpoints, using those will be way better than actually scraping the site. Be mindful of copyrights though, I'm not sure if they would or would not mind.

    – Shilly
    Nov 15 '18 at 14:47













  • When it comes to web scraping I'd personally recommend the use of a headless browser such as headless chrome because you can do things like programmatically click elements without having to sniff for event listeners. You can also do things like wait for the DOM to change or a network request to be made before proceeding. All of which sound like they'd benefit your use case. You can't do that with a content script. Which is what I assume you're using?

    – Khauri McClain
    Nov 15 '18 at 14:53



















  • If there's no onclick then I'd guess that a click handler is registered somewhere in the JavaScript the page loads. Any idea what JavaScript frameworks (if any) the page uses?

    – phuzi
    Nov 15 '18 at 14:44











  • There's like 20 different script files those pages load. All the event handlers will be there somewhere. But as elken shown below, if you are able to extract all the relevant API endpoints, using those will be way better than actually scraping the site. Be mindful of copyrights though, I'm not sure if they would or would not mind.

    – Shilly
    Nov 15 '18 at 14:47













  • When it comes to web scraping I'd personally recommend the use of a headless browser such as headless chrome because you can do things like programmatically click elements without having to sniff for event listeners. You can also do things like wait for the DOM to change or a network request to be made before proceeding. All of which sound like they'd benefit your use case. You can't do that with a content script. Which is what I assume you're using?

    – Khauri McClain
    Nov 15 '18 at 14:53

















If there's no onclick then I'd guess that a click handler is registered somewhere in the JavaScript the page loads. Any idea what JavaScript frameworks (if any) the page uses?

– phuzi
Nov 15 '18 at 14:44





If there's no onclick then I'd guess that a click handler is registered somewhere in the JavaScript the page loads. Any idea what JavaScript frameworks (if any) the page uses?

– phuzi
Nov 15 '18 at 14:44













There's like 20 different script files those pages load. All the event handlers will be there somewhere. But as elken shown below, if you are able to extract all the relevant API endpoints, using those will be way better than actually scraping the site. Be mindful of copyrights though, I'm not sure if they would or would not mind.

– Shilly
Nov 15 '18 at 14:47







There's like 20 different script files those pages load. All the event handlers will be there somewhere. But as elken shown below, if you are able to extract all the relevant API endpoints, using those will be way better than actually scraping the site. Be mindful of copyrights though, I'm not sure if they would or would not mind.

– Shilly
Nov 15 '18 at 14:47















When it comes to web scraping I'd personally recommend the use of a headless browser such as headless chrome because you can do things like programmatically click elements without having to sniff for event listeners. You can also do things like wait for the DOM to change or a network request to be made before proceeding. All of which sound like they'd benefit your use case. You can't do that with a content script. Which is what I assume you're using?

– Khauri McClain
Nov 15 '18 at 14:53





When it comes to web scraping I'd personally recommend the use of a headless browser such as headless chrome because you can do things like programmatically click elements without having to sniff for event listeners. You can also do things like wait for the DOM to change or a network request to be made before proceeding. All of which sound like they'd benefit your use case. You can't do that with a content script. Which is what I assume you're using?

– Khauri McClain
Nov 15 '18 at 14:53












2 Answers
2






active

oldest

votes


















1














Clicking that show comments twice gives me the following urls



https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T13:52:41.714&sort=NEWEST
https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T12:23:26.896&sort=NEWEST


Which returns the comments. So just use the post id you have and keep fiddling with created-cursor until you get all the comments?



EDIT:
Removing the creator-cursor parameter should give you all the comments



https://api-co.la.welt.de/api/comments?document-id=183878020


EDIT 2:



As someone else mentioned, this might not be a good idea without first contacting the owner of the site.






share|improve this answer

































    0














    As far as finding the click handler:



    If you inspect this element, you can see it has a click event handler calling something in communityweb.js:



    enter image description here



    This is almost certainly attached with javascript somewhere else (eg, document.getElementById('something').addEventListener("click", function(){ ... } );)



    If you want, you can follow through and see the code it's calling (be sure to use the 'pretty print' feature, as it's minified):



    enter image description here



    It gets complicated from there, but if you're determined enough you could step through in the debugger and see what's being called.






    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53321836%2fonly-href-no-onclick-how-do-i-load-this-in-script%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      Clicking that show comments twice gives me the following urls



      https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T13:52:41.714&sort=NEWEST
      https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T12:23:26.896&sort=NEWEST


      Which returns the comments. So just use the post id you have and keep fiddling with created-cursor until you get all the comments?



      EDIT:
      Removing the creator-cursor parameter should give you all the comments



      https://api-co.la.welt.de/api/comments?document-id=183878020


      EDIT 2:



      As someone else mentioned, this might not be a good idea without first contacting the owner of the site.






      share|improve this answer






























        1














        Clicking that show comments twice gives me the following urls



        https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T13:52:41.714&sort=NEWEST
        https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T12:23:26.896&sort=NEWEST


        Which returns the comments. So just use the post id you have and keep fiddling with created-cursor until you get all the comments?



        EDIT:
        Removing the creator-cursor parameter should give you all the comments



        https://api-co.la.welt.de/api/comments?document-id=183878020


        EDIT 2:



        As someone else mentioned, this might not be a good idea without first contacting the owner of the site.






        share|improve this answer




























          1












          1








          1







          Clicking that show comments twice gives me the following urls



          https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T13:52:41.714&sort=NEWEST
          https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T12:23:26.896&sort=NEWEST


          Which returns the comments. So just use the post id you have and keep fiddling with created-cursor until you get all the comments?



          EDIT:
          Removing the creator-cursor parameter should give you all the comments



          https://api-co.la.welt.de/api/comments?document-id=183878020


          EDIT 2:



          As someone else mentioned, this might not be a good idea without first contacting the owner of the site.






          share|improve this answer















          Clicking that show comments twice gives me the following urls



          https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T13:52:41.714&sort=NEWEST
          https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T12:23:26.896&sort=NEWEST


          Which returns the comments. So just use the post id you have and keep fiddling with created-cursor until you get all the comments?



          EDIT:
          Removing the creator-cursor parameter should give you all the comments



          https://api-co.la.welt.de/api/comments?document-id=183878020


          EDIT 2:



          As someone else mentioned, this might not be a good idea without first contacting the owner of the site.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 15 '18 at 14:49

























          answered Nov 15 '18 at 14:42









          elkenelken

          138110




          138110

























              0














              As far as finding the click handler:



              If you inspect this element, you can see it has a click event handler calling something in communityweb.js:



              enter image description here



              This is almost certainly attached with javascript somewhere else (eg, document.getElementById('something').addEventListener("click", function(){ ... } );)



              If you want, you can follow through and see the code it's calling (be sure to use the 'pretty print' feature, as it's minified):



              enter image description here



              It gets complicated from there, but if you're determined enough you could step through in the debugger and see what's being called.






              share|improve this answer




























                0














                As far as finding the click handler:



                If you inspect this element, you can see it has a click event handler calling something in communityweb.js:



                enter image description here



                This is almost certainly attached with javascript somewhere else (eg, document.getElementById('something').addEventListener("click", function(){ ... } );)



                If you want, you can follow through and see the code it's calling (be sure to use the 'pretty print' feature, as it's minified):



                enter image description here



                It gets complicated from there, but if you're determined enough you could step through in the debugger and see what's being called.






                share|improve this answer


























                  0












                  0








                  0







                  As far as finding the click handler:



                  If you inspect this element, you can see it has a click event handler calling something in communityweb.js:



                  enter image description here



                  This is almost certainly attached with javascript somewhere else (eg, document.getElementById('something').addEventListener("click", function(){ ... } );)



                  If you want, you can follow through and see the code it's calling (be sure to use the 'pretty print' feature, as it's minified):



                  enter image description here



                  It gets complicated from there, but if you're determined enough you could step through in the debugger and see what's being called.






                  share|improve this answer













                  As far as finding the click handler:



                  If you inspect this element, you can see it has a click event handler calling something in communityweb.js:



                  enter image description here



                  This is almost certainly attached with javascript somewhere else (eg, document.getElementById('something').addEventListener("click", function(){ ... } );)



                  If you want, you can follow through and see the code it's calling (be sure to use the 'pretty print' feature, as it's minified):



                  enter image description here



                  It gets complicated from there, but if you're determined enough you could step through in the debugger and see what's being called.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 15 '18 at 14:51









                  gregmacgregmac

                  18.6k768102




                  18.6k768102






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53321836%2fonly-href-no-onclick-how-do-i-load-this-in-script%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Florida Star v. B. J. F.

                      Error while running script in elastic search , gateway timeout

                      Adding quotations to stringified JSON object values