Only href=“#”, no onclick(), how do I load this in script?

I'm in the process of writing a scraper for the articles on the site https://www.welt.de. I'd also like to include the comments. However, when loading the page, not all comments are loaded automatically. Instead one has to click on a link to load more comments, until at some point, all are loaded.

Eg: https://www.welt.de/finanzen/immobilien/article183878020/Bundesbank-sieht-im-Immobilienboom-ein-Stabilitaetsrisiko.html

When you scroll down, there appears a surface "MEHR KOMMENTARE ANZEIGEN" (German for 'show more comments').

This link looks like:

<div href="#" style="text-align: center; height: 44px; cursor: pointer;">

<a style="font-size: 0.6875rem; font-family: ffmark, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-weight: 800; color: rgb(0, 57, 91); line-height: 5;">

<span style="font-size: 0.6875rem; font-family: ffmark, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-weight: 500; margin-right: 0.625rem; text-align: right; color: rgb(120, 120, 120);">

MEHR KOMMENTARE ANZEIGEN

<span style="width: 14px; height: 8px; margin: 0px 0px 0px 0.625rem; padding-top: 0px; display: inline-block; vertical-align: initial;">

<svg viewBox="0 0 15 9" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">

<g stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">

<g transform="translate(-608.000000, -4318.000000)" fill="#787878">

<polygon transform="translate(615.205882, 4322.852941) rotate(-90.000000) translate(-615.205882, -4322.852941) " points="618.264706 4315.79412 611.205882 4322.85353 618.264706 4329.91176 619.205882 4328.97059 613.088824 4322.85353 619.205882 4316.73529">

</polygon>

</g>

</g>

</svg>

</span>

</span>

</a>

</div>

However, I do not know how to load this link in a script?

I understand that href="#" is used when a link is handled by javascript and that it is bad style, as it is only used to change the appearance of the mouse, for which there are other methods.

But where is the onClick() method? Kinda dumbfoundead here...

edited Nov 15 '18 at 15:32

Khauri McClain

2,2701414

asked Nov 15 '18 at 14:38

Thomas Kaltfuss

133

If there's no onclick then I'd guess that a click handler is registered somewhere in the JavaScript the page loads. Any idea what JavaScript frameworks (if any) the page uses?

– phuzi
Nov 15 '18 at 14:44

There's like 20 different script files those pages load. All the event handlers will be there somewhere. But as elken shown below, if you are able to extract all the relevant API endpoints, using those will be way better than actually scraping the site. Be mindful of copyrights though, I'm not sure if they would or would not mind.

– Shilly
Nov 15 '18 at 14:47

When it comes to web scraping I'd personally recommend the use of a headless browser such as headless chrome because you can do things like programmatically click elements without having to sniff for event listeners. You can also do things like wait for the DOM to change or a network request to be made before proceeding. All of which sound like they'd benefit your use case. You can't do that with a content script. Which is what I assume you're using?

– Khauri McClain
Nov 15 '18 at 14:53

add a comment |

Eg: https://www.welt.de/finanzen/immobilien/article183878020/Bundesbank-sieht-im-Immobilienboom-ein-Stabilitaetsrisiko.html

When you scroll down, there appears a surface "MEHR KOMMENTARE ANZEIGEN" (German for 'show more comments').

This link looks like:

<div href="#" style="text-align: center; height: 44px; cursor: pointer;">

<a style="font-size: 0.6875rem; font-family: ffmark, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-weight: 800; color: rgb(0, 57, 91); line-height: 5;">

<span style="font-size: 0.6875rem; font-family: ffmark, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-weight: 500; margin-right: 0.625rem; text-align: right; color: rgb(120, 120, 120);">

MEHR KOMMENTARE ANZEIGEN

<span style="width: 14px; height: 8px; margin: 0px 0px 0px 0.625rem; padding-top: 0px; display: inline-block; vertical-align: initial;">

<svg viewBox="0 0 15 9" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">

<g stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">

<g transform="translate(-608.000000, -4318.000000)" fill="#787878">

<polygon transform="translate(615.205882, 4322.852941) rotate(-90.000000) translate(-615.205882, -4322.852941) " points="618.264706 4315.79412 611.205882 4322.85353 618.264706 4329.91176 619.205882 4328.97059 613.088824 4322.85353 619.205882 4316.73529">

</polygon>

</g>

</g>

</svg>

</span>

</span>

</a>

</div>

However, I do not know how to load this link in a script?

I understand that href="#" is used when a link is handled by javascript and that it is bad style, as it is only used to change the appearance of the mouse, for which there are other methods.

But where is the onClick() method? Kinda dumbfoundead here...

edited Nov 15 '18 at 15:32

Khauri McClain

2,2701414

asked Nov 15 '18 at 14:38

Thomas Kaltfuss

133

If there's no onclick then I'd guess that a click handler is registered somewhere in the JavaScript the page loads. Any idea what JavaScript frameworks (if any) the page uses?

– phuzi
Nov 15 '18 at 14:44

There's like 20 different script files those pages load. All the event handlers will be there somewhere. But as elken shown below, if you are able to extract all the relevant API endpoints, using those will be way better than actually scraping the site. Be mindful of copyrights though, I'm not sure if they would or would not mind.

– Shilly
Nov 15 '18 at 14:47

When it comes to web scraping I'd personally recommend the use of a headless browser such as headless chrome because you can do things like programmatically click elements without having to sniff for event listeners. You can also do things like wait for the DOM to change or a network request to be made before proceeding. All of which sound like they'd benefit your use case. You can't do that with a content script. Which is what I assume you're using?

– Khauri McClain
Nov 15 '18 at 14:53

add a comment |

Eg: https://www.welt.de/finanzen/immobilien/article183878020/Bundesbank-sieht-im-Immobilienboom-ein-Stabilitaetsrisiko.html

When you scroll down, there appears a surface "MEHR KOMMENTARE ANZEIGEN" (German for 'show more comments').

This link looks like:

<div href="#" style="text-align: center; height: 44px; cursor: pointer;">

<a style="font-size: 0.6875rem; font-family: ffmark, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-weight: 800; color: rgb(0, 57, 91); line-height: 5;">

<span style="font-size: 0.6875rem; font-family: ffmark, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-weight: 500; margin-right: 0.625rem; text-align: right; color: rgb(120, 120, 120);">

MEHR KOMMENTARE ANZEIGEN

<span style="width: 14px; height: 8px; margin: 0px 0px 0px 0.625rem; padding-top: 0px; display: inline-block; vertical-align: initial;">

<svg viewBox="0 0 15 9" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">

<g stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">

<g transform="translate(-608.000000, -4318.000000)" fill="#787878">

<polygon transform="translate(615.205882, 4322.852941) rotate(-90.000000) translate(-615.205882, -4322.852941) " points="618.264706 4315.79412 611.205882 4322.85353 618.264706 4329.91176 619.205882 4328.97059 613.088824 4322.85353 619.205882 4316.73529">

</polygon>

</g>

</g>

</svg>

</span>

</span>

</a>

</div>

However, I do not know how to load this link in a script?

I understand that href="#" is used when a link is handled by javascript and that it is bad style, as it is only used to change the appearance of the mouse, for which there are other methods.

But where is the onClick() method? Kinda dumbfoundead here...

edited Nov 15 '18 at 15:32

Khauri McClain

2,2701414

asked Nov 15 '18 at 14:38

Thomas Kaltfuss

133

Eg: https://www.welt.de/finanzen/immobilien/article183878020/Bundesbank-sieht-im-Immobilienboom-ein-Stabilitaetsrisiko.html

When you scroll down, there appears a surface "MEHR KOMMENTARE ANZEIGEN" (German for 'show more comments').

This link looks like:

<div href="#" style="text-align: center; height: 44px; cursor: pointer;">

<a style="font-size: 0.6875rem; font-family: ffmark, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-weight: 800; color: rgb(0, 57, 91); line-height: 5;">

<span style="font-size: 0.6875rem; font-family: ffmark, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; font-weight: 500; margin-right: 0.625rem; text-align: right; color: rgb(120, 120, 120);">

MEHR KOMMENTARE ANZEIGEN

<span style="width: 14px; height: 8px; margin: 0px 0px 0px 0.625rem; padding-top: 0px; display: inline-block; vertical-align: initial;">

<svg viewBox="0 0 15 9" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">

<g stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">

<g transform="translate(-608.000000, -4318.000000)" fill="#787878">

<polygon transform="translate(615.205882, 4322.852941) rotate(-90.000000) translate(-615.205882, -4322.852941) " points="618.264706 4315.79412 611.205882 4322.85353 618.264706 4329.91176 619.205882 4328.97059 613.088824 4322.85353 619.205882 4316.73529">

</polygon>

</g>

</g>

</svg>

</span>

</span>

</a>

</div>

However, I do not know how to load this link in a script?

I understand that href="#" is used when a link is handled by javascript and that it is bad style, as it is only used to change the appearance of the mouse, for which there are other methods.

But where is the onClick() method? Kinda dumbfoundead here...

javascript html web-scraping web-crawler href

edited Nov 15 '18 at 15:32

Khauri McClain

2,2701414

asked Nov 15 '18 at 14:38

Thomas Kaltfuss

133

edited Nov 15 '18 at 15:32

Khauri McClain

2,2701414

asked Nov 15 '18 at 14:38

Thomas Kaltfuss

133

edited Nov 15 '18 at 15:32

Khauri McClain

2,2701414

edited Nov 15 '18 at 15:32

Khauri McClain

2,2701414

edited Nov 15 '18 at 15:32

Khauri McClain

2,2701414

asked Nov 15 '18 at 14:38

Thomas Kaltfuss

133

asked Nov 15 '18 at 14:38

Thomas Kaltfuss

133

asked Nov 15 '18 at 14:38

Thomas Kaltfuss

133

If there's no onclick then I'd guess that a click handler is registered somewhere in the JavaScript the page loads. Any idea what JavaScript frameworks (if any) the page uses?

– phuzi
Nov 15 '18 at 14:44

There's like 20 different script files those pages load. All the event handlers will be there somewhere. But as elken shown below, if you are able to extract all the relevant API endpoints, using those will be way better than actually scraping the site. Be mindful of copyrights though, I'm not sure if they would or would not mind.

– Shilly
Nov 15 '18 at 14:47

When it comes to web scraping I'd personally recommend the use of a headless browser such as headless chrome because you can do things like programmatically click elements without having to sniff for event listeners. You can also do things like wait for the DOM to change or a network request to be made before proceeding. All of which sound like they'd benefit your use case. You can't do that with a content script. Which is what I assume you're using?

– Khauri McClain
Nov 15 '18 at 14:53

add a comment |

If there's no onclick then I'd guess that a click handler is registered somewhere in the JavaScript the page loads. Any idea what JavaScript frameworks (if any) the page uses?

– phuzi
Nov 15 '18 at 14:44

There's like 20 different script files those pages load. All the event handlers will be there somewhere. But as elken shown below, if you are able to extract all the relevant API endpoints, using those will be way better than actually scraping the site. Be mindful of copyrights though, I'm not sure if they would or would not mind.

– Shilly
Nov 15 '18 at 14:47

When it comes to web scraping I'd personally recommend the use of a headless browser such as headless chrome because you can do things like programmatically click elements without having to sniff for event listeners. You can also do things like wait for the DOM to change or a network request to be made before proceeding. All of which sound like they'd benefit your use case. You can't do that with a content script. Which is what I assume you're using?

– Khauri McClain
Nov 15 '18 at 14:53

If there's no onclick then I'd guess that a click handler is registered somewhere in the JavaScript the page loads. Any idea what JavaScript frameworks (if any) the page uses?

– phuzi
Nov 15 '18 at 14:44

There's like 20 different script files those pages load. All the event handlers will be there somewhere. But as elken shown below, if you are able to extract all the relevant API endpoints, using those will be way better than actually scraping the site. Be mindful of copyrights though, I'm not sure if they would or would not mind.

– Shilly
Nov 15 '18 at 14:47

When it comes to web scraping I'd personally recommend the use of a headless browser such as headless chrome because you can do things like programmatically click elements without having to sniff for event listeners. You can also do things like wait for the DOM to change or a network request to be made before proceeding. All of which sound like they'd benefit your use case. You can't do that with a content script. Which is what I assume you're using?

– Khauri McClain
Nov 15 '18 at 14:53

add a comment |

2 Answers
2

active

oldest

votes

Clicking that show comments twice gives me the following urls

https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T13:52:41.714&sort=NEWEST

https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T12:23:26.896&sort=NEWEST

Which returns the comments. So just use the post id you have and keep fiddling with created-cursor until you get all the comments?

EDIT:
Removing the creator-cursor parameter should give you all the comments

https://api-co.la.welt.de/api/comments?document-id=183878020

EDIT 2:

As someone else mentioned, this might not be a good idea without first contacting the owner of the site.

edited Nov 15 '18 at 14:49

answered Nov 15 '18 at 14:42

elken

138110

add a comment |

As far as finding the click handler:

If you inspect this element, you can see it has a click event handler calling something in communityweb.js:

enter image description here

This is almost certainly attached with javascript somewhere else (eg, document.getElementById('something').addEventListener("click", function(){ ... } );)

If you want, you can follow through and see the code it's calling (be sure to use the 'pretty print' feature, as it's minified):

enter image description here

It gets complicated from there, but if you're determined enough you could step through in the debugger and see what's being called.

answered Nov 15 '18 at 14:51

gregmac

18.6k768102

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53321836%2fonly-href-no-onclick-how-do-i-load-this-in-script%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Clicking that show comments twice gives me the following urls

https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T13:52:41.714&sort=NEWEST

https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T12:23:26.896&sort=NEWEST

Which returns the comments. So just use the post id you have and keep fiddling with created-cursor until you get all the comments?

EDIT:
Removing the creator-cursor parameter should give you all the comments

https://api-co.la.welt.de/api/comments?document-id=183878020

EDIT 2:

As someone else mentioned, this might not be a good idea without first contacting the owner of the site.

edited Nov 15 '18 at 14:49

answered Nov 15 '18 at 14:42

elken

138110

add a comment |

Clicking that show comments twice gives me the following urls

https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T13:52:41.714&sort=NEWEST

https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T12:23:26.896&sort=NEWEST

Which returns the comments. So just use the post id you have and keep fiddling with created-cursor until you get all the comments?

EDIT:
Removing the creator-cursor parameter should give you all the comments

https://api-co.la.welt.de/api/comments?document-id=183878020

EDIT 2:

As someone else mentioned, this might not be a good idea without first contacting the owner of the site.

edited Nov 15 '18 at 14:49

answered Nov 15 '18 at 14:42

elken

138110

add a comment |

Clicking that show comments twice gives me the following urls

https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T13:52:41.714&sort=NEWEST

https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T12:23:26.896&sort=NEWEST

Which returns the comments. So just use the post id you have and keep fiddling with created-cursor until you get all the comments?

EDIT:
Removing the creator-cursor parameter should give you all the comments

https://api-co.la.welt.de/api/comments?document-id=183878020

EDIT 2:

As someone else mentioned, this might not be a good idea without first contacting the owner of the site.

edited Nov 15 '18 at 14:49

answered Nov 15 '18 at 14:42

elken

138110

Clicking that show comments twice gives me the following urls

https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T13:52:41.714&sort=NEWEST

https://api-co.la.welt.de/api/comments?document-id=183878020&created-cursor=2018-11-15T12:23:26.896&sort=NEWEST

Which returns the comments. So just use the post id you have and keep fiddling with created-cursor until you get all the comments?

EDIT:
Removing the creator-cursor parameter should give you all the comments

https://api-co.la.welt.de/api/comments?document-id=183878020

EDIT 2:

As someone else mentioned, this might not be a good idea without first contacting the owner of the site.

edited Nov 15 '18 at 14:49

answered Nov 15 '18 at 14:42

elken

138110

edited Nov 15 '18 at 14:49

answered Nov 15 '18 at 14:42

elken

138110

answered Nov 15 '18 at 14:42

elken

138110

answered Nov 15 '18 at 14:42

elken

138110

add a comment |

As far as finding the click handler:

If you inspect this element, you can see it has a click event handler calling something in communityweb.js:

enter image description here

This is almost certainly attached with javascript somewhere else (eg, document.getElementById('something').addEventListener("click", function(){ ... } );)

If you want, you can follow through and see the code it's calling (be sure to use the 'pretty print' feature, as it's minified):

enter image description here

It gets complicated from there, but if you're determined enough you could step through in the debugger and see what's being called.

answered Nov 15 '18 at 14:51

gregmac

18.6k768102

add a comment |

As far as finding the click handler:

If you inspect this element, you can see it has a click event handler calling something in communityweb.js:

enter image description here

This is almost certainly attached with javascript somewhere else (eg, document.getElementById('something').addEventListener("click", function(){ ... } );)

If you want, you can follow through and see the code it's calling (be sure to use the 'pretty print' feature, as it's minified):

enter image description here

It gets complicated from there, but if you're determined enough you could step through in the debugger and see what's being called.

answered Nov 15 '18 at 14:51

gregmac

18.6k768102

add a comment |

As far as finding the click handler:

If you inspect this element, you can see it has a click event handler calling something in communityweb.js:

enter image description here

This is almost certainly attached with javascript somewhere else (eg, document.getElementById('something').addEventListener("click", function(){ ... } );)

If you want, you can follow through and see the code it's calling (be sure to use the 'pretty print' feature, as it's minified):

enter image description here

It gets complicated from there, but if you're determined enough you could step through in the debugger and see what's being called.

answered Nov 15 '18 at 14:51

gregmac

18.6k768102

As far as finding the click handler:

If you inspect this element, you can see it has a click event handler calling something in communityweb.js:

enter image description here

This is almost certainly attached with javascript somewhere else (eg, document.getElementById('something').addEventListener("click", function(){ ... } );)

If you want, you can follow through and see the code it's calling (be sure to use the 'pretty print' feature, as it's minified):

enter image description here

It gets complicated from there, but if you're determined enough you could step through in the debugger and see what's being called.

answered Nov 15 '18 at 14:51

gregmac

18.6k768102

answered Nov 15 '18 at 14:51

gregmac

18.6k768102

answered Nov 15 '18 at 14:51

gregmac

18.6k768102

answered Nov 15 '18 at 14:51

gregmac

18.6k768102

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ndtyjky