Website section not appearing with BeautifulSoup

I'm trying to webscrape the abstract part of this website:

from bs4 import BeautifulSoup

urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'

page_response = requests.get(page_link, timeout=5, verify=False, headers={'User-Agent': 'Mozilla/5.0'})

soup2 = BeautifulSoup(page_response.content, 'html.parser')

and when I search for:

    soup2.find_all("div", {"class": "abstractSection"})

I do not get anything, whereas this is the part i'm interested in.
Any idea?

asked Nov 14 '18 at 8:56

sammtt

6911

add a comment |

I'm trying to webscrape the abstract part of this website:

from bs4 import BeautifulSoup

urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'

page_response = requests.get(page_link, timeout=5, verify=False, headers={'User-Agent': 'Mozilla/5.0'})

soup2 = BeautifulSoup(page_response.content, 'html.parser')

and when I search for:

    soup2.find_all("div", {"class": "abstractSection"})

I do not get anything, whereas this is the part i'm interested in.
Any idea?

asked Nov 14 '18 at 8:56

sammtt

6911

add a comment |

I'm trying to webscrape the abstract part of this website:

from bs4 import BeautifulSoup

urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'

page_response = requests.get(page_link, timeout=5, verify=False, headers={'User-Agent': 'Mozilla/5.0'})

soup2 = BeautifulSoup(page_response.content, 'html.parser')

and when I search for:

    soup2.find_all("div", {"class": "abstractSection"})

I do not get anything, whereas this is the part i'm interested in.
Any idea?

asked Nov 14 '18 at 8:56

sammtt

6911

I'm trying to webscrape the abstract part of this website:

from bs4 import BeautifulSoup

urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'

page_response = requests.get(page_link, timeout=5, verify=False, headers={'User-Agent': 'Mozilla/5.0'})

soup2 = BeautifulSoup(page_response.content, 'html.parser')

and when I search for:

    soup2.find_all("div", {"class": "abstractSection"})

I do not get anything, whereas this is the part i'm interested in.
Any idea?

python-3.x web-scraping beautifulsoup

asked Nov 14 '18 at 8:56

sammtt

6911

asked Nov 14 '18 at 8:56

sammtt

6911

asked Nov 14 '18 at 8:56

sammtt

6911

asked Nov 14 '18 at 8:56

sammtt

6911

asked Nov 14 '18 at 8:56

sammtt

6911

add a comment |

1 Answer
1

active

oldest

votes

I'm unsure where you found this page_link to make use of. Try the below approach to get the content you wanna parse.

from bs4 import BeautifulSoup

import requests



urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'



page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})

soup = BeautifulSoup(page_response.content, 'html.parser')

name = soup.find(class_="hlFld-ContribAuthor").find("a").text

abstract = soup.find(class_="abstractSection").find("p").text

print(f'Name : {name}nAbstract : {abstract}')

If you want to use selector then try:

from bs4 import BeautifulSoup

import requests



urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'



page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})

soup = BeautifulSoup(page_response.content, 'html.parser')

name = soup.select_one(".hlFld-ContribAuthor a").text

abstract = soup.select_one(".abstractSection p").text

print(f'Name : {name}nAbstract : {abstract}')

Output:

Name : Charles D. Ellis, CFA

Abstract :  One of the consequences of the shift in corporate retirement plans from defined benefit           to defined contribution is widespread retirement insecurity. Although most people in the           top one-third of economic affluence will be fine, for the other two-thirds—particularly           the bottom one-third—the problem is a serious threat. We can prevent this painful           future if we act sensibly and soon by raising the alarm with our corporate and government           leaders.

Finally, if you do not wish to see the gap between text within abstract then replace the line with abstract = ' '.join(soup.find(class_="abstractSection").find("p").text.split()).

edited Nov 14 '18 at 10:47

answered Nov 14 '18 at 9:29

SIM

10.4k3743

Thank you very much, would you know how to get the name just above the abstract part too? he's called "Charles D. Ellis, CFA"

– sammtt
Nov 14 '18 at 10:03

Sure. Check out the edit @sammtt!!

– SIM
Nov 14 '18 at 10:35

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53296262%2fwebsite-section-not-appearing-with-beautifulsoup%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I'm unsure where you found this page_link to make use of. Try the below approach to get the content you wanna parse.

from bs4 import BeautifulSoup

import requests



urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'



page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})

soup = BeautifulSoup(page_response.content, 'html.parser')

name = soup.find(class_="hlFld-ContribAuthor").find("a").text

abstract = soup.find(class_="abstractSection").find("p").text

print(f'Name : {name}nAbstract : {abstract}')

If you want to use selector then try:

from bs4 import BeautifulSoup

import requests



urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'



page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})

soup = BeautifulSoup(page_response.content, 'html.parser')

name = soup.select_one(".hlFld-ContribAuthor a").text

abstract = soup.select_one(".abstractSection p").text

print(f'Name : {name}nAbstract : {abstract}')

Output:

Name : Charles D. Ellis, CFA

Abstract :  One of the consequences of the shift in corporate retirement plans from defined benefit           to defined contribution is widespread retirement insecurity. Although most people in the           top one-third of economic affluence will be fine, for the other two-thirds—particularly           the bottom one-third—the problem is a serious threat. We can prevent this painful           future if we act sensibly and soon by raising the alarm with our corporate and government           leaders.

Finally, if you do not wish to see the gap between text within abstract then replace the line with abstract = ' '.join(soup.find(class_="abstractSection").find("p").text.split()).

edited Nov 14 '18 at 10:47

answered Nov 14 '18 at 9:29

SIM

10.4k3743

Thank you very much, would you know how to get the name just above the abstract part too? he's called "Charles D. Ellis, CFA"

– sammtt
Nov 14 '18 at 10:03

Sure. Check out the edit @sammtt!!

– SIM
Nov 14 '18 at 10:35

add a comment |

I'm unsure where you found this page_link to make use of. Try the below approach to get the content you wanna parse.

from bs4 import BeautifulSoup

import requests



urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'



page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})

soup = BeautifulSoup(page_response.content, 'html.parser')

name = soup.find(class_="hlFld-ContribAuthor").find("a").text

abstract = soup.find(class_="abstractSection").find("p").text

print(f'Name : {name}nAbstract : {abstract}')

If you want to use selector then try:

from bs4 import BeautifulSoup

import requests



urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'



page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})

soup = BeautifulSoup(page_response.content, 'html.parser')

name = soup.select_one(".hlFld-ContribAuthor a").text

abstract = soup.select_one(".abstractSection p").text

print(f'Name : {name}nAbstract : {abstract}')

Output:

Name : Charles D. Ellis, CFA

Abstract :  One of the consequences of the shift in corporate retirement plans from defined benefit           to defined contribution is widespread retirement insecurity. Although most people in the           top one-third of economic affluence will be fine, for the other two-thirds—particularly           the bottom one-third—the problem is a serious threat. We can prevent this painful           future if we act sensibly and soon by raising the alarm with our corporate and government           leaders.

Finally, if you do not wish to see the gap between text within abstract then replace the line with abstract = ' '.join(soup.find(class_="abstractSection").find("p").text.split()).

edited Nov 14 '18 at 10:47

answered Nov 14 '18 at 9:29

SIM

10.4k3743

Thank you very much, would you know how to get the name just above the abstract part too? he's called "Charles D. Ellis, CFA"

– sammtt
Nov 14 '18 at 10:03

Sure. Check out the edit @sammtt!!

– SIM
Nov 14 '18 at 10:35

add a comment |

I'm unsure where you found this page_link to make use of. Try the below approach to get the content you wanna parse.

from bs4 import BeautifulSoup

import requests



urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'



page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})

soup = BeautifulSoup(page_response.content, 'html.parser')

name = soup.find(class_="hlFld-ContribAuthor").find("a").text

abstract = soup.find(class_="abstractSection").find("p").text

print(f'Name : {name}nAbstract : {abstract}')

If you want to use selector then try:

from bs4 import BeautifulSoup

import requests



urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'



page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})

soup = BeautifulSoup(page_response.content, 'html.parser')

name = soup.select_one(".hlFld-ContribAuthor a").text

abstract = soup.select_one(".abstractSection p").text

print(f'Name : {name}nAbstract : {abstract}')

Output:

Name : Charles D. Ellis, CFA

Abstract :  One of the consequences of the shift in corporate retirement plans from defined benefit           to defined contribution is widespread retirement insecurity. Although most people in the           top one-third of economic affluence will be fine, for the other two-thirds—particularly           the bottom one-third—the problem is a serious threat. We can prevent this painful           future if we act sensibly and soon by raising the alarm with our corporate and government           leaders.

Finally, if you do not wish to see the gap between text within abstract then replace the line with abstract = ' '.join(soup.find(class_="abstractSection").find("p").text.split()).

edited Nov 14 '18 at 10:47

answered Nov 14 '18 at 9:29

SIM

10.4k3743

I'm unsure where you found this page_link to make use of. Try the below approach to get the content you wanna parse.

from bs4 import BeautifulSoup

import requests



urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'



page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})

soup = BeautifulSoup(page_response.content, 'html.parser')

name = soup.find(class_="hlFld-ContribAuthor").find("a").text

abstract = soup.find(class_="abstractSection").find("p").text

print(f'Name : {name}nAbstract : {abstract}')

If you want to use selector then try:

from bs4 import BeautifulSoup

import requests



urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'



page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})

soup = BeautifulSoup(page_response.content, 'html.parser')

name = soup.select_one(".hlFld-ContribAuthor a").text

abstract = soup.select_one(".abstractSection p").text

print(f'Name : {name}nAbstract : {abstract}')

Output:

Name : Charles D. Ellis, CFA

Abstract :  One of the consequences of the shift in corporate retirement plans from defined benefit           to defined contribution is widespread retirement insecurity. Although most people in the           top one-third of economic affluence will be fine, for the other two-thirds—particularly           the bottom one-third—the problem is a serious threat. We can prevent this painful           future if we act sensibly and soon by raising the alarm with our corporate and government           leaders.

Finally, if you do not wish to see the gap between text within abstract then replace the line with abstract = ' '.join(soup.find(class_="abstractSection").find("p").text.split()).

edited Nov 14 '18 at 10:47

answered Nov 14 '18 at 9:29

SIM

10.4k3743

edited Nov 14 '18 at 10:47

answered Nov 14 '18 at 9:29

SIM

10.4k3743

answered Nov 14 '18 at 9:29

SIM

10.4k3743

answered Nov 14 '18 at 9:29

SIM

10.4k3743

Thank you very much, would you know how to get the name just above the abstract part too? he's called "Charles D. Ellis, CFA"

– sammtt
Nov 14 '18 at 10:03

Sure. Check out the edit @sammtt!!

– SIM
Nov 14 '18 at 10:35

add a comment |

Thank you very much, would you know how to get the name just above the abstract part too? he's called "Charles D. Ellis, CFA"

– sammtt
Nov 14 '18 at 10:03

Sure. Check out the edit @sammtt!!

– SIM
Nov 14 '18 at 10:35

Thank you very much, would you know how to get the name just above the abstract part too? he's called "Charles D. Ellis, CFA"

– sammtt
Nov 14 '18 at 10:03

Sure. Check out the edit @sammtt!!

– SIM
Nov 14 '18 at 10:35

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ndtyjky