Website section not appearing with BeautifulSoup












1















I'm trying to webscrape the abstract part of this website:



from bs4 import BeautifulSoup
urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'
page_response = requests.get(page_link, timeout=5, verify=False, headers={'User-Agent': 'Mozilla/5.0'})
soup2 = BeautifulSoup(page_response.content, 'html.parser')


and when I search for:



    soup2.find_all("div", {"class": "abstractSection"})


I do not get anything, whereas this is the part i'm interested in.
Any idea?










share|improve this question



























    1















    I'm trying to webscrape the abstract part of this website:



    from bs4 import BeautifulSoup
    urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'
    page_response = requests.get(page_link, timeout=5, verify=False, headers={'User-Agent': 'Mozilla/5.0'})
    soup2 = BeautifulSoup(page_response.content, 'html.parser')


    and when I search for:



        soup2.find_all("div", {"class": "abstractSection"})


    I do not get anything, whereas this is the part i'm interested in.
    Any idea?










    share|improve this question

























      1












      1








      1








      I'm trying to webscrape the abstract part of this website:



      from bs4 import BeautifulSoup
      urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'
      page_response = requests.get(page_link, timeout=5, verify=False, headers={'User-Agent': 'Mozilla/5.0'})
      soup2 = BeautifulSoup(page_response.content, 'html.parser')


      and when I search for:



          soup2.find_all("div", {"class": "abstractSection"})


      I do not get anything, whereas this is the part i'm interested in.
      Any idea?










      share|improve this question














      I'm trying to webscrape the abstract part of this website:



      from bs4 import BeautifulSoup
      urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'
      page_response = requests.get(page_link, timeout=5, verify=False, headers={'User-Agent': 'Mozilla/5.0'})
      soup2 = BeautifulSoup(page_response.content, 'html.parser')


      and when I search for:



          soup2.find_all("div", {"class": "abstractSection"})


      I do not get anything, whereas this is the part i'm interested in.
      Any idea?







      python-3.x web-scraping beautifulsoup






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 14 '18 at 8:56









      sammttsammtt

      6911




      6911
























          1 Answer
          1






          active

          oldest

          votes


















          1














          I'm unsure where you found this page_link to make use of. Try the below approach to get the content you wanna parse.



          from bs4 import BeautifulSoup
          import requests

          urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'

          page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})
          soup = BeautifulSoup(page_response.content, 'html.parser')
          name = soup.find(class_="hlFld-ContribAuthor").find("a").text
          abstract = soup.find(class_="abstractSection").find("p").text
          print(f'Name : {name}nAbstract : {abstract}')


          If you want to use selector then try:



          from bs4 import BeautifulSoup
          import requests

          urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'

          page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})
          soup = BeautifulSoup(page_response.content, 'html.parser')
          name = soup.select_one(".hlFld-ContribAuthor a").text
          abstract = soup.select_one(".abstractSection p").text
          print(f'Name : {name}nAbstract : {abstract}')


          Output:



          Name : Charles D. Ellis, CFA
          Abstract : One of the consequences of the shift in corporate retirement plans from defined benefit to defined contribution is widespread retirement insecurity. Although most people in the top one-third of economic affluence will be fine, for the other two-thirds—particularly the bottom one-third—the problem is a serious threat. We can prevent this painful future if we act sensibly and soon by raising the alarm with our corporate and government leaders.


          Finally, if you do not wish to see the gap between text within abstract then replace the line with abstract = ' '.join(soup.find(class_="abstractSection").find("p").text.split()).






          share|improve this answer


























          • Thank you very much, would you know how to get the name just above the abstract part too? he's called "Charles D. Ellis, CFA"

            – sammtt
            Nov 14 '18 at 10:03











          • Sure. Check out the edit @sammtt!!

            – SIM
            Nov 14 '18 at 10:35











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53296262%2fwebsite-section-not-appearing-with-beautifulsoup%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          I'm unsure where you found this page_link to make use of. Try the below approach to get the content you wanna parse.



          from bs4 import BeautifulSoup
          import requests

          urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'

          page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})
          soup = BeautifulSoup(page_response.content, 'html.parser')
          name = soup.find(class_="hlFld-ContribAuthor").find("a").text
          abstract = soup.find(class_="abstractSection").find("p").text
          print(f'Name : {name}nAbstract : {abstract}')


          If you want to use selector then try:



          from bs4 import BeautifulSoup
          import requests

          urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'

          page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})
          soup = BeautifulSoup(page_response.content, 'html.parser')
          name = soup.select_one(".hlFld-ContribAuthor a").text
          abstract = soup.select_one(".abstractSection p").text
          print(f'Name : {name}nAbstract : {abstract}')


          Output:



          Name : Charles D. Ellis, CFA
          Abstract : One of the consequences of the shift in corporate retirement plans from defined benefit to defined contribution is widespread retirement insecurity. Although most people in the top one-third of economic affluence will be fine, for the other two-thirds—particularly the bottom one-third—the problem is a serious threat. We can prevent this painful future if we act sensibly and soon by raising the alarm with our corporate and government leaders.


          Finally, if you do not wish to see the gap between text within abstract then replace the line with abstract = ' '.join(soup.find(class_="abstractSection").find("p").text.split()).






          share|improve this answer


























          • Thank you very much, would you know how to get the name just above the abstract part too? he's called "Charles D. Ellis, CFA"

            – sammtt
            Nov 14 '18 at 10:03











          • Sure. Check out the edit @sammtt!!

            – SIM
            Nov 14 '18 at 10:35
















          1














          I'm unsure where you found this page_link to make use of. Try the below approach to get the content you wanna parse.



          from bs4 import BeautifulSoup
          import requests

          urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'

          page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})
          soup = BeautifulSoup(page_response.content, 'html.parser')
          name = soup.find(class_="hlFld-ContribAuthor").find("a").text
          abstract = soup.find(class_="abstractSection").find("p").text
          print(f'Name : {name}nAbstract : {abstract}')


          If you want to use selector then try:



          from bs4 import BeautifulSoup
          import requests

          urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'

          page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})
          soup = BeautifulSoup(page_response.content, 'html.parser')
          name = soup.select_one(".hlFld-ContribAuthor a").text
          abstract = soup.select_one(".abstractSection p").text
          print(f'Name : {name}nAbstract : {abstract}')


          Output:



          Name : Charles D. Ellis, CFA
          Abstract : One of the consequences of the shift in corporate retirement plans from defined benefit to defined contribution is widespread retirement insecurity. Although most people in the top one-third of economic affluence will be fine, for the other two-thirds—particularly the bottom one-third—the problem is a serious threat. We can prevent this painful future if we act sensibly and soon by raising the alarm with our corporate and government leaders.


          Finally, if you do not wish to see the gap between text within abstract then replace the line with abstract = ' '.join(soup.find(class_="abstractSection").find("p").text.split()).






          share|improve this answer


























          • Thank you very much, would you know how to get the name just above the abstract part too? he's called "Charles D. Ellis, CFA"

            – sammtt
            Nov 14 '18 at 10:03











          • Sure. Check out the edit @sammtt!!

            – SIM
            Nov 14 '18 at 10:35














          1












          1








          1







          I'm unsure where you found this page_link to make use of. Try the below approach to get the content you wanna parse.



          from bs4 import BeautifulSoup
          import requests

          urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'

          page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})
          soup = BeautifulSoup(page_response.content, 'html.parser')
          name = soup.find(class_="hlFld-ContribAuthor").find("a").text
          abstract = soup.find(class_="abstractSection").find("p").text
          print(f'Name : {name}nAbstract : {abstract}')


          If you want to use selector then try:



          from bs4 import BeautifulSoup
          import requests

          urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'

          page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})
          soup = BeautifulSoup(page_response.content, 'html.parser')
          name = soup.select_one(".hlFld-ContribAuthor a").text
          abstract = soup.select_one(".abstractSection p").text
          print(f'Name : {name}nAbstract : {abstract}')


          Output:



          Name : Charles D. Ellis, CFA
          Abstract : One of the consequences of the shift in corporate retirement plans from defined benefit to defined contribution is widespread retirement insecurity. Although most people in the top one-third of economic affluence will be fine, for the other two-thirds—particularly the bottom one-third—the problem is a serious threat. We can prevent this painful future if we act sensibly and soon by raising the alarm with our corporate and government leaders.


          Finally, if you do not wish to see the gap between text within abstract then replace the line with abstract = ' '.join(soup.find(class_="abstractSection").find("p").text.split()).






          share|improve this answer















          I'm unsure where you found this page_link to make use of. Try the below approach to get the content you wanna parse.



          from bs4 import BeautifulSoup
          import requests

          urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'

          page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})
          soup = BeautifulSoup(page_response.content, 'html.parser')
          name = soup.find(class_="hlFld-ContribAuthor").find("a").text
          abstract = soup.find(class_="abstractSection").find("p").text
          print(f'Name : {name}nAbstract : {abstract}')


          If you want to use selector then try:



          from bs4 import BeautifulSoup
          import requests

          urlLink = 'https://www.cfapubs.org/doi/abs/10.2469/faj.v74.n4.2'

          page_response = requests.get(urlLink,headers={'User-Agent':'Mozilla/5.0'})
          soup = BeautifulSoup(page_response.content, 'html.parser')
          name = soup.select_one(".hlFld-ContribAuthor a").text
          abstract = soup.select_one(".abstractSection p").text
          print(f'Name : {name}nAbstract : {abstract}')


          Output:



          Name : Charles D. Ellis, CFA
          Abstract : One of the consequences of the shift in corporate retirement plans from defined benefit to defined contribution is widespread retirement insecurity. Although most people in the top one-third of economic affluence will be fine, for the other two-thirds—particularly the bottom one-third—the problem is a serious threat. We can prevent this painful future if we act sensibly and soon by raising the alarm with our corporate and government leaders.


          Finally, if you do not wish to see the gap between text within abstract then replace the line with abstract = ' '.join(soup.find(class_="abstractSection").find("p").text.split()).







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 14 '18 at 10:47

























          answered Nov 14 '18 at 9:29









          SIMSIM

          10.4k3743




          10.4k3743













          • Thank you very much, would you know how to get the name just above the abstract part too? he's called "Charles D. Ellis, CFA"

            – sammtt
            Nov 14 '18 at 10:03











          • Sure. Check out the edit @sammtt!!

            – SIM
            Nov 14 '18 at 10:35



















          • Thank you very much, would you know how to get the name just above the abstract part too? he's called "Charles D. Ellis, CFA"

            – sammtt
            Nov 14 '18 at 10:03











          • Sure. Check out the edit @sammtt!!

            – SIM
            Nov 14 '18 at 10:35

















          Thank you very much, would you know how to get the name just above the abstract part too? he's called "Charles D. Ellis, CFA"

          – sammtt
          Nov 14 '18 at 10:03





          Thank you very much, would you know how to get the name just above the abstract part too? he's called "Charles D. Ellis, CFA"

          – sammtt
          Nov 14 '18 at 10:03













          Sure. Check out the edit @sammtt!!

          – SIM
          Nov 14 '18 at 10:35





          Sure. Check out the edit @sammtt!!

          – SIM
          Nov 14 '18 at 10:35




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53296262%2fwebsite-section-not-appearing-with-beautifulsoup%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Florida Star v. B. J. F.

          Error while running script in elastic search , gateway timeout

          Adding quotations to stringified JSON object values