python xpath returns empty list - exilead












1















I'm fairly new to scraping with Python.
I am trying to obtain the number of search results from a query on Exilead. In this example I would like to get "
586,564 results".



This is the code I am running:



r = requests.get(URL, headers=headers)
tree = html.fromstring(r.text)
stats = tree.xpath('//[@id="searchform"]/div/div/small/text()')


This returns an empty list.



I copy-pasted the xPath directly from the elements' page.

As an alternative, I have tried using Beautiful soup:



html = r.text
soup = BeautifulSoup(html, 'xml')
stats = soup.find('small', {'class': 'pull-right'}).text


which returns a Attribute error: NoneType object does not have attribute text.



When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.



Does anyone know why this is happening and how this can be resolved?
Thanks a lot!










share|improve this question























  • did you try the xpath without the /text() ? Then get the innerHTML

    – Ywapom
    Nov 14 '18 at 21:52
















1















I'm fairly new to scraping with Python.
I am trying to obtain the number of search results from a query on Exilead. In this example I would like to get "
586,564 results".



This is the code I am running:



r = requests.get(URL, headers=headers)
tree = html.fromstring(r.text)
stats = tree.xpath('//[@id="searchform"]/div/div/small/text()')


This returns an empty list.



I copy-pasted the xPath directly from the elements' page.

As an alternative, I have tried using Beautiful soup:



html = r.text
soup = BeautifulSoup(html, 'xml')
stats = soup.find('small', {'class': 'pull-right'}).text


which returns a Attribute error: NoneType object does not have attribute text.



When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.



Does anyone know why this is happening and how this can be resolved?
Thanks a lot!










share|improve this question























  • did you try the xpath without the /text() ? Then get the innerHTML

    – Ywapom
    Nov 14 '18 at 21:52














1












1








1








I'm fairly new to scraping with Python.
I am trying to obtain the number of search results from a query on Exilead. In this example I would like to get "
586,564 results".



This is the code I am running:



r = requests.get(URL, headers=headers)
tree = html.fromstring(r.text)
stats = tree.xpath('//[@id="searchform"]/div/div/small/text()')


This returns an empty list.



I copy-pasted the xPath directly from the elements' page.

As an alternative, I have tried using Beautiful soup:



html = r.text
soup = BeautifulSoup(html, 'xml')
stats = soup.find('small', {'class': 'pull-right'}).text


which returns a Attribute error: NoneType object does not have attribute text.



When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.



Does anyone know why this is happening and how this can be resolved?
Thanks a lot!










share|improve this question














I'm fairly new to scraping with Python.
I am trying to obtain the number of search results from a query on Exilead. In this example I would like to get "
586,564 results".



This is the code I am running:



r = requests.get(URL, headers=headers)
tree = html.fromstring(r.text)
stats = tree.xpath('//[@id="searchform"]/div/div/small/text()')


This returns an empty list.



I copy-pasted the xPath directly from the elements' page.

As an alternative, I have tried using Beautiful soup:



html = r.text
soup = BeautifulSoup(html, 'xml')
stats = soup.find('small', {'class': 'pull-right'}).text


which returns a Attribute error: NoneType object does not have attribute text.



When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.



Does anyone know why this is happening and how this can be resolved?
Thanks a lot!







python xpath web-scraping beautifulsoup empty-list






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 14 '18 at 21:37









Elisa MacchiElisa Macchi

61




61













  • did you try the xpath without the /text() ? Then get the innerHTML

    – Ywapom
    Nov 14 '18 at 21:52



















  • did you try the xpath without the /text() ? Then get the innerHTML

    – Ywapom
    Nov 14 '18 at 21:52

















did you try the xpath without the /text() ? Then get the innerHTML

– Ywapom
Nov 14 '18 at 21:52





did you try the xpath without the /text() ? Then get the innerHTML

– Ywapom
Nov 14 '18 at 21:52












2 Answers
2






active

oldest

votes


















2















When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.




This suggests that the data you're looking for is dynamically generated with javascript. You'll need to be able to see the element you're looking for in the html source.



To confirm this being the cause of your error, you could try something really simple like:



html = r.text
soup = BeautifulSoup(html, 'lxml')


*note the 'lxml' above.



And then manually check 'soup' to see if your desired element is there.






share|improve this answer































    1














    I can get that with a css selector combination of small.pull-right to target the tag and the class name of the element.



    from bs4 import BeautifulSoup
    import requests
    url = 'https://www.exalead.com/search/web/results/?q=lead+poisoning'
    res = requests.get(url)
    soup = BeautifulSoup(res.content, "lxml")
    print(soup.select_one('small.pull-right').text)





    share|improve this answer



















    • 1





      This one works.

      – Kamikaze_goldfish
      Nov 14 '18 at 22:32






    • 1





      This did the trick! thanks a lot! :)

      – Elisa Macchi
      Nov 14 '18 at 22:48











    • You are most welcome.

      – QHarr
      Nov 14 '18 at 22:49











    • Please remember to consider hitting the check mark next to the answer to check resolved. stackoverflow.com/help/someone-answers

      – QHarr
      Nov 15 '18 at 19:50











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309097%2fpython-xpath-returns-empty-list-exilead%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2















    When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.




    This suggests that the data you're looking for is dynamically generated with javascript. You'll need to be able to see the element you're looking for in the html source.



    To confirm this being the cause of your error, you could try something really simple like:



    html = r.text
    soup = BeautifulSoup(html, 'lxml')


    *note the 'lxml' above.



    And then manually check 'soup' to see if your desired element is there.






    share|improve this answer




























      2















      When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.




      This suggests that the data you're looking for is dynamically generated with javascript. You'll need to be able to see the element you're looking for in the html source.



      To confirm this being the cause of your error, you could try something really simple like:



      html = r.text
      soup = BeautifulSoup(html, 'lxml')


      *note the 'lxml' above.



      And then manually check 'soup' to see if your desired element is there.






      share|improve this answer


























        2












        2








        2








        When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.




        This suggests that the data you're looking for is dynamically generated with javascript. You'll need to be able to see the element you're looking for in the html source.



        To confirm this being the cause of your error, you could try something really simple like:



        html = r.text
        soup = BeautifulSoup(html, 'lxml')


        *note the 'lxml' above.



        And then manually check 'soup' to see if your desired element is there.






        share|improve this answer














        When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.




        This suggests that the data you're looking for is dynamically generated with javascript. You'll need to be able to see the element you're looking for in the html source.



        To confirm this being the cause of your error, you could try something really simple like:



        html = r.text
        soup = BeautifulSoup(html, 'lxml')


        *note the 'lxml' above.



        And then manually check 'soup' to see if your desired element is there.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 14 '18 at 21:51









        Matthew5JohnsonMatthew5Johnson

        536




        536

























            1














            I can get that with a css selector combination of small.pull-right to target the tag and the class name of the element.



            from bs4 import BeautifulSoup
            import requests
            url = 'https://www.exalead.com/search/web/results/?q=lead+poisoning'
            res = requests.get(url)
            soup = BeautifulSoup(res.content, "lxml")
            print(soup.select_one('small.pull-right').text)





            share|improve this answer



















            • 1





              This one works.

              – Kamikaze_goldfish
              Nov 14 '18 at 22:32






            • 1





              This did the trick! thanks a lot! :)

              – Elisa Macchi
              Nov 14 '18 at 22:48











            • You are most welcome.

              – QHarr
              Nov 14 '18 at 22:49











            • Please remember to consider hitting the check mark next to the answer to check resolved. stackoverflow.com/help/someone-answers

              – QHarr
              Nov 15 '18 at 19:50
















            1














            I can get that with a css selector combination of small.pull-right to target the tag and the class name of the element.



            from bs4 import BeautifulSoup
            import requests
            url = 'https://www.exalead.com/search/web/results/?q=lead+poisoning'
            res = requests.get(url)
            soup = BeautifulSoup(res.content, "lxml")
            print(soup.select_one('small.pull-right').text)





            share|improve this answer



















            • 1





              This one works.

              – Kamikaze_goldfish
              Nov 14 '18 at 22:32






            • 1





              This did the trick! thanks a lot! :)

              – Elisa Macchi
              Nov 14 '18 at 22:48











            • You are most welcome.

              – QHarr
              Nov 14 '18 at 22:49











            • Please remember to consider hitting the check mark next to the answer to check resolved. stackoverflow.com/help/someone-answers

              – QHarr
              Nov 15 '18 at 19:50














            1












            1








            1







            I can get that with a css selector combination of small.pull-right to target the tag and the class name of the element.



            from bs4 import BeautifulSoup
            import requests
            url = 'https://www.exalead.com/search/web/results/?q=lead+poisoning'
            res = requests.get(url)
            soup = BeautifulSoup(res.content, "lxml")
            print(soup.select_one('small.pull-right').text)





            share|improve this answer













            I can get that with a css selector combination of small.pull-right to target the tag and the class name of the element.



            from bs4 import BeautifulSoup
            import requests
            url = 'https://www.exalead.com/search/web/results/?q=lead+poisoning'
            res = requests.get(url)
            soup = BeautifulSoup(res.content, "lxml")
            print(soup.select_one('small.pull-right').text)






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 14 '18 at 22:03









            QHarrQHarr

            33.4k82043




            33.4k82043








            • 1





              This one works.

              – Kamikaze_goldfish
              Nov 14 '18 at 22:32






            • 1





              This did the trick! thanks a lot! :)

              – Elisa Macchi
              Nov 14 '18 at 22:48











            • You are most welcome.

              – QHarr
              Nov 14 '18 at 22:49











            • Please remember to consider hitting the check mark next to the answer to check resolved. stackoverflow.com/help/someone-answers

              – QHarr
              Nov 15 '18 at 19:50














            • 1





              This one works.

              – Kamikaze_goldfish
              Nov 14 '18 at 22:32






            • 1





              This did the trick! thanks a lot! :)

              – Elisa Macchi
              Nov 14 '18 at 22:48











            • You are most welcome.

              – QHarr
              Nov 14 '18 at 22:49











            • Please remember to consider hitting the check mark next to the answer to check resolved. stackoverflow.com/help/someone-answers

              – QHarr
              Nov 15 '18 at 19:50








            1




            1





            This one works.

            – Kamikaze_goldfish
            Nov 14 '18 at 22:32





            This one works.

            – Kamikaze_goldfish
            Nov 14 '18 at 22:32




            1




            1





            This did the trick! thanks a lot! :)

            – Elisa Macchi
            Nov 14 '18 at 22:48





            This did the trick! thanks a lot! :)

            – Elisa Macchi
            Nov 14 '18 at 22:48













            You are most welcome.

            – QHarr
            Nov 14 '18 at 22:49





            You are most welcome.

            – QHarr
            Nov 14 '18 at 22:49













            Please remember to consider hitting the check mark next to the answer to check resolved. stackoverflow.com/help/someone-answers

            – QHarr
            Nov 15 '18 at 19:50





            Please remember to consider hitting the check mark next to the answer to check resolved. stackoverflow.com/help/someone-answers

            – QHarr
            Nov 15 '18 at 19:50


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309097%2fpython-xpath-returns-empty-list-exilead%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Florida Star v. B. J. F.

            Danny Elfman

            Retrieve a Users Dashboard in Tumblr with R and TumblR. Oauth Issues