python xpath returns empty list - exilead
I'm fairly new to scraping with Python.
I am trying to obtain the number of search results from a query on Exilead. In this example I would like to get "
586,564 results".
This is the code I am running:
r = requests.get(URL, headers=headers)
tree = html.fromstring(r.text)
stats = tree.xpath('//[@id="searchform"]/div/div/small/text()')
This returns an empty list.
I copy-pasted the xPath directly from the elements' page.
As an alternative, I have tried using Beautiful soup:
html = r.text
soup = BeautifulSoup(html, 'xml')
stats = soup.find('small', {'class': 'pull-right'}).text
which returns a Attribute error: NoneType object does not have attribute text.
When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.
Does anyone know why this is happening and how this can be resolved?
Thanks a lot!
python xpath web-scraping beautifulsoup empty-list
add a comment |
I'm fairly new to scraping with Python.
I am trying to obtain the number of search results from a query on Exilead. In this example I would like to get "
586,564 results".
This is the code I am running:
r = requests.get(URL, headers=headers)
tree = html.fromstring(r.text)
stats = tree.xpath('//[@id="searchform"]/div/div/small/text()')
This returns an empty list.
I copy-pasted the xPath directly from the elements' page.
As an alternative, I have tried using Beautiful soup:
html = r.text
soup = BeautifulSoup(html, 'xml')
stats = soup.find('small', {'class': 'pull-right'}).text
which returns a Attribute error: NoneType object does not have attribute text.
When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.
Does anyone know why this is happening and how this can be resolved?
Thanks a lot!
python xpath web-scraping beautifulsoup empty-list
did you try the xpath without the /text() ? Then get the innerHTML
– Ywapom
Nov 14 '18 at 21:52
add a comment |
I'm fairly new to scraping with Python.
I am trying to obtain the number of search results from a query on Exilead. In this example I would like to get "
586,564 results".
This is the code I am running:
r = requests.get(URL, headers=headers)
tree = html.fromstring(r.text)
stats = tree.xpath('//[@id="searchform"]/div/div/small/text()')
This returns an empty list.
I copy-pasted the xPath directly from the elements' page.
As an alternative, I have tried using Beautiful soup:
html = r.text
soup = BeautifulSoup(html, 'xml')
stats = soup.find('small', {'class': 'pull-right'}).text
which returns a Attribute error: NoneType object does not have attribute text.
When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.
Does anyone know why this is happening and how this can be resolved?
Thanks a lot!
python xpath web-scraping beautifulsoup empty-list
I'm fairly new to scraping with Python.
I am trying to obtain the number of search results from a query on Exilead. In this example I would like to get "
586,564 results".
This is the code I am running:
r = requests.get(URL, headers=headers)
tree = html.fromstring(r.text)
stats = tree.xpath('//[@id="searchform"]/div/div/small/text()')
This returns an empty list.
I copy-pasted the xPath directly from the elements' page.
As an alternative, I have tried using Beautiful soup:
html = r.text
soup = BeautifulSoup(html, 'xml')
stats = soup.find('small', {'class': 'pull-right'}).text
which returns a Attribute error: NoneType object does not have attribute text.
When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.
Does anyone know why this is happening and how this can be resolved?
Thanks a lot!
python xpath web-scraping beautifulsoup empty-list
python xpath web-scraping beautifulsoup empty-list
asked Nov 14 '18 at 21:37
Elisa MacchiElisa Macchi
61
61
did you try the xpath without the /text() ? Then get the innerHTML
– Ywapom
Nov 14 '18 at 21:52
add a comment |
did you try the xpath without the /text() ? Then get the innerHTML
– Ywapom
Nov 14 '18 at 21:52
did you try the xpath without the /text() ? Then get the innerHTML
– Ywapom
Nov 14 '18 at 21:52
did you try the xpath without the /text() ? Then get the innerHTML
– Ywapom
Nov 14 '18 at 21:52
add a comment |
2 Answers
2
active
oldest
votes
When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.
This suggests that the data you're looking for is dynamically generated with javascript. You'll need to be able to see the element you're looking for in the html source.
To confirm this being the cause of your error, you could try something really simple like:
html = r.text
soup = BeautifulSoup(html, 'lxml')
*note the 'lxml' above.
And then manually check 'soup' to see if your desired element is there.
add a comment |
I can get that with a css selector combination of small.pull-right
to target the tag and the class name of the element.
from bs4 import BeautifulSoup
import requests
url = 'https://www.exalead.com/search/web/results/?q=lead+poisoning'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
print(soup.select_one('small.pull-right').text)
1
This one works.
– Kamikaze_goldfish
Nov 14 '18 at 22:32
1
This did the trick! thanks a lot! :)
– Elisa Macchi
Nov 14 '18 at 22:48
You are most welcome.
– QHarr
Nov 14 '18 at 22:49
Please remember to consider hitting the check mark next to the answer to check resolved. stackoverflow.com/help/someone-answers
– QHarr
Nov 15 '18 at 19:50
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309097%2fpython-xpath-returns-empty-list-exilead%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.
This suggests that the data you're looking for is dynamically generated with javascript. You'll need to be able to see the element you're looking for in the html source.
To confirm this being the cause of your error, you could try something really simple like:
html = r.text
soup = BeautifulSoup(html, 'lxml')
*note the 'lxml' above.
And then manually check 'soup' to see if your desired element is there.
add a comment |
When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.
This suggests that the data you're looking for is dynamically generated with javascript. You'll need to be able to see the element you're looking for in the html source.
To confirm this being the cause of your error, you could try something really simple like:
html = r.text
soup = BeautifulSoup(html, 'lxml')
*note the 'lxml' above.
And then manually check 'soup' to see if your desired element is there.
add a comment |
When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.
This suggests that the data you're looking for is dynamically generated with javascript. You'll need to be able to see the element you're looking for in the html source.
To confirm this being the cause of your error, you could try something really simple like:
html = r.text
soup = BeautifulSoup(html, 'lxml')
*note the 'lxml' above.
And then manually check 'soup' to see if your desired element is there.
When I checked the html source I realised I actually cannot find the element I am looking for (the number of results) on the source.
This suggests that the data you're looking for is dynamically generated with javascript. You'll need to be able to see the element you're looking for in the html source.
To confirm this being the cause of your error, you could try something really simple like:
html = r.text
soup = BeautifulSoup(html, 'lxml')
*note the 'lxml' above.
And then manually check 'soup' to see if your desired element is there.
answered Nov 14 '18 at 21:51
Matthew5JohnsonMatthew5Johnson
536
536
add a comment |
add a comment |
I can get that with a css selector combination of small.pull-right
to target the tag and the class name of the element.
from bs4 import BeautifulSoup
import requests
url = 'https://www.exalead.com/search/web/results/?q=lead+poisoning'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
print(soup.select_one('small.pull-right').text)
1
This one works.
– Kamikaze_goldfish
Nov 14 '18 at 22:32
1
This did the trick! thanks a lot! :)
– Elisa Macchi
Nov 14 '18 at 22:48
You are most welcome.
– QHarr
Nov 14 '18 at 22:49
Please remember to consider hitting the check mark next to the answer to check resolved. stackoverflow.com/help/someone-answers
– QHarr
Nov 15 '18 at 19:50
add a comment |
I can get that with a css selector combination of small.pull-right
to target the tag and the class name of the element.
from bs4 import BeautifulSoup
import requests
url = 'https://www.exalead.com/search/web/results/?q=lead+poisoning'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
print(soup.select_one('small.pull-right').text)
1
This one works.
– Kamikaze_goldfish
Nov 14 '18 at 22:32
1
This did the trick! thanks a lot! :)
– Elisa Macchi
Nov 14 '18 at 22:48
You are most welcome.
– QHarr
Nov 14 '18 at 22:49
Please remember to consider hitting the check mark next to the answer to check resolved. stackoverflow.com/help/someone-answers
– QHarr
Nov 15 '18 at 19:50
add a comment |
I can get that with a css selector combination of small.pull-right
to target the tag and the class name of the element.
from bs4 import BeautifulSoup
import requests
url = 'https://www.exalead.com/search/web/results/?q=lead+poisoning'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
print(soup.select_one('small.pull-right').text)
I can get that with a css selector combination of small.pull-right
to target the tag and the class name of the element.
from bs4 import BeautifulSoup
import requests
url = 'https://www.exalead.com/search/web/results/?q=lead+poisoning'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
print(soup.select_one('small.pull-right').text)
answered Nov 14 '18 at 22:03
QHarrQHarr
33.4k82043
33.4k82043
1
This one works.
– Kamikaze_goldfish
Nov 14 '18 at 22:32
1
This did the trick! thanks a lot! :)
– Elisa Macchi
Nov 14 '18 at 22:48
You are most welcome.
– QHarr
Nov 14 '18 at 22:49
Please remember to consider hitting the check mark next to the answer to check resolved. stackoverflow.com/help/someone-answers
– QHarr
Nov 15 '18 at 19:50
add a comment |
1
This one works.
– Kamikaze_goldfish
Nov 14 '18 at 22:32
1
This did the trick! thanks a lot! :)
– Elisa Macchi
Nov 14 '18 at 22:48
You are most welcome.
– QHarr
Nov 14 '18 at 22:49
Please remember to consider hitting the check mark next to the answer to check resolved. stackoverflow.com/help/someone-answers
– QHarr
Nov 15 '18 at 19:50
1
1
This one works.
– Kamikaze_goldfish
Nov 14 '18 at 22:32
This one works.
– Kamikaze_goldfish
Nov 14 '18 at 22:32
1
1
This did the trick! thanks a lot! :)
– Elisa Macchi
Nov 14 '18 at 22:48
This did the trick! thanks a lot! :)
– Elisa Macchi
Nov 14 '18 at 22:48
You are most welcome.
– QHarr
Nov 14 '18 at 22:49
You are most welcome.
– QHarr
Nov 14 '18 at 22:49
Please remember to consider hitting the check mark next to the answer to check resolved. stackoverflow.com/help/someone-answers
– QHarr
Nov 15 '18 at 19:50
Please remember to consider hitting the check mark next to the answer to check resolved. stackoverflow.com/help/someone-answers
– QHarr
Nov 15 '18 at 19:50
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309097%2fpython-xpath-returns-empty-list-exilead%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
did you try the xpath without the /text() ? Then get the innerHTML
– Ywapom
Nov 14 '18 at 21:52