find_elements_by_xpath() not producing the desired output python selenium scraping
I'm trying to find a tr
by its class of .tableOne
. Here is my code:
browser = webdriver.Chrome(executable_path=path, options=options)
cells = browser.find_elements_by_xpath('//*[@class="tableone"]')
But the output of the cells
variable is , an empty array.
Here is the html of the page:
<tbody class="tableUpper">
<tr class="tableone">
<td><a class="studentName" href="//www.abc.com"> student one</a></td>
<td><a href="//www.abc.com/overview"> <span class="id_one"></span> <span class="long">Place</span> <span class="short">Place</span></a></td>
<td class="hide-s">
<span class="state"></span> <span class="studentState">student_state</span>
</td>
</tr>
<tr class="tableone">..</tr>
<tr class="tableone">..</tr>
<tr class="tableone">..</tr>
<tr class="tableone">..</tr>
</tbody>
python-3.x selenium-webdriver xpath
|
show 4 more comments
I'm trying to find a tr
by its class of .tableOne
. Here is my code:
browser = webdriver.Chrome(executable_path=path, options=options)
cells = browser.find_elements_by_xpath('//*[@class="tableone"]')
But the output of the cells
variable is , an empty array.
Here is the html of the page:
<tbody class="tableUpper">
<tr class="tableone">
<td><a class="studentName" href="//www.abc.com"> student one</a></td>
<td><a href="//www.abc.com/overview"> <span class="id_one"></span> <span class="long">Place</span> <span class="short">Place</span></a></td>
<td class="hide-s">
<span class="state"></span> <span class="studentState">student_state</span>
</td>
</tr>
<tr class="tableone">..</tr>
<tr class="tableone">..</tr>
<tr class="tableone">..</tr>
<tr class="tableone">..</tr>
</tbody>
python-3.x selenium-webdriver xpath
Hello @Praveen, could you include the html in your post as text instead of using a image? Also one thought, it might be because you need to wait for the page to load before getting elements off it but I'm not really familiar with selenium so I can't really say that's defiantly the problem.
– Mike
Nov 12 at 10:54
<tbody class="tableUpper"> <tr class="tableone"> <td><a class="studentName" href="//www.abc.com"> student one</a></td> <td><a href="//www.abc.com/overview"> <span class="id_one"></span> <span class="long">Place</span> <span class="short">Place</span></a> </td> <td class="hide-s"><span class="state"></span> <span class="studentState">student_state</span> </td> </tr> <tr class="tableone">..</tr> <tr class="tableone">..</tr> <tr class="tableone">..</tr> <tr class="tableone">..</tr> </tbody>
– Praveen
Nov 12 at 11:26
There is problem with editing the question, I am working in it, since I am new to this site, I have some troubles
– Praveen
Nov 12 at 11:27
Ah, that's ok I've just edited your question so If you accept my edit I've put the code inline
– Mike
Nov 12 at 11:38
Thankyou for editing the code, I have approved it. Hope to see the answer soon
– Praveen
Nov 12 at 11:41
|
show 4 more comments
I'm trying to find a tr
by its class of .tableOne
. Here is my code:
browser = webdriver.Chrome(executable_path=path, options=options)
cells = browser.find_elements_by_xpath('//*[@class="tableone"]')
But the output of the cells
variable is , an empty array.
Here is the html of the page:
<tbody class="tableUpper">
<tr class="tableone">
<td><a class="studentName" href="//www.abc.com"> student one</a></td>
<td><a href="//www.abc.com/overview"> <span class="id_one"></span> <span class="long">Place</span> <span class="short">Place</span></a></td>
<td class="hide-s">
<span class="state"></span> <span class="studentState">student_state</span>
</td>
</tr>
<tr class="tableone">..</tr>
<tr class="tableone">..</tr>
<tr class="tableone">..</tr>
<tr class="tableone">..</tr>
</tbody>
python-3.x selenium-webdriver xpath
I'm trying to find a tr
by its class of .tableOne
. Here is my code:
browser = webdriver.Chrome(executable_path=path, options=options)
cells = browser.find_elements_by_xpath('//*[@class="tableone"]')
But the output of the cells
variable is , an empty array.
Here is the html of the page:
<tbody class="tableUpper">
<tr class="tableone">
<td><a class="studentName" href="//www.abc.com"> student one</a></td>
<td><a href="//www.abc.com/overview"> <span class="id_one"></span> <span class="long">Place</span> <span class="short">Place</span></a></td>
<td class="hide-s">
<span class="state"></span> <span class="studentState">student_state</span>
</td>
</tr>
<tr class="tableone">..</tr>
<tr class="tableone">..</tr>
<tr class="tableone">..</tr>
<tr class="tableone">..</tr>
</tbody>
python-3.x selenium-webdriver xpath
python-3.x selenium-webdriver xpath
edited Nov 12 at 11:43
asked Nov 12 at 10:44
Praveen
12
12
Hello @Praveen, could you include the html in your post as text instead of using a image? Also one thought, it might be because you need to wait for the page to load before getting elements off it but I'm not really familiar with selenium so I can't really say that's defiantly the problem.
– Mike
Nov 12 at 10:54
<tbody class="tableUpper"> <tr class="tableone"> <td><a class="studentName" href="//www.abc.com"> student one</a></td> <td><a href="//www.abc.com/overview"> <span class="id_one"></span> <span class="long">Place</span> <span class="short">Place</span></a> </td> <td class="hide-s"><span class="state"></span> <span class="studentState">student_state</span> </td> </tr> <tr class="tableone">..</tr> <tr class="tableone">..</tr> <tr class="tableone">..</tr> <tr class="tableone">..</tr> </tbody>
– Praveen
Nov 12 at 11:26
There is problem with editing the question, I am working in it, since I am new to this site, I have some troubles
– Praveen
Nov 12 at 11:27
Ah, that's ok I've just edited your question so If you accept my edit I've put the code inline
– Mike
Nov 12 at 11:38
Thankyou for editing the code, I have approved it. Hope to see the answer soon
– Praveen
Nov 12 at 11:41
|
show 4 more comments
Hello @Praveen, could you include the html in your post as text instead of using a image? Also one thought, it might be because you need to wait for the page to load before getting elements off it but I'm not really familiar with selenium so I can't really say that's defiantly the problem.
– Mike
Nov 12 at 10:54
<tbody class="tableUpper"> <tr class="tableone"> <td><a class="studentName" href="//www.abc.com"> student one</a></td> <td><a href="//www.abc.com/overview"> <span class="id_one"></span> <span class="long">Place</span> <span class="short">Place</span></a> </td> <td class="hide-s"><span class="state"></span> <span class="studentState">student_state</span> </td> </tr> <tr class="tableone">..</tr> <tr class="tableone">..</tr> <tr class="tableone">..</tr> <tr class="tableone">..</tr> </tbody>
– Praveen
Nov 12 at 11:26
There is problem with editing the question, I am working in it, since I am new to this site, I have some troubles
– Praveen
Nov 12 at 11:27
Ah, that's ok I've just edited your question so If you accept my edit I've put the code inline
– Mike
Nov 12 at 11:38
Thankyou for editing the code, I have approved it. Hope to see the answer soon
– Praveen
Nov 12 at 11:41
Hello @Praveen, could you include the html in your post as text instead of using a image? Also one thought, it might be because you need to wait for the page to load before getting elements off it but I'm not really familiar with selenium so I can't really say that's defiantly the problem.
– Mike
Nov 12 at 10:54
Hello @Praveen, could you include the html in your post as text instead of using a image? Also one thought, it might be because you need to wait for the page to load before getting elements off it but I'm not really familiar with selenium so I can't really say that's defiantly the problem.
– Mike
Nov 12 at 10:54
<tbody class="tableUpper"> <tr class="tableone"> <td><a class="studentName" href="//www.abc.com"> student one</a></td> <td><a href="//www.abc.com/overview"> <span class="id_one"></span> <span class="long">Place</span> <span class="short">Place</span></a> </td> <td class="hide-s"><span class="state"></span> <span class="studentState">student_state</span> </td> </tr> <tr class="tableone">..</tr> <tr class="tableone">..</tr> <tr class="tableone">..</tr> <tr class="tableone">..</tr> </tbody>
– Praveen
Nov 12 at 11:26
<tbody class="tableUpper"> <tr class="tableone"> <td><a class="studentName" href="//www.abc.com"> student one</a></td> <td><a href="//www.abc.com/overview"> <span class="id_one"></span> <span class="long">Place</span> <span class="short">Place</span></a> </td> <td class="hide-s"><span class="state"></span> <span class="studentState">student_state</span> </td> </tr> <tr class="tableone">..</tr> <tr class="tableone">..</tr> <tr class="tableone">..</tr> <tr class="tableone">..</tr> </tbody>
– Praveen
Nov 12 at 11:26
There is problem with editing the question, I am working in it, since I am new to this site, I have some troubles
– Praveen
Nov 12 at 11:27
There is problem with editing the question, I am working in it, since I am new to this site, I have some troubles
– Praveen
Nov 12 at 11:27
Ah, that's ok I've just edited your question so If you accept my edit I've put the code inline
– Mike
Nov 12 at 11:38
Ah, that's ok I've just edited your question so If you accept my edit I've put the code inline
– Mike
Nov 12 at 11:38
Thankyou for editing the code, I have approved it. Hope to see the answer soon
– Praveen
Nov 12 at 11:41
Thankyou for editing the code, I have approved it. Hope to see the answer soon
– Praveen
Nov 12 at 11:41
|
show 4 more comments
1 Answer
1
active
oldest
votes
Please try this:
import re
cells = browser.find_elements_by_xpath("//*[contains(local-name(), 'tr') and contains(@class, 'tableone')]")
for (e in cells):
insides = e.find_elements_by_xpath("./td")
for (i in insides):
result = re.search('">(.*)</', i.get_attribute("outerHTML"))
print result.group(1)
What this does is gets all the tr
elements that have class tableone
, then iterates through each element and lists all the tds
. Then iterates through the outerHTML of each td
and strips each string to get the text value.
It's quite unrefined and will return empty strings, I think. You might need to put some more work into the final product.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53260482%2ffind-elements-by-xpath-not-producing-the-desired-output-python-selenium-scrapi%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Please try this:
import re
cells = browser.find_elements_by_xpath("//*[contains(local-name(), 'tr') and contains(@class, 'tableone')]")
for (e in cells):
insides = e.find_elements_by_xpath("./td")
for (i in insides):
result = re.search('">(.*)</', i.get_attribute("outerHTML"))
print result.group(1)
What this does is gets all the tr
elements that have class tableone
, then iterates through each element and lists all the tds
. Then iterates through the outerHTML of each td
and strips each string to get the text value.
It's quite unrefined and will return empty strings, I think. You might need to put some more work into the final product.
add a comment |
Please try this:
import re
cells = browser.find_elements_by_xpath("//*[contains(local-name(), 'tr') and contains(@class, 'tableone')]")
for (e in cells):
insides = e.find_elements_by_xpath("./td")
for (i in insides):
result = re.search('">(.*)</', i.get_attribute("outerHTML"))
print result.group(1)
What this does is gets all the tr
elements that have class tableone
, then iterates through each element and lists all the tds
. Then iterates through the outerHTML of each td
and strips each string to get the text value.
It's quite unrefined and will return empty strings, I think. You might need to put some more work into the final product.
add a comment |
Please try this:
import re
cells = browser.find_elements_by_xpath("//*[contains(local-name(), 'tr') and contains(@class, 'tableone')]")
for (e in cells):
insides = e.find_elements_by_xpath("./td")
for (i in insides):
result = re.search('">(.*)</', i.get_attribute("outerHTML"))
print result.group(1)
What this does is gets all the tr
elements that have class tableone
, then iterates through each element and lists all the tds
. Then iterates through the outerHTML of each td
and strips each string to get the text value.
It's quite unrefined and will return empty strings, I think. You might need to put some more work into the final product.
Please try this:
import re
cells = browser.find_elements_by_xpath("//*[contains(local-name(), 'tr') and contains(@class, 'tableone')]")
for (e in cells):
insides = e.find_elements_by_xpath("./td")
for (i in insides):
result = re.search('">(.*)</', i.get_attribute("outerHTML"))
print result.group(1)
What this does is gets all the tr
elements that have class tableone
, then iterates through each element and lists all the tds
. Then iterates through the outerHTML of each td
and strips each string to get the text value.
It's quite unrefined and will return empty strings, I think. You might need to put some more work into the final product.
edited Nov 13 at 13:29
answered Nov 13 at 13:16
Alichino
7121617
7121617
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53260482%2ffind-elements-by-xpath-not-producing-the-desired-output-python-selenium-scrapi%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Hello @Praveen, could you include the html in your post as text instead of using a image? Also one thought, it might be because you need to wait for the page to load before getting elements off it but I'm not really familiar with selenium so I can't really say that's defiantly the problem.
– Mike
Nov 12 at 10:54
<tbody class="tableUpper"> <tr class="tableone"> <td><a class="studentName" href="//www.abc.com"> student one</a></td> <td><a href="//www.abc.com/overview"> <span class="id_one"></span> <span class="long">Place</span> <span class="short">Place</span></a> </td> <td class="hide-s"><span class="state"></span> <span class="studentState">student_state</span> </td> </tr> <tr class="tableone">..</tr> <tr class="tableone">..</tr> <tr class="tableone">..</tr> <tr class="tableone">..</tr> </tbody>
– Praveen
Nov 12 at 11:26
There is problem with editing the question, I am working in it, since I am new to this site, I have some troubles
– Praveen
Nov 12 at 11:27
Ah, that's ok I've just edited your question so If you accept my edit I've put the code inline
– Mike
Nov 12 at 11:38
Thankyou for editing the code, I have approved it. Hope to see the answer soon
– Praveen
Nov 12 at 11:41