Issue with Beautifulsoup .find(text=true)
for row in soup.find_all('tr'):
cells = row.find_all('td')
if len(cells)==10: #Only extract table body not heading
A.append(cells[0].find(text=True))
B.append(cells[1].find(text=True))
C.append(cells[2].find('div').get('title'))
D.append(cells[3].find('a', href=True).get_text())
E.append(cells[4].find('a', href=True).get_text())
if cells[5].find(text=True) is None or cells[5].find('a', href=True) is None:
F.append(cells[5].find(text=True))
else:
Output = '-'.join([item.get_text() for item in cells[5].find_all('a')])
F.append(Output)
if cells[6].find(text=True) is None or cells[6].find('a', href=True) is None:
G.append(cells[6].find(text=True))
else:
G.append(cells[6].find('a', href=True).get_text())
if cells[7].find(text=True) is None or cells[7].find('a', href=True) is None:
H.append(cells[7].find(text=True))
else:
H.append(cells[7].find('a', href=True).get_text())
I.append(cells[8].find('span').get_text())
J.append(cells[9].find(Title=True))
The problem is that at cells 5,6 and 7 the desired output is sometimes inside a ahref tag and sometimes inside a td tag. The code works but the List F f.e. looks something like this:
0
T-001
1
TD-U1B
2 BMA-D2-USA
3 BMU-D3-USA
4
Position 2 and 3 are correct. These are the outputs from:
else:
Output = '-'.join([item.get_text() for item in cells[5].find_all('a')])
F.append(Output)
Position 0 and 1 are incorrect. These are the outputs from:
F.append(cells[5].find(text=True))
python python-3.x beautifulsoup
add a comment |
for row in soup.find_all('tr'):
cells = row.find_all('td')
if len(cells)==10: #Only extract table body not heading
A.append(cells[0].find(text=True))
B.append(cells[1].find(text=True))
C.append(cells[2].find('div').get('title'))
D.append(cells[3].find('a', href=True).get_text())
E.append(cells[4].find('a', href=True).get_text())
if cells[5].find(text=True) is None or cells[5].find('a', href=True) is None:
F.append(cells[5].find(text=True))
else:
Output = '-'.join([item.get_text() for item in cells[5].find_all('a')])
F.append(Output)
if cells[6].find(text=True) is None or cells[6].find('a', href=True) is None:
G.append(cells[6].find(text=True))
else:
G.append(cells[6].find('a', href=True).get_text())
if cells[7].find(text=True) is None or cells[7].find('a', href=True) is None:
H.append(cells[7].find(text=True))
else:
H.append(cells[7].find('a', href=True).get_text())
I.append(cells[8].find('span').get_text())
J.append(cells[9].find(Title=True))
The problem is that at cells 5,6 and 7 the desired output is sometimes inside a ahref tag and sometimes inside a td tag. The code works but the List F f.e. looks something like this:
0
T-001
1
TD-U1B
2 BMA-D2-USA
3 BMU-D3-USA
4
Position 2 and 3 are correct. These are the outputs from:
else:
Output = '-'.join([item.get_text() for item in cells[5].find_all('a')])
F.append(Output)
Position 0 and 1 are incorrect. These are the outputs from:
F.append(cells[5].find(text=True))
python python-3.x beautifulsoup
Hey @nule, welcome to SO, your code is a little cryptic, if you post a sample of your data and desired output that will help people better understand and help. Minimal, Complete, and Verifiable example
– Dalvenjia
Nov 13 '18 at 21:14
without knowing your html it hard to fix
– ewwink
Nov 14 '18 at 10:39
add a comment |
for row in soup.find_all('tr'):
cells = row.find_all('td')
if len(cells)==10: #Only extract table body not heading
A.append(cells[0].find(text=True))
B.append(cells[1].find(text=True))
C.append(cells[2].find('div').get('title'))
D.append(cells[3].find('a', href=True).get_text())
E.append(cells[4].find('a', href=True).get_text())
if cells[5].find(text=True) is None or cells[5].find('a', href=True) is None:
F.append(cells[5].find(text=True))
else:
Output = '-'.join([item.get_text() for item in cells[5].find_all('a')])
F.append(Output)
if cells[6].find(text=True) is None or cells[6].find('a', href=True) is None:
G.append(cells[6].find(text=True))
else:
G.append(cells[6].find('a', href=True).get_text())
if cells[7].find(text=True) is None or cells[7].find('a', href=True) is None:
H.append(cells[7].find(text=True))
else:
H.append(cells[7].find('a', href=True).get_text())
I.append(cells[8].find('span').get_text())
J.append(cells[9].find(Title=True))
The problem is that at cells 5,6 and 7 the desired output is sometimes inside a ahref tag and sometimes inside a td tag. The code works but the List F f.e. looks something like this:
0
T-001
1
TD-U1B
2 BMA-D2-USA
3 BMU-D3-USA
4
Position 2 and 3 are correct. These are the outputs from:
else:
Output = '-'.join([item.get_text() for item in cells[5].find_all('a')])
F.append(Output)
Position 0 and 1 are incorrect. These are the outputs from:
F.append(cells[5].find(text=True))
python python-3.x beautifulsoup
for row in soup.find_all('tr'):
cells = row.find_all('td')
if len(cells)==10: #Only extract table body not heading
A.append(cells[0].find(text=True))
B.append(cells[1].find(text=True))
C.append(cells[2].find('div').get('title'))
D.append(cells[3].find('a', href=True).get_text())
E.append(cells[4].find('a', href=True).get_text())
if cells[5].find(text=True) is None or cells[5].find('a', href=True) is None:
F.append(cells[5].find(text=True))
else:
Output = '-'.join([item.get_text() for item in cells[5].find_all('a')])
F.append(Output)
if cells[6].find(text=True) is None or cells[6].find('a', href=True) is None:
G.append(cells[6].find(text=True))
else:
G.append(cells[6].find('a', href=True).get_text())
if cells[7].find(text=True) is None or cells[7].find('a', href=True) is None:
H.append(cells[7].find(text=True))
else:
H.append(cells[7].find('a', href=True).get_text())
I.append(cells[8].find('span').get_text())
J.append(cells[9].find(Title=True))
The problem is that at cells 5,6 and 7 the desired output is sometimes inside a ahref tag and sometimes inside a td tag. The code works but the List F f.e. looks something like this:
0
T-001
1
TD-U1B
2 BMA-D2-USA
3 BMU-D3-USA
4
Position 2 and 3 are correct. These are the outputs from:
else:
Output = '-'.join([item.get_text() for item in cells[5].find_all('a')])
F.append(Output)
Position 0 and 1 are incorrect. These are the outputs from:
F.append(cells[5].find(text=True))
python python-3.x beautifulsoup
python python-3.x beautifulsoup
edited Nov 13 '18 at 21:03
Scott Hunter
33.3k74071
33.3k74071
asked Nov 13 '18 at 21:01
nulenule
1
1
Hey @nule, welcome to SO, your code is a little cryptic, if you post a sample of your data and desired output that will help people better understand and help. Minimal, Complete, and Verifiable example
– Dalvenjia
Nov 13 '18 at 21:14
without knowing your html it hard to fix
– ewwink
Nov 14 '18 at 10:39
add a comment |
Hey @nule, welcome to SO, your code is a little cryptic, if you post a sample of your data and desired output that will help people better understand and help. Minimal, Complete, and Verifiable example
– Dalvenjia
Nov 13 '18 at 21:14
without knowing your html it hard to fix
– ewwink
Nov 14 '18 at 10:39
Hey @nule, welcome to SO, your code is a little cryptic, if you post a sample of your data and desired output that will help people better understand and help. Minimal, Complete, and Verifiable example
– Dalvenjia
Nov 13 '18 at 21:14
Hey @nule, welcome to SO, your code is a little cryptic, if you post a sample of your data and desired output that will help people better understand and help. Minimal, Complete, and Verifiable example
– Dalvenjia
Nov 13 '18 at 21:14
without knowing your html it hard to fix
– ewwink
Nov 14 '18 at 10:39
without knowing your html it hard to fix
– ewwink
Nov 14 '18 at 10:39
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53289437%2fissue-with-beautifulsoup-findtext-true%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53289437%2fissue-with-beautifulsoup-findtext-true%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Hey @nule, welcome to SO, your code is a little cryptic, if you post a sample of your data and desired output that will help people better understand and help. Minimal, Complete, and Verifiable example
– Dalvenjia
Nov 13 '18 at 21:14
without knowing your html it hard to fix
– ewwink
Nov 14 '18 at 10:39