BeautifulSoup not letting me get the text
up vote
1
down vote
favorite
I'm looking to get all the text in tag
It gives me the text in the console, but it doesn't put it in the .txt
file.
It works with body.text
, but not with article.text
. I don't know what to do.
import bs4 as bs
import urllib.request
#import re
sauce = urllib.request.urlopen('http://www.bodoniparavia.it/index.php/it/amministrazione-trasparente/bandi-di-gara-e-contratti.html')
soup = bs.BeautifulSoup(sauce,'lxml')
body = soup.body
article = body.find('article')
article1 = article.text
print(article1)
x = open('file.txt','w')
x.write(article1)
x.close
python beautifulsoup
add a comment |
up vote
1
down vote
favorite
I'm looking to get all the text in tag
It gives me the text in the console, but it doesn't put it in the .txt
file.
It works with body.text
, but not with article.text
. I don't know what to do.
import bs4 as bs
import urllib.request
#import re
sauce = urllib.request.urlopen('http://www.bodoniparavia.it/index.php/it/amministrazione-trasparente/bandi-di-gara-e-contratti.html')
soup = bs.BeautifulSoup(sauce,'lxml')
body = soup.body
article = body.find('article')
article1 = article.text
print(article1)
x = open('file.txt','w')
x.write(article1)
x.close
python beautifulsoup
The html on that page might be malformed. The lxml parser is quite strict, and will simply ignore parts of the document if the html is invalid. Have you tried using a different parser? crummy.com/software/BeautifulSoup/bs4/doc/…
– Håken Lid
Nov 11 at 18:15
It prints it to me, tho. I get all the text and everything is fine, but it doesn't get into the txt file
– Florentin Udrea
Nov 11 at 18:24
1
Don't know.x.close
should bex.close()
. But it ought to work even if you didn't close the file. What type isarticle.text
? Maybe tryx.write(article.get_text())
?
– Håken Lid
Nov 11 at 18:32
It works for me. No adjustments needed. (Not even an error because of thatx.close
but that may be coincidence.)
– usr2564301
Nov 11 at 21:55
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I'm looking to get all the text in tag
It gives me the text in the console, but it doesn't put it in the .txt
file.
It works with body.text
, but not with article.text
. I don't know what to do.
import bs4 as bs
import urllib.request
#import re
sauce = urllib.request.urlopen('http://www.bodoniparavia.it/index.php/it/amministrazione-trasparente/bandi-di-gara-e-contratti.html')
soup = bs.BeautifulSoup(sauce,'lxml')
body = soup.body
article = body.find('article')
article1 = article.text
print(article1)
x = open('file.txt','w')
x.write(article1)
x.close
python beautifulsoup
I'm looking to get all the text in tag
It gives me the text in the console, but it doesn't put it in the .txt
file.
It works with body.text
, but not with article.text
. I don't know what to do.
import bs4 as bs
import urllib.request
#import re
sauce = urllib.request.urlopen('http://www.bodoniparavia.it/index.php/it/amministrazione-trasparente/bandi-di-gara-e-contratti.html')
soup = bs.BeautifulSoup(sauce,'lxml')
body = soup.body
article = body.find('article')
article1 = article.text
print(article1)
x = open('file.txt','w')
x.write(article1)
x.close
python beautifulsoup
python beautifulsoup
asked Nov 11 at 18:10
Florentin Udrea
145
145
The html on that page might be malformed. The lxml parser is quite strict, and will simply ignore parts of the document if the html is invalid. Have you tried using a different parser? crummy.com/software/BeautifulSoup/bs4/doc/…
– Håken Lid
Nov 11 at 18:15
It prints it to me, tho. I get all the text and everything is fine, but it doesn't get into the txt file
– Florentin Udrea
Nov 11 at 18:24
1
Don't know.x.close
should bex.close()
. But it ought to work even if you didn't close the file. What type isarticle.text
? Maybe tryx.write(article.get_text())
?
– Håken Lid
Nov 11 at 18:32
It works for me. No adjustments needed. (Not even an error because of thatx.close
but that may be coincidence.)
– usr2564301
Nov 11 at 21:55
add a comment |
The html on that page might be malformed. The lxml parser is quite strict, and will simply ignore parts of the document if the html is invalid. Have you tried using a different parser? crummy.com/software/BeautifulSoup/bs4/doc/…
– Håken Lid
Nov 11 at 18:15
It prints it to me, tho. I get all the text and everything is fine, but it doesn't get into the txt file
– Florentin Udrea
Nov 11 at 18:24
1
Don't know.x.close
should bex.close()
. But it ought to work even if you didn't close the file. What type isarticle.text
? Maybe tryx.write(article.get_text())
?
– Håken Lid
Nov 11 at 18:32
It works for me. No adjustments needed. (Not even an error because of thatx.close
but that may be coincidence.)
– usr2564301
Nov 11 at 21:55
The html on that page might be malformed. The lxml parser is quite strict, and will simply ignore parts of the document if the html is invalid. Have you tried using a different parser? crummy.com/software/BeautifulSoup/bs4/doc/…
– Håken Lid
Nov 11 at 18:15
The html on that page might be malformed. The lxml parser is quite strict, and will simply ignore parts of the document if the html is invalid. Have you tried using a different parser? crummy.com/software/BeautifulSoup/bs4/doc/…
– Håken Lid
Nov 11 at 18:15
It prints it to me, tho. I get all the text and everything is fine, but it doesn't get into the txt file
– Florentin Udrea
Nov 11 at 18:24
It prints it to me, tho. I get all the text and everything is fine, but it doesn't get into the txt file
– Florentin Udrea
Nov 11 at 18:24
1
1
Don't know.
x.close
should be x.close()
. But it ought to work even if you didn't close the file. What type is article.text
? Maybe try x.write(article.get_text())
?– Håken Lid
Nov 11 at 18:32
Don't know.
x.close
should be x.close()
. But it ought to work even if you didn't close the file. What type is article.text
? Maybe try x.write(article.get_text())
?– Håken Lid
Nov 11 at 18:32
It works for me. No adjustments needed. (Not even an error because of that
x.close
but that may be coincidence.)– usr2564301
Nov 11 at 21:55
It works for me. No adjustments needed. (Not even an error because of that
x.close
but that may be coincidence.)– usr2564301
Nov 11 at 21:55
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
It seems to be working fine for me but try adding encoding = 'utf-8'
to the write statement. So the code would now look like this
import bs4 as bs
import urllib.request
#import re
sauce = urllib.request.urlopen('http://www.bodoniparavia.it/index.php/it/amministrazione-trasparente/bandi-di-gara-e-contratti.html')
soup = bs.BeautifulSoup(sauce,'lxml')
body = soup.body
article = body.find('article')
article1 = article.text
print(article1)
x = open('file.txt','w',encoding = 'utf-8')
x.write(article1)
x.close()
Adding the encoding does not do anything else for me, although I do admit it might be my system is set to use UTF8 by default. Or, since that page metadata says it iscontent="text/html; charset=utf-8"
, possibly the text is passed on unchanged.
– usr2564301
Nov 11 at 21:58
This got it working for me :) Thanks
– Florentin Udrea
Nov 13 at 17:49
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53251698%2fbeautifulsoup-not-letting-me-get-the-text%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
It seems to be working fine for me but try adding encoding = 'utf-8'
to the write statement. So the code would now look like this
import bs4 as bs
import urllib.request
#import re
sauce = urllib.request.urlopen('http://www.bodoniparavia.it/index.php/it/amministrazione-trasparente/bandi-di-gara-e-contratti.html')
soup = bs.BeautifulSoup(sauce,'lxml')
body = soup.body
article = body.find('article')
article1 = article.text
print(article1)
x = open('file.txt','w',encoding = 'utf-8')
x.write(article1)
x.close()
Adding the encoding does not do anything else for me, although I do admit it might be my system is set to use UTF8 by default. Or, since that page metadata says it iscontent="text/html; charset=utf-8"
, possibly the text is passed on unchanged.
– usr2564301
Nov 11 at 21:58
This got it working for me :) Thanks
– Florentin Udrea
Nov 13 at 17:49
add a comment |
up vote
1
down vote
accepted
It seems to be working fine for me but try adding encoding = 'utf-8'
to the write statement. So the code would now look like this
import bs4 as bs
import urllib.request
#import re
sauce = urllib.request.urlopen('http://www.bodoniparavia.it/index.php/it/amministrazione-trasparente/bandi-di-gara-e-contratti.html')
soup = bs.BeautifulSoup(sauce,'lxml')
body = soup.body
article = body.find('article')
article1 = article.text
print(article1)
x = open('file.txt','w',encoding = 'utf-8')
x.write(article1)
x.close()
Adding the encoding does not do anything else for me, although I do admit it might be my system is set to use UTF8 by default. Or, since that page metadata says it iscontent="text/html; charset=utf-8"
, possibly the text is passed on unchanged.
– usr2564301
Nov 11 at 21:58
This got it working for me :) Thanks
– Florentin Udrea
Nov 13 at 17:49
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
It seems to be working fine for me but try adding encoding = 'utf-8'
to the write statement. So the code would now look like this
import bs4 as bs
import urllib.request
#import re
sauce = urllib.request.urlopen('http://www.bodoniparavia.it/index.php/it/amministrazione-trasparente/bandi-di-gara-e-contratti.html')
soup = bs.BeautifulSoup(sauce,'lxml')
body = soup.body
article = body.find('article')
article1 = article.text
print(article1)
x = open('file.txt','w',encoding = 'utf-8')
x.write(article1)
x.close()
It seems to be working fine for me but try adding encoding = 'utf-8'
to the write statement. So the code would now look like this
import bs4 as bs
import urllib.request
#import re
sauce = urllib.request.urlopen('http://www.bodoniparavia.it/index.php/it/amministrazione-trasparente/bandi-di-gara-e-contratti.html')
soup = bs.BeautifulSoup(sauce,'lxml')
body = soup.body
article = body.find('article')
article1 = article.text
print(article1)
x = open('file.txt','w',encoding = 'utf-8')
x.write(article1)
x.close()
answered Nov 11 at 21:16
Nate Mahalingam
263
263
Adding the encoding does not do anything else for me, although I do admit it might be my system is set to use UTF8 by default. Or, since that page metadata says it iscontent="text/html; charset=utf-8"
, possibly the text is passed on unchanged.
– usr2564301
Nov 11 at 21:58
This got it working for me :) Thanks
– Florentin Udrea
Nov 13 at 17:49
add a comment |
Adding the encoding does not do anything else for me, although I do admit it might be my system is set to use UTF8 by default. Or, since that page metadata says it iscontent="text/html; charset=utf-8"
, possibly the text is passed on unchanged.
– usr2564301
Nov 11 at 21:58
This got it working for me :) Thanks
– Florentin Udrea
Nov 13 at 17:49
Adding the encoding does not do anything else for me, although I do admit it might be my system is set to use UTF8 by default. Or, since that page metadata says it is
content="text/html; charset=utf-8"
, possibly the text is passed on unchanged.– usr2564301
Nov 11 at 21:58
Adding the encoding does not do anything else for me, although I do admit it might be my system is set to use UTF8 by default. Or, since that page metadata says it is
content="text/html; charset=utf-8"
, possibly the text is passed on unchanged.– usr2564301
Nov 11 at 21:58
This got it working for me :) Thanks
– Florentin Udrea
Nov 13 at 17:49
This got it working for me :) Thanks
– Florentin Udrea
Nov 13 at 17:49
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53251698%2fbeautifulsoup-not-letting-me-get-the-text%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
The html on that page might be malformed. The lxml parser is quite strict, and will simply ignore parts of the document if the html is invalid. Have you tried using a different parser? crummy.com/software/BeautifulSoup/bs4/doc/…
– Håken Lid
Nov 11 at 18:15
It prints it to me, tho. I get all the text and everything is fine, but it doesn't get into the txt file
– Florentin Udrea
Nov 11 at 18:24
1
Don't know.
x.close
should bex.close()
. But it ought to work even if you didn't close the file. What type isarticle.text
? Maybe tryx.write(article.get_text())
?– Håken Lid
Nov 11 at 18:32
It works for me. No adjustments needed. (Not even an error because of that
x.close
but that may be coincidence.)– usr2564301
Nov 11 at 21:55