Scrapy - Simple div[@class] response.xpath attribute not returning data












0















I have written some scrapy code to obtain HTML links from Indeed search page results. My start URL is a http address that provides a list of job ads. I am trying to scrape the URL for each job shown on the page and the job title. My problem appears to be the titles = response.xpath attribute. If I use a job specific attribute, I get data, but when I use the attribute shown below in my code I get nothing (not even the column headers). This is despite the fact that the attribute encompasses everything that I need. Any help welcomed, as I am just a beginner.



I'm outputting to a CSV file and I've used this code successfully elsewhere, so I'm wondering if it is something about the way they have coded the target URL page. It's driving me nuts!



from scrapy.spiders import Spider
from scrapy.selector import Selector
from ICcom4.items import Scrape4Item
from scrapy.linkextractors import LinkExtractor
from scrapy.utils.response import get_base_url
from scrapy.spiders import CSVFeedSpider
import requests

class MySpider(Spider):
name = "Scrape4"
allowed_domains = ["indeed.co.uk"]

start_urls = ['http://www.indeed.co.uk/jobs?as_and=a&as_phr=&as_any=&as_not=IT+construction&as_ttl=Project+Manager&as_cmp=&jt=contract&st=&salary=%C2%A310K-%C2%A3999K&radius=25&l=&fromage=2&limit=50&sort=date&psf=advsrch',]

def parse(self, response):
titles = response.xpath('//div[@class="jobsearch-SerpJobCard row result clickcard"]')

items =
for titles in titles:
item = Scrape4Item()
base_url = get_base_url(response)
home_url = ("http://www.indeed.co.uk")
item ['_pageURL'] = base_url
item ['role_titletext'] = titles.xpath('//h2/a/text()').extract()

items.append(item)
return items


Thanks for the guidance Elena, but I'm afraid that your suggestions made no difference. I still get no data return. I have resolved the duplicate variable (for titles in titles1) which I tested as a standalone change satisfactorily. However, the other suggestions made no difference. I also tried running the scrape with just the request for a URL to be returned, and it still didn't work. Revised example is below.



from scrapy.spiders import Spider
from scrapy.selector import Selector
from ICcom4.items import Scrape4Item
from scrapy.linkextractors import LinkExtractor
from scrapy.utils.response import get_base_url
from scrapy.spiders import CSVFeedSpider
import requests

class MySpider(Spider):
name = "Scrape4"
allowed_domains = ["indeed.co.uk"]

start_urls = ['http://www.indeed.co.uk/jobs?as_and=a&as_phr=&as_any=&as_not=IT+construction&as_ttl=Project+Manager&as_cmp=&jt=contract&st=&salary=%C2%A310K-%C2%A3999K&radius=25&l=&fromage=2&limit=50&sort=date&psf=advsrch',]

def parse(self, response):
titles1 = response.css('div.jobsearch-SerpJobCard.row.result.clickcard')
#also tried as titles = response.css('div.jobsearch-SerpJobCard row result clickcard')

items =
for titles in titles1:
item = Scrape4Item()
base_url = get_base_url(response)
home_url = ("http://www.indeed.co.uk")
item ['_pageURL'] = base_url
item ['role_titletext'] = titles.xpath('.//h2/a/text()').extract()
#also tried as item ['role_titletext'] = titles.css('h2 a::text').extract()
items.append(item)
return items


EDIT:
Thanks you Thiago. That's cracked it! You're a superstar!
Thanks to you and Elena for having patience with a newbie.
Just to complete the circle for anybody else, the final code that I used that worked was as below. This returns the search page url and the job title :-) ;



from scrapy.spiders import Spider
from scrapy.selector import Selector
from ICcom4.items import Scrape4Item
from scrapy.linkextractors import LinkExtractor
from scrapy.utils.response import get_base_url
from scrapy.spiders import CSVFeedSpider
import requests

class MySpider(Spider):
name = "Scrape4"
allowed_domains = ["indeed.co.uk"]
start_urls = ['http://www.indeed.co.uk/jobs?as_and=a&as_phr=&as_any=&as_not=IT+construction&as_ttl=Project+Manager&as_cmp=&jt=contract&st=&salary=%C2%A310K-%C2%A3999K&radius=25&l=&fromage=2&limit=50&sort=date&psf=advsrch',]

def parse(self, response):
titles = response.css('.jobsearch-SerpJobCard')
items =
for titles in titles:
item = Scrape4Item()
base_url = get_base_url(response)
home_url = ("http://www.indeed.co.uk")
item ['_pageURL'] = base_url
item ['role_titletext'] = title.xpath('.//h2/a/@title').extract()
items.append(item)
return items









share|improve this question

























  • Try to use response.css('div.jobsearch-SerpJobCard.row.result.clickcard') if you want to use all classes. But you can decrease this amount. Then you have duplicate variable here for titles in titles:. And also extraction is wrong. Use .xpath('.//h2/a/text()').extract() or .css('h2 a::text').extract()

    – vezunchik
    Nov 15 '18 at 11:58


















0















I have written some scrapy code to obtain HTML links from Indeed search page results. My start URL is a http address that provides a list of job ads. I am trying to scrape the URL for each job shown on the page and the job title. My problem appears to be the titles = response.xpath attribute. If I use a job specific attribute, I get data, but when I use the attribute shown below in my code I get nothing (not even the column headers). This is despite the fact that the attribute encompasses everything that I need. Any help welcomed, as I am just a beginner.



I'm outputting to a CSV file and I've used this code successfully elsewhere, so I'm wondering if it is something about the way they have coded the target URL page. It's driving me nuts!



from scrapy.spiders import Spider
from scrapy.selector import Selector
from ICcom4.items import Scrape4Item
from scrapy.linkextractors import LinkExtractor
from scrapy.utils.response import get_base_url
from scrapy.spiders import CSVFeedSpider
import requests

class MySpider(Spider):
name = "Scrape4"
allowed_domains = ["indeed.co.uk"]

start_urls = ['http://www.indeed.co.uk/jobs?as_and=a&as_phr=&as_any=&as_not=IT+construction&as_ttl=Project+Manager&as_cmp=&jt=contract&st=&salary=%C2%A310K-%C2%A3999K&radius=25&l=&fromage=2&limit=50&sort=date&psf=advsrch',]

def parse(self, response):
titles = response.xpath('//div[@class="jobsearch-SerpJobCard row result clickcard"]')

items =
for titles in titles:
item = Scrape4Item()
base_url = get_base_url(response)
home_url = ("http://www.indeed.co.uk")
item ['_pageURL'] = base_url
item ['role_titletext'] = titles.xpath('//h2/a/text()').extract()

items.append(item)
return items


Thanks for the guidance Elena, but I'm afraid that your suggestions made no difference. I still get no data return. I have resolved the duplicate variable (for titles in titles1) which I tested as a standalone change satisfactorily. However, the other suggestions made no difference. I also tried running the scrape with just the request for a URL to be returned, and it still didn't work. Revised example is below.



from scrapy.spiders import Spider
from scrapy.selector import Selector
from ICcom4.items import Scrape4Item
from scrapy.linkextractors import LinkExtractor
from scrapy.utils.response import get_base_url
from scrapy.spiders import CSVFeedSpider
import requests

class MySpider(Spider):
name = "Scrape4"
allowed_domains = ["indeed.co.uk"]

start_urls = ['http://www.indeed.co.uk/jobs?as_and=a&as_phr=&as_any=&as_not=IT+construction&as_ttl=Project+Manager&as_cmp=&jt=contract&st=&salary=%C2%A310K-%C2%A3999K&radius=25&l=&fromage=2&limit=50&sort=date&psf=advsrch',]

def parse(self, response):
titles1 = response.css('div.jobsearch-SerpJobCard.row.result.clickcard')
#also tried as titles = response.css('div.jobsearch-SerpJobCard row result clickcard')

items =
for titles in titles1:
item = Scrape4Item()
base_url = get_base_url(response)
home_url = ("http://www.indeed.co.uk")
item ['_pageURL'] = base_url
item ['role_titletext'] = titles.xpath('.//h2/a/text()').extract()
#also tried as item ['role_titletext'] = titles.css('h2 a::text').extract()
items.append(item)
return items


EDIT:
Thanks you Thiago. That's cracked it! You're a superstar!
Thanks to you and Elena for having patience with a newbie.
Just to complete the circle for anybody else, the final code that I used that worked was as below. This returns the search page url and the job title :-) ;



from scrapy.spiders import Spider
from scrapy.selector import Selector
from ICcom4.items import Scrape4Item
from scrapy.linkextractors import LinkExtractor
from scrapy.utils.response import get_base_url
from scrapy.spiders import CSVFeedSpider
import requests

class MySpider(Spider):
name = "Scrape4"
allowed_domains = ["indeed.co.uk"]
start_urls = ['http://www.indeed.co.uk/jobs?as_and=a&as_phr=&as_any=&as_not=IT+construction&as_ttl=Project+Manager&as_cmp=&jt=contract&st=&salary=%C2%A310K-%C2%A3999K&radius=25&l=&fromage=2&limit=50&sort=date&psf=advsrch',]

def parse(self, response):
titles = response.css('.jobsearch-SerpJobCard')
items =
for titles in titles:
item = Scrape4Item()
base_url = get_base_url(response)
home_url = ("http://www.indeed.co.uk")
item ['_pageURL'] = base_url
item ['role_titletext'] = title.xpath('.//h2/a/@title').extract()
items.append(item)
return items









share|improve this question

























  • Try to use response.css('div.jobsearch-SerpJobCard.row.result.clickcard') if you want to use all classes. But you can decrease this amount. Then you have duplicate variable here for titles in titles:. And also extraction is wrong. Use .xpath('.//h2/a/text()').extract() or .css('h2 a::text').extract()

    – vezunchik
    Nov 15 '18 at 11:58
















0












0








0








I have written some scrapy code to obtain HTML links from Indeed search page results. My start URL is a http address that provides a list of job ads. I am trying to scrape the URL for each job shown on the page and the job title. My problem appears to be the titles = response.xpath attribute. If I use a job specific attribute, I get data, but when I use the attribute shown below in my code I get nothing (not even the column headers). This is despite the fact that the attribute encompasses everything that I need. Any help welcomed, as I am just a beginner.



I'm outputting to a CSV file and I've used this code successfully elsewhere, so I'm wondering if it is something about the way they have coded the target URL page. It's driving me nuts!



from scrapy.spiders import Spider
from scrapy.selector import Selector
from ICcom4.items import Scrape4Item
from scrapy.linkextractors import LinkExtractor
from scrapy.utils.response import get_base_url
from scrapy.spiders import CSVFeedSpider
import requests

class MySpider(Spider):
name = "Scrape4"
allowed_domains = ["indeed.co.uk"]

start_urls = ['http://www.indeed.co.uk/jobs?as_and=a&as_phr=&as_any=&as_not=IT+construction&as_ttl=Project+Manager&as_cmp=&jt=contract&st=&salary=%C2%A310K-%C2%A3999K&radius=25&l=&fromage=2&limit=50&sort=date&psf=advsrch',]

def parse(self, response):
titles = response.xpath('//div[@class="jobsearch-SerpJobCard row result clickcard"]')

items =
for titles in titles:
item = Scrape4Item()
base_url = get_base_url(response)
home_url = ("http://www.indeed.co.uk")
item ['_pageURL'] = base_url
item ['role_titletext'] = titles.xpath('//h2/a/text()').extract()

items.append(item)
return items


Thanks for the guidance Elena, but I'm afraid that your suggestions made no difference. I still get no data return. I have resolved the duplicate variable (for titles in titles1) which I tested as a standalone change satisfactorily. However, the other suggestions made no difference. I also tried running the scrape with just the request for a URL to be returned, and it still didn't work. Revised example is below.



from scrapy.spiders import Spider
from scrapy.selector import Selector
from ICcom4.items import Scrape4Item
from scrapy.linkextractors import LinkExtractor
from scrapy.utils.response import get_base_url
from scrapy.spiders import CSVFeedSpider
import requests

class MySpider(Spider):
name = "Scrape4"
allowed_domains = ["indeed.co.uk"]

start_urls = ['http://www.indeed.co.uk/jobs?as_and=a&as_phr=&as_any=&as_not=IT+construction&as_ttl=Project+Manager&as_cmp=&jt=contract&st=&salary=%C2%A310K-%C2%A3999K&radius=25&l=&fromage=2&limit=50&sort=date&psf=advsrch',]

def parse(self, response):
titles1 = response.css('div.jobsearch-SerpJobCard.row.result.clickcard')
#also tried as titles = response.css('div.jobsearch-SerpJobCard row result clickcard')

items =
for titles in titles1:
item = Scrape4Item()
base_url = get_base_url(response)
home_url = ("http://www.indeed.co.uk")
item ['_pageURL'] = base_url
item ['role_titletext'] = titles.xpath('.//h2/a/text()').extract()
#also tried as item ['role_titletext'] = titles.css('h2 a::text').extract()
items.append(item)
return items


EDIT:
Thanks you Thiago. That's cracked it! You're a superstar!
Thanks to you and Elena for having patience with a newbie.
Just to complete the circle for anybody else, the final code that I used that worked was as below. This returns the search page url and the job title :-) ;



from scrapy.spiders import Spider
from scrapy.selector import Selector
from ICcom4.items import Scrape4Item
from scrapy.linkextractors import LinkExtractor
from scrapy.utils.response import get_base_url
from scrapy.spiders import CSVFeedSpider
import requests

class MySpider(Spider):
name = "Scrape4"
allowed_domains = ["indeed.co.uk"]
start_urls = ['http://www.indeed.co.uk/jobs?as_and=a&as_phr=&as_any=&as_not=IT+construction&as_ttl=Project+Manager&as_cmp=&jt=contract&st=&salary=%C2%A310K-%C2%A3999K&radius=25&l=&fromage=2&limit=50&sort=date&psf=advsrch',]

def parse(self, response):
titles = response.css('.jobsearch-SerpJobCard')
items =
for titles in titles:
item = Scrape4Item()
base_url = get_base_url(response)
home_url = ("http://www.indeed.co.uk")
item ['_pageURL'] = base_url
item ['role_titletext'] = title.xpath('.//h2/a/@title').extract()
items.append(item)
return items









share|improve this question
















I have written some scrapy code to obtain HTML links from Indeed search page results. My start URL is a http address that provides a list of job ads. I am trying to scrape the URL for each job shown on the page and the job title. My problem appears to be the titles = response.xpath attribute. If I use a job specific attribute, I get data, but when I use the attribute shown below in my code I get nothing (not even the column headers). This is despite the fact that the attribute encompasses everything that I need. Any help welcomed, as I am just a beginner.



I'm outputting to a CSV file and I've used this code successfully elsewhere, so I'm wondering if it is something about the way they have coded the target URL page. It's driving me nuts!



from scrapy.spiders import Spider
from scrapy.selector import Selector
from ICcom4.items import Scrape4Item
from scrapy.linkextractors import LinkExtractor
from scrapy.utils.response import get_base_url
from scrapy.spiders import CSVFeedSpider
import requests

class MySpider(Spider):
name = "Scrape4"
allowed_domains = ["indeed.co.uk"]

start_urls = ['http://www.indeed.co.uk/jobs?as_and=a&as_phr=&as_any=&as_not=IT+construction&as_ttl=Project+Manager&as_cmp=&jt=contract&st=&salary=%C2%A310K-%C2%A3999K&radius=25&l=&fromage=2&limit=50&sort=date&psf=advsrch',]

def parse(self, response):
titles = response.xpath('//div[@class="jobsearch-SerpJobCard row result clickcard"]')

items =
for titles in titles:
item = Scrape4Item()
base_url = get_base_url(response)
home_url = ("http://www.indeed.co.uk")
item ['_pageURL'] = base_url
item ['role_titletext'] = titles.xpath('//h2/a/text()').extract()

items.append(item)
return items


Thanks for the guidance Elena, but I'm afraid that your suggestions made no difference. I still get no data return. I have resolved the duplicate variable (for titles in titles1) which I tested as a standalone change satisfactorily. However, the other suggestions made no difference. I also tried running the scrape with just the request for a URL to be returned, and it still didn't work. Revised example is below.



from scrapy.spiders import Spider
from scrapy.selector import Selector
from ICcom4.items import Scrape4Item
from scrapy.linkextractors import LinkExtractor
from scrapy.utils.response import get_base_url
from scrapy.spiders import CSVFeedSpider
import requests

class MySpider(Spider):
name = "Scrape4"
allowed_domains = ["indeed.co.uk"]

start_urls = ['http://www.indeed.co.uk/jobs?as_and=a&as_phr=&as_any=&as_not=IT+construction&as_ttl=Project+Manager&as_cmp=&jt=contract&st=&salary=%C2%A310K-%C2%A3999K&radius=25&l=&fromage=2&limit=50&sort=date&psf=advsrch',]

def parse(self, response):
titles1 = response.css('div.jobsearch-SerpJobCard.row.result.clickcard')
#also tried as titles = response.css('div.jobsearch-SerpJobCard row result clickcard')

items =
for titles in titles1:
item = Scrape4Item()
base_url = get_base_url(response)
home_url = ("http://www.indeed.co.uk")
item ['_pageURL'] = base_url
item ['role_titletext'] = titles.xpath('.//h2/a/text()').extract()
#also tried as item ['role_titletext'] = titles.css('h2 a::text').extract()
items.append(item)
return items


EDIT:
Thanks you Thiago. That's cracked it! You're a superstar!
Thanks to you and Elena for having patience with a newbie.
Just to complete the circle for anybody else, the final code that I used that worked was as below. This returns the search page url and the job title :-) ;



from scrapy.spiders import Spider
from scrapy.selector import Selector
from ICcom4.items import Scrape4Item
from scrapy.linkextractors import LinkExtractor
from scrapy.utils.response import get_base_url
from scrapy.spiders import CSVFeedSpider
import requests

class MySpider(Spider):
name = "Scrape4"
allowed_domains = ["indeed.co.uk"]
start_urls = ['http://www.indeed.co.uk/jobs?as_and=a&as_phr=&as_any=&as_not=IT+construction&as_ttl=Project+Manager&as_cmp=&jt=contract&st=&salary=%C2%A310K-%C2%A3999K&radius=25&l=&fromage=2&limit=50&sort=date&psf=advsrch',]

def parse(self, response):
titles = response.css('.jobsearch-SerpJobCard')
items =
for titles in titles:
item = Scrape4Item()
base_url = get_base_url(response)
home_url = ("http://www.indeed.co.uk")
item ['_pageURL'] = base_url
item ['role_titletext'] = title.xpath('.//h2/a/@title').extract()
items.append(item)
return items






python scrapy






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 17 at 21:37









Thiago Curvelo

2,1451629




2,1451629










asked Nov 14 '18 at 21:47









JamwgJamwg

32




32













  • Try to use response.css('div.jobsearch-SerpJobCard.row.result.clickcard') if you want to use all classes. But you can decrease this amount. Then you have duplicate variable here for titles in titles:. And also extraction is wrong. Use .xpath('.//h2/a/text()').extract() or .css('h2 a::text').extract()

    – vezunchik
    Nov 15 '18 at 11:58





















  • Try to use response.css('div.jobsearch-SerpJobCard.row.result.clickcard') if you want to use all classes. But you can decrease this amount. Then you have duplicate variable here for titles in titles:. And also extraction is wrong. Use .xpath('.//h2/a/text()').extract() or .css('h2 a::text').extract()

    – vezunchik
    Nov 15 '18 at 11:58



















Try to use response.css('div.jobsearch-SerpJobCard.row.result.clickcard') if you want to use all classes. But you can decrease this amount. Then you have duplicate variable here for titles in titles:. And also extraction is wrong. Use .xpath('.//h2/a/text()').extract() or .css('h2 a::text').extract()

– vezunchik
Nov 15 '18 at 11:58







Try to use response.css('div.jobsearch-SerpJobCard.row.result.clickcard') if you want to use all classes. But you can decrease this amount. Then you have duplicate variable here for titles in titles:. And also extraction is wrong. Use .xpath('.//h2/a/text()').extract() or .css('h2 a::text').extract()

– vezunchik
Nov 15 '18 at 11:58














1 Answer
1






active

oldest

votes


















0














I noticed that there is no clickcard class int the downloaded HTML code, but it is there after page load. Surely it is added by some javascript code.
As Scrapy doesn't execute javascript, you may want to double check the page source when some selector fails unexpectedly (instead of 'inspect element').
Besides that, a shorter selector like '.jobsearch-SerpJobCard' would do the job.



Regarding the question in the title, to get an attribute data you may use xpath('.//div/@class') or css('div::attr(class)'). E.g:



def parse(self, response):
titles = response.css('.jobsearch-SerpJobCard')
for title in titles:
item = {}
item['role_titletext'] = title.xpath('.//h2/a/@title').get()
# or
# item['role_titletext'] = title.css('h2 a::attr(title)').get()
yield item





share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309208%2fscrapy-simple-divclass-response-xpath-attribute-not-returning-data%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    I noticed that there is no clickcard class int the downloaded HTML code, but it is there after page load. Surely it is added by some javascript code.
    As Scrapy doesn't execute javascript, you may want to double check the page source when some selector fails unexpectedly (instead of 'inspect element').
    Besides that, a shorter selector like '.jobsearch-SerpJobCard' would do the job.



    Regarding the question in the title, to get an attribute data you may use xpath('.//div/@class') or css('div::attr(class)'). E.g:



    def parse(self, response):
    titles = response.css('.jobsearch-SerpJobCard')
    for title in titles:
    item = {}
    item['role_titletext'] = title.xpath('.//h2/a/@title').get()
    # or
    # item['role_titletext'] = title.css('h2 a::attr(title)').get()
    yield item





    share|improve this answer






























      0














      I noticed that there is no clickcard class int the downloaded HTML code, but it is there after page load. Surely it is added by some javascript code.
      As Scrapy doesn't execute javascript, you may want to double check the page source when some selector fails unexpectedly (instead of 'inspect element').
      Besides that, a shorter selector like '.jobsearch-SerpJobCard' would do the job.



      Regarding the question in the title, to get an attribute data you may use xpath('.//div/@class') or css('div::attr(class)'). E.g:



      def parse(self, response):
      titles = response.css('.jobsearch-SerpJobCard')
      for title in titles:
      item = {}
      item['role_titletext'] = title.xpath('.//h2/a/@title').get()
      # or
      # item['role_titletext'] = title.css('h2 a::attr(title)').get()
      yield item





      share|improve this answer




























        0












        0








        0







        I noticed that there is no clickcard class int the downloaded HTML code, but it is there after page load. Surely it is added by some javascript code.
        As Scrapy doesn't execute javascript, you may want to double check the page source when some selector fails unexpectedly (instead of 'inspect element').
        Besides that, a shorter selector like '.jobsearch-SerpJobCard' would do the job.



        Regarding the question in the title, to get an attribute data you may use xpath('.//div/@class') or css('div::attr(class)'). E.g:



        def parse(self, response):
        titles = response.css('.jobsearch-SerpJobCard')
        for title in titles:
        item = {}
        item['role_titletext'] = title.xpath('.//h2/a/@title').get()
        # or
        # item['role_titletext'] = title.css('h2 a::attr(title)').get()
        yield item





        share|improve this answer















        I noticed that there is no clickcard class int the downloaded HTML code, but it is there after page load. Surely it is added by some javascript code.
        As Scrapy doesn't execute javascript, you may want to double check the page source when some selector fails unexpectedly (instead of 'inspect element').
        Besides that, a shorter selector like '.jobsearch-SerpJobCard' would do the job.



        Regarding the question in the title, to get an attribute data you may use xpath('.//div/@class') or css('div::attr(class)'). E.g:



        def parse(self, response):
        titles = response.css('.jobsearch-SerpJobCard')
        for title in titles:
        item = {}
        item['role_titletext'] = title.xpath('.//h2/a/@title').get()
        # or
        # item['role_titletext'] = title.css('h2 a::attr(title)').get()
        yield item






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Jan 17 at 21:38

























        answered Nov 16 '18 at 3:28









        Thiago CurveloThiago Curvelo

        2,1451629




        2,1451629
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309208%2fscrapy-simple-divclass-response-xpath-attribute-not-returning-data%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Florida Star v. B. J. F.

            Danny Elfman

            Lugert, Oklahoma