Welcome to the Treehouse Community

The Treehouse Community is a meeting place for developers, designers, and programmers of all backgrounds and skill levels to get support. Collaborate here on code errors or bugs that you need feedback on, or asking for an extra set of eyes on your latest project. Join thousands of Treehouse students and alumni in the community today. (Note: Only Treehouse students can comment or ask questions, but non-students are welcome to browse our conversations.)

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and a supportive community. Start your free trial today.

Python Scraping Data From the Web A World Full of Spiders Crawling Spiders

No module found error

This is the code I inputted in the Spyder IDE

import scrapy


class HorseSpider(scrapy.Spider):

    name = "ike"

    def start_requests(self):
        urls = ["https://treehouse-projects.github.io/horse-land/index.html",
               "https://treehouse-projects.github.io/horse-land/mustang.html"]

        return [scrapy.Request(url=url, callback=self.parse) for url in urls]    

    def parse(self, response):
        url = response.url
        page = url.split("/")[-1]
        filename = "horses-{}".format(page)
        print("URL is: {}".format(url))
        with open (filename, "wb") as file:
            file.write(response.body)
        print("Saved file {}".format(filename))

But this is the output I am getting in the terminal. Please help

C:\Users\User\Desktop\ScrapyTests\AraneaSpyder\AraneaSpyder\spiders>scrapy crawl
ike
2021-01-07 20:13:18 [scrapy.utils.log] INFO: Scrapy 2.4.1 started (bot: AraneaSpyder)
2021-01-07 20:13:18 [scrapy.utils.log] INFO: Versions: lxml 4.6.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.8.5 (default, Sep  3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 20.0.1 (OpenSSL 1.1.1i  8 Dec 2020), cryptography 3.3.1, Platform Windows-10-10.0.18362-SP0
2021-01-07 20:13:18 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2021-01-07 20:13:18 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'AraneaSpyder',
 'NEWSPIDER_MODULE': 'AraneaSpyder.spiders',
 'ROBOTSTXT_OBEY': True,
 'SPIDER_MODULES': ['AraneaSpyder.spiders']}
2021-01-07 20:13:18 [scrapy.extensions.telnet] INFO: Telnet Password: 14ae6acfbbea13ed
2021-01-07 20:13:18 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
Unhandled error in Deferred:
2021-01-07 20:13:19 [twisted] CRITICAL: Unhandled error in Deferred:

Traceback (most recent call last):
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\crawler.py", line 192, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\crawler.py", line 196, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\twisted\internet\defer.py", line 1613, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\twisted\internet\defer.py", line 1529, in _cancellableInlineCallbacks
    _inlineCallbacks(None, g, status)
--- <exception caught here> ---
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\crawler.py", line 87, in crawl
    self.engine = self._create_engine()
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\crawler.py", line 101, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\core\engine.py", line 69, in __init__
    self.downloader = downloader_cls(crawler)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\core\downloader\__init__.py", line 83, in __init__
    self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\middleware.py", line 53, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\middleware.py", line 35, in from_settings
    mw = create_instance(mwcls, settings, crawler)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\utils\misc.py", line 167, in create_instance
    instance = objcls.from_crawler(crawler, *args, **kwargs)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\downloadermiddlewares\robotstxt.py", line 36, in from_crawler
    return cls(crawler)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\downloadermiddlewares\robotstxt.py", line 32, in __init__
    self._parserimpl.from_crawler(self.crawler, b'')
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\robotstxt.py", line 124, in from_crawler
    o = cls(robotstxt_body, spider)
  File "C:\Users\User\anaconda3\envs\Scrapy enviroment\lib\site-packages\scrapy\robotstxt.py", line 116, in __init__
    from protego import Protego
builtins.ModuleNotFoundError: No module named 'protego' 

[MOD: added ```python formatting -cf]

NB: protego is already installed

1 Answer

ryantalbot2
ryantalbot2
12,536 Points

do you have the package already? go to pyCharm in top left, then preferences, click plus sign, then search and add protego