Welcome to the Treehouse Community

The Treehouse Community is a meeting place for developers, designers, and programmers of all backgrounds and skill levels to get support. Collaborate here on code errors or bugs that you need feedback on, or asking for an extra set of eyes on your latest project. Join thousands of Treehouse students and alumni in the community today. (Note: Only Treehouse students can comment or ask questions, but non-students are welcome to browse our conversations.)

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and a supportive community. Start your free trial today.

Python Scraping Data From the Web Additional Scraping Tasks Wrapping Up

Josh Gold
Josh Gold
12,206 Points

In the previous page it says "Captchas can be worked around w/ various technologies". Can you someone elaborate a bit?

How exactly do scrapers get around captchas? I am particular interested to know the limitations as capchas from a defensive standpoint as a defender against certain types of bots. So don't hate me! Anyhow at this point it's purely for my own interest, as I'm not defending any particular pages at the moment.

Thank you!

1 Answer

Captcha stands for "Completely Automated Public Turing test for telling Computers and Humans Apart."

however, computers/programs become smarter all the time, so the captchas need to be constantly updated to keep up with the smart AI.

Often, the captchas are also used to train AI, it's no coincidence that modern captchas are usually related to traffic (click all the images containing a bus, car, crosswalk, sidewalk, traffic light, etc). This is because they are used to train self driving cars.

At some point, the AI will become able to solve these captchas just as good or better than most humans, and then new captchas will need to be made.

Most of the time, for problems like these, they use self-learning AI, in other words, the AI gets trained. It just randomly guesses at first, but it becomes increasingly good at guessing once it gets feedback for which guesses are right and which or wrong. It will start to learn what information is relevant, and what isn't. If done enough times, it will eventually figure out the trick.