Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Scraping Data From the Web Additional Scraping Tasks Wrapping Up

Josh Gold
Josh Gold
12,207 Points

In the previous page it says "Captchas can be worked around w/ various technologies". Can you someone elaborate a bit?

How exactly do scrapers get around captchas? I am particular interested to know the limitations as capchas from a defensive standpoint as a defender against certain types of bots. So don't hate me! Anyhow at this point it's purely for my own interest, as I'm not defending any particular pages at the moment.

Thank you!

1 Answer

Captcha stands for "Completely Automated Public Turing test for telling Computers and Humans Apart."

however, computers/programs become smarter all the time, so the captchas need to be constantly updated to keep up with the smart AI.

Often, the captchas are also used to train AI, it's no coincidence that modern captchas are usually related to traffic (click all the images containing a bus, car, crosswalk, sidewalk, traffic light, etc). This is because they are used to train self driving cars.

At some point, the AI will become able to solve these captchas just as good or better than most humans, and then new captchas will need to be made.

Most of the time, for problems like these, they use self-learning AI, in other words, the AI gets trained. It just randomly guesses at first, but it becomes increasingly good at guessing once it gets feedback for which guesses are right and which or wrong. It will start to learn what information is relevant, and what isn't. If done enough times, it will eventually figure out the trick.