Welcome to the Treehouse Community
Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.
Looking to learn something new?
Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.
Start your free trial

Y B
14,136 PointsTools scraping and parsing pdf
Can anyone suggest what are the best tools, libraries for automating the logging in to a website and scraping and parsing a pdf download? (any guides or tips useful too)
Thanks.
1 Answer

Iain Simmons
Treehouse Moderator 32,305 PointsI've used dryscape for other scraping, and found it to be pretty useful (it uses Webkit, so it will allow JavaScript and other stuff to load).
You could probably use that in combo with something like pdfquery.
There are some articles about specifically getting text out of a PDF, if that's also what you're looking to do:
Robert Richey
Courses Plus Student 16,352 PointsRobert Richey
Courses Plus Student 16,352 PointsSorry Y B, I don't have a good answer for you. The best I can offer is to do your own research with the help of Google.
Best Regards