"Build a Simple Android App (retired 2014)" was retired on September 11, 2014.

Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

PHP

How to crawl a website which has the same url for all the pages.

Hello Everyone.

I have just finished the php course and let's say I loved it ;) To be specific I loved the part regarding php + mysql databases.

Now, since I am in the sportsbetting business as main occupation, I would like to develop something related to this.

I have thought about starting with a simple php crawler that is able to check the odds and price a certain bookmaker gives to a certain sport event.

The web is full of sites that do what I am trying to code, but since I do not know much I would like to hear opinions and suggestions.

But...

I got stuck quiet early in the project, the reason?

I think the first thing I have to deal with is what kind of values I will get from the website I want to crawl, but here I have a problem. I have been able to download values where the url is fully specified: like (totally random example) http://www.bets.com/id=12230 and so on, but it comes out most of the sites have just 1 url for all they offer.

So basically something like http://www.bets.com/bets

And this url is the same for a ton of sports + leagues + games + markets.

I am wondering how can I access all those datas?

Maybe Randy Hoyt can help? :D

Thanks

Vittorio

To make it clear:

I am at the point where I can get the urls from the sites, but for example I want to target http://www.snai.it/scommesse (which is an italian bookmaker) and I can't even get the urls of that site...

There must be something I am missing...

I thought retrieving the url what be good for a start, then moving to the only urls I need would be next step, but, I repeat, this site only has 1 url for all its pages...

1 Answer

The URLs are not the issue I see here, but rather it is downloading the data and parsing through it to retrieve the information you want. The only way I know to do this in PHP is using Regular Expressions which is a way of matching patterns and retrieving the values from the patterns. Since each page and website will have different different page layouts, you would need to have a variety of regular expressions for the various websites you are trying to retrieve information from. Lastly, regular expressions are some of the more difficult concepts to learn in programming, but once you do they are amongst the most powerful tools you can have to scrape data like you're trying to do.