How to crawl a website which has the same url for all the pages.

Question

Hello Everyone.
I have just finished the php course and let's say I loved it ;)
To be specific I loved the part regarding php + mysql databases.
Now, since I am in the sportsbetting business as main occupation, I would like to develop something related to this.
I have thought about starting with a simple php crawler that is able to check the odds and price a certain bookmaker gives to a certain sport event.
The web is full of sites that do what I am trying to code, but since I do not know much I would like to hear opinions and suggestions.
But...
I got stuck quiet early in the project, the reason?
I think the first thing I have to deal with is what kind of values I will get from the website I want to crawl, but here I have a problem.
I have been able to download values where the url is fully specified: like (totally random example) http://www.bets.com/id=12230 and so on, but it comes out most of the sites have just 1 url for all they offer.
So basically something like http://www.bets.com/bets
And this url is the same for a ton of sports + leagues + games + markets.
I am wondering how can I access all those datas?
Maybe @Randy Hoyt (https://teamtreehouse.com/randyhoyt) can help? :D
Thanks
Vittorio

miguelcastro2 · Answer

The URLs are not the issue I see here, but rather it is downloading the data and parsing through it to retrieve the information you want. The only way I know to do this in PHP is using Regular Expressions (http://webcheatsheet.com/php/regular_expressions.php) which is a way of matching patterns and retrieving the values from the patterns. Since each page and website will have different different page layouts, you would need to have a variety of regular expressions for the various websites you are trying to retrieve information from. Lastly, regular expressions are some of the more difficult concepts to learn in programming, but once you do they are amongst the most powerful tools you can have to scrape data like you're trying to do.

Welcome to the Treehouse Community

Looking to learn something new?

Vittorio Somaschini

Vittorio Somaschini

How to crawl a website which has the same url for all the pages.

Vittorio Somaschini

Vittorio Somaschini

1 Answer

miguelcastro2

miguelcastro2