10
Dec
Parsing HTML in PHP
Have you ever wanted to get a list of the links contained in a HTML page? Or a list of images, the title or every other non-nested tag for that matter? Then this is the class for you!

Example:

include("phpHTMLParser.php");
$content = file_get_contents("http://www.onderstekop.nl/");
$parser = new phpHTMLParser("$content");
$HTMLObject = $parser->parse_tags(array("a", "title"));
$aTags = $HTMLObject->getTagsByName("a");
foreach ($aTags as $a) {
   if ($a->href != "") {
      echo $a->href . "<br/>";
      echo $a->innerHTML . "<br/><br/>";
   }
}
?>


In this example the parser only keeps track of the 'a' and 'title' tag from which only the 'a' tag object is being requested afterwards. Running this code will parse the HTML page obtained from http://www.onderstekop.nl/, return an object containing all the information you need and output a list of links with their description. This makes the job of dealing with web pages pretty simple, because you can work with a page in an object oriented way instead of having to go through it character by character or with sophisticated and error-prone regular expressions.

Some other features

Each tag object in the object obtained by a getTagsByName call, currently supports href and innerHTML (as shown), but also id, src and innerTag (to get all the attributes as a string).

Another feature, most useful for dumping results and debugging is the output() function available on the object returned by parse() or parse_tags() ($HTMLObject in our example). Furthermore, for even more debugging, you could set $debug=True in the php file itself.

Download phpHTMLParser



16 Comments


1
RE: Parsing HTML in PHP
Written by: Dave
2007-12-12 10:38:30
You seriously wrote this? Yourself? Obvious case of NIH syndrome, I'd say. How about a DomDocument::loadHTML(), huh? Oh wait, that would imply you know what the DOM is...

2
RE: Parsing HTML in PHP
Written by: Rajasekaran site
2009-10-05 07:11:28
thanks for sharing the coding


3
RE: Parsing HTML in PHP
Written by: Dell Charger site
2009-10-31 13:55:05
Thanks for this its really helpful.

Did you write this class ?


4
RE: Parsing HTML in PHP
Written by: Victor
2009-11-06 13:12:11
This code is SHIT!!! It does not work. Shame on you for publishing this kind of codes. UUUUUUUUUU!!!!


5
Do not use this code
Written by: Paco
2009-11-20 14:02:36
Use any of the php standart libraries for parsing xml type documents.

The intention may be good, but the result is disasterous.


6
RE: Parsing HTML in PHP
Written by: prashant nalawade site
2009-12-15 18:30:20
nice script!!!!!!!!!!!!
There for other people asking is it yours???????


7
RE: Parsing HTML in PHP
Written by: Tayfun Demirbilek
2010-01-25 10:41:42
Good work.
It was very useful for me to parse some html form in a daily task routine



8
RE: Parsing HTML in PHP
Written by: Mariuss
2010-02-07 12:51:56
Good work
I used it to parse TD tags inside a html page.
Some notice warnig to patch, but it works


9
RE: Parsing HTML in PHP
Written by: Israr
2010-02-24 10:39:31
Great Piece of Code. Thanks a lot for sharing this code.


10
RE: Parsing HTML in PHP
Written by: Salim
2010-03-23 10:26:27
Victor, You could not work with this script. This was your fault. This is really great and working.


11
RE: Parsing HTML in PHP
Written by: amph
2010-03-28 19:34:07
I second Dave, please use the built-in HTML DOM functionality:

$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table');
$rows = $tables->item(0)->getElementsByTagName('tr');

Its 10 times as easy.. Hope it helps someone.


12
RE: Parsing HTML in PHP
Written by: iguess
2010-05-25 22:27:43
This throws the least amount of errors when I parse an url...

unfortunately dont have getElemById... :)

can you add that please?


13
RE: Parsing HTML in PHP
Written by: Gustavo Bellino site
2010-05-27 20:09:33
Maybe it would be very helpfull to put more comments into the code, nice code anyway.
Regards.


14
RE: Parsing HTML in PHP
Written by: StuR
2010-06-03 09:45:18
Thanks for sharing, this has saved me some time.


15
RE: Parsing HTML in PHP
Written by: Leo site
2010-06-04 15:55:18
Thanks for this.


16
RE: Parsing HTML in PHP
Written by: Sumit
2010-06-09 16:45:20
Yes, I second the other fellows who advise you to use DOM.


Leave a comment
Name*
E-mail
Website
Title*
Comment*
Notify me when somebody else comments on this article