Parse XML or HTML

Here’s a simple XML parser, with DOM Object, which can fetch values digging deep but in a few lines of code.

It uses namespace functinality(XPATH). The XML document must have defined namespaces.

<?php
$xml = <<<EOT
<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:other="http://other.w3.org/other" >
        <id>uYG7-sPwjFg</id>
        <published>2009-05-17T18:29:31.000Z</published>
</entry>
EOT;
$doc = new DOMDocument;
$doc->loadXML($xml);
$xpath =  DOMXPath($doc);
$xpath->registerNamespace('atom', "http://www.w3.org/2005/Atom");

$xpath_str = '//atom:entry/atom:published/text()';

$entries = $xpath->evaluate($xpath_str);

print $entries->item(0)->nodeValue ."\n";

?>

You can work with HTML also

<?php
$file = "htmlfilename.html";
$doc = new DOMDocument();    //DOM Object (XML/HTML)
$doc->loadHTMLFile($file);
$xpath = new DOMXpath($doc); //XPATH Object
//$elements = $xpath->query("//*[@id]");  //for everything with an id
//$elements = $xpath->query("/html/body/div[@id='yourTagIdHere']");  //for node data in a selected id
$elements = $xpath->query("*/div[@id='yourTagIdHere']"); // same as above with wildcard

if (!is_null($elements)) {         // before entring into the loop check if something is there
  foreach ($elements as $element) {
    echo "<br/>[". $element->nodeName. "]";

    $nodes = $element->childNodes;
    foreach ($nodes as $node) {
      echo $node->nodeValue. "\n";
    }
  }
}
?>

Another simple way for XML (no hardcoding) is,

<?php
function list_xml($str) {
  $root = simplexml_load_string($str);
  list_node($root);
}

function list_node($node) {
  foreach ($node as $element) {
    echo $element. "\n";
    if ($element->children()) {
      echo "<br/>";
      list_node($element);
    }
  }
}
?>

Another one, simple but with hardcoding. Be careful using nested SimpleXML objects in double quoted strings, as they might not work as expected, you might need a curly brace to make them work.

<?php
$xmlstring = '<root><node>123</node><foo><bar>456</bar></foo></root>'; 

$root = simplexml_load_string($xmlstring); 

echo "Node is: $root->node"; // Works: Node is 123
echo "Bar is: $root->foo->bar"; // Doesn't work, outputs: Bar is: ->bar  (use curly brackets to fix)
echo "Bar is: {$root->foo->bar}"; // Works: Bar is 456 

?>

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>