使用php simple html dom parser解析html標(biāo)簽
Jun 13, 2016 am 10:53 AM
?
?
使用php simple html dom parser解析html標(biāo)簽
用了一下
PHP Simple HTML DOM Parser?
解析HTML頁(yè)面,感覺(jué)還不錯(cuò),它能創(chuàng)建一個(gè)DOM tree方便你解析html里面的內(nèi)容。用來(lái)抓東西挺好的。
?
附帶一個(gè)例子,你也到sourceforge下載壓縮包看里面的例子:
Scraping data with PHP Simple HTML DOM Parser?
?
PHP Simple HTML DOM Parser , written in PHP5+, allows you to manipulate HTML in a very easy way. Supporting invalid HTML, this parser is better then other PHP scripts using complicated regexes to extract information from web pages.
Before getting the necessary info, a DOM should be created from either URL or file. The following script extracts links & images from a website:
view plain copy to clipboard print ??
?
Php代碼 // Create DOM from URL or file ? ?
$html = file_get_html('http://www.microsoft.com/'); ? ?
? ?
// Extract links ? ?
foreach($html->find('a') as $element) ? ?
? ? ? ?echo $element->href . '
'; ? ??
? ?
// Extract images ? ?
foreach($html->find('img') as $element) ? ?
? ? ? ?echo $element->src . '
'; ?
[php]?
// Create DOM from URL or file ??
$html = file_get_html('http://www.microsoft.com/'); ?
// Extract links ??
foreach($html->find('a') as $element) ?
? ? ? ?echo $element->href . '
'; ??
// Extract images ??
foreach($html->find('img') as $element) ?
? ? ? ?echo $element->src . '
'; ?
?
// Create DOM from URL or file
$html = file_get_html('http://www.microsoft.com/');
// Extract links
foreach($html->find('a') as $element)
? ? ? ?echo $element->href . '
';?
// Extract images
foreach($html->find('img') as $element)
? ? ? ?echo $element->src . '
';
The parser can also be used to modify HTML elements:
view plain copy to clipboard print ??
?
Php代碼 // Create DOM from string ? ?
$html = str_get_html('
? ?
$html->find('div', 1)->class = 'bar'; ? ?
? ?
$html->find('div[id=simple]', 0)->innertext = 'Foo'; ? ?
? ?
// Output:
echo $html; ?
[php]?
// Create DOM from string ??
$html = str_get_html('
$html->find('div', 1)->class = 'bar'; ?
$html->find('div[id=simple]', 0)->innertext = 'Foo'; ?
// Output:
echo $html; ?
?
// Create DOM from string
$html = str_get_html('
$html->find('div', 1)->class = 'bar';
$html->find('div[id=simple]', 0)->innertext = 'Foo';
// Output:
echo $html;
Do you wish to retrieve content without any tags?
view plain copy to clipboard print ??
?
Php代碼 echo file_get_html('http://www.yahoo.com/')->plaintext; ?
[php]?
echo file_get_html('http://www.yahoo.com/')->plaintext; ?
?
echo file_get_html('http://www.yahoo.com/')->plaintext;In the package files of this parser ([url]http://simplehtmldom.sourceforge.net/[/url]) you can find some scraping examples from digg, imdb, slashdot. Let’s create one that extracts the first 10 results (titles only) for the keyword “php” from Google:
view plain copy to clipboard print ??
?
Php代碼 $url = 'http://www.google.com/search?hl=en&q=php&btnG=Search'; ? ?
? ?
// Create DOM from URL ? ?
$html = file_get_html($url); ? ?
? ?
// Match all 'A' tags that have the class attribute equal with 'l' ? ?
foreach($html->find('a[class=l]') as $key => $info) ? ?
{ ? ?
echo ($key + 1).'. '.$info->plaintext."
\n"; ? ?
} ?
[php]?
$url = 'http://www.google.com/search?hl=en&q=php&btnG=Search'; ?
// Create DOM from URL ??
$html = file_get_html($url); ?
// Match all 'A' tags that have the class attribute equal with 'l' ??
foreach($html->find('a[class=l]') as $key => $info) ?
{ ?
echo ($key + 1).'. '.$info->plaintext."
\n"; ?
} ?
?
$url = 'http://www.google.com/search?hl=en&q=php&btnG=Search';
// Create DOM from URL
$html = file_get_html($url);
// Match all 'A' tags that have the class attribute equal with 'l'
foreach($html->find('a[class=l]') as $key => $info)
{
echo ($key + 1).'. '.$info->plaintext."
\n";
}NOTE Make sure to include the parser before using any functions of it:
view plain copy to clipboard print ??
Php代碼?
include 'simple_html_dom.php'; ?
[php]?
include 'simple_html_dom.php'; ?
?
include 'simple_html_dom.php';For more information regarding the usage of this function consider checking the ‘PHP Simple HTML Dom Parser’ Manual. To download the package files use the following URL: [url]
分享到:?

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

The method to get the current session ID in PHP is to use the session_id() function, but you must call session_start() to successfully obtain it. 1. Call session_start() to start the session; 2. Use session_id() to read the session ID and output a string similar to abc123def456ghi789; 3. If the return is empty, check whether session_start() is missing, whether the user accesses for the first time, or whether the session is destroyed; 4. The session ID can be used for logging, security verification and cross-request communication, but security needs to be paid attention to. Make sure that the session is correctly enabled and the ID can be obtained successfully.

To extract substrings from PHP strings, you can use the substr() function, which is syntax substr(string$string,int$start,?int$length=null), and if the length is not specified, it will be intercepted to the end; when processing multi-byte characters such as Chinese, you should use the mb_substr() function to avoid garbled code; if you need to intercept the string according to a specific separator, you can use exploit() or combine strpos() and substr() to implement it, such as extracting file name extensions or domain names.

UnittestinginPHPinvolvesverifyingindividualcodeunitslikefunctionsormethodstocatchbugsearlyandensurereliablerefactoring.1)SetupPHPUnitviaComposer,createatestdirectory,andconfigureautoloadandphpunit.xml.2)Writetestcasesfollowingthearrange-act-assertpat

In PHP, the most common method is to split the string into an array using the exploit() function. This function divides the string into multiple parts through the specified delimiter and returns an array. The syntax is exploit(separator, string, limit), where separator is the separator, string is the original string, and limit is an optional parameter to control the maximum number of segments. For example $str="apple,banana,orange";$arr=explode(",",$str); The result is ["apple","bana

JavaScript data types are divided into primitive types and reference types. Primitive types include string, number, boolean, null, undefined, and symbol. The values are immutable and copies are copied when assigning values, so they do not affect each other; reference types such as objects, arrays and functions store memory addresses, and variables pointing to the same object will affect each other. Typeof and instanceof can be used to determine types, but pay attention to the historical issues of typeofnull. Understanding these two types of differences can help write more stable and reliable code.

std::chrono is used in C to process time, including obtaining the current time, measuring execution time, operation time point and duration, and formatting analysis time. 1. Use std::chrono::system_clock::now() to obtain the current time, which can be converted into a readable string, but the system clock may not be monotonous; 2. Use std::chrono::steady_clock to measure the execution time to ensure monotony, and convert it into milliseconds, seconds and other units through duration_cast; 3. Time point (time_point) and duration (duration) can be interoperable, but attention should be paid to unit compatibility and clock epoch (epoch)

In PHP, to pass a session variable to another page, the key is to start the session correctly and use the same $_SESSION key name. 1. Before using session variables for each page, it must be called session_start() and placed in the front of the script; 2. Set session variables such as $_SESSION['username']='JohnDoe' on the first page; 3. After calling session_start() on another page, access the variables through the same key name; 4. Make sure that session_start() is called on each page, avoid outputting content in advance, and check that the session storage path on the server is writable; 5. Use ses

ToaccessenvironmentvariablesinPHP,usegetenv()orthe$_ENVsuperglobal.1.getenv('VAR_NAME')retrievesaspecificvariable.2.$_ENV['VAR_NAME']accessesvariablesifvariables_orderinphp.iniincludes"E".SetvariablesviaCLIwithVAR=valuephpscript.php,inApach
