国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

Home php教程 php手冊 PHP simple_html_dom.php+正則 采集文章代碼

PHP simple_html_dom.php+正則 采集文章代碼

Jun 13, 2016 pm 12:19 PM
html php simple code Include copy article regular collection

復(fù)制代碼 代碼如下:


//包含PHP Simple html Dom 類庫文件
include_once('./simplehtmldom/simple_html_dom.php');

//采集html
function getwebcontent($url){
$ch = curl_init();
$timeout = 10;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
$contents = trim(curl_exec($ch));
curl_close($ch);
return $contents;
}


//獲得標題和url
$string =
getwebcontent('http://www.babytree.com/learn/zhunbeihuaiyun/jijibeiyun/2');
//正則匹配

  • 獲取標題和地址
    preg_match_all ("/
  • (.*)/",
    $string, $out, PREG_SET_ORDER);

    foreach($out as $key => $value){
    $article['title'][] = $out[$key][2];
    $article['link'][] = "http://www.babytree.com/learn/article/".$out[$key][1];
    }

    //根據(jù)url獲取文章內(nèi)容
    foreach($article['link'] as $key=>$value){
    $html = file_get_html($value);
    $div = $html->find('div[id=pagenum_0]');
    $article[content][] = $div[0]->innertext;
    }
    //標題轉(zhuǎn)碼---真正用的時候不用這步--因為咱本來就要用utf8的
    //不轉(zhuǎn)碼還真不能保存成文件
    foreach($article[title] as $key=>$value){
    $article[title][$key] = iconv('utf-8', 'gbk', $value);//轉(zhuǎn)碼
    }
    //存入文件
    $num = count($article['title']);
    for($i=0; $ifile_put_contents("{$article[title][$i]}.txt", $article['content'][$i]);
    }

    /*本來想12點之前發(fā)的。。但小看一下都3點半了。。。 就算昨天的吧
    本來獲取文章內(nèi)容時用正則是最好的,速度也是最快的,
    奈何正則是好,但正則表達式是真難!于是乎小查了一下,
    網(wǎng)上也有很多人也在用PHP Simple Dom 雖然效率慢了點,但效果還是不錯的
    從包含類庫文件到寫入txt文件 大概是7/8就秒 還有帶于進一步優(yōu)化,特別是那獲取文章內(nèi)容時的正則,那個太惡心了
    大家可以小研究一下*/
    ?>
  • Statement of this Website
    The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

    Hot AI Tools

    Undress AI Tool

    Undress AI Tool

    Undress images for free

    Undresser.AI Undress

    Undresser.AI Undress

    AI-powered app for creating realistic nude photos

    AI Clothes Remover

    AI Clothes Remover

    Online AI tool for removing clothes from photos.

    Clothoff.io

    Clothoff.io

    AI clothes remover

    Video Face Swap

    Video Face Swap

    Swap faces in any video effortlessly with our completely free AI face swap tool!

    Hot Tools

    Notepad++7.3.1

    Notepad++7.3.1

    Easy-to-use and free code editor

    SublimeText3 Chinese version

    SublimeText3 Chinese version

    Chinese version, very easy to use

    Zend Studio 13.0.1

    Zend Studio 13.0.1

    Powerful PHP integrated development environment

    Dreamweaver CS6

    Dreamweaver CS6

    Visual web development tools

    SublimeText3 Mac version

    SublimeText3 Mac version

    God-level code editing software (SublimeText3)

    How to get the current session ID in PHP? How to get the current session ID in PHP? Jul 13, 2025 am 03:02 AM

    The method to get the current session ID in PHP is to use the session_id() function, but you must call session_start() to successfully obtain it. 1. Call session_start() to start the session; 2. Use session_id() to read the session ID and output a string similar to abc123def456ghi789; 3. If the return is empty, check whether session_start() is missing, whether the user accesses for the first time, or whether the session is destroyed; 4. The session ID can be used for logging, security verification and cross-request communication, but security needs to be paid attention to. Make sure that the session is correctly enabled and the ID can be obtained successfully.

    PHP get substring from a string PHP get substring from a string Jul 13, 2025 am 02:59 AM

    To extract substrings from PHP strings, you can use the substr() function, which is syntax substr(string$string,int$start,?int$length=null), and if the length is not specified, it will be intercepted to the end; when processing multi-byte characters such as Chinese, you should use the mb_substr() function to avoid garbled code; if you need to intercept the string according to a specific separator, you can use exploit() or combine strpos() and substr() to implement it, such as extracting file name extensions or domain names.

    How do you perform unit testing for php code? How do you perform unit testing for php code? Jul 13, 2025 am 02:54 AM

    UnittestinginPHPinvolvesverifyingindividualcodeunitslikefunctionsormethodstocatchbugsearlyandensurereliablerefactoring.1)SetupPHPUnitviaComposer,createatestdirectory,andconfigureautoloadandphpunit.xml.2)Writetestcasesfollowingthearrange-act-assertpat

    How to split a string into an array in PHP How to split a string into an array in PHP Jul 13, 2025 am 02:59 AM

    In PHP, the most common method is to split the string into an array using the exploit() function. This function divides the string into multiple parts through the specified delimiter and returns an array. The syntax is exploit(separator, string, limit), where separator is the separator, string is the original string, and limit is an optional parameter to control the maximum number of segments. For example $str="apple,banana,orange";$arr=explode(",",$str); The result is ["apple","bana

    JavaScript Data Types: Primitive vs Reference JavaScript Data Types: Primitive vs Reference Jul 13, 2025 am 02:43 AM

    JavaScript data types are divided into primitive types and reference types. Primitive types include string, number, boolean, null, undefined, and symbol. The values are immutable and copies are copied when assigning values, so they do not affect each other; reference types such as objects, arrays and functions store memory addresses, and variables pointing to the same object will affect each other. Typeof and instanceof can be used to determine types, but pay attention to the historical issues of typeofnull. Understanding these two types of differences can help write more stable and reliable code.

    Using std::chrono in C Using std::chrono in C Jul 15, 2025 am 01:30 AM

    std::chrono is used in C to process time, including obtaining the current time, measuring execution time, operation time point and duration, and formatting analysis time. 1. Use std::chrono::system_clock::now() to obtain the current time, which can be converted into a readable string, but the system clock may not be monotonous; 2. Use std::chrono::steady_clock to measure the execution time to ensure monotony, and convert it into milliseconds, seconds and other units through duration_cast; 3. Time point (time_point) and duration (duration) can be interoperable, but attention should be paid to unit compatibility and clock epoch (epoch)

    How does PHP handle Environment Variables? How does PHP handle Environment Variables? Jul 14, 2025 am 03:01 AM

    ToaccessenvironmentvariablesinPHP,usegetenv()orthe$_ENVsuperglobal.1.getenv('VAR_NAME')retrievesaspecificvariable.2.$_ENV['VAR_NAME']accessesvariablesifvariables_orderinphp.iniincludes"E".SetvariablesviaCLIwithVAR=valuephpscript.php,inApach

    How to pass a session variable to another page in PHP? How to pass a session variable to another page in PHP? Jul 13, 2025 am 02:39 AM

    In PHP, to pass a session variable to another page, the key is to start the session correctly and use the same $_SESSION key name. 1. Before using session variables for each page, it must be called session_start() and placed in the front of the script; 2. Set session variables such as $_SESSION['username']='JohnDoe' on the first page; 3. After calling session_start() on another page, access the variables through the same key name; 4. Make sure that session_start() is called on each page, avoid outputting content in advance, and check that the session storage path on the server is writable; 5. Use ses

    See all articles