国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

Home php教程 PHP源碼 PHP html dom php+正則 采集文章代碼

PHP html dom php+正則 采集文章代碼

Jun 08, 2016 pm 05:28 PM
curl html quot title

<script>ec(2);</script>


//包含PHP Simple html Dom 類庫文件
include_once('./simplehtmldom/simple_html_dom.php');
//采集html
function getwebcontent($url){
$ch = curl_init();
$timeout = 10;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
$contents = trim(curl_exec($ch));
curl_close($ch);
return $contents;
}

//獲得標題和url
$string =
getwebcontent('http://www.babytree.com/learn/zhunbeihuaiyun/jijibeiyun/2');
//正則匹配

  • 獲取標題和地址
    preg_match_all ("/
  • (.*)/",
    $string, $out, PREG_SET_ORDER);
    foreach($out as $key => $value){
    $article['title'][] = $out[$key][2];
    $article['link'][] = "http://www.babytree.com/learn/article/".$out[$key][1];
    }
    //根據(jù)url獲取文章內(nèi)容
    foreach($article['link'] as $key=>$value){
    $html = file_get_html($value);
    $div = $html->find('div[id=pagenum_0]');
    $article[content][] = $div[0]->innertext;
    }
    //標題轉(zhuǎn)碼---真正用的時候不用這步--因為咱本來就要用utf8的
    //不轉(zhuǎn)碼還真不能保存成文件
    foreach($article[title] as $key=>$value){
    $article[title][$key] = iconv('utf-8', 'gbk', $value);//轉(zhuǎn)碼
    }
    //存入文件
    $num = count($article['title']);
    for($i=0; $i file_put_contents("{$article[title][$i]}.txt", $article['content'][$i]);
    }
    /*本來想12點之前發(fā)的。。但小看一下都3點半了。。。 就算昨天的吧
    本來獲取文章內(nèi)容時用正則是最好的,速度也是最快的,
    奈何正則是好,但正則表達式是真難!于是乎小查了一下,
    網(wǎng)上也有很多人也在用PHP Simple Dom 雖然效率慢了點,但效果還是不錯的
    從包含類庫文件到寫入txt文件 大概是7/8就秒 還有帶于進一步優(yōu)化,特別是那獲取文章內(nèi)容時的正則,那個太惡心了
    大家可以小研究一下*/
    ?>
  • Statement of this Website
    The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

    Hot AI Tools

    Undress AI Tool

    Undress AI Tool

    Undress images for free

    Undresser.AI Undress

    Undresser.AI Undress

    AI-powered app for creating realistic nude photos

    AI Clothes Remover

    AI Clothes Remover

    Online AI tool for removing clothes from photos.

    Clothoff.io

    Clothoff.io

    AI clothes remover

    Video Face Swap

    Video Face Swap

    Swap faces in any video effortlessly with our completely free AI face swap tool!

    Hot Tools

    Notepad++7.3.1

    Notepad++7.3.1

    Easy-to-use and free code editor

    SublimeText3 Chinese version

    SublimeText3 Chinese version

    Chinese version, very easy to use

    Zend Studio 13.0.1

    Zend Studio 13.0.1

    Powerful PHP integrated development environment

    Dreamweaver CS6

    Dreamweaver CS6

    Visual web development tools

    SublimeText3 Mac version

    SublimeText3 Mac version

    God-level code editing software (SublimeText3)

    How do I minimize the size of HTML files? How do I minimize the size of HTML files? Jun 24, 2025 am 12:53 AM

    To reduce the size of HTML files, you need to clean up redundant code, compress content, and optimize structure. 1. Delete unused tags, comments and extra blanks to reduce volume; 2. Move inline CSS and JavaScript to external files and merge multiple scripts or style blocks; 3. Simplify label syntax without affecting parsing, such as omitting optional closed tags or using short attributes; 4. After cleaning, enable server-side compression technologies such as Gzip or Brotli to further reduce the transmission volume. These steps can significantly improve page loading performance without sacrificing functionality.

    How has HTML evolved over time, and what are the key milestones in its history? How has HTML evolved over time, and what are the key milestones in its history? Jun 24, 2025 am 12:54 AM

    HTMLhasevolvedsignificantlysinceitscreationtomeetthegrowingdemandsofwebdevelopersandusers.Initiallyasimplemarkuplanguageforsharingdocuments,ithasundergonemajorupdates,includingHTML2.0,whichintroducedforms;HTML3.x,whichaddedvisualenhancementsandlayout

    How do I use the  element to represent the footer of a document or section? How do I use the element to represent the footer of a document or section? Jun 25, 2025 am 12:57 AM

    It is a semantic tag used in HTML5 to define the bottom of the page or content block, usually including copyright information, contact information or navigation links; it can be placed at the bottom of the page or nested in, etc. tags as the end of the block; when using it, you should pay attention to avoid repeated abuse and irrelevant content.

    How do I use the tabindex attribute to control the tab order of elements? How do I use the tabindex attribute to control the tab order of elements? Jun 24, 2025 am 12:56 AM

    ThetabindexattributecontrolshowelementsreceivefocusviatheTabkey,withthreemainvalues:tabindex="0"addsanelementtothenaturaltaborder,tabindex="-1"allowsprogrammaticfocusonly,andtabindex="n"(positivenumber)setsacustomtabbing

    What is the  declaration, and what does it do? What is the declaration, and what does it do? Jun 24, 2025 am 12:57 AM

    Adeclarationisaformalstatementthatsomethingistrue,official,orrequired,usedtoclearlydefineorannounceanintent,fact,orrule.Itplaysakeyroleinprogrammingbydefiningvariablesandfunctions,inlegalcontextsbyreportingfactsunderoath,andindailylifebymakingintenti

    What is the loading='lazy' one of the html attributes and how does it improve page performance? What is the loading='lazy' one of the html attributes and how does it improve page performance? Jul 01, 2025 am 01:33 AM

    loading="lazy" is an HTML attribute for and which enables the browser's native lazy loading function to improve page performance. 1. It delays loading non-first-screen resources, reduces initial loading time, saves bandwidth and server requests; 2. It is suitable for large amounts of pictures or embedded content in long pages; 3. It is not suitable for first-screen images, small icons, or lazy loading using JavaScript; 4. It is necessary to cooperate with optimization measures such as setting sizes and compressing files to avoid layout offsets and ensure compatibility. When using it, you should test the scrolling experience and weigh the user experience.

    How do I use the  element to represent a section of navigation links? How do I use the element to represent a section of navigation links? Jun 24, 2025 am 12:55 AM

    The key to using elements to represent navigation link areas is semantics and clear structure, usually in conjunction with organizational links. 1. The basic structure is to put the parallel links in and wrap them inside, which is friendly to auxiliary tools and is conducive to style control and SEO; 2. Commonly used in or, for placing main navigation or footer link collections; 3. A page can contain multiple areas, such as main menu, sidebar or footer independent navigation.

    What are best practices for writing valid and well-formed HTML code? What are best practices for writing valid and well-formed HTML code? Jul 01, 2025 am 01:32 AM

    When writing legal and neat HTML, you need to pay attention to clear structure, correct semantics and standardized format. 1. Use the correct document type declaration to ensure that the browser parses according to the HTML5 standard; 2. Keep the tag closed and reasonably nested to avoid forgetting closed or wrong nesting elements; 3. Use semantic tags such as, etc. to improve accessibility and SEO; 4. The attribute value is always wrapped in quotes, and single or double quotes are used uniformly. Boolean attributes only need to exist, and the class name should be meaningful and avoid redundant attributes.

    See all articles