PHP Classes

File: examples/case-studies/websites/resource_aware_batch_crawler/resource_aware_batch_crawler-el.md

Recommend this page to a friend!
  Packages of Christos Drogidis   Ascoos OS   examples/case-studies/websites/resource_aware_batch_crawler/resource_aware_batch_crawler-el.md   Download  
File: examples/case-studies/websites/resource_aware_batch_crawler/resource_aware_batch_crawler-el.md
Role: Documentation
Content type: text/markdown
Description: Documentation
Class: Ascoos OS
A PHP Web 5.0 Kernel for decentralized web and IoT
Author: By
Last change:
Date: 6 months ago
Size: 5,552 bytes
 

Contents

Class file image Download

Case Study: Resource-Aware Batch Web Crawler

???????, ??????? ??? ????-???????????? crawler ??? ??????????? ?? ????? ??????? ??????? ?? ???? ?????? ??? ??????????

???? ? ?????? ?????????? ??????? ??? ?? Ascoos OS ?????? ?? ?????????? ???? ??????????, scalable web crawler ??? ??? ??????????? ?? server ????? ??? ???? ?????? ?? shared hosting ? VPS ?? ?????????????? ??????. ?? script ???????????? real-time CPU + RAM, ??????????? ?light mode? ???? ?????????? ??? ?????????? ???????? ?? quota control ? ??? ?? 3 ???? ??????? ??? kernel.

??????

  • ?????? ?????????? ????? ?????????? ???? ?? crawling
  • ???????? ???????? ?????? full ??? light mode
  • ??????? ??????? ??????? ?? quota (100 MB default)
  • ???????? ???????? JSON ??? Web5 indexing / AI training
  • Zero memory leaks (? Free() calls)

?????? ??????? ??? Ascoos OS ??? ????????????????

| ????? | ????? | |------------------------------|-----------------------------------------------------------------------| | TCoreSystemHandler | ????????????? CPU load & memory usage ?? real-time | | TWebsiteHandler | ??????? ??????????????, ??????? load time, ???? HTML, ??????? keywords | | TFilesHandler | ?????????? ???????, ??????? quota, ??????? ??????? JSON ???????? | | $utf8 (global helper) | ??????? UTF-8 substring ??? excerpts |

???? ???????

examples/
??? case-studies/
    ??? websites/
        ??? resource_aware_batch_crawler/
            ??? resource_aware_batch_crawler.php   ? ?? ?????? ????? ??? ???????

?????? ?????????: https://github.com/ascoos/os/blob/main/examples/case-studies/websites/resource_aware_batch_crawler/resource_aware_batch_crawler.php

??????????????

  1. PHP ? 8.2
  2. ????????????? Ascoos OS 26 ? AWES 26 (https://awes.ascoos.com)

??? ????????? (????-????)

  1. ????????? thresholds CPU 70 % ??? Memory 80 %.
  2. ??? ???? URL ??? ??????: - ???????????? ?????? CPU + RAM load - ?? ?????? ???? ?????????? ? ?????????????? light mode
  3. ????? ???????????: - `checkAvailability()` - `analyzeLoadTime()`
  4. ???? ?? normal mode ???????????: - `getHTMLContent()` - `extractKeywords()` (??? sentiment/analysis)
  5. ???????????? ????????? report ?? JSON ?? timestamp
  6. ???????? cleanup ?? `Free()` ??? zero memory leaks

?????????? ?????? (???????????)

$cpuLoad = $system->get_cpu_load(0);
$memLoad = $system->get_memory_stats()['percent'];

$lightMode = $cpuLoad > 70 || $memLoad > 80;

if (!$lightMode) {
    $content   = $website->getHTMLContent($url);
    $keywords  = $website->extractKeywords($url);
} else {
    $content = ['light_mode' => true, 'basic' => $loadTime];
}

$files->writeToFileWithCheck(
    json_encode($reports, JSON_PRETTY_PRINT | JSON_UNESCAPED_UNICODE),
    $reportFile
);

??????????? ?????????? (?????????? output)

[
  {
    "url": "https://ascoos.com",
    "cpu_load": 34.5,
    "mem_load": 62.1,
    "light_mode": false,
    "availability": true,
    "load_time": 0.842,
    "content_excerpt": "<!DOCTYPE html><html lang=\"el\">..."
  },
  {
    "url": "https://example.com",
    "cpu_load": 88.2,
    "mem_load": 91.4,
    "light_mode": true,
    "availability": true,
    "load_time": 0.317,
    "content_excerpt": { "light_mode": true, "basic": 0.317 }
  }
]

? ???????????? ?? batch_crawl_20251127_142310.json ???? ??? /tmp/crawl_reports/

?????

  • ?????????? Ascoos OS ? https://docs.ascoos.com/os (??? ?????????)
  • ??????? ?????????? ? https://os.ascoos.com (??? ?????????)
  • AWES Studio (online IDE) ? https://awes.ascoos.com
  • GitHub Repository ? https://github.com/ascoos/os

??????????

???????? ??: - ?????????? ????????????? ???????? (?.?. disk I/O) - ?????????? ???????????? ???????? ?? TThreadHandler - ???????????? TNeuralNetworkHandler ??? semantic scoring URLs

??????? ? CONTRIBUTING-GR.md

????? ??????

Ascoos General License (AGL) ? ????? LICENSE-GR.md