![]() ![]() Filters are processed from left to right the element listed last has the highest priority. The preliminary -* excludes all types not specifically listed however, to parse the individual pages for links, you additionally need to specify +*.htm*. Instead of providing a negative list, you can define a positive list to designate explicitly the content to be backed up for example, -* +*.htm* +only grabs PDF documents published on the Document Foundation homepage. However, to exclude PDFs on the Document Foundation site only, you need the rule -Similarly, -skips not only all CSS files, but also the images to which they link. To specify more precisely what you want to download – and what you want checked for links – you can define filters.įor example, to download all links, except those that point to PDF files, you would filter for -*.pdf. But on the LibreOffice website, for example, which also contains download links, this would mean that, in addition to the actual homepage, numerous program files would also be grabbed. By default, all pages below the specified URL, including the links they contain, are backed up. The Scan Rules are a powerful feature that lets you specify the desired content precisely. In some networks, the use of a Proxy can be relevant. Further settings, primarily intended for advanced users, are available in the MIME Types, Browser ID, Spider, and Log, Index, Cache tabs. These settings help back up slow servers without exposing them to excessive access attempts.Īs a kind of built-in airbag, you can set Limits for the overall size, the transfer rate, and the transfer time. WebHHTrack takes care of rewriting links, and it removes error pages or passwords on request.ĭepending on the available bandwidth, you might want to use Flow Control and customize the number of simultaneous connections, as well as the timeouts and retries in the event of an error. If the given structures are insufficient, you can simply enter custom paths based on variables. Also, you can configure the way in which WebHTTrack stores the documents locally in Build (Figure 2).įigure 2: Restructure the site with Build options.īy default, the directory structure is mirrored 1:1 in the corresponding subdirectories, but you can also choose to structure by file type – for example, to keep images and PDF files separate. Among other things, you can specify the order in which the files are loaded. Hiding behind the inconspicuous Settings button are numerous options that let you set up almost every detail. In contrast, Test links on pages doesn’t download anything, but only checks whether links are valid. Load special files lets you secure individual files without following the links they contain, and Branch to all links is useful for saving bookmarks because it saves all the links on the first page in each case. Automatic Website Copy runs without asking you any questions, whereas Website Copy Prompt is more verbose and asks you questions if in doubt. WebHTTrack offers several modes for downloading the source content. Password-protected pages are best added by clicking the Add a URL button. The tool supports FTP, HTTP, and HTTPS addresses, for which you can either enter a complete path (e.g., ) or restrict to individual subdirectories (e.g., ). The relevant addresses can either be typed directly in the appropriate fields, or you can point to a text file with one URL per line. On the next page, enter the website to be mirrored I’m using the Document Foundation website as an example. To launch the interface, either find it directly through the Applications menu or simply type webhttrack at the command line to launch a local web server on port 8080, open the default browser, and load a graphical wizard that guides you through the process (Figure 1). Each package contains a command-line variant called HTTrack (useful for scripting) and a graphical interface called WebHTTrack (or WinHTTrack on Windows). The program website offers packages for Debian, Ubuntu, Gentoo, Red Hat, Mandriva, Fedora, and FreeBSD, and versions are also available for Windows and Mac OS X. To install the packages and dependencies. I typed sudo apt-get install httrack webhttrack ![]() On our lab machine with Linux Mint 12, the installation was easy. Tools like WebHTTrack can help, and they allow convenient updating of the content. However, manual mirroring can be time-consuming and cumbersome. Despite ubiquitous Internet access, users often have good reason to create offline copies of websites – be it for archiving or to provide the content on your intranet. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |