#hide
from nbdev import *
Check for broken external and internal links.
fastlinkcheck
checks for broken links in HTML documents. This occurs in parallel so performance is fast. Both external links and internal links are checked. Internal links are checked by verifying local files.
pip install fastlinkcheck
from fastlinkcheck import link_check
show_doc(link_check)
link_check
[source]
link_check
(path
:"Root directory searched recursively for HTML files",host
:"Host and path (without protocol) of web server"=''
,config_file
:"Location of file with urls to ignore"=None
,actions_output
:"Toggle GitHub Actions output on/off"=False
,exit_on_found
:"(CLI Only) Exit with status code 1 if broken links are found"=False
,print_logs
:"Toggle printing logs to stdout."=False
)
Check for broken links recursively in path
.
The _example/ directory in this repo contains sample HTML files which we can use for demonstration.
The path
parameter specifies the directory that will be searched recursively for HTML files that you wish to check.
Specifying the host
parameter allows you detect links that are internal by identifying links with that host name. External links are verified by making a request to the appropriate website. On the other hand, internal links are verified by inspecting the presence and content of local files.
from fastlinkcheck import link_check
broken_links = link_check(path='_example', host='fastlinkcheck.com')
print(broken_links)
- 'http://somecdn.com/doesntexist.html' was found in the following pages: - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html` - Path('/Users/hamelsmu/github/fastlinkcheck/_example/test.js') was found in the following pages: - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`
You can optionally print logs to stdout with the print_logs
parameter. This can be useful for debugging:
broken_links = link_check(path='_example', host='fastlinkcheck.com', print_logs=True)
ERROR: The Following Broken Links or Paths were found: - 'http://somecdn.com/doesntexist.html' was found in the following pages: - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html` - Path('/Users/hamelsmu/github/fastlinkcheck/_example/test.js') was found in the following pages: - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`
print(f'Number of broken links found {len(broken_links)}')
Number of broken links found 2
You can choose to ignore files with a a plain-text file containing a list of urls to ignore. For example, the file linkcheck.rc
contains a list of urls I want to ignore:
with open('_example/linkcheck.rc', 'r') as f: print(f.read())
test.js https://www.google.com
In this case example/test.js
will be filtered out from the list:
broken_links = link_check(path='_example', host='fastlinkcheck.com', config_file='_example/linkcheck.rc')
print(broken_links)
- 'http://somecdn.com/doesntexist.html' was found in the following pages: - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`
link_check
can also be called from the command line. We can see various options by passing the --help
flag. These options correspond to the same parameters as calling the link_check
function described above.
link_check --help
usage: link_check [-h] [--host HOST] [--config_file CONFIG_FILE]
[--actions_output] [--exit_on_found] [--print_logs] [--pdb]
[--xtra XTRA]
path
Check for broken links recursively in `path`.
positional arguments:
path Root directory searched recursively for HTML files
optional arguments:
-h, --help show this help message and exit
--host HOST Host and path (without protocol) of web server
(default: )
--config_file CONFIG_FILE
Location of file with urls to ignore
--actions_output Toggle GitHub Actions output on/off (default: False)
--exit_on_found Exit with status code 1 if broken links are
found (default: False)
--print_logs Toggle printing logs to stdout. (default: False)