#hide
from nbdev import *
Check for broken external and internal links.
Check for broken links in HTML documents. This occurs in parallel so performance is fast. Both external links and external links are checked, with intelligent behavior for internal links.
pip install fastlinkcheck
from fastlinkcheck import link_check
show_doc(link_check)
link_check
[source]
link_check
(path
:"Root directory searched recursively for HTML files",host
:"Host and path (without protocol) of web server"=''
,config_file
:"Location of file with urls to ignore"=None
,actions_output
:"Toggle GitHub Actions output on/off"=False
,exit_on_found
:"(CLI Only) Exit with status code 1 if broken links are found"=False
,print_logs
:"Toggle printing logs to stdout."=False
)
Check for broken links recursively in path
.
The _example/ directory in this repo contains sample HTML files which we can use for demonstration:
from fastlinkcheck import link_check
broken_links = link_check(path='_example', host='fastlinkcheck.com')
print(broken_links)
- 'http://somecdn.com/doesntexist.html' was found in the following pages: - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html` - Path('/Users/hamelsmu/github/fastlinkcheck/_example/test.js') was found in the following pages: - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`
You can optionally print logs to stdout with the print_logs
parameter. This can be useful for debugging:
broken_links = link_check(path='_example', host='fastlinkcheck.com', print_logs=True)
ERROR: The Following Broken Links or Paths were found: - 'http://somecdn.com/doesntexist.html' was found in the following pages: - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html` - Path('/Users/hamelsmu/github/fastlinkcheck/_example/test.js') was found in the following pages: - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`
print(f'Number of broken links found {len(broken_links)}')
Number of broken links found 2
You can choose to ignore files with a a plain-text file containing a list of urls to ignore. For example, the file linkcheck.rc
contains a list of urls I want to ignore:
with open('_example/linkcheck.rc', 'r') as f: print(f.read())
test.js https://www.google.com
In this case example/test.js
will be filtered out from the list:
broken_links = link_check(path='_example', host='fastlinkcheck.com', config_file='_example/linkcheck.rc')
print(broken_links)
- 'http://somecdn.com/doesntexist.html' was found in the following pages: - `/Users/hamelsmu/github/fastlinkcheck/_example/test.html`
link_check
can also be called use from the command line like this:
The -h
or --help
flag will allow you to see the command line docs:
!link_check -h
usage: link_check [-h] [--host HOST] [--config_file CONFIG_FILE] [--actions_output] [--exit_on_found] [--print_logs] [--pdb] [--xtra XTRA] path Check for broken links recursively in `path`. positional arguments: path Root directory searched recursively for HTML files optional arguments: -h, --help show this help message and exit --host HOST Host and path (without protocol) of web server (default: ) --config_file CONFIG_FILE Location of file with urls to ignore --actions_output Toggle GitHub Actions output on/off (default: False) --exit_on_found (CLI Only) Exit with status code 1 if broken links are found (default: False) --print_logs Toggle printing logs to stdout. (default: False) --pdb Run in pdb debugger (default: False) --xtra XTRA Parse for additional args (default: '')