To get values from websites which don't provide an API is often only through scraping. It can be very tricky to get to the right values but this example here should help you to get started. This is very similar to the work-flow the scrape
sensor is using.
Importing the needed modules.
import requests
from bs4 import BeautifulSoup
We want to scrape the counter for all our implementations from the Component overview.
The section (extracted from the source) which is relevant for this example is shown below.
...
<div class="grid__item one-sixth lap-one-whole palm-one-whole">
<div class="filter-button-group">
<a href='#all' class="btn">All (444)</a>
<a href='#featured' class="btn featured">Featured</a>
<a href='#alarm' class="btn">
Alarm
(9)
</a>
...
The line <a href='#all' class="btn">All (444)</a>
contains the counter.
URL = 'https://home-assistant.io/components/'
With requests
the website is retrieved and with BeautifulSoup
parsed.
raw_html = requests.get(URL).text
data = BeautifulSoup(raw_html, 'html.parser')
Now you have the complete content of the page. CSS selectors can be used to identify the counter. We have several options to get the part in question. As BeautifulSoup
is giving us a list with the findings, we only need to identify the position in the list.
print(data.select('a')[10])
<a class="btn" href="#all">All (791)</a>
print(data.select('.btn')[0])
<a class="btn" href="#all">All (791)</a>
nth-of-type(x)
gives you element x
back.
print(data.select('a:nth-of-type(11)'))
[<a class="btn" href="#all">All (791)</a>]
To make your selector as robust as possible, it's recommended to look for unique elements like id
, URL
, etc.
print(data.select('a[href="#all"]'))
[<a class="btn" href="#all">All (791)</a>]
The value extration is handled with value_template
by the scrape
sensor. The next two step are only shown here to show all manual steps.
We only need the actual text.
print(data.select('a[href="#all"]')[0].text)
All (791)
This is a string and can be manipulated. We focus on the number.
print(data.select('a[href="#all"]')[0].text[5:8])
791
This is the number of the current platforms/components from the Component overview which are available in Home Assistant.
The details you identified here can be re-used to configure scrape
sensor's select
. This means that the most efficient way is to apply nth-of-type(x)
to your selector.
The "Using the Home Assistant Python API" notebooks contains an intro to the Python API of Home Assistant and Jupyther notebooks. Here we are sending the scrapped value to the Home Assistant frontend.
import homeassistant.remote as remote
HOST = '127.0.0.1'
PASSWORD = 'YOUR_PASSWORD'
api = remote.API(HOST, PASSWORD)
new_state = data.select('a[href="#all"]')[0].text[5:8]
attributes = {
"friendly_name": "Home Assistant Implementations",
"unit_of_measurement": "Count"
}
remote.set_state(api, 'sensor.ha_implement', new_state=new_state, attributes=attributes)
True