This notebook shows how to use CARTOframes for discovering and downloading premium datasets from CARTO's Data Observatory.
In particular, we will download touristic POI's from Pitney Bowes within a specific bounding box.
The notebook is organized in the following sections:
Note for this notebook we are using the premium dataset of Pitney Bowes POI's in Spain.
import geopandas as gpd
import pandas as pd
pd.set_option('display.max_columns', None)
from cartoframes import to_carto
from cartoframes.auth import set_default_credentials
from cartoframes.data.observatory import *
from cartoframes.viz import *
from shapely.geometry import box
In order to be able to use the Data Observatory via CARTOframes, you need to set your CARTO account credentials first.
Please, visit the Authentication guide for further detail.
from cartoframes.auth import set_default_credentials
set_default_credentials('creds.json')
Note about credentials
For security reasons, we recommend storing your credentials in an external file to prevent publishing them by accident when sharing your notebooks. You can get more information in the section Setting your credentials of the Authentication guide.
When working with very large datasets, you might need to explore the dataset in detail to decide if you need the whole data or just part of it. In order to speed up your time to identifying the exact data, it might be very helpful to download just a small sample of your data, to later decide what you need. In this section, we will show how to identify toutistic POI's from a dataset we are already subscribed to.
First, we check we're already subscribed to the dataset we want to use for our analysis. In this case, we would like to use Pitney Bowes POI's in Spain. The dataset is pb_points_of_i_94bda91b
.
You can subscribe to this premium dataset on your CARTO dashboard or contacting CARTO.
Catalog().subscriptions().datasets.to_dataframe()
slug | name | description | category_id | country_id | data_source_id | provider_id | geography_name | geography_description | temporal_aggregation | time_coverage | update_frequency | is_public_data | lang | version | category_name | provider_name | geography_id | id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ags_sociodemogr_a7e14220 | Sociodemographics - United States of America (... | Census and ACS sociodemographic data estimated... | demographics | usa | sociodemographics | ags | Census Block Group - United States of America | None | yearly | None | yearly | False | eng | 2020 | Demographics | Applied Geographic Solutions | carto-do.ags.geography_usa_blockgroup_2015 | carto-do.ags.demographics_sociodemographics_us... |
1 | ags_retailpoten_aaf25a8c | Retail Potential - United States of America (C... | The retail potential database consists of aver... | demographics | usa | retailpotential | ags | Census Block Group - United States of America ... | Shoreline clipped TIGER/Line boundaries. More ... | yearly | [2018-01-01, 2019-01-01) | yearly | False | eng | 2019 | Demographics | Applied Geographic Solutions | carto-do-public-data.carto.geography_usa_block... | carto-do.ags.demographics_retailpotential_usa_... |
2 | pb_consumer_po_62cddc04 | Points Of Interest - Consumer - United States ... | Consumer Point of interest database per catego... | points_of_interest | usa | consumer_points_of_interest | pitney_bowes | Latitude/Longitude - United States of America | Location of Points of Interest | monthly | None | monthly | False | eng | v1 | Points of Interest | Pitney Bowes | carto-do.pitney_bowes.geography_usa_latlon_v1 | carto-do.pitney_bowes.pointsofinterest_consume... |
3 | ags_sociodemogr_f510a947 | Sociodemographics - United States of America (... | Census and ACS sociodemographic data estimated... | demographics | usa | sociodemographics | ags | Census Block Group - United States of America ... | Shoreline clipped TIGER/Line boundaries. More ... | yearly | [2019-01-01, 2020-01-01) | yearly | False | eng | 2019 | Demographics | Applied Geographic Solutions | carto-do-public-data.carto.geography_usa_block... | carto-do.ags.demographics_sociodemographics_us... |
4 | ags_consumer_sp_dbabddfb | Consumer Spending - United States of America (... | The Consumer Expenditure database consists of ... | demographics | usa | consumer_spending | ags | Census Block Group - United States of America | None | yearly | None | yearly | False | eng | 2020 | Demographics | Applied Geographic Solutions | carto-do.ags.geography_usa_blockgroup_2015 | carto-do.ags.demographics_consumerspending_usa... |
5 | spa_geosocial_s_d5dc42ae | Geosocial Segments - United States of America ... | By analysing feeds from Twitter, Instagram, Me... | behavioral | usa | geosocial_segments | spatial_ai | Census Block Group - United States of America ... | Shoreline clipped TIGER/Line boundaries. More ... | quarterly | [2020-01-01, 2020-04-01) | quarterly | False | eng | v1 | Behavioral | Spatial.ai | carto-do-public-data.carto.geography_usa_block... | carto-do.spatial_ai.behavioral_geosocialsegmen... |
6 | mc_geographic__7980c5c3 | Geographic Insights - United States of America... | Geographic Insights validate, evaluate and ben... | financial | usa | geographic_insights | mastercard | Census Block Group - United States of America ... | Shoreline clipped TIGER/Line boundaries. More ... | monthly | [2019-01-01, 2020-01-01) | monthly | False | eng | v1 | Financial | Mastercard | carto-do-public-data.carto.geography_usa_block... | carto-do.mastercard.financial_geographicinsigh... |
7 | pb_points_of_i_94bda91b | Points Of Interest - Spain (Latitude/Longitude) | Point of interest database per categories | points_of_interest | esp | points_of_interest | pitney_bowes | Latitude/Longitude - Spain | Location of Points of Interest | monthly | None | monthly | False | eng | v1 | Points of Interest | Pitney Bowes | carto-do.pitney_bowes.geography_esp_latlon_v1 | carto-do.pitney_bowes.pointsofinterest_pointso... |
pois_ds = Dataset.get('pb_points_of_i_94bda91b')
pois_ds.head()
HTTP | ISO3 | NAME | SIC1 | SIC2 | SIC8 | CLASS | GROUP | PB_ID | STABB | geoid | FAXNUM | MICODE | TEL_NUM | do_date | LATITUDE | OPEN_24H | POSTCODE | do_label | AREANAME1 | AREANAME2 | AREANAME3 | AREANAME4 | BRANDNAME | GEORESULT | LONGITUDE | SUB_CLASS | AGENT_CODE | TRADE_NAME | YEAR_START | STATUS_CODE | BUSINESS_LINE | CURRENCY_CODE | EMPLOYEE_HERE | EXCHANGE_NAME | TICKER_SYMBOL | EMPLOYEE_COUNT | FAMILY_MEMBERS | FRANCHISE_NAME | HIERARCHY_CODE | PARENT_ADDRESS | PARENT_COUNTRY | TRADE_DIVISION | ADDRESSLASTLINE | CONFIDENCE_CODE | MAINADDRESSLINE | PARENT_POSTCODE | FORMATTEDADDRESS | PARENT_AREANAME1 | PARENT_AREANAME3 | SIC8_DESCRIPTION | ALT_INDUSTRY_CODE | LEGAL_STATUS_CODE | SALES_VOLUME_LOCAL | COUNTRY_ACCESS_CODE | PARENT_BUSINESS_NAME | SUBSIDIARY_INDICATOR | PARENT_STREET_ADDRESS | GLOBAL_ULTIMATE_ADDRESS | GLOBAL_ULTIMATE_COUNTRY | SALES_VOLUME_US_DOLLARS | GLOBAL_ULTIMATE_POSTCODE | DOMESTIC_ULTIMATE_ADDRESS | GLOBAL_ULTIMATE_AREANAME1 | GLOBAL_ULTIMATE_AREANAME3 | GLOBAL_ULTIMATE_INDICATOR | DOMESTIC_ULTIMATE_POSTCODE | DOMESTIC_ULTIMATE_AREANAME1 | DOMESTIC_ULTIMATE_AREANAME3 | GLOBAL_ULTIMATE_BUSINESS_NAME | GLOBAL_ULTIMATE_STREET_ADDRESS | DOMESTIC_ULTIMATE_BUSINESS_NAME | DOMESTIC_ULTIMATE_STREET_ADDRESS | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | None | ESP | EL SANTO | None | 6552 | 50440000 | DRINKING PLACES | None | SHOPPING | 1173764019 | EX | 2107476034#1.1064478#41.1567659 | None | 10010314 | 936 58 74 13 | 2020-11-01 | 43.267950 | None | AD700 | BBV PRIVANZA INTERNATIONAL | MADRID | VIZCAYA | VILADECANS | None | None | T20 | 2.164250 | DEPARTMENT STORES | None | BRICOR | 1994.0 | None | None | None | None | None | None | None | 00000 | None | 00 | None | None | DIVISION I. - SERVICES | 10869, TORRECILLA DE LOS ANGELES, CÁCERES | LOW | None | None | None | MALAGA | None | FRUIT AND VEGETABLE MARKETS | None | None | NaN | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | CARRETERA DE CADIZ 206 |
1 | None | ESP | LA FUENTE | None | 6552 | 50440000 | DRINKING PLACES | None | SHOPPING | 1432760662 | MD | 2227609955#-6.378907#39.470747 | None | 10010314 | 935 48 09 27 | 2020-11-01 | 43.323898 | None | GX11 1AA | SPRINGFIELD BANK AND TRUST | MADRID | BIZKAIA | BARCELONA | None | None | S8HPNTSCZA | -1.639469 | DEPARTMENT STORES | None | None | 2018.0 | None | None | None | None | None | None | None | 00000 | None | 00 | None | None | DIVISION I. - SERVICES | 28911, LEGANES, MADRID | LOW | None | None | None | GUIPUZCOA | None | FRUIT AND VEGETABLE MARKETS | None | None | NaN | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | PASEO DE MIRAMON 185 |
2 | None | ESP | SANT FRANCESC XAVIER | None | 6541 | 50440000 | DRINKING PLACES | None | SHOPPING | 1170864171 | CT | 2226191863#-8.5258399#42.8889816 | None | 10010314 | 938 14 89 43 | 2020-11-01 | 43.322003 | None | AD700 | I D T FINANCIAL SERVICES | BARCELONA | VIZCAYA | SANT PERE DE RIBES | None | None | T18 | -3.686817 | DEPARTMENT STORES | None | EL CORTE INGLÉS | 2008.0 | None | None | None | None | None | None | None | 00000 | None | 00 | None | None | DIVISION I. - SERVICES | 08301, MATARO, BARCELONA | MEDIUM | None | None | None | MADRID | None | FRUIT AND VEGETABLE MARKETS | None | None | NaN | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | PASEO CASTELLANA, 120 - IZ BJ |
3 | None | ESP | RUFINO BLANCO | None | 6552 | 50440000 | DRINKING PLACES | None | SHOPPING | 1171185920 | MC | 2225636588#0.629133#41.6186896 | None | 10010314 | 946 53 35 12 | 2020-11-01 | 42.507879 | None | GX11 1AA | JYSKE BANK MANAGEMENT | MADRID | VIZCAYA | GETXO | None | None | T19 | -3.692750 | DEPARTMENT STORES | None | HIPERCOR | 2006.0 | None | None | None | None | None | None | None | 00000 | None | 00 | None | None | DIVISION I. - SERVICES | 30009, MURCIA, MURCIA | HIGH | None | None | None | MALAGA | None | FRUIT AND VEGETABLE MARKETS | None | None | NaN | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | AVENIDA DEL CARMEN (ED EL FARO), BL 3 LOC |
4 | None | ESP | CASA CONVALESCÈNCIA | None | None | 50440000 | DRINKING PLACES | None | SHOPPING | 1173842007 | AR | 2219538595#-8.4077351#43.3661062 | None | 10010314 | 661 78 11 33 | 2020-11-01 | 43.318963 | None | AD400 | OFFSEC SERVICES | TENERIFE | GIPUZKOA | SAN BARTOLOMÉ | None | None | T20 | -3.656440 | DEPARTMENT STORES | None | EL CORTE INGLÉS | 1988.0 | None | None | None | None | None | None | None | 00000 | None | 00 | None | None | DIVISION I. - SERVICES | 50180, UTEBO, ZARAGOZA | LOW | None | None | None | LERIDA | None | FRUIT AND VEGETABLE MARKETS | None | None | NaN | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | CARRETERA PALAU (KM 1) |
5 | None | ESP | ROSALÍA DE CASTRO | None | None | 50440000 | DRINKING PLACES | None | PERSONAL SERVICES | 1505524737 | AN | 2226659530#-5.6530505#43.5359533 | None | 10010314 | 922 04 90 67 | 2020-11-01 | 43.323258 | None | AD500 | SOIAM | LERIDA | VIZCAYA | PEDREGUER | None | None | T17 | -3.806100 | DEPARTMENT STORES | None | EL CORTE INGLÉS | 2020.0 | None | None | None | None | None | None | None | 00000 | None | 00 | None | None | DIVISION I. - SERVICES | 21006, HUELVA, HUELVA | HIGH | None | None | None | SEVILLA | None | FRUIT AND VEGETABLE MARKETS | None | None | 59118.0 | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | CALLE MIGUEL VAZQUEZ DELGADO 71 |
6 | None | ESP | CENTRO DE FORMACIÓN Y EMPLEO | None | None | 50440000 | DRINKING PLACES | None | PERSONAL SERVICES | 1173898606 | CT | 2225995739#-0.883421#41.647102 | None | 10010314 | 968 70 11 50 | 2020-11-01 | 43.265980 | None | GX11 1AA | LAB | MADRID | BIZKAIA | CARAVACA DE LA CRUZ | None | None | T19 | -2.669666 | DEPARTMENT STORES | None | None | NaN | None | None | None | None | None | None | None | 00000 | None | 00 | None | None | DIVISION I. - SERVICES | 08026, BARCELONA, BARCELONA | MEDIUM | None | None | None | BARCELONA | None | FRUIT AND VEGETABLE MARKETS | None | None | 60384.0 | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | CALLE ANTIC CAMI DE XIMELIS 19 |
7 | None | ESP | EFA EL SOTO | None | None | 50440000 | DRINKING PLACES | None | PERSONAL SERVICES | 1293842742 | CT | 2222921452#-2.79984#36.7689 | None | 10010314 | 914 04 99 28 | 2020-11-01 | 43.253290 | None | AD500 | F.I.T.A | MADRID | VIZCAYA | MADRID | None | None | T16 | 2.127009 | DEPARTMENT STORES | None | None | 2016.0 | None | None | None | None | None | None | None | 00000 | None | 00 | None | None | DIVISION I. - SERVICES | 08030, BARCELONA, BARCELONA | LOW | None | None | None | None | None | FRUIT AND VEGETABLE MARKETS | None | None | 61505.0 | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | AVENIDA GENERAL PERON (ED MASTER'S I), 38 - PI... |
8 | None | ESP | O CASTIÑEIRO | None | None | 50440000 | DRINKING PLACES | None | PERSONAL SERVICES | 1172241073 | AN | 2152151101#-8.644829#42.4307343 | 944 46 13 47 | 10010314 | 943 61 95 40 | 2020-11-01 | 42.851396 | None | AD500 | P & C APARTAMENTS | CASTILLA Y LEÓN | ANDORRA | IRUN | None | None | T16 | -3.689160 | DEPARTMENT STORES | None | EL CORTE INGLÉS | 2000.0 | None | None | None | None | None | None | None | None | None | 00 | None | None | DIVISION I. - SERVICES | 29014, MALAGA, MÁLAGA | HIGH | None | None | None | BARCELONA | None | FRUIT AND VEGETABLE MARKETS | None | None | 67464.0 | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | CALLE BRUC DEL MIG 8 |
9 | None | ESP | CPEB DE CABAÑAQUINTA | None | None | 50440000 | DRINKING PLACES | None | PERSONAL SERVICES | 1171203786 | AN | 2221998958#-7.3611215#43.6618237 | None | 10010314 | 953 58 03 67 | 2020-11-01 | 43.247379 | None | AD400 | ME VACANCES | PAÍS VASCO | BIZKAIA | ALCALÁ LA REAL | None | None | T20 | 2.127009 | DEPARTMENT STORES | None | None | 1999.0 | None | None | None | None | None | None | None | None | None | 00 | None | None | DIVISION I. - SERVICES | 41020, SEVILLA, SEVILLA | LOW | None | None | None | ALBACETE | None | FRUIT AND VEGETABLE MARKETS | None | None | 60587.0 | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | None | CALLE MAYOR, 32 - 1 A |
Let's make sure the dataset covers our area of interest.
pois_ds.geom_coverage()
We're only interested in tourism related POI's in Spain. Since we don't know exactly which variable to use in order to filter tourism POI's, we'll first download a small sample of the dataset to explore it. We'll filter by a bounding box covering Madrid downtown to make sure we have a good variety of POI's.
We can use SQL queries to specify the bounding box or polygon we are interested in.
ST_IntersectsBox
.ST_Intersects
.In order to get the bounding box of interest we'll use bboxfinder.
sql_query = "SELECT * except(do_label) FROM $dataset$ WHERE ST_IntersectsBox(geom, -3.707628,40.415947,-3.700891,40.421403)"
sample_df = pois_ds.to_dataframe(sql_query=sql_query)
#To keep only most updated POI's (based on the do_date)
sample_df = sample_df.sort_values(['NAME', 'do_date']).groupby('NAME').first().reset_index()
sample_df.head()
NAME | geoid | do_date | BRANDNAME | PB_ID | TRADE_NAME | FRANCHISE_NAME | ISO3 | AREANAME4 | AREANAME3 | AREANAME2 | AREANAME1 | STABB | POSTCODE | FORMATTEDADDRESS | MAINADDRESSLINE | ADDRESSLASTLINE | LONGITUDE | LATITUDE | GEORESULT | CONFIDENCE_CODE | COUNTRY_ACCESS_CODE | TEL_NUM | FAXNUM | HTTP | OPEN_24H | BUSINESS_LINE | SIC1 | SIC2 | SIC8 | SIC8_DESCRIPTION | ALT_INDUSTRY_CODE | MICODE | TRADE_DIVISION | GROUP | CLASS | SUB_CLASS | EMPLOYEE_HERE | EMPLOYEE_COUNT | YEAR_START | SALES_VOLUME_LOCAL | SALES_VOLUME_US_DOLLARS | CURRENCY_CODE | AGENT_CODE | LEGAL_STATUS_CODE | STATUS_CODE | SUBSIDIARY_INDICATOR | PARENT_BUSINESS_NAME | PARENT_ADDRESS | PARENT_STREET_ADDRESS | PARENT_AREANAME3 | PARENT_AREANAME1 | PARENT_COUNTRY | PARENT_POSTCODE | DOMESTIC_ULTIMATE_BUSINESS_NAME | DOMESTIC_ULTIMATE_ADDRESS | DOMESTIC_ULTIMATE_STREET_ADDRESS | DOMESTIC_ULTIMATE_AREANAME3 | DOMESTIC_ULTIMATE_AREANAME1 | DOMESTIC_ULTIMATE_POSTCODE | GLOBAL_ULTIMATE_INDICATOR | GLOBAL_ULTIMATE_BUSINESS_NAME | GLOBAL_ULTIMATE_ADDRESS | GLOBAL_ULTIMATE_STREET_ADDRESS | GLOBAL_ULTIMATE_AREANAME3 | GLOBAL_ULTIMATE_AREANAME1 | GLOBAL_ULTIMATE_COUNTRY | GLOBAL_ULTIMATE_POSTCODE | FAMILY_MEMBERS | HIERARCHY_CODE | TICKER_SYMBOL | EXCHANGE_NAME | geom | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 100 MONTADITOS | 2173220473#-3.70582#40.416 | 2019-12-01 | NaN | 2173220473 | 100 MONTADITOS | NaN | ESP | NaN | MADRID | MADRID | COMUNIDAD DE MADRID | MD | 28012.0 | CALLE POSTAS 12, 28012, MADRID, MADRID | CALLE POSTAS 12 | 28012, MADRID, MADRID | -3.705820 | 40.416000 | S8HPNTSCZA | HIGH | NaN | 915 23 11 40 | 913 51 90 03 | ATTCLIENTE@GRUPORESTALIA.COM | SPAIN.100MONTADITOS.COM | NaN | NaN | NaN | NaN | NaN | TAPAS RESTAURANTS | NaN | 10021076 | DIVISION G. - RETAIL TRADE | EATING AND DRINKING PLACES | EATING PLACES/RESTAURANTS | EATING PLACES/RESTAURANTS | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.70582 40.41600) |
1 | 1005 DISTRIBUCION SL | 1277167953#-3.7055463461631111#40.420295192309013 | 2019-12-01 | NaN | 1277167953 | NaN | NaN | ESP | NaN | MADRID | MADRID | MADRID | MD | 28013.0 | CALLE GRAN VIA, 28013, MADRID, MADRID | CALLE GRAN VIA | 28013, MADRID, MADRID | -3.705546 | 40.420295 | S4-PNTSCZA | MEDIUM | 34.0 | 915 22 16 12 | NaN | NaN | NaN | NaN | METALS SERVICE CENTERS AND OFFICES | 5051.0 | NaN | 50510000.0 | METALS SERVICE CENTERS AND OFFICES | 350.0 | 10035051 | DIVISION F. - WHOLESALE TRADE | WHOLESALE TRADE - DURABLE GOODS | METALS AND MINERALS, EXCEPT PETROLEUM | METALS SERVICE CENTERS AND OFFICES | 1.0 | 1.0 | 2004.0 | 149127.0 | 170724.0 | 5080.0 | G | 3.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | N | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | NaN | NaN | POINT (-3.70555 40.42030) |
2 | 15K ANGELS AND INVESTORS SL. | 1369422585#-3.70587#40.42048 | 2019-12-01 | NaN | 1369422585 | NaN | NaN | ESP | MADRID | MADRID | MADRID | MADRID | MD | 28013.0 | CALLE GRAN VIA 46, 28013, MADRID, MADRID | CALLE GRAN VIA 46 | 28013, MADRID, MADRID | -3.705870 | 40.420480 | S8HPNTSCZA | HIGH | 34.0 | NaN | NaN | NaN | WWW.15KANGELS.COM | NaN | SECURITY AND COMMODITY SERVICES, NEC, NSK | 6289.0 | NaN | 62890000.0 | SECURITY AND COMMODITY SERVICE | 350.0 | 10010324 | DIVISION H. - FINANCE, INSURANCE, AND REAL ESTATE | SECURITY AND COMMODITY BROKERS, DEALERS, EXCHA... | SERVICES ALLIED WITH THE EXCHANGE OF SECURITIE... | SECURITY AND COMMODITY SERVICE | 2.0 | 2.0 | 2017.0 | 54756.0 | 61000.0 | 5080.0 | G | 3.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | N | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | NaN | NaN | POINT (-3.70587 40.42048) |
3 | 1610 PLAZA DEL CARMEN 5 SL. | 1289731188#-3.7028#40.41892 | 2019-12-01 | NaN | 1289731188 | NaN | NaN | ESP | MADRID | MADRID | MADRID | MADRID | MD | 28013.0 | PLAZA CARMEN 5, 28013, MADRID, MADRID | PLAZA CARMEN 5 | 28013, MADRID, MADRID | -3.702800 | 40.418920 | S8HPNTSCZA | HIGH | 34.0 | NaN | NaN | NaN | NaN | NaN | EATING PLACES | 5812.0 | NaN | 58120000.0 | EATING PLACES | 350.0 | 10020100 | DIVISION G. - RETAIL TRADE | EATING AND DRINKING PLACES | EATING PLACES/RESTAURANTS | EATING PLACES/RESTAURANTS - UNSPECIFIED | 35.0 | 35.0 | 2016.0 | 2766588.0 | 3167262.0 | 5080.0 | G | 3.0 | 0.0 | 0.0 | PUZZLE DE RESTAURANTES SL. | CALLE BALLESTA, 32 - LOC DR, 28004, MADRID, MA... | CALLE BALLESTA, 32 - LOC DR | MADRID | MADRID | SPAIN | 28004 | PUZZLE DE RESTAURANTES SL. | CALLE BALLESTA, 32 - LOC DR, 28004, MADRID, MA... | CALLE BALLESTA, 32 - LOC DR | MADRID | MADRID | 28004.0 | N | PUZZLE DE RESTAURANTES SL. | CALLE BALLESTA, 32 - LOC DR, 28004, MADRID, MA... | CALLE BALLESTA, 32 - LOC DR | MADRID | MADRID | SPAIN | 28004 | 0.0 | 0.0 | NaN | NaN | POINT (-3.70280 40.41892) |
4 | 1ST WANDA SERVICE SERVICIOS INTEGRADOS DE CONT... | 1277282874#-3.70641#40.41956 | 2019-12-01 | NaN | 1277282874 | NaN | NaN | ESP | MADRID | MADRID | MADRID | MADRID | MD | 28013.0 | CALLE PRECIADOS, 29 - 5 A, 28013, MADRID, MADRID | CALLE PRECIADOS, 29 - 5 A | 28013, MADRID, MADRID | -3.706410 | 40.419560 | S8HPNTSCZA | HIGH | 34.0 | NaN | NaN | NaN | NaN | NaN | BUSINESS SERVICES, NEC, NSK | 7389.0 | NaN | 73890900.0 | FINANCIAL SERVICES | 350.0 | 10905900 | DIVISION I. - SERVICES | BUSINESS SERVICES | MISCELLANEOUS BUSINESS SERVICES | BUSINESS SERVICES, NEC | 0.0 | 1.0 | 2003.0 | 58097.0 | 68508.0 | 5080.0 | G | 3.0 | 1.0 | 0.0 | 1ST WANDA SERVICE SERVICIOS INTEGRADOS DE CONT... | CALLE PRECIADOS, 29 - 5 A, 28013, MADRID, MADR... | CALLE PRECIADOS, 29 - 5 A | MADRID | MADRID | SPAIN | 28013 | 1ST WANDA SERVICE SERVICIOS INTEGRADOS DE CONT... | CALLE PRECIADOS, 29 - 5 A, 28013, MADRID, MADRID | CALLE PRECIADOS, 29 - 5 A | MADRID | MADRID | 28013.0 | Y | 1ST WANDA SERVICE SERVICIOS INTEGRADOS DE CONT... | CALLE PRECIADOS, 29 - 5 A, 28013, MADRID, MADR... | CALLE PRECIADOS, 29 - 5 A | MADRID | MADRID | SPAIN | 28013 | 2.0 | 1.0 | NaN | NaN | POINT (-3.70641 40.41956) |
After some exploration of the DataFrame, we find out TRADE_DIVISION
is our variable. There is a category called DIVISION L. - TOURISM
.
sample_df['TRADE_DIVISION'].value_counts()
DIVISION I. - SERVICES 1769 DIVISION G. - RETAIL TRADE 1370 DIVISION E. - TRANSPORTATION AND PUBLIC UTILITIES 724 DIVISION H. - FINANCE, INSURANCE, AND REAL ESTATE 613 DIVISION F. - WHOLESALE TRADE 205 DIVISION D. - MANUFACTURING 198 DIVISION C. - CONSTRUCTION 67 DIVISION J. - PUBLIC ADMINISTRATION 25 DIVISION L. - TOURISM 14 DIVISION A. - AGRICULTURE, FORESTRY, AND FISHING 10 DIVISION M. - SPORTS 8 Name: TRADE_DIVISION, dtype: int64
Map(Layer(sample_df,
geom_col='geom',
style=color_category_style('TRADE_DIVISION', size=3, stroke_width=0.2),
encode_data=False))
Once we know we can use column TRADE_DIVISION
to filter tourism POIs, we'll add it to our query with a larger bounding box covering our area of interest.
sql_query = """
SELECT * except(do_label) FROM $dataset$
WHERE TRADE_DIVISION = 'DIVISION L. - TOURISM'
AND ST_IntersectsBox(geom, -3.716398,40.407437,-3.690477,40.425277)
"""
tourism_pois = pois_ds.to_dataframe(sql_query=sql_query)
#To keep only most updated POIs (based on the do_date)
tourism_pois = tourism_pois.sort_values(['NAME', 'do_date']).groupby('NAME').first().reset_index()
tourism_pois.head()
NAME | geoid | do_date | BRANDNAME | PB_ID | TRADE_NAME | FRANCHISE_NAME | ISO3 | AREANAME4 | AREANAME3 | AREANAME2 | AREANAME1 | STABB | POSTCODE | FORMATTEDADDRESS | MAINADDRESSLINE | ADDRESSLASTLINE | LONGITUDE | LATITUDE | GEORESULT | CONFIDENCE_CODE | COUNTRY_ACCESS_CODE | TEL_NUM | FAXNUM | HTTP | OPEN_24H | BUSINESS_LINE | SIC1 | SIC2 | SIC8 | SIC8_DESCRIPTION | ALT_INDUSTRY_CODE | MICODE | TRADE_DIVISION | GROUP | CLASS | SUB_CLASS | EMPLOYEE_HERE | EMPLOYEE_COUNT | YEAR_START | SALES_VOLUME_LOCAL | SALES_VOLUME_US_DOLLARS | CURRENCY_CODE | AGENT_CODE | LEGAL_STATUS_CODE | STATUS_CODE | SUBSIDIARY_INDICATOR | PARENT_BUSINESS_NAME | PARENT_ADDRESS | PARENT_STREET_ADDRESS | PARENT_AREANAME3 | PARENT_AREANAME1 | PARENT_COUNTRY | PARENT_POSTCODE | DOMESTIC_ULTIMATE_BUSINESS_NAME | DOMESTIC_ULTIMATE_ADDRESS | DOMESTIC_ULTIMATE_STREET_ADDRESS | DOMESTIC_ULTIMATE_AREANAME3 | DOMESTIC_ULTIMATE_AREANAME1 | DOMESTIC_ULTIMATE_POSTCODE | GLOBAL_ULTIMATE_INDICATOR | GLOBAL_ULTIMATE_BUSINESS_NAME | GLOBAL_ULTIMATE_ADDRESS | GLOBAL_ULTIMATE_STREET_ADDRESS | GLOBAL_ULTIMATE_AREANAME3 | GLOBAL_ULTIMATE_AREANAME1 | GLOBAL_ULTIMATE_COUNTRY | GLOBAL_ULTIMATE_POSTCODE | FAMILY_MEMBERS | HIERARCHY_CODE | TICKER_SYMBOL | EXCHANGE_NAME | geom | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ARCO DE CUCHILLEROS | 2033946578#-3.708101#40.4147983 | 2019-12-01 | NaN | 2033946578 | NaN | NaN | ESP | NaN | MADRID | MADRID | COMUNIDAD DE MADRID | MD | 28012.0 | CALLE DE LOS CUCHILLEROS, 28012, MADRID, MADRID | CALLE DE LOS CUCHILLEROS | 28012, MADRID, MADRID | -3.708101 | 40.414798 | T20 | LOW | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ARCH | NaN | 10110112 | DIVISION L. - TOURISM | TOURISM | IMPORTANT TOURIST ATTRACTION | ARCH | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.70810 40.41480) |
1 | ATENEO DE MADRID | 2033838561#-3.6982188#40.4150308 | 2019-12-01 | NaN | 2033838561 | NaN | NaN | ESP | NaN | MADRID | MADRID | COMUNIDAD DE MADRID | MD | 28014.0 | CALLE DEL PRADO, 28014, MADRID, MADRID | CALLE DEL PRADO | 28014, MADRID, MADRID | -3.698219 | 40.415031 | T20 | LOW | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | TOURIST BUILDING | NaN | 10110200 | DIVISION L. - TOURISM | TOURISM | IMPORTANT TOURIST ATTRACTION | TOURIST BUILDING | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.69822 40.41503) |
2 | BANCO DE ESPAÑA | 2022507776#-3.6939777#40.4189737 | 2019-12-01 | NaN | 2022507776 | NaN | NaN | ESP | NaN | MADRID | MADRID | COMUNIDAD DE MADRID | MD | NaN | MADRID, MADRID | NaN | MADRID, MADRID | -3.693978 | 40.418974 | T1 | HIGH | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | TOURIST BUILDING | NaN | 10110200 | DIVISION L. - TOURISM | TOURISM | IMPORTANT TOURIST ATTRACTION | TOURIST BUILDING | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.69398 40.41897) |
3 | BASÍLICA DE NUESTRO PADRE JESÚS DE MEDINACELI | 2033893910#-3.6957089#40.4141956 | 2019-12-01 | NaN | 2033893910 | NaN | NaN | ESP | NaN | MADRID | MADRID | COMUNIDAD DE MADRID | MD | 28014.0 | PLAZA DE JESÚS, 28014, MADRID, MADRID | PLAZA DE JESÚS | 28014, MADRID, MADRID | -3.695709 | 40.414196 | T20 | LOW | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | TOURIST BUILDING | NaN | 10110200 | DIVISION L. - TOURISM | TOURISM | IMPORTANT TOURIST ATTRACTION | TOURIST BUILDING | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.69571 40.41420) |
4 | BIBLIOTECA NACIONAL DE ESPAÑA | 2033804926#-3.6906236#40.4238526 | 2019-12-01 | NaN | 2033804926 | NaN | NaN | ESP | NaN | MADRID | MADRID | COMUNIDAD DE MADRID | MD | 28001.0 | PASEO DE RECOLETOS, 28001, MADRID, MADRID | PASEO DE RECOLETOS | 28001, MADRID, MADRID | -3.690624 | 40.423853 | T20 | LOW | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | TOURIST BUILDING | NaN | 10110200 | DIVISION L. - TOURISM | TOURISM | IMPORTANT TOURIST ATTRACTION | TOURIST BUILDING | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-3.69062 40.42385) |
We can classify the tourism POI's using the variable SUB_CLASS
.
tourism_pois['SUB_CLASS'].value_counts()
TOURIST BUILDING 66 MONUMENT 20 IMPORTANT TOURIST ATTRACTION -UNSPECIFIED 16 SCENIC, PANORAMIC VIEW 4 STATUE 2 TOWER 2 IMPORTANT TOURIST ATTRACTION 2 MEMORIAL 1 ARCH 1 Name: SUB_CLASS, dtype: int64
Map(Layer(tourism_pois,
geom_col='geom',
style=color_category_style('SUB_CLASS', stroke_width=0.2)))
Finally, we will save this filtered dataset on our CARTO account so that we can use it in the future. We can do this using the CARTOframes function to_carto
.
to_carto(tourism_pois, 'demo_tourism_pois_madrid', geom_col='geom')
Success! Data uploaded to table "demo_tourism_pois_madrid" correctly
'demo_tourism_pois_madrid'