Obsei Tutorial 02

This example shows following Obsei workflow

  1. Observe: Play Store's app reviews
  2. Pre-process: Clean review text with properly
  3. Analyze: Classify review text within given category list
  4. Inform: Provide all data in Pandas DataFrame
  5. Store: Store data in Google Drive in CSV format

Install Obsei from latest code, perform these steps -

  • Select GPU RunType for faster computation
  • Restart Runtime after installation
In [ ]:
!pip install git+https://github.com/lalitpagaria/obsei.git
Collecting git+https://github.com/lalitpagaria/obsei.git
  Cloning https://github.com/lalitpagaria/obsei.git to /tmp/pip-req-build-9q4fz4j2
  Running command git clone -q https://github.com/lalitpagaria/obsei.git /tmp/pip-req-build-9q4fz4j2
Requirement already satisfied (use --upgrade to upgrade): obsei==0.0.9 from git+https://github.com/lalitpagaria/obsei.git in /usr/local/lib/python3.7/dist-packages
Requirement already satisfied: app-store-reviews-reader==1.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.2)
Requirement already satisfied: atlassian-python-api==3.10.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.10.0)
Requirement already satisfied: beautifulsoup4==4.9.3 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (4.9.3)
Requirement already satisfied: blis==0.7.4 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.7.4)
Requirement already satisfied: cachetools==4.2.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (4.2.2)
Requirement already satisfied: catalogue==2.0.4 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.0.4)
Requirement already satisfied: certifi==2021.5.30 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2021.5.30)
Requirement already satisfied: chardet==4.0.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (4.0.0)
Requirement already satisfied: click==7.1.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (7.1.2)
Requirement already satisfied: courlan==0.4.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.4.0)
Requirement already satisfied: cssselect==1.1.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.1.0)
Requirement already satisfied: cymem==2.0.5 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.0.5)
Requirement already satisfied: dateparser==1.0.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.0.0)
Requirement already satisfied: deprecated==1.2.12 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.2.12)
Requirement already satisfied: elasticsearch==7.13.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (7.13.1)
Requirement already satisfied: feedparser==6.0.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (6.0.2)
Requirement already satisfied: filelock==3.0.12 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.0.12)
Requirement already satisfied: gnews==0.1.3 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.1.3)
Requirement already satisfied: google-api-core==1.30.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.30.0)
Requirement already satisfied: google-api-python-client==2.8.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.8.0)
Requirement already satisfied: google-auth==1.30.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.30.2)
Requirement already satisfied: google-auth-httplib2==0.1.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.1.0)
Requirement already satisfied: google-play-scraper==1.0.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.0.0)
Requirement already satisfied: googleapis-common-protos==1.53.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.53.0)
Requirement already satisfied: greenlet==1.1.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.1.0)
Requirement already satisfied: htmldate==0.8.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.8.1)
Requirement already satisfied: httplib2==0.19.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.19.1)
Requirement already satisfied: huggingface-hub==0.0.8 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.0.8)
Requirement already satisfied: idna==2.10 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.10)
Requirement already satisfied: importlib-metadata==4.5.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (4.5.0)
Requirement already satisfied: jinja2==3.0.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.0.1)
Requirement already satisfied: joblib==1.0.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.0.1)
Requirement already satisfied: justext==2.2.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.2.0)
Requirement already satisfied: lxml==4.6.3 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (4.6.3)
Requirement already satisfied: markupsafe==2.0.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.0.1)
Requirement already satisfied: mmh3==3.0.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.0.0)
Requirement already satisfied: murmurhash==1.0.5 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.0.5)
Requirement already satisfied: nltk==3.6.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.6.2)
Requirement already satisfied: numpy==1.20.3 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.20.3)
Requirement already satisfied: oauthlib==3.1.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.1.1)
Requirement already satisfied: packaging==20.9 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (20.9)
Requirement already satisfied: pandas==1.2.4 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.2.4)
Requirement already satisfied: pathy==0.5.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.5.2)
Requirement already satisfied: praw==7.2.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (7.2.0)
Requirement already satisfied: prawcore==2.1.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.1.0)
Requirement already satisfied: preshed==3.0.5 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.0.5)
Requirement already satisfied: presidio-analyzer==2.2.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.2.1)
Requirement already satisfied: presidio-anonymizer==2.2.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.2.1)
Requirement already satisfied: protobuf==3.17.3 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.17.3)
Requirement already satisfied: pyasn1==0.4.8 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.4.8)
Requirement already satisfied: pyasn1-modules==0.2.8 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.2.8)
Requirement already satisfied: pycryptodome==3.10.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.10.1)
Requirement already satisfied: pydantic==1.7.4 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.7.4)
Requirement already satisfied: pyparsing==2.4.7 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.4.7)
Requirement already satisfied: python-dateutil==2.8.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.8.1)
Requirement already satisfied: python-facebook-api==0.9.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.9.2)
Requirement already satisfied: pytz==2021.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2021.1)
Requirement already satisfied: pyyaml==5.4.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (5.4.1)
Requirement already satisfied: readability-lxml==0.8.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.8.1)
Requirement already satisfied: reddit-rss-reader==1.3.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.3.2)
Requirement already satisfied: regex==2020.11.13 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2020.11.13)
Requirement already satisfied: requests==2.25.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.25.1)
Requirement already satisfied: requests-file==1.5.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.5.1)
Requirement already satisfied: requests-oauthlib==1.3.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.3.0)
Requirement already satisfied: rsa==4.7.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (4.7.2)
Requirement already satisfied: sacremoses==0.0.45 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.0.45)
Requirement already satisfied: searchtweets-v2==1.0.7 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.0.7)
Requirement already satisfied: sentencepiece==0.1.95 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.1.95)
Requirement already satisfied: sgmllib3k==1.0.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.0.0)
Requirement already satisfied: six==1.16.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.16.0)
Requirement already satisfied: slack-sdk==3.6.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.6.0)
Requirement already satisfied: smart-open==3.0.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.0.0)
Requirement already satisfied: soupsieve==2.2.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.2.1)
Requirement already satisfied: spacy==3.0.5 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.0.5)
Requirement already satisfied: spacy-legacy==3.0.5 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.0.5)
Requirement already satisfied: sqlalchemy==1.4.17 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.4.17)
Requirement already satisfied: srsly==2.4.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.4.1)
Requirement already satisfied: thinc==8.0.4 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (8.0.4)
Requirement already satisfied: tld==0.12.6 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.12.6)
Requirement already satisfied: tldextract==3.1.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.1.0)
Requirement already satisfied: tokenizers==0.10.3 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.10.3)
Requirement already satisfied: tqdm==4.61.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (4.61.0)
Requirement already satisfied: trafilatura==0.8.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.8.2)
Requirement already satisfied: transformers==4.6.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (4.6.1)
Requirement already satisfied: tweet-preprocessor==0.6.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.6.0)
Requirement already satisfied: typer==0.3.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.3.2)
Requirement already satisfied: typing-extensions==3.10.0.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.10.0.0)
Requirement already satisfied: tzlocal==2.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.1)
Requirement already satisfied: update-checker==0.18.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.18.0)
Requirement already satisfied: uritemplate==3.0.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.0.1)
Requirement already satisfied: urllib3==1.26.5 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.26.5)
Requirement already satisfied: vadersentiment==3.3.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.3.2)
Requirement already satisfied: wasabi==0.8.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.8.2)
Requirement already satisfied: websocket-client==1.0.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.0.1)
Requirement already satisfied: wrapt==1.12.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.12.1)
Requirement already satisfied: zenpy==2.0.24 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.0.24)
Requirement already satisfied: zipp==3.4.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.4.1)
Requirement already satisfied: torch==1.8.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.8.1)
Requirement already satisfied: setuptools>=40.3.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core==1.30.0->obsei==0.0.9) (57.0.0)
Requirement already satisfied: responses>=0.11 in /usr/local/lib/python3.7/dist-packages (from python-facebook-api==0.9.2->obsei==0.0.9) (0.13.3)
Requirement already satisfied: cattrs<2.0,>=1.1; python_version >= "3.7" and python_version < "4.0" in /usr/local/lib/python3.7/dist-packages (from python-facebook-api==0.9.2->obsei==0.0.9) (1.7.1)
Requirement already satisfied: attrs<21.0.0,>=20.1.0 in /usr/local/lib/python3.7/dist-packages (from python-facebook-api==0.9.2->obsei==0.0.9) (20.3.0)
Building wheels for collected packages: obsei
  Building wheel for obsei (setup.py) ... done
  Created wheel for obsei: filename=obsei-0.0.9-cp37-none-any.whl size=65557 sha256=bc7c8c937eed4a7b325b3ef8e46de64e44778e40914d99267356cc4ce36c7c27
  Stored in directory: /tmp/pip-ephem-wheel-cache-qhkx9sy8/wheels/49/1a/6e/2fd83c9a275b7096fc615a0edef2d55b1fc33c3751ba45c1ad
Successfully built obsei

Mount your Google Drive to store CSV

In [ ]:
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

Configure following input -

  • name: Brand name of App
  • category_list: List of categories to perform review text classification
  • identifier: Package name of the app, it can be found at the end of the url of app in play store
  • country: Country of reviews
  • lookup_period: How many old reviews to collect (Note: Google rate limit and provide max 200 reviews only)
  • extra_stop_words: Extra stop words top clean from review text
In [ ]:
name = "zomato"
category_list = ["easyOrder placement", "Realtime order tracking", "easy payment options","Rewards and discounts","user interface","social media Integration",]
identifier = "com.application.zomato"
country = "in"
lookup_period = "365d"
extra_stop_words = ["i", "-", "day", "will", ".", "use", "n", "without", "please", "app", "ha", "ho", "nt", "wa", 
                    "thi", "plz", "pleas", "ff", "ya", "thank", "you", "thanks", "mai"]

Configure columns of Pandas DataFrame

included_cols will only be returned by Pandas Sink and rename_cols_dict will rename selected included_cols columns to desired one

In [ ]:
included_cols = [f"segmented_data_classifier_data_{category}" for category in category_list]
included_cols.append("segmented_data_classifier_data_positive")
included_cols.append("segmented_data_classifier_data_negative")
included_cols.append("processed_text")
included_cols.append("meta_at")
included_cols.append("meta_date")
included_cols.append("meta_published date")
included_cols.append("meta_score")
# included_cols.append("meta_title")
included_cols.append("meta_publisher_title")

rename_cols_dict = {f"segmented_data_classifier_data_{category}": category for category in category_list}
rename_cols_dict["segmented_data_classifier_data_positive"] = "positive"
rename_cols_dict["segmented_data_classifier_data_negative"] = "negative"
rename_cols_dict["processed_text"] = "text"
rename_cols_dict["meta_at"] = "time"
rename_cols_dict["meta_date"] = "time"
rename_cols_dict["meta_published date"] = "time"
rename_cols_dict["meta_score"] = "ratings"
# rename_cols_dict["meta_title"] = "title"
rename_cols_dict["meta_publisher_title"] = "news publisher"
rename_cols_dict['Unnamed: 0'] = 'reviews'

Configure Play Store Review Observer

In [ ]:
from obsei.source.playstore_scrapper import (
    PlayStoreScrapperSource,
    PlayStoreScrapperConfig,
)

source_config = PlayStoreScrapperConfig(
    countries=[country],
    package_name=identifier,
    lookup_period=lookup_period
)

source = PlayStoreScrapperSource()

Configure TextCleaner as Pre-Processor to clean review text

These cleaning function will run serially

In [ ]:
from obsei.preprocessor.text_cleaner import TextCleaner, TextCleanerConfig
from obsei.preprocessor.text_cleaning_function import *

text_cleaner_config = TextCleanerConfig(
    stop_words=extra_stop_words,
    cleaning_functions = [
        ToLowerCase(),
        RemoveWhiteSpaceAndEmptyToken(),
        RemovePunctuation(),
        RemoveSpecialChars(),
        DecodeUnicode(),
        RemoveDateTime(),
        RemoveStopWords(),
        RemoveStopWords(stop_words=extra_stop_words),
        RemoveWhiteSpaceAndEmptyToken(),
   ]
)

text_cleaner = TextCleaner()
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!

Configure Classification Analyzer

Note: Select model from https://huggingface.co/models?pipeline_tag=zero-shot-classification, if you want to try different one

In [ ]:
from obsei.analyzer.classification_analyzer import ClassificationAnalyzerConfig, ZeroShotClassificationAnalyzer

analyzer_config=ClassificationAnalyzerConfig(
   labels=category_list,
)

text_analyzer = ZeroShotClassificationAnalyzer(
   model_name_or_path="typeform/mobilebert-uncased-mnli",
   device="auto"
)

Configure Pandas DataFrame Informer

In [ ]:
from pandas import DataFrame
from obsei.sink.pandas_sink import PandasSink, PandasSinkConfig

sink_config = PandasSinkConfig(
   dataframe=DataFrame(),
   include_columns_list=included_cols
)
sink = PandasSink()

Fetch app reviews

In [ ]:
source_response_list = source.lookup(source_config)

PreProcess review text to clean it

In [ ]:
cleaner_response_list = text_cleaner.preprocess_input(
    input_list=source_response_list,
    config=text_cleaner_config
)
07/11/2021 17:09:25 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:25 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:25 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:25 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:25 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format
07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function -   Token contain invalid date time format

Analyze reviews to perform classification

Note: This is compute heavy step

In [ ]:
analyzer_response_list = text_analyzer.analyze_input(
    source_response_list=cleaner_response_list,
    analyzer_config=analyzer_config
)
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.

Inform review data in form of Pandas DataFrame

In [ ]:
dataframe = sink.send_data(analyzer_response_list, sink_config)
dataframe.rename(rename_cols_dict,axis=1,inplace=True)


dataframe["brand"] = name
dataframe
Out[ ]:
text positive easy payment options easyOrder placement user interface Realtime order tracking Rewards and discounts social media Integration negative ratings time brand
0 good 1.00 0.67 0.65 0.60 0.43 0.35 0.06 0.00 5 2021-07-11 17:09:17 zomato
1 excellent loving 1.00 0.20 0.19 0.32 0.10 0.11 0.01 0.00 5 2021-07-11 17:08:09 zomato
2 delievered wrong house 0.00 0.00 0.00 0.26 0.00 0.02 0.03 0.99 1 2021-07-11 17:07:36 zomato
3 superb excellent 1.00 0.55 0.57 0.71 0.28 0.20 0.02 0.00 5 2021-07-11 17:07:17 zomato
4 good 1.00 0.67 0.65 0.60 0.43 0.35 0.06 0.00 4 2021-07-11 17:05:58 zomato
... ... ... ... ... ... ... ... ... ... ... ... ...
195 sellers cheat users selling less quantity cont... 0.18 0.00 0.04 0.07 0.04 0.08 0.03 0.68 1 2021-07-11 16:08:05 zomato
196 nice service 0.99 0.81 0.40 0.60 0.12 0.28 0.02 0.00 5 2021-07-11 16:07:52 zomato
197 amazing experience far 0.99 0.02 0.04 0.21 0.09 0.02 0.01 0.00 5 2021-07-11 16:07:53 zomato
198 delivery fast less offers cash delivery 0.94 0.94 0.17 0.62 0.13 0.06 0.03 0.42 2 2021-07-11 16:07:38 zomato
199 good food fast delivery 0.99 0.95 0.30 0.52 0.09 0.33 0.00 0.00 5 2021-07-11 16:07:23 zomato

200 rows × 12 columns

Store result in Google Drive as CSV

In [ ]:
dataframe.to_csv(f'/content/drive/My Drive/playstore_{name}.csv')