!pip install obsei[all]
Collecting git+https://github.com/lalitpagaria/obsei.git Cloning https://github.com/lalitpagaria/obsei.git to /tmp/pip-req-build-9q4fz4j2 Running command git clone -q https://github.com/lalitpagaria/obsei.git /tmp/pip-req-build-9q4fz4j2 Requirement already satisfied (use --upgrade to upgrade): obsei==0.0.9 from git+https://github.com/lalitpagaria/obsei.git in /usr/local/lib/python3.7/dist-packages Requirement already satisfied: app-store-reviews-reader==1.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.2) Requirement already satisfied: atlassian-python-api==3.10.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.10.0) Requirement already satisfied: beautifulsoup4==4.9.3 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (4.9.3) Requirement already satisfied: blis==0.7.4 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.7.4) Requirement already satisfied: cachetools==4.2.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (4.2.2) Requirement already satisfied: catalogue==2.0.4 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.0.4) Requirement already satisfied: certifi==2021.5.30 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2021.5.30) Requirement already satisfied: chardet==4.0.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (4.0.0) Requirement already satisfied: click==7.1.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (7.1.2) Requirement already satisfied: courlan==0.4.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.4.0) Requirement already satisfied: cssselect==1.1.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.1.0) Requirement already satisfied: cymem==2.0.5 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.0.5) Requirement already satisfied: dateparser==1.0.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.0.0) Requirement already satisfied: deprecated==1.2.12 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.2.12) Requirement already satisfied: elasticsearch==7.13.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (7.13.1) Requirement already satisfied: feedparser==6.0.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (6.0.2) Requirement already satisfied: filelock==3.0.12 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.0.12) Requirement already satisfied: gnews==0.1.3 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.1.3) Requirement already satisfied: google-api-core==1.30.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.30.0) Requirement already satisfied: google-api-python-client==2.8.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.8.0) Requirement already satisfied: google-auth==1.30.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.30.2) Requirement already satisfied: google-auth-httplib2==0.1.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.1.0) Requirement already satisfied: google-play-scraper==1.0.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.0.0) Requirement already satisfied: googleapis-common-protos==1.53.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.53.0) Requirement already satisfied: greenlet==1.1.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.1.0) Requirement already satisfied: htmldate==0.8.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.8.1) Requirement already satisfied: httplib2==0.19.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.19.1) Requirement already satisfied: huggingface-hub==0.0.8 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.0.8) Requirement already satisfied: idna==2.10 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.10) Requirement already satisfied: importlib-metadata==4.5.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (4.5.0) Requirement already satisfied: jinja2==3.0.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.0.1) Requirement already satisfied: joblib==1.0.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.0.1) Requirement already satisfied: justext==2.2.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.2.0) Requirement already satisfied: lxml==4.6.3 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (4.6.3) Requirement already satisfied: markupsafe==2.0.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.0.1) Requirement already satisfied: mmh3==3.0.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.0.0) Requirement already satisfied: murmurhash==1.0.5 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.0.5) Requirement already satisfied: nltk==3.6.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.6.2) Requirement already satisfied: numpy==1.20.3 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.20.3) Requirement already satisfied: oauthlib==3.1.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.1.1) Requirement already satisfied: packaging==20.9 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (20.9) Requirement already satisfied: pandas==1.2.4 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.2.4) Requirement already satisfied: pathy==0.5.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.5.2) Requirement already satisfied: praw==7.2.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (7.2.0) Requirement already satisfied: prawcore==2.1.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.1.0) Requirement already satisfied: preshed==3.0.5 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.0.5) Requirement already satisfied: presidio-analyzer==2.2.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.2.1) Requirement already satisfied: presidio-anonymizer==2.2.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.2.1) Requirement already satisfied: protobuf==3.17.3 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.17.3) Requirement already satisfied: pyasn1==0.4.8 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.4.8) Requirement already satisfied: pyasn1-modules==0.2.8 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.2.8) Requirement already satisfied: pycryptodome==3.10.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.10.1) Requirement already satisfied: pydantic==1.7.4 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.7.4) Requirement already satisfied: pyparsing==2.4.7 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.4.7) Requirement already satisfied: python-dateutil==2.8.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.8.1) Requirement already satisfied: python-facebook-api==0.9.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.9.2) Requirement already satisfied: pytz==2021.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2021.1) Requirement already satisfied: pyyaml==5.4.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (5.4.1) Requirement already satisfied: readability-lxml==0.8.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.8.1) Requirement already satisfied: reddit-rss-reader==1.3.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.3.2) Requirement already satisfied: regex==2020.11.13 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2020.11.13) Requirement already satisfied: requests==2.25.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.25.1) Requirement already satisfied: requests-file==1.5.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.5.1) Requirement already satisfied: requests-oauthlib==1.3.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.3.0) Requirement already satisfied: rsa==4.7.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (4.7.2) Requirement already satisfied: sacremoses==0.0.45 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.0.45) Requirement already satisfied: searchtweets-v2==1.0.7 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.0.7) Requirement already satisfied: sentencepiece==0.1.95 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.1.95) Requirement already satisfied: sgmllib3k==1.0.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.0.0) Requirement already satisfied: six==1.16.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.16.0) Requirement already satisfied: slack-sdk==3.6.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.6.0) Requirement already satisfied: smart-open==3.0.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.0.0) Requirement already satisfied: soupsieve==2.2.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.2.1) Requirement already satisfied: spacy==3.0.5 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.0.5) Requirement already satisfied: spacy-legacy==3.0.5 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.0.5) Requirement already satisfied: sqlalchemy==1.4.17 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.4.17) Requirement already satisfied: srsly==2.4.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.4.1) Requirement already satisfied: thinc==8.0.4 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (8.0.4) Requirement already satisfied: tld==0.12.6 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.12.6) Requirement already satisfied: tldextract==3.1.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.1.0) Requirement already satisfied: tokenizers==0.10.3 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.10.3) Requirement already satisfied: tqdm==4.61.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (4.61.0) Requirement already satisfied: trafilatura==0.8.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.8.2) Requirement already satisfied: transformers==4.6.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (4.6.1) Requirement already satisfied: tweet-preprocessor==0.6.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.6.0) Requirement already satisfied: typer==0.3.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.3.2) Requirement already satisfied: typing-extensions==3.10.0.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.10.0.0) Requirement already satisfied: tzlocal==2.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.1) Requirement already satisfied: update-checker==0.18.0 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.18.0) Requirement already satisfied: uritemplate==3.0.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.0.1) Requirement already satisfied: urllib3==1.26.5 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.26.5) Requirement already satisfied: vadersentiment==3.3.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.3.2) Requirement already satisfied: wasabi==0.8.2 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (0.8.2) Requirement already satisfied: websocket-client==1.0.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.0.1) Requirement already satisfied: wrapt==1.12.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.12.1) Requirement already satisfied: zenpy==2.0.24 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (2.0.24) Requirement already satisfied: zipp==3.4.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (3.4.1) Requirement already satisfied: torch==1.8.1 in /usr/local/lib/python3.7/dist-packages (from obsei==0.0.9) (1.8.1) Requirement already satisfied: setuptools>=40.3.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core==1.30.0->obsei==0.0.9) (57.0.0) Requirement already satisfied: responses>=0.11 in /usr/local/lib/python3.7/dist-packages (from python-facebook-api==0.9.2->obsei==0.0.9) (0.13.3) Requirement already satisfied: cattrs<2.0,>=1.1; python_version >= "3.7" and python_version < "4.0" in /usr/local/lib/python3.7/dist-packages (from python-facebook-api==0.9.2->obsei==0.0.9) (1.7.1) Requirement already satisfied: attrs<21.0.0,>=20.1.0 in /usr/local/lib/python3.7/dist-packages (from python-facebook-api==0.9.2->obsei==0.0.9) (20.3.0) Building wheels for collected packages: obsei Building wheel for obsei (setup.py) ... done Created wheel for obsei: filename=obsei-0.0.9-cp37-none-any.whl size=65557 sha256=bc7c8c937eed4a7b325b3ef8e46de64e44778e40914d99267356cc4ce36c7c27 Stored in directory: /tmp/pip-ephem-wheel-cache-qhkx9sy8/wheels/49/1a/6e/2fd83c9a275b7096fc615a0edef2d55b1fc33c3751ba45c1ad Successfully built obsei
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
name
: Brand name of Appcategory_list
: List of categories to perform review text classificationidentifier
: Package name of the app, it can be found at the end of the url of app in play storecountry
: Country of reviewslookup_period
: How many old reviews to collect (Note: Google rate limit and provide max 200 reviews only)extra_stop_words
: Extra stop words top clean from review textname = "zomato"
category_list = ["easyOrder placement", "Realtime order tracking", "easy payment options","Rewards and discounts","user interface","social media Integration",]
identifier = "com.application.zomato"
country = "in"
lookup_period = "365d"
extra_stop_words = ["i", "-", "day", "will", ".", "use", "n", "without", "please", "app", "ha", "ho", "nt", "wa",
"thi", "plz", "pleas", "ff", "ya", "thank", "you", "thanks", "mai"]
included_cols
will only be returned by Pandas Sink and rename_cols_dict
will rename selected included_cols
columns to desired one
included_cols = [f"segmented_data_classifier_data_{category}" for category in category_list]
included_cols.append("segmented_data_classifier_data_positive")
included_cols.append("segmented_data_classifier_data_negative")
included_cols.append("processed_text")
included_cols.append("meta_at")
included_cols.append("meta_date")
included_cols.append("meta_published date")
included_cols.append("meta_score")
# included_cols.append("meta_title")
included_cols.append("meta_publisher_title")
rename_cols_dict = {f"segmented_data_classifier_data_{category}": category for category in category_list}
rename_cols_dict["segmented_data_classifier_data_positive"] = "positive"
rename_cols_dict["segmented_data_classifier_data_negative"] = "negative"
rename_cols_dict["processed_text"] = "text"
rename_cols_dict["meta_at"] = "time"
rename_cols_dict["meta_date"] = "time"
rename_cols_dict["meta_published date"] = "time"
rename_cols_dict["meta_score"] = "ratings"
# rename_cols_dict["meta_title"] = "title"
rename_cols_dict["meta_publisher_title"] = "news publisher"
rename_cols_dict['Unnamed: 0'] = 'reviews'
from obsei.source.playstore_scrapper import (
PlayStoreScrapperSource,
PlayStoreScrapperConfig,
)
source_config = PlayStoreScrapperConfig(
countries=[country],
package_name=identifier,
lookup_period=lookup_period
)
source = PlayStoreScrapperSource()
These cleaning function will run serially
from obsei.preprocessor.text_cleaner import TextCleaner, TextCleanerConfig
from obsei.preprocessor.text_cleaning_function import *
text_cleaner_config = TextCleanerConfig(
stop_words=extra_stop_words,
cleaning_functions = [
ToLowerCase(),
RemoveWhiteSpaceAndEmptyToken(),
RemovePunctuation(),
RemoveSpecialChars(),
DecodeUnicode(),
RemoveDateTime(),
RemoveStopWords(),
RemoveStopWords(stop_words=extra_stop_words),
RemoveWhiteSpaceAndEmptyToken(),
]
)
text_cleaner = TextCleaner()
[nltk_data] Downloading package stopwords to /root/nltk_data... [nltk_data] Package stopwords is already up-to-date!
Note: Select model from https://huggingface.co/models?pipeline_tag=zero-shot-classification, if you want to try different one
from obsei.analyzer.classification_analyzer import ClassificationAnalyzerConfig, ZeroShotClassificationAnalyzer
analyzer_config=ClassificationAnalyzerConfig(
labels=category_list,
)
text_analyzer = ZeroShotClassificationAnalyzer(
model_name_or_path="typeform/mobilebert-uncased-mnli",
device="auto"
)
from pandas import DataFrame
from obsei.sink.pandas_sink import PandasSink, PandasSinkConfig
sink_config = PandasSinkConfig(
dataframe=DataFrame(),
include_columns_list=included_cols
)
sink = PandasSink()
source_response_list = source.lookup(source_config)
cleaner_response_list = text_cleaner.preprocess_input(
input_list=source_response_list,
config=text_cleaner_config
)
07/11/2021 17:09:25 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:25 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:25 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:25 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:25 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format 07/11/2021 17:09:26 - WARNING - obsei.preprocessor.text_cleaning_function - Token contain invalid date time format
Note: This is compute heavy step
analyzer_response_list = text_analyzer.analyze_input(
source_response_list=cleaner_response_list,
analyzer_config=analyzer_config
)
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
dataframe = sink.send_data(analyzer_response_list, sink_config)
dataframe.rename(rename_cols_dict,axis=1,inplace=True)
dataframe["brand"] = name
dataframe
text | positive | easy payment options | easyOrder placement | user interface | Realtime order tracking | Rewards and discounts | social media Integration | negative | ratings | time | brand | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | good | 1.00 | 0.67 | 0.65 | 0.60 | 0.43 | 0.35 | 0.06 | 0.00 | 5 | 2021-07-11 17:09:17 | zomato |
1 | excellent loving | 1.00 | 0.20 | 0.19 | 0.32 | 0.10 | 0.11 | 0.01 | 0.00 | 5 | 2021-07-11 17:08:09 | zomato |
2 | delievered wrong house | 0.00 | 0.00 | 0.00 | 0.26 | 0.00 | 0.02 | 0.03 | 0.99 | 1 | 2021-07-11 17:07:36 | zomato |
3 | superb excellent | 1.00 | 0.55 | 0.57 | 0.71 | 0.28 | 0.20 | 0.02 | 0.00 | 5 | 2021-07-11 17:07:17 | zomato |
4 | good | 1.00 | 0.67 | 0.65 | 0.60 | 0.43 | 0.35 | 0.06 | 0.00 | 4 | 2021-07-11 17:05:58 | zomato |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
195 | sellers cheat users selling less quantity cont... | 0.18 | 0.00 | 0.04 | 0.07 | 0.04 | 0.08 | 0.03 | 0.68 | 1 | 2021-07-11 16:08:05 | zomato |
196 | nice service | 0.99 | 0.81 | 0.40 | 0.60 | 0.12 | 0.28 | 0.02 | 0.00 | 5 | 2021-07-11 16:07:52 | zomato |
197 | amazing experience far | 0.99 | 0.02 | 0.04 | 0.21 | 0.09 | 0.02 | 0.01 | 0.00 | 5 | 2021-07-11 16:07:53 | zomato |
198 | delivery fast less offers cash delivery | 0.94 | 0.94 | 0.17 | 0.62 | 0.13 | 0.06 | 0.03 | 0.42 | 2 | 2021-07-11 16:07:38 | zomato |
199 | good food fast delivery | 0.99 | 0.95 | 0.30 | 0.52 | 0.09 | 0.33 | 0.00 | 0.00 | 5 | 2021-07-11 16:07:23 | zomato |
200 rows × 12 columns
dataframe.to_csv(f'/content/drive/My Drive/playstore_{name}.csv')