Sebastian Raschka, 2015
mlxtend
, a library of extension and helper modules for Python's data analysis and machine learning libraries
View this page in jupyter nbviewer
%load_ext watermark
%watermark -a 'Sebastian Raschka' -u -d -v -p matplotlib,numpy,scipy,mlxtend
Sebastian Raschka last updated: 2016-01-30 CPython 3.5.1 IPython 4.0.3 matplotlib 1.5.1 numpy 1.10.2 scipy 0.16.1 mlxtend 0.3.0
Different functions to tokenize text.
from mlxtend.text import tokenizer_[type]
Different functions to tokenize text for natural language processing tasks, for example such as building a bag-of-words model for text classification.
from mlxtend.text import tokenizer_emoticons
tokenizer_emoticons('</a>This :) is :( a test :-)!')
[':)', ':(', ':-)']
from mlxtend.text import tokenizer_words_and_emoticons
tokenizer_words_and_emoticons('</a>This :) is :( a test :-)!')
['this', 'is', 'a', 'test', ':)', ':(', ':-)']
with open('../../api_modules/mlxtend.text/tokenizer_emoticons.md', 'r') as f:
s = f.read() + '<br><br>'
with open('../../api_modules/mlxtend.text/tokenizer_words_and_emoticons.md', 'r') as f:
s2 = f.readlines()
s += ''.join(s2[1:])
print(s)
## tokenizer_emoticons *tokenizer_emoticons(text)* Return emoticons from text Example: >>> tokenizer_emoticons('</a>This :) is :( a test :-)!') [':)', ':(', ':-)'] <br><br> *tokenizer_words_and_emoticons(text)* Convert text to lowercase words and emoticons. Example: >>> tokenizer_words_and_emoticons('</a>This :) is :( a test :-)!') ['this', 'is', 'a', 'test', ':)', ':(', ':-)']