Sebastian Raschka, 2015
mlxtend, a library of extension and helper modules for Python's data analysis and machine learning libraries

GitHub repository: https://github.com/rasbt/mlxtend
Documentation: http://rasbt.github.io/mlxtend/

View this page in jupyter nbviewer

In [1]:

%load_ext watermark
%watermark -a 'Sebastian Raschka' -u -d -v -p matplotlib,numpy,scipy,mlxtend

Sebastian Raschka 
last updated: 2016-01-30 

CPython 3.5.1
IPython 4.0.3

matplotlib 1.5.1
numpy 1.10.2
scipy 0.16.1
mlxtend 0.3.0

Tokenizer¶

Different functions to tokenize text.

from mlxtend.text import tokenizer_[type]

Overview¶

Different functions to tokenize text for natural language processing tasks, for example such as building a bag-of-words model for text classification.

References¶

Examples¶

Example 1 - Extract Emoticons¶

In [2]:

from mlxtend.text import tokenizer_emoticons

In [3]:

tokenizer_emoticons('</a>This :) is :( a test :-)!')

Out[3]:

[':)', ':(', ':-)']

Example 2 - Extract Words and Emoticons¶

In [4]:

from mlxtend.text import tokenizer_words_and_emoticons

In [5]:

tokenizer_words_and_emoticons('</a>This :) is :( a test :-)!')

Out[5]:

['this', 'is', 'a', 'test', ':)', ':(', ':-)']

API¶

In [6]:

with open('../../api_modules/mlxtend.text/tokenizer_emoticons.md', 'r') as f:
    s = f.read() + '<br><br>'

with open('../../api_modules/mlxtend.text/tokenizer_words_and_emoticons.md', 'r') as f:
    s2 = f.readlines()
    s += ''.join(s2[1:])
print(s)

## tokenizer_emoticons

*tokenizer_emoticons(text)*

Return emoticons from text

    Example:
    >>> tokenizer_emoticons('</a>This :) is :( a test :-)!')
    [':)', ':(', ':-)']

<br><br>
*tokenizer_words_and_emoticons(text)*

Convert text to lowercase words and emoticons.

    Example:
    >>> tokenizer_words_and_emoticons('</a>This :) is :( a test :-)!')
    ['this', 'is', 'a', 'test', ':)', ':(', ':-)']