Knowsis is a social-intelligence company providing social media analytics for finance.
Topics of research:
System which promptly flags the user when there is a novel anomaly in social activity
from IPython.core.display import Image, display
display(Image('images/VW_chart.jpg', unconfined=True))
OTHER WAYS TO DO IT pretty important means you can apply this approach to any problem which involves events happening in time (sensor, serer requests, biochimical..)
Define features which make the measured variable unexpected Consider a different quantity (or quality) and define what anomalous means with respect to this new target (inter-arrival times, another quantity, another approach bayesian)
Natural way to model counts of events happening in time is to use a Poisson process $N(t)$ with parameter $\lambda$, the rate at which the events happen
$$ \mathbb{E}[N(t)] = \textbf{expected number of events occurred until time } t = \lambda t$$A Poisson regression models the logarithm of the expected number of events as a linear combination of predictor variables
$$ \log(\mathbb{E}[N]) = log(\lambda) = \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_k x_k $$with $\beta_i$ coefficients of the model estimated based on volume history and $x_i$ observable quantities of the datapoint.
Consider seasonality at different scales
Consider recent behaviour
Other variables?
With the expected rate $\lambda_{pred}$, how (un)likely is that we see a number of tweets $N$ higher than the one we have so far?
If this event has probability lower than $\alpha$, it is anomalous!
$$\mathbb{P}(N \geq n_{obs} | \lambda_{pred}) < \alpha$$Given an anomaly period, identify which tweets are novel
Observation: tweets tend to cluster in stories
Given an anomaly period, identify the tweets which are novel not part of old stories
Advantage:
Disadvantage:
general anomaly detection algorithm which is:
two-step novelty detection
Presentation & Notebooks:
https://github.com/knowsis/novel-twitter-anomalies-pydatalondon2016/