%load_ext itikz
(privacy)=
*DISCLAIMER:* This document is meant to be an introduction to questions in security and privacy in speech technology for engineering students, such that they would understand the main problematic. In particular, this is not a legal document. In real-life application of technology and data collection, you must consult legal experts to determine whether you follow the law. You are responsible.
The right to privacy is a widely accepted concept though its definition varies. It is however clear that people tend to think that some things are "theirs", that they have ownership of things, including information about themselves. A possible definition of privacy would then be "the absence of attention from others" and correspondingly security could be defined as the protection of that which one owns, including material and immaterial things. It however must be emphasised that there are no widely shared and accepted definitions and in particular, the legal community has a wide range of definitions depending on the context and field of application.
From the perspective of speech technology, security and privacy has two principal aspects;
The latter aspect is mainly related to speaker identity; fraudsters can for example synthesise speech which mimics (spoofs) a target person to gain access to restricted systems, such as access to the bank account of the target person. Such use cases fall mainly under the discussions under speaker recognition and verification, and not discussed further here.
Observe that in the isolated category of telephony (classical telephone connections) privacy and security already have well-established ethical standards as well as legislation. In typical jurisdictions, telephone calls are private in the sense that only the "intended" participants can listen to them and sometimes even recording them is restricted. Covert listening is usually allowed only for the police and even for them only in specially regulated situations, such as with a permission granted by a court or judge.
{cite}edps2019techdispatch,nautsch2019gdpr
Speech is a tool for communication such that it is generally sensible to always discuss interactions between two agents, say, Alice and Bob. The interaction between them is the desired function such that the information exchanged there is explicitly permitted. By choosing to talk with each other, they both reveal information to the extent speech contains such information.
%%itikz --temp-dir --file-prefix dual-primary-
\documentclass{standalone}
\usepackage[utf8]{inputenc}
\usepackage{tikz}
\usepackage{verbatim}
\usepackage{pgfplots}
\DeclareUnicodeCharacter{2212}{−}
\usepgfplotslibrary{groupplots,dateplot}
\usetikzlibrary{patterns,shapes.arrows}
\usetikzlibrary {fit}
\usetikzlibrary{shapes.geometric,positioning}
\usetikzlibrary{bending}
\pgfplotsset{compat=newest}
\begin{document}
\begin{tikzpicture}
\node[ellipse,draw,minimum width=40pt] at (0,0) (alice) {Alice};
\node[ellipse,draw,minimum width=40pt] at (2,0) (bob) {Bob};
\draw[->, line width=1pt] (alice) to [bend left] (bob);
\draw[->, line width=1pt] (bob) to [bend left] (alice);
\end{tikzpicture}
\end{document}
Though Alice and Bob knowingly and intentionally interact, they might reveal private things. This is the classic "slip of the tongue".
{cite:p}petronio2002boundaries
A second-order question are third parties, who are not part of the main speech interaction. The pertinent question is the degree to which the third party is allowed to partake in an interaction. As a practical example, suppose Alice and Bob have a romantic dinner at a restaurant. To which extent is the waitress Eve allowed to interact with the discussion of Alice and Bob? Clearly Eve has some necessary tasks such that interaction is unavoidable. Will Alice and Bob, for example, pause their discussion when Eve approaches?
%%itikz --temp-dir
\documentclass{standalone}
\usepackage[utf8]{inputenc}
\usepackage{tikz}
\usepackage{verbatim}
\usepackage{pgfplots}
\DeclareUnicodeCharacter{2212}{−}
\usepgfplotslibrary{groupplots,dateplot}
\usetikzlibrary{patterns,shapes.arrows}
\usetikzlibrary {fit}
\usetikzlibrary{shapes.geometric,positioning}
\usetikzlibrary{bending}
\pgfplotsset{compat=newest}
\begin{document}
\begin{tikzpicture}
\node[ellipse,draw,minimum width=40pt] at (0,0) (alice) {Alice};
\node[ellipse,draw,minimum width=40pt] at (2,0) (bob) {Bob};
\draw[->, line width=1pt] (alice) to [bend left] (bob);
\draw[->, line width=1pt] (bob) to [bend left] (alice);
%\node[fit=(alice) (bob), inner sep=1pt] (p) {};% {\parbox{2.4cm}{\centering Desired speech interaction\vspace{-1cm}}};
%\node[draw=gray,inner sep=6pt,very thick,ellipse,fit=(alice) (bob),label={[label distance=.01cm]180:{\parbox{1.7cm}{\centering Desired\\ interaction}}}] (private) {};
%\node[draw=gray,inner sep=6pt,very thick,ellipse,fit=(alice) (bob) (p)] (private) {};
\node[draw=gray,very thick,ellipse,minimum width=115pt, minimum height = 40pt] at (1,0) (private) {};
\node[ellipse,draw] at (1,2.5) (eve) {Eve};
\draw[color=purple,->, line width=1pt] (eve) to [bend left] (private);
\draw[color=purple,->, line width=1pt] (private) to [bend left] (eve);
\node at (-2,0) {\parbox{2cm}{\centering Private speech interaction}};
\node at (-2,2.5) {\parbox{2cm}{\centering Restricted interactor}};
\end{tikzpicture}
\end{document}
Observe that we have here labelled Eve as a "restricted" and not as an "unauthorized" interactor. If access is unauthorized, then it is clear that Eve should not have any access to the speech interaction, which is generally straightforward to handle. The word restricted, on the other hand, implies that unimpeded access should not be granted, but that some access can be allowed. It is thus not question of "if" access should be granted but "how much?".
Privacy is closely connected to ownership of immaterial property, that is, information. Such ownership can also be translated to the question of who has the control over some information? In terms of personal privacy, it is clear that it relates to information to which a single person can claim ownership.
Speech is more complicated. Speech is a form of communication and thus relates to an interaction between two parties. Dialogues can also commonly lead to co-creation of meaning, where new information is generated through the dialogue in a form which none of the involved parties could have alone produced {cite:p}gasiorek2018message
. None of the users can thus claim sole ownership of the information, but the ownership is shared. Currently we do not have the tools for handling such shared ownership.
%%itikz --temp-dir --file-prefix dual-primary-
\documentclass{standalone}
\usepackage[utf8]{inputenc}
\usepackage{tikz}
\usepackage{verbatim}
\usepackage{pgfplots}
\DeclareUnicodeCharacter{2212}{−}
\usepgfplotslibrary{groupplots,dateplot}
\usetikzlibrary{patterns,shapes.arrows}
\usetikzlibrary {fit}
\usetikzlibrary{shapes.geometric,positioning}
\usetikzlibrary{bending}
\pgfplotsset{compat=newest}
\begin{document}
\begin{tikzpicture}
\node[ellipse,draw,minimum width=40pt] at (0,0) (alice) {Alice};
\node[draw,fill=lightgray,minimum width=20pt,minimum height=50pt] at (2,0) (tx) {\parbox{1.35cm}{Telecom.\\ service}};
\node[ellipse,draw,minimum width=40pt] at (4,0) (bob) {Bob};
\draw[->, line width=1pt] (alice) to [bend left] (bob);
\draw[->, line width=1pt] (bob) to [bend left] (alice);
\end{tikzpicture}
\end{document}
Talking over the phone and video conference involve transmission of speech through a telecommunication service. Here we consider scenarios where the telecommunication device does not include any advanced functionalities or artificial intelligence. Most countries have clearly definied rules that specify the situations when such communication can be eavesdropped. In most jurisdictions, only the police is allowed to intercept such traffic and only in specific situations.
%%itikz --temp-dir --file-prefix dual-primary-
\documentclass{standalone}
\usepackage[utf8]{inputenc}
\usepackage{tikz}
\usepackage{verbatim}
\usepackage{pgfplots}
\DeclareUnicodeCharacter{2212}{−}
\usepgfplotslibrary{groupplots,dateplot}
\usetikzlibrary{patterns,shapes.arrows}
\usetikzlibrary {fit}
\usetikzlibrary{shapes.geometric,positioning}
\usetikzlibrary{bending}
\pgfplotsset{compat=newest}
\begin{document}
\begin{tikzpicture}
\node[ellipse,draw,minimum width=40pt] at (0,0) (alice) {Alice};
\node[draw,fill=lightgray,minimum width=20pt,minimum height=50pt] at (2,0) (tx) {\parbox{1.35cm}{Telecom.\\ service}};
\node[ellipse,draw,minimum width=40pt] at (4,0) (bob) {Bob};
\draw[->, line width=1pt] (alice) to [bend left] (bob);
\draw[->, line width=1pt] (bob) to [bend left] (alice);
\node[draw=gray,very thick,ellipse,minimum width=170pt, minimum height = 60pt] at (2,0) (private) {};
\node[ellipse,draw] at (2,2.5) (eve) {Police};
\draw[color=purple,->, line width=1pt] (eve) to [bend left] (private);
\draw[color=purple,->, line width=1pt] (private) to [bend left] (eve);
\node at (-2,0) {\parbox{2cm}{\centering Private speech interaction}};
\node at (-2,2.5) {\parbox{2cm}{\centering Restricted interactor}};
\end{tikzpicture}
\end{document}
Such interception of private communication is interesting primarily from ethical and legal perspectives, but does not contain technological challenges related to the communication itself. The main technological challenges are related to forensics;
A commonly occuring scenario is one where two or more users engage in a discussion such that there is one or more speech operated devices nearby. For example, a user could have their mobile phone or there could be a smart speaker nearby.
%%itikz --temp-dir --file-prefix dual-primary-
\documentclass{standalone}
\usepackage[utf8]{inputenc}
\usepackage{tikz}
\usepackage{verbatim}
\usepackage{pgfplots}
\DeclareUnicodeCharacter{2212}{−}
\usepgfplotslibrary{groupplots,dateplot}
\usetikzlibrary{patterns,shapes.arrows}
\usetikzlibrary {fit}
\usetikzlibrary{shapes.geometric,positioning}
\usetikzlibrary{bending}
\pgfplotsset{compat=newest}
\begin{document}
\begin{tikzpicture}
\node[ellipse,draw,minimum width=40pt] at (0,0) (alice) {Alice};
\node[ellipse,draw,minimum width=40pt] at (2,0) (bob) {Bob};
\draw[->, line width=1pt] (alice) to [bend left] (bob);
\draw[->, line width=1pt] (bob) to [bend left] (alice);
%\node[fit=(alice) (bob), inner sep=1pt] (p) {};% {\parbox{2.4cm}{\centering Desired speech interaction\vspace{-1cm}}};
%\node[draw=gray,inner sep=6pt,very thick,ellipse,fit=(alice) (bob),label={[label distance=.01cm]180:{\parbox{1.7cm}{\centering Desired\\ interaction}}}] (private) {};
%\node[draw=gray,inner sep=6pt,very thick,ellipse,fit=(alice) (bob) (p)] (private) {};
\node[draw=gray,very thick,ellipse,minimum width=115pt, minimum height = 40pt] at (1,0) (private) {};
\node[ellipse,draw] at (1,2.5) (eve) {Agent};
\draw[color=purple,->, line width=1pt] (eve) to [bend left] (private);
\draw[color=purple,->, line width=1pt] (private) to [bend left] (eve);
\node at (-2,0) {\parbox{2cm}{\centering Private speech interaction}};
\node at (-2,2.5) {\parbox{2cm}{\centering Restricted interactor}};
\end{tikzpicture}
\end{document}
Often, one of the users will be the primary user of said devices (e.g. it is their phone), so the question is how the devices should relate to the other users. For example, suppose Alice has a smart speaker at home and Bob comes to visit. What would be the appropriate approach then for both Alice and the agent? Should Alice or the agent notify Bob of the presence of an agent? Or should the agent automatically detect the presence of Bob and change its behaviour (e.g. go to sleep)?
We seem to lack both the cultural habits which dictate how to handle such situations, the legal tools which regulates such situations as well as the technical tools to manage multi-user access.
An interaction with a speech interface or agent is surprisingly free of problems as long as the agent is not connected to any outside entity. We can think of the agent as a local device. If nobody else has access to that device, then all the information remains in the user's direct control. Even if the agent exist in a remote cloud-service, if information remains strictly within the desired service, there are very little problems to consider.
%%itikz --temp-dir --file-prefix dual-primary-
\documentclass{standalone}
\usepackage[utf8]{inputenc}
\usepackage{tikz}
\usepackage{verbatim}
\usepackage{pgfplots}
\DeclareUnicodeCharacter{2212}{−}
\usepgfplotslibrary{groupplots,dateplot}
\usetikzlibrary{patterns,shapes.arrows}
\usetikzlibrary {fit}
\usetikzlibrary{shapes.geometric,positioning}
\usetikzlibrary{bending}
\pgfplotsset{compat=newest}
\begin{document}
\begin{tikzpicture}
\node[ellipse,draw,minimum width=40pt] at (0,0) (alice) {Alice};
\node[ellipse,draw,minimum width=40pt] at (2,0) (bob) {Agent};
\draw[->, line width=1pt] (alice) to [bend left] (bob);
\draw[->, line width=1pt] (bob) to [bend left] (alice);
\end{tikzpicture}
\end{document}
An exception is analysis, which the agent can perform, which is not related to the desired service, which can take abusive forms. For example, suppose the agent analyses the user's voice for health problems and identifies that the user has Alzheimer's disease. What should the agent then do with that information? Not doing anthing seems unethical - getting early access to medical services could greatly improve life quality. Informing the user, on the other hand, involves risks. How will the user react to the information? Is the user sufficiently psychologically stable to handle it? What if the analysis is incorrect and the agent thus causes suffering? It is also easy to think of further problematic scenarios. {cite:p}konig2015automatic
An agent can be involved in an interaction with multiple users at the same time.
%%itikz --temp-dir --file-prefix dual-primary-
\documentclass{standalone}
\usepackage[utf8]{inputenc}
\usepackage{tikz}
\usepackage{verbatim}
\usepackage{pgfplots}
\DeclareUnicodeCharacter{2212}{−}
\usepgfplotslibrary{groupplots,dateplot}
\usetikzlibrary{patterns,shapes.arrows}
\usetikzlibrary {fit}
\usetikzlibrary{shapes.geometric,positioning}
\usetikzlibrary{bending}
\pgfplotsset{compat=newest}
\begin{document}
\begin{tikzpicture}
\node[ellipse,draw,minimum width=40pt] at (0,0) (alice) {Alice\strut };
\node[ellipse,draw,minimum width=40pt] at (1,1.5) (eve) {Agent\strut };
\node[ellipse,draw,minimum width=40pt] at (2,0) (bob) {Bob\strut };
\node at (-2.2,.9) {\parbox{2cm}{\centering Private speech interaction}};
\draw[->, line width=1pt] (alice) to (bob);
\draw[->, line width=1pt] (bob) to [bend left] (alice);
\draw[->, line width=1pt] (alice) to [bend left] (eve);
\draw[->, line width=1pt] (eve) to (alice);
\draw[->, line width=1pt] (eve) to [bend left] (bob);
\draw[->, line width=1pt] (bob) to (eve);
\end{tikzpicture}
\end{document}
This scenario differs from the single-user case in particular in the way the device can store information. To which extent should different combinations of users be permitted to have access to information from prior interactions? Quite obviously Alice should not have default, unrestricted access to Bob's prior interactions without Bob's permission. Where Alice and Bob have engaged in a joint discussion, the question of access becomes more complicated. It would seem natural that both can have access to information about their prior discussions. However, if Bob is in a discussion with Eve, then access to prior discussion between Bob and Alice should again be restricted. The rules governing access will thus be complicated, often non-obvious and they will have many exceptions.
In the early days of mobile phones, a common faux pas was to speak loudly on the phone in public places such as on a bus or subway. It seems that it causes an uncomfortable feeling to people when they overhear private discussions. It can also be hard to ignore speech when you hear it. Obviously, the reverse is also true, participants of a private discussion often feel uncomfortable if they fear that outsiders can hear their discussion.
%%itikz --temp-dir --file-prefix dual-primary-
\documentclass{standalone}
\usepackage[utf8]{inputenc}
\usepackage{tikz}
\usepackage{verbatim}
\usepackage{pgfplots}
\DeclareUnicodeCharacter{2212}{−}
\usepgfplotslibrary{groupplots,dateplot}
\usetikzlibrary{patterns,shapes.arrows}
\usetikzlibrary {fit}
\usetikzlibrary{shapes.geometric,positioning}
\usetikzlibrary{bending}
\pgfplotsset{compat=newest}
\begin{document}
\begin{tikzpicture}
\node[ellipse,draw,minimum width=60pt] at (0,0) (alice) {Alice\strut };
\node[ellipse,draw,minimum width=60pt] at (2.6,0) (bob) {Agent\strut };
\draw[->, line width=1pt] (alice) to [bend left] (bob);
\draw[->, line width=1pt] (bob) to [bend left] (alice);
\node[draw=gray,very thick,ellipse,minimum width=140pt, minimum height = 45pt] at (1.3,0) (private) {};
\node[ellipse,draw] at (1.3,2.5) (eve) {Eve};
\draw[color=purple,->, line width=1pt] (eve) to [bend left] (private);
\draw[color=purple,->, line width=1pt] (private) to [bend left] (eve);
\node at (-2.2,0) {\parbox{2cm}{\centering Private speech interaction}};
\node at (-2.2,2.5) {\parbox{2cm}{\centering Restricted interactor}};
\end{tikzpicture}
\end{document}
The same applies to speech operated devices. Such interactions can be private, but even when they are not, the fact that they can be overheard is often uncomfortable to all parties.
This is a problem when designing user interfaces to services. Speech interaction is often a natural way to user a service or device, but it is not practical in locations where other people can overhear private information and where it the sound is annoying to other people present.
When interacting with a speech interface, users typically do it with a specific objective in mind. For example, suppose Alice wants to turn off the lights in the bedroom and says "Computer, lights off". To which extent is it permissible that that information is relayed to a cloud-service? If the local device is unable to or uncapable of deciphering the command, it can transmit it to the cloud. The cloud-service then obtains information from a very private part of Alice's life.
%%itikz --temp-dir --file-prefix dual-primary-
\documentclass{standalone}
\usepackage[utf8]{inputenc}
\usepackage{tikz}
\usepackage{verbatim}
\usepackage{pgfplots}
\DeclareUnicodeCharacter{2212}{−}
\usepgfplotslibrary{groupplots,dateplot}
\usetikzlibrary{patterns,shapes.arrows}
\usetikzlibrary {fit}
\usetikzlibrary{shapes.geometric,positioning}
\usetikzlibrary{bending}
\pgfplotsset{compat=newest}
\begin{document}
\begin{tikzpicture}
\node[ellipse,draw,minimum width=60pt] at (0,0) (alice) {Alice\strut };
\node[ellipse,draw,minimum width=60pt] at (2.6,0) (bob) {Agent 1\strut };
\draw[->, line width=1pt] (alice) to [bend left] (bob);
\draw[->, line width=1pt] (bob) to [bend left] (alice);
\node[draw=gray,very thick,ellipse,minimum width=140pt, minimum height = 45pt] at (1.3,0) (private) {};
\node[ellipse,draw] at (1.3,2.5) (eve) {Agent 2};
\draw[color=purple,->, line width=1pt] (eve) to [bend left] (private);
\draw[color=purple,->, line width=1pt] (private) to [bend left] (eve);
\node at (-2.2,0) {\parbox{2cm}{\centering Private speech interaction}};
\node at (-2.2,2.5) {\parbox{2cm}{\centering Restricted interactor}};
\end{tikzpicture}
\end{document}
Information obtained this way can be very useful for example, to advertisers. By analyzing the habits of users, they can serve more meaningful advertisements. Arguably, by better targeting of users, advertisement can more effective, which could potentially reduce the need for advertisement. It is however questionable whether advertisers ever would have incentives to reduce the amount of advertisement. Still, some people are creeped out by "overly fitting" advertisement.
There are however plenty of other scenarios which are more potent sources of danger. What if insurance agencies analyze users life patterns and increase payments for at-risk users such as substance abusers? Some smart devices already today can call the emergency services if they recognize cries of help or other obvious signs of distress. What are the moral dilemmas of that?
Things get even more complicated when multipled users and/or users co-exist in the same space. Consider, for example, an open office with two users simulatenously engaged in independent video conferences.
%%itikz --temp-dir --file-prefix dual-primary-
\documentclass{standalone}
\usepackage[utf8]{inputenc}
\usepackage{tikz}
\usepackage{verbatim}
\usepackage{pgfplots}
\DeclareUnicodeCharacter{2212}{−}
\usepgfplotslibrary{groupplots,dateplot}
\usetikzlibrary{patterns,shapes.arrows}
\usetikzlibrary {fit}
\usetikzlibrary{shapes.geometric,positioning}
\usetikzlibrary{bending}
\pgfplotsset{compat=newest}
\begin{document}
\begin{tikzpicture}
\node[ellipse,draw,minimum width=40pt] at (0,0) (alice) {Alice\strut };
\node[ellipse,draw,minimum width=40pt] at (2,0) (bob) {Bob\strut };
\node at (-2.2,0) {\parbox{2cm}{\centering Private speech interaction 2}};
\draw[->, line width=1pt] (alice) to [bend left] (bob);
\draw[->, line width=1pt] (bob) to [bend left] (alice);
\node[ellipse,draw,minimum width=40pt] at (0,2) (alice2) {Adam\strut };
\node[ellipse,draw,minimum width=40pt] at (2,2) (bob2) {Beata\strut };
\node at (-2.2,2) {\parbox{2cm}{\centering Private speech interaction 1}};
\draw[->, line width=1pt] (alice2) to [bend left] (bob2);
\draw[->, line width=1pt] (bob2) to [bend left] (alice2);
\draw[color=purple,<->, line width=1pt] (alice) to (alice2);
\draw[color=purple,<->, line width=1pt] (bob) to (alice2);
\draw[color=purple,<->, line width=1pt] (alice) to (bob2);
\draw[color=purple,<->, line width=1pt] (bob) to (bob2);
\node at (3.5,1) {\parbox{2cm}{\centering Leaked interactions}};
\end{tikzpicture}
\end{document}
Speech signals contain a wide variety of information which are often potentially private or even sensitive and it is difficult or impossible to list all categories of potential information. However, information which speech at least contains includes for example;
In other words, speech contains or can contain just about all types of private and sensitive information you could imagine. As speech is a tool for communication, this is not surprising; anything we can communicate about, can be spoken. Conversely, if we find that (and we do find that) privacy is important, then speech is among the most important signals to protect.
In a more general scope than just speech, privacy can be categorized
into seven types: {cite:p}finn2013seven
Note that this list does not make any claims with respect to rights to these types of privacy, but that privacy-issues can be often be split into these sub-topics. Whether someone has a right to privacy is a society-level decision and political choice, where psychological and cultural aspects play a big role.
Threats to privacy in speech communication can almost always be defined as covert extraction of information as well as storage, processing and usage of that information in ways of which the speaker is not aware, and/or with which the speaker does not agree. Variability in scenarios is then almost entirely due to the type of information involved as well as the stakeholders. In particular,
Most of the threats and attack scenarios are not familiar to the common public and some of them might be too abstract to be relevant to average users. Typically, we can hypothesize that scenarios which do not touch directly on the life of an individual, probably do not get much attention in the media. Topics in privacy and security which however have received attention in the public media include:
Amazon workers are listening to what you tell Alexa (Bloomberg, 2019). Later it was revealed that Google, Apple and others are doing the same.
Amazon Sends 1,700 Alexa Voice Recordings to a Random Person (Threatpost, 2018).
Amazon's Alexa recorded private conversation and sent it to random contact (The Guardian, 2018).
Fraudsters Used AI to Mimic CEO’s Voice in Unusual Cybercrime Case (The Wall Street Journal, 2019).
Note that the fact that many of the above examples are related to Amazon/Alexa is probably more coincidence than an indication that Alexa would treat privacy differently than its competitors.
At least from the European perspective, the following design concepts are seen as basis of good design for privacy. In fact, they are mandated by the General Data Protection Regulations of the European Union.
Typically it is reasonable that service providers have access to aggregated data such as ensemble averages, but not to information about individuals. For example, in a hypothetical case, a smart speaker operator could receive the information that 55% of users are male, but would not get the gender of any individual user.
A central problematic in privacy with speech signals is the concept of "uniquely identifiable". Legal frameworks such as the GDPR state that private information is such data where individual users are "uniquely identifiable", but there is no accurate definition of what it really means. If your partner recognizes your "Hello" on the phone, it means that for her, your "Hello" is uniquely identifiable. However, if you give 10.000 speech samples to your partner, one of which is your "Hello", then there's a significant likelihood that your partner would not find your "Hello" from the pile. An unanswered question is thus, "What is the size of the group where a user should be uniquely identifiable?".
A more detailed aspect is that of significance. The speaker recognition approach is to find the most likely speaker, out of the reference group of size N, whereas speaker verification tries to determine whether we, within some confidence intervals, can be sure that you are who you claim you are. In engineering terms, this means that we want to find the speaker with the highest likelihood, but with a sufficient margin to all other speakers. In the opposite direction, we can also use a lower threshold; we could say that statistically significant correlation already exposes the users privacy. For example, if we find that the speaker is either you or your father/mother, then we have a significant statistical correlation, but your are not uniquely identified.
A further consideration is that of adjoining data; Suppose there is a
recording of a speaker A, and that you happen to know a speaker A very
well. Then it will be easy for you to recognize the voice of A in that
recording. That is, you have a lot of experience (stored data) about how
A sounds, therefore it is easy for you to identify A. Does that mean
that A is uniquely identifiable in that recording? After all, A would
not be identifiable if you did not know A (= if you would not have
prior, stored data about A).
A slight variation of the above case is a recording of a speaker B,
where B is relatively famous public person, such that there are readily
available sound samples of his voice on-line. Does that make the
recording of B uniquely identifiable? Or if there is a recording of a
currently non-famous person C, who later becomes famous. Does that
change the status of the recording of C to uniquely identifiable?
Today, this question remains unanswered and we have no commonly agreed interpretation of what "uniquely identifiable" really means. What level of statistical confidence is assumed? What level of adjoining data is assumed (in terms of GDPR probably: any and all data which exists)? Can it change over time if new information becomes public (probably: yes)? Can it change over time if new technologies are developed (probably: yes)?
Privacy is an issue only if some other party has access to data about you. Data which resides on a device which is in your control is therefore relatively safe, assuming that no outsider has access to that device. If data is sent to a cloud server then there are more entities which could potentially have access to your data. Therefore all storage and processing which can be done on your local device is usually by design more private than any cloud server. Typically, a cloud server would only provide software updates (downlink), but no data would be sent in the other direction (uplink). {cite:p}shi2016edge
%%itikz --temp-dir --file-prefix dual-primary-
\documentclass{standalone}
\usepackage[utf8]{inputenc}
\usepackage{tikz}
\usepackage{verbatim}
\usepackage{pgfplots}
\DeclareUnicodeCharacter{2212}{−}
\usepgfplotslibrary{groupplots,dateplot}
\usetikzlibrary{patterns,shapes.arrows}
\usetikzlibrary {fit}
\usetikzlibrary{shapes.geometric,positioning}
\usetikzlibrary{bending}
\pgfplotsset{compat=newest}
\begin{document}
\begin{tikzpicture}
\node[ellipse,draw,minimum width=40pt,minimum height=35pt] at (-.5,0) (alice) {Alice};
\node[ellipse,draw,minimum width=35pt] at (2,0) (bob) {\parbox{1cm}{\centering Local device}};
\draw[->, line width=1pt] (alice) to [bend left] (bob);
\draw[->, line width=1pt] (bob) to [bend left] (alice);
%\node[fit=(alice) (bob), inner sep=1pt] (p) {};% {\parbox{2.4cm}{\centering Desired speech interaction\vspace{-1cm}}};
%\node[draw=gray,inner sep=6pt,very thick,ellipse,fit=(alice) (bob),label={[label distance=.01cm]180:{\parbox{1.7cm}{\centering Desired\\ interaction}}}] (private) {};
%\node[draw=gray,inner sep=6pt,very thick,ellipse,fit=(alice) (bob) (p)] (private) {};
\node[draw=gray,very thick,ellipse,minimum width=130pt, minimum height = 60pt] at (.8,0) (private) {};
\node[ellipse,draw] at (5.5,0) (eve) {\parbox{1cm}{\centering Cloud servce}};
%\draw[color=purple,->, dashed, line width=1pt] (eve) to node[above]{Model update} [left] (bob);
\path[color=purple,very thick,dashed,<-,every node/.style={font=\small}]
(bob) edge [] node[yshift=.5cm] {\parbox{1cm}{model update}} (eve);
%\draw[color=purple,->, line width=1pt] (bob) to [bend left] (eve);
\node at (.8,1.5) {\parbox{3cm}{\centering Private speech interaction}};
\node at (5.5,1.5) {\parbox{2cm}{\centering Restricted interactor}};
\end{tikzpicture}
\end{document}
Observe that this does not protect you from other local users. For example, if multiple persons are using one smart speaker at home, then the other users could have access to information about you through that device and any connected other devices.
Central limitations of edge processing are
A central issue with voice communication is that, in addition to the intended message, it also contains so much other information. For example, if you want to order a pizza delivery, the service provider needs to know only the content of the order, destination where the order should be delivered and how it is paid. The provider does not need to know, for example, your state of health or your cultural affiliation. Anonymization refers to methods which try to strip away such private and extra information such that only the intended message remains. With pseudonymization we refer to similar methods, where private information is replaced by some other information. For example, we could replace personal identifying information of a user Alice with an avatar-identity Adam. The process and methodology of separating the different streams of information is known as disentanglement.
%%itikz --temp-dir --file-prefix dual-primary-
\documentclass{standalone}
\usepackage[utf8]{inputenc}
\usepackage{tikz}
\usepackage{verbatim}
\usepackage{pgfplots}
\DeclareUnicodeCharacter{2212}{−}
\usepgfplotslibrary{groupplots,dateplot}
\usetikzlibrary{patterns,shapes.arrows}
\usetikzlibrary {fit}
\usetikzlibrary{shapes.geometric,positioning}
\usetikzlibrary{bending}
\pgfplotsset{compat=newest}
\begin{document}
\begin{tikzpicture}
\node[ellipse,draw,minimum width=40pt,minimum height=35pt] at (-.5,0) (alice) {Alice};
\node[ellipse,draw,minimum width=35pt] at (2,0) (bob) {\parbox{1cm}{\centering Local device}};
\draw[->, line width=1pt] (alice) to [bend left] (bob);
\draw[->, line width=1pt] (bob) to [bend left] (alice);
%\node[fit=(alice) (bob), inner sep=1pt] (p) {};% {\parbox{2.4cm}{\centering Desired speech interaction\vspace{-1cm}}};
%\node[draw=gray,inner sep=6pt,very thick,ellipse,fit=(alice) (bob),label={[label distance=.01cm]180:{\parbox{1.7cm}{\centering Desired\\ interaction}}}] (private) {};
%\node[draw=gray,inner sep=6pt,very thick,ellipse,fit=(alice) (bob) (p)] (private) {};
\node[draw=gray,very thick,ellipse,minimum width=130pt, minimum height = 60pt] at (.8,0) (private) {};
\node[ellipse,draw] at (6,0) (eve) {\parbox{1cm}{\centering Cloud servce}};
\draw[color=purple,dashed,->, line width=1pt] (eve) to [bend left] (bob);
\draw[color=purple,dashed,->, line width=1pt] (bob) to [bend left] (eve);
\node at (.8,1.5) {\parbox{3cm}{\centering Private speech interaction}};
\node at (4.1,0) {\parbox{2cm}{\centering Reduced data}};
\node at (6,1.5) {\parbox{2cm}{\centering Restricted interactor}};
\end{tikzpicture}
\end{document}
An issue with current methodology is that there are no theoretical guarantees of anonymity. That is, we can try to deduce private information from the anonymized data stream, and if we fail, we can say that the anonymization has succeeded with respect our attempts to break it. However, we have no guarantee that some other more advanced method for breaking the anonynization would not succeed.
Even when operating with aggregate data, like the mean user age, it is still possible to extract private information in some scenarios. For example, if we know the mean user age and the number of users at a time t, and we also know that the age of user X was added to the mean at time t+1, as well as the mean user age at t+1, then we can deduce the age of user X with basic algebra. As a safeguard against such differential attacks, to provide differential privacy, it is possible to add noise to any data transfers. Individual data points are then obfuscated and cannot be exactly recovered. However, the ensemble average can still be deduced if the distribution of the added noise is known.
%%itikz --temp-dir --file-prefix dual-primary-
\documentclass{standalone}
\usepackage[utf8]{inputenc}
\usepackage{tikz}
\usepackage{verbatim}
\usepackage{pgfplots}
\DeclareUnicodeCharacter{2212}{−}
\usepgfplotslibrary{groupplots,dateplot}
\usetikzlibrary{patterns,shapes.arrows}
\usetikzlibrary {fit}
\usetikzlibrary{shapes.geometric,positioning}
\usetikzlibrary{bending}
\pgfplotsset{compat=newest}
\begin{document}
\begin{tikzpicture}
\node[ellipse,draw,minimum width=50pt] at (0,0) (alice) {Alice\strut };
\node[ellipse,draw,minimum width=50pt] at (2.6,0) (bob) {Agent 1\strut };
\draw[->, line width=1pt] (alice) to [bend left] (bob);
\draw[->, line width=1pt] (bob) to [bend left] (alice);
\node[draw=gray,very thick,ellipse,minimum width=140pt, minimum height = 45pt] at (1.35,0) (private) {};
\node[circle,draw,style={font=\small}] at (5,0) (diffplus) {+};
\node[ellipse,draw,minimum width=50pt] at (7,0) (eve) {Agent 2\strut};
\draw[color=purple,->, line width=1pt] (bob) to [left] (diffplus);
\draw[color=purple,->, dashed,line width=1pt] (diffplus) to [left] (eve);
%\path[color=purple,very thick,dashed,<-,every node/.style={font=\small}] (bob) edge [] node[yshift=.5cm] {\parbox{1cm}{model update}} (eve);
%\draw[color=purple,->, line width=1pt] (bob) to [bend left] (eve);
\node at (1.35,1.5) {\parbox{3cm}{\centering Private speech interaction}};
\node at (7,1.5) {\parbox{2cm}{\centering Restricted interactor}};
\node at (5,1.5) (noise) {\parbox{2cm}{\centering Noise}};
\draw[color=purple,->, line width=1pt] (noise) to [left] (diffplus);
\end{tikzpicture}
\end{document}
The required compromise here is that the level of privacy corresponds to amount of noise, which is inversely proportional to the accuracy of the ensemble mean. That is, if the amount of noise is large, then we need a huge number of users to determine an accurate ensemble average. On the other hand, if the amount of noise is small, then we can get a fair guess of an individual data point, but also the ensemble average is accurate.
To enable machine learning in the cloud without the need to provide access to private data, we can use federated learning, where private data remains on the local device, but only model updates are sent to the cloud. Clearly this is approach has better privacy than one where all private data is sent to the cloud. At the same time, by combining information from a large number of local devices in the cloud, the machine learning models can be optimized to be highly accurate. However, currently we do not yet have clear understanding of the extent of privacy with this type of methods; some data is sent to the cloud, but can some private data still be traced back to the user?
%%itikz --temp-dir --file-prefix dual-primary-
\documentclass{standalone}
\usepackage[utf8]{inputenc}
\usepackage{tikz}
\usepackage{verbatim}
\usepackage{pgfplots}
\DeclareUnicodeCharacter{2212}{−}
\usepgfplotslibrary{groupplots,dateplot}
\usetikzlibrary{patterns,shapes.arrows}
\usetikzlibrary {fit}
\usetikzlibrary{shapes.geometric,positioning}
\usetikzlibrary{bending}
\pgfplotsset{compat=newest}
\begin{document}
\begin{tikzpicture}
\node[draw=gray,dotted,fill=white,very thick,ellipse,minimum width=130pt, minimum height = 60pt] at (1.2,.8) (private5) {};
\node[draw=gray,dotted,fill=white,very thick,ellipse,minimum width=130pt, minimum height = 60pt] at (1.1,.6) (private4) {};
\node[draw=gray,fill=white,very thick,ellipse,minimum width=130pt, minimum height = 60pt] at (1,.4) (private3) {};
\node[draw=gray,fill=white,very thick,ellipse,minimum width=130pt, minimum height = 60pt] at (.9,.2) (private2) {};
\node[draw=gray,fill=white,very thick,ellipse,minimum width=130pt, minimum height = 60pt] at (.8,0) (private) {};
\node[ellipse,draw,minimum width=40pt,minimum height=35pt] at (-.5,0) (alice) {Alice};
\node[ellipse,draw,minimum width=35pt] at (2,0) (bob) {\parbox{1cm}{\centering Local device}};
\draw[->, line width=1pt] (alice) to [bend left] (bob);
\draw[->, line width=1pt] (bob) to [bend left] (alice);
%\node[fit=(alice) (bob), inner sep=1pt] (p) {};% {\parbox{2.4cm}{\centering Desired speech interaction\vspace{-1cm}}};
%\node[draw=gray,inner sep=6pt,very thick,ellipse,fit=(alice) (bob),label={[label distance=.01cm]180:{\parbox{1.7cm}{\centering Desired\\ interaction}}}] (private) {};
%\node[draw=gray,inner sep=6pt,very thick,ellipse,fit=(alice) (bob) (p)] (private) {};
\node[ellipse,draw] at (6,0) (eve) {\parbox{1cm}{\centering Cloud servce}};
\draw[color=purple,->, line width=1pt] (eve) to [bend left] (bob);
%\path[color=purple,very thick,<-,every node/.style={font=\small}] (bob) edge [] node[yshift=.5cm] {\parbox{1cm}{model update}} (eve);
\draw[color=purple,->, line width=1pt] (bob) to [bend left] (eve);
\node at (.8,2.5) {\parbox{3cm}{\centering Private speech interactions}};
\node at (4.2,0) {\parbox{1.3cm}{\centering Model updates}};
\node at (6,1.5) {\parbox{2cm}{\centering Restricted interactor}};
\end{tikzpicture}
\end{document}
Suppose a service provider has a proprietary model, say an analysis method for Alzheimer's disease from the voice, and your doctor would like to analyse your voice with that method. Naturally your voice is also private, so you do not want to send your voice to the third-party service provider, but also the service provider does not want to send the model to you. Homomorphic encryption provides a method for applying the secret model on encrypted data, such that you have to only send your data in an encrypted form to the service provider. Your doctor would then receive only the final diagnosis, but not the model nor your speech data. The concept is in principle beautiful, it solves the problem of mutual distrust very nicely. However, the compromise is that currently available homomorphic encryption methods require that all processing functions can be written as polynomial functions. In theory, we can transform any function to a corresponding polynomial, but the increase in complexity is often dramatic. Consequently, privacy-preserving methods based on homomorphic encryption typically have a prohibitively high computational complexity. {cite:p}armknecht2015guide
%%itikz --temp-dir --file-prefix dual-primary-
\documentclass{standalone}
\usepackage[utf8]{inputenc}
\usepackage{tikz}
\usepackage{verbatim}
\usepackage{pgfplots}
\DeclareUnicodeCharacter{2212}{−}
\usepgfplotslibrary{groupplots,dateplot}
\usetikzlibrary{patterns,shapes.arrows}
\usetikzlibrary {fit}
\usetikzlibrary{shapes.geometric,positioning}
\usetikzlibrary{bending}
\pgfplotsset{compat=newest}
\begin{document}
\begin{tikzpicture}
\node[ellipse,draw,minimum width=40pt,minimum height=35pt] at (-.5,0) (alice) {Alice};
\node[ellipse,draw,minimum width=35pt] at (2,0) (bob) {\parbox{1cm}{\centering Local device}};
\draw[->, line width=1pt] (alice) to [bend left] (bob);
\draw[->, line width=1pt] (bob) to [bend left] (alice);
%\node[fit=(alice) (bob), inner sep=1pt] (p) {};% {\parbox{2.4cm}{\centering Desired speech interaction\vspace{-1cm}}};
%\node[draw=gray,inner sep=6pt,very thick,ellipse,fit=(alice) (bob),label={[label distance=.01cm]180:{\parbox{1.7cm}{\centering Desired\\ interaction}}}] (private) {};
%\node[draw=gray,inner sep=6pt,very thick,ellipse,fit=(alice) (bob) (p)] (private) {};
\node[draw=gray,very thick,ellipse,minimum width=130pt, minimum height = 60pt] at (.8,0) (private) {};
\node[ellipse,draw] at (6,0) (eve) {\parbox{1cm}{\centering Cloud servce}};
\draw[color=purple,->, dotted,line width=1pt] (eve) to [bend left] (bob);
\draw[color=purple,->, dotted,line width=1pt] (bob) to [bend left] (eve);
\node at (.8,1.5) {\parbox{3cm}{\centering Private speech interaction}};
\node at (4.1,0) {\parbox{2cm}{\centering Encrypted data}};
\node at (6,1.5) {\parbox{2cm}{\centering Restricted interactor}};
\end{tikzpicture}
\end{document}
In addition to privacy-preserving algorithms, we can also design privacy-preserving architectures. The myData paradigm is based on a three-tier design, where the user can choose where all his/her data is stored and where the user can give access for service providers to his/her data when required. The idea is to separate service providers from data storage, such that users have better control over his/her data. To transform existing services to adhere with the myData concept requires that new storage services for private data are created and that APIs between storage and processing services are specified. {cite:p}poikola2016mydatb
%%itikz --temp-dir --file-prefix dual-primary-
\documentclass{standalone}
\usepackage[utf8]{inputenc}
\usepackage{tikz}
\usepackage{verbatim}
\usepackage{pgfplots}
\DeclareUnicodeCharacter{2212}{−}
\usepgfplotslibrary{groupplots,dateplot}
\usetikzlibrary{patterns,shapes.arrows}
\usetikzlibrary {fit}
\usetikzlibrary{shapes.geometric,positioning}
\usetikzlibrary{bending}
\pgfplotsset{compat=newest}
\begin{document}
\begin{tikzpicture}
\node[ellipse,draw,minimum width=40pt,minimum height=35pt] at (-.5,0) (alice) {Alice};
\node[ellipse,draw,minimum width=35pt] at (2,0) (bob) {\parbox{1cm}{\centering Local device}};
\draw[->, line width=1pt] (alice) to [bend left] (bob);
\draw[->, line width=1pt] (bob) to [bend left] (alice);
%\node[fit=(alice) (bob), inner sep=1pt] (p) {};% {\parbox{2.4cm}{\centering Desired speech interaction\vspace{-1cm}}};
%\node[draw=gray,inner sep=6pt,very thick,ellipse,fit=(alice) (bob),label={[label distance=.01cm]180:{\parbox{1.7cm}{\centering Desired\\ interaction}}}] (private) {};
%\node[draw=gray,inner sep=6pt,very thick,ellipse,fit=(alice) (bob) (p)] (private) {};
\node[draw=gray,very thick,ellipse,minimum width=130pt, minimum height = 60pt] at (.8,0) (private) {};
\node[ellipse,draw] at (5.1,1) (mydata) {\parbox{1cm}{\centering Private cloud}};
\node[ellipse,draw] at (8,0) (eve) {\parbox{1cm}{\centering Cloud servce}};
\draw[color=purple,->, line width=1pt] (eve) to [bend left] (bob);
\path[color=purple,very thick,->,every node/.style={font=\small}]
(bob) edge [] node[yshift=.5cm] {\parbox{1cm}{\centering raw data}} (mydata);
\path[color=purple,very thick,dashed,->,every node/.style={font=\small}]
(mydata) edge [] node[xshift=.25cm,yshift=.5cm] {\parbox{1cm}{\centering reduced data}} (eve);
\node at (.8,1.5) {\parbox{3cm}{\centering Private speech interaction}};
%\node at (6,1.5) {\parbox{2cm}{\centering Restricted interactor}};
\end{tikzpicture}
\end{document}
Note that, if a user chooses to store private data on a cloud-server, then it is still susceptible for abuse by the storage-service-provider, unless appropriate encryption methods are used. However, the user could in principle choose to store private data on a edge device, such that the storage-service-provider is cut out of the loop.
A further risk is that in the myData concept, we usually assume that data is stored at a single central location, which becomes a central point of weakness. Should someone gain illegitimate access to the storage, then all your data would be compromised. Distributing data to several different storage locations might therefore be reasonable.
A common prejudice is that privacy and security requirements cause problems for developers and make systems more difficult for users to use. Such prejudice are unfortunate and patently misguided. The problem is that many privacy problems are not visible to the casual observer and their effects become apparent only when it already is too late. Another argument is "privacy is not my concern because I haven't seen any privacy problems", which is like saying that "rape is not my concern because I haven't seen any rapes". This is thus an absurd argument. Privacy safeguards are meant to protect users and developers from very bad consequences. These problems are real. You cannot ignore them.
However, designing for privacy is also not only about protection of users. It is also very much about designing technology which is easy to use and where the user experience feels intuitive and natural. For example, speaker recognition can be used to grant access to voice technology such that the user does not have to be bothered with passwords, PIN-codes or other cumbersome authentication methods. Overall speech technology promises to give access to services without the need scroll through menus on your washing machine to find that one mode which is optimized for white curtains made out of cotton.
The overall design goal could be that people should be able to trust
the system. In particular, a trustworthy system will be {cite:p}Chen2003,Xie2009
These goals are best illustrated by examples;
The following is a list of hypothetical questions which can (and do) arise in the design of speech operated systems:
Scientific research is based on arguments supported by evidence, where evidence, in the speech sciences, is recordings of speech. Access to speech data is therefore a mandatory part of research in the speech sciences. To obtain trustworthy results, independent researchers have to be able to verify each others results, which means that they have to have access to the same or practically identical data sources. Shared data is the gold standard for reproducible research. However, the sharing of speech data can be problematic with respect to speaker privacy.
The concept of "uniquely identifiable" is here the key. If an individual is not uniquely identifiable in a data set, then you are allowed to share that data. Conversely, if you remove all identifying data, then you can share data relatively freely. However, in perspective of the discussions above, it should be clear that it is not clear what constitutes identifying data nor is it clear what makes that data "uniquely" identifying.
A second important consideration is consent. The persons whose voices are recorded must be allowed to choose freely whether they want to participate and that choice has to be explicit; you need to ask them clearly whether they want to participate in a recording. The research needs to be able to prove that consent has been given, and therefore that consent must be documented carefully. Consent must also be given freely such that there are no explicit or hidden penalties of rejecting consent. Furthermore, if any uniquely identifying data of a participant is stored, then the participant must be allowed to withdraw consent afterwards. There are however some important exemptions to this rule; the right to withdraw consent can be rejected, for example, if that would corrupt the integrity of the data set, such as
To allow plausible grounds for denying the right to withdraw consent, datasets can then be designed to be either balanced or relatively small. Collecting balanced datasets is good practice in any case, such that this is not a limitation but can actually improve quality. Conversely, good data is balanced and that should be our goal; A consequence is that we might be forced to deny the right to withdraw consent. Avoiding the collection of excessively large data sets is also good from the perspective of data minimization and data ecology.
In a request for consent, the data collector should state the purpose of the dataset (i.e. purpose binding). For instance, a dataset could be collected for development of wake-word detection methods and consent is received for that purpose. Then it is not permissible to use the same data set for speaker detection experiments or medical analysis of the voice. Period. It is therefore good practice to ask for consent in a sufficiently wide way, such that researchers have some flexibility in using the data. Blanket consent to all research purposes is however not good practice. In particular, it is recommend that processing of sensitive information such as health, ethnic, political information is excluded if it is not the express purpose of the dataset (cf. data minimization).
If a dataset by nature does include uniquely identifiable data, then the researchers need to apply stronger layers of safeguards. In particular, typically researchers have to keep track of who has access to the data, to ensure purpose binding and to allow withdrawal of consent. This could also require that any researcher who downloads the data signs a contract with the data provider, where the terms of usage are defined. Such a contract can be required in any case, not only with uniquely identifiable data.
Data such as medical information, data about children or other exposed groups, political, religious and gender-identity affiliations etc. are particularly sensitive. If your dataset contains any such information, then you have to apply stronger safeguards. To begin with, access to such data has to be, in practice, always limited to only persons who are included in a legally binding contract specifying access rights and allowable uses, processing and storage.
As an overall principle, note that the principal investigator (research group leader) is legally responsible for the use of the data that is collected, stored and processed. In particular, if a third party downloads the data and misuses it, for example by analysing health information even if no consent has been acquired for that purpose, then it is the principal investigator who is responsible. However, the principal investigator is only required to apply reasonable safeguards to ensure that data is not misused. What level of safeguards are sufficient has however not yet been agreed. It is likely that there will never be rules which specify exactly a sufficient level of safeguards.
In the above discussion it has become clear that the nature of unique identifiability can change over time, when new information is published and new technologies emerge. This means that datasets which previously were adequately protected, over time become exposed to privacy problems. It is therefore important that researchers monitor their published datasets over time such that if new threats emerge, they can take appropriate action. For example, they could withdraw an dataset entirely. Reasonable ways for implementing this could be:
As a last resort, when data is so sensitive and private that it cannot be publicly released, it is possible to require on-site processing of data. For example, you can design a computing architecture, where data resides on a secure server, to which researcher have access through a secure API. Data never leaves the server such that privacy is always preserved. For an even higher level of security, data can be stored on an air-gapped computer system, which means that access to the data requires that researchers physically come to the computer (no network access). This level of security is usually the domain of military-grade systems.
{bibliography}
:filter: docname in docnames