\chapter{Simple hypothesis testing} \label{chap:DET_1} In the statistical hypothesis testing problem, a decision has to be made as to which of several hypotheses (or states of nature) is the correct one. The states of nature are encoded in a rv $H$ and a decision has to be made on the basis of an observation $\myvec{Y}$ which is statistically related to $H$. \section{Motivating examples} \label{sec:DET_1+MotivatingExamples} \paragraph{Control process} \paragraph{A simple communication example} \paragraph{Testing means} \section{The probabilistic model} \label{sec:DET_1+Model} The {\em binary} hypothesis testing problem is the simplest version of this problem in that nature can be in either of two states, say $H=0$ or $H=1$ for sake of concreteness. If $F_0 $ and $F_1 $ are probability distribution functions on $\mathbb{R}^k$, this situation is summarized by \begin{equation} \begin{array}{ll} H_1: & \ \myvec{Y} \sim F_1 \\ H_0: & \ \myvec{Y} \sim F_0 . \\ \end{array} \label{eq:DET_1+Model+Formulation1} \end{equation} The hypothesis $H_0$ is called the {\em null hypothesis} and hypothesis $H_1$ is referred to as the {\em non-null hypothesis} or the {\em alternative}. Probabilistically, the symbolic statement (\ref{eq:DET_1+Model+Formulation1}) is understood as follows: On some probability triple $(\Omega, \mathcal{F}, \mathbb{P})$, consider the rvs $H$ and $\myvec{Y}$ defined on $\Omega$ which take values in $\{0,1\}$ and $\mathbb{R}^k$, respectively. The probability distribution functions $F_0$ and $F_1$ then have the interpretation as conditional probability distribution of $\myvec{Y}$ given $H=0$ and $H=1$, respectively, namely \[ F_h(\myvec{y}) = \bP{\myvec{Y} \leq \myvec{y} | H=h}, \quad \begin{array}{c} \myvec{y} \in \mathbb{R}^k ,\\ h=0,1. \\ \end{array} \] The probability distribution of the rv $H$ is specified by $p$ in $[0,1]$ with \[ p = \bP{H=1} = 1- \bP{H=0}. \] We refer to the pmf $(1-p,p)$ on $\{0,1\}$, or just to $p$, as the {\em prior}. This construction is always possible. Since \begin{eqnarray} \bP{ \myvec{Y} \leq y , H=h} &=& \bP{\myvec{Y} \leq y |H=h} \bP{H=h} \nonumber \\ &=& \left \{ \begin{array}{ll} (1-p) F_0(\myvec{y}) & \mbox{if $h=0$, $\myvec{y} \in \mathbb{R}^k$} \\ & \\ p F_1(\myvec{y}) & \mbox{if $h=1$, $\myvec{y} \in \mathbb{R}^k$,} \\ \end{array} \right . \end{eqnarray} then the law of total probability shows that \begin{eqnarray} \bP{\myvec{Y} \leq y } &=& \sum _{h=0}^{1} \bP{\myvec{Y} \leq y |H=h} \bP{H=h} \nonumber \\ &=& p F_1(\myvec{y}) +(1-p) F_0(\myvec{y}), \quad \myvec{y} \in \mathbb{R}^k . \end{eqnarray} Thus, the conditional probability distributions of the observations given the hypothesis {\em and} the probability distribution of $H$ completely specify the {\em joint} distribution of the rvs $H$ and $\myvec{Y}$. During the discussion, several assumptions will be enforced on the probability distributions $F_0 $ and $F_1 $. The assumptions that will be most often encountered are denoted by {\bf (A.1)} and {\bf (A.2)} for sake of convenience. They are stated and discussed in some details below. Condition {\bf (A.1)} corresponds to the assumption of usual absolute continuity that was used in Part \ref{Part:ESTIMATION}. The probability distributions $F_0 $ and $F_1 $ on $\mathbb{R}^k$ satisfy Condition {\bf (A.1)} if they are both {\em absolutely continuous} with respect to some distribution $F $ on $\mathbb{R}^k$ -- In general $F$ may not a probability distribution. Condition {\bf (A.1)} is equivalent to saying that there exist Borel mappings $f_0, f_1 :\mathbb{R}^k \rightarrow \mathbb{R}_+$ such that \begin{equation} F_h (\myvec{y}) = \int _{ -\myvec{\infty }} ^{\myvec{y}} f_h (\myvec{\eta}) dF(\myvec{\eta}), \quad \begin{array}{c} \myvec{y} \in \mathbb{R}^k, \\ h=0,1. \\ \end{array} \label{eq:DET_1+Model+A1} \end{equation} In some basic sense, this condition is hardly constraining since we can always take $F$ to be the average of the two probability distributions $F_0$ and $F_1$. i.e., \begin{equation} F (\myvec{y}) \equiv {1\over 2} F_0 (\myvec{y}) + {1\over 2} F_1 (\myvec{y}), \quad \myvec{y} \in \mathbb{R}^k . \end{equation} in which case $F$ is also a probability distribution. This choice for $F$ is usually not operationally convenient and therefore discarded. However, the most often encountered situations arise when $F$ is either Lebesgue measure on $\mathbb{R}^k$ or a counting measure on some countable subset of $\mathbb{R}^k$, in which case $F$ is not a probability distribution. When $F$ is Lebesgue measure on $\mathbb{R}^k$, the Borel mappings $f_0, f_1 :\mathbb{R}^k \rightarrow \mathbb{R}_+$ are just the probability density functions induced by $F_0$ and $F_1$ in the usual sense. When $F$ is counting measure on a countable subset $S \subseteq \mathbb{R}^k$, then the Borel mappings $f_0, f_1 :\mathbb{R}^k \rightarrow \mathbb{R}_+$ are best thought as {\em probability mass functions} (pdfs) $\myvec{f}_0 = \{ f_0(\myvec{y}), \ \myvec{y} \in S \}$ and $\myvec{f}_1 = \{ f_1(\myvec{y}), \ \myvec{y} \in S \}$, i.e., \[ 0 \leq f_h(\myvec{y}) \leq 1 , \quad \begin{array}{c} \myvec{y} \in S, \\ h=0,1. \\ \end{array} \] and \[ \sum_{ \myvec{y} \in S } f_h(\myvec{y}) = 1, \quad h=0,1. \] The condition (\ref{eq:DET_1+Model+A1}) now takes the form \[ \bP{ \myvec{Y} \in B | H=h} = \sum_{ \myvec{\eta} \in S \cap B } f_h(\myvec{\eta}), \quad \begin{array}{c} B \in \mathcal{B}(\mathbb{R}^k)\\ h=0,1. \\ \end{array} \] The second assumption of interest here is Condition {\bf (A.2)} which asserts that the probability distribution $F_1 $ is {\em absolutely continuous} with respect to the probability distribution $F_0 $. Under Condition {\bf (A.1)}, with the notation introduced earlier, this is equivalent to requiring \begin{equation} f_0 (\myvec{y})=0 \quad \mbox{implies} \quad f_1(\myvec{y})=0. \end{equation} \section{Admissible tests} \label{sec:DET_1+AdmissibleRules} An {\em admissible} decision rule (or test) is any {\em Borel} mapping $d:{\mathbb{R}}^k\rightarrow\{0,1\}$. The collection of all admissible rules is denoted by ${\cal D}$. The measurability requirement entering the definition of admissibility is imposed to guarantee that the mapping $d(\myvec{Y}):\Omega\rightarrow\{0,1 \}:\omega\rightarrow d(Y(\omega ))$ is indeed a rv, i. e., $\left [\omega \in \Omega : d(Y(\omega)) = h \right ]$ is an event in ${\cal F}$ for all $h=0,1$. It is plain that every test $d$ in ${\cal D}$ is completely specified by the {\em Borel} subset $C(d)$ defined by \begin{equation} C(d) \equiv \{ \myvec{y} \in \mathbb{R}^k : d(\myvec{y}) =0 \}. \label{eq:DET_1+AdmissibleRules+C(d)} \end{equation} Conversely, any Borel measurable subset $C$ of $\mathbb{R}^k$ uniquely determines an admissible rule $d_{C}$ in ${\cal D}$ through \[ d_{C} (\myvec{y}) = \left \{ \begin{array}{ll} 1 & \mbox{if $\myvec{y} \notin C$} \\ & \\ 0 & \mbox{if $\myvec{y} \in C$.} \\ \end{array} \right . \] We note that $C(d_C) = C$ as expected. Any admissible rule $d$ in ${\cal D}$ induces {\em two} types of error: Upon observing $\myvec{Y}$, either $H=0$ is true and $d(\myvec{Y}) = 1$ {\em or} $H=1$ is true and $d(\myvec{Y})=0$. These two possibilities are the so--called errors of the {\em first} and {\em second} type associated with the decision rule $d$; they are quantified by \begin{equation} \alpha (d) \equiv \bP{d(\myvec{Y}) = 1| H=0 } \end{equation} and \begin{equation} \beta (d) \equiv \bP{d(\myvec{Y}) = 0 | H=1 }, \end{equation} respectively. The quantity $\alpha (d)$ is sometimes called the {\em size} of the test $d$. In radar parlance, these probabilities are referred to as probabilities of {\em false alarm} and {\em miss}, respectively, with alternate notation \begin{equation} P_F (d) \equiv \bP{d(\myvec{Y}) = 1 | H=0 } \label{eq:DET_1+AdmissibleRules+P_F} \end{equation} and \begin{equation} P_M(d) \equiv \bP{d(\myvec{Y}) = 0 | H=1 }. \label{eq:DET_1+AdmissibleRules+P_M} \end{equation} Throughout we shall use this terminology. Sometimes, it is convenient to consider the so--called probability of {\em detection} given by \begin{equation} P_D(d) \equiv \bP{d(\myvec{Y}) = 1 | H=1 } = 1 - P_M(d) . \label{eq:DET_1+AdmissibleRules+P_D} \end{equation} \section{The Bayesian formulation} \label{sec:DET_1+BayesianFormulation} The Bayesian formulation assumes {\em knowledge} of the conditional distributions $F_1$ and $F_0$, and of the prior distribution $p$ of the rv $H$. \paragraph{The Bayesian optimization problem} The cost incurred for making decisions is quantified by the mapping $C:\{0,1\} \times \{0,1\} \rightarrow \mathbb{R}$ with the interpretation that \[ C(h,d) = \left ( \begin{array}{c} \mbox{Cost incurred for deciding $d$} \\ \mbox{when $H=h$} \\ \end{array} \right ), \quad d,h=0,1. \] As the sample $\omega$ in $\Omega$ is realized, the observation $Y( \omega )$ is recorded and the use of the admissible rule $d$ in ${\cal D}$ incurs a cost $C(H (\omega ),d(\myvec{Y} (\omega )))$. Although it is tempting to seek to minimize this quantity, this is not possible. Indeed, the rv $\myvec{Y}$ is observed, whence $d(\myvec{Y})$ is known once the test $d$ has been specified, but the state of nature $H$ is {\em not} directly observable. Consequently, the value of the cost $C(H,d(\myvec{Y}))$ is not available. To remedy to this difficulty, we introduce the {\em expected cost function} $J:{\cal D}\rightarrow \mathbb{R} $ given by \[ J(d) \equiv \bE{ C(H,d(\myvec{Y}))}, \quad d \in {\cal D}. \] The {\em Bayesian} Problem $\mathcal{P}_B$ is the minimization problem \[ \mathcal{P}_B: \quad \mbox{Minimize $J(d)$ over $d$ in $\mathcal{D}$.} \] This amounts to finding an admissible test $d^\star$ in $\cal D$ such that \begin{equation} J(d^\star) \leq J(d), \quad d \in \mathcal{D}. \label{eq:DET_1+BayesianFormulation+Bayesian1} \end{equation} Any admissible test $d^\star$ which satisfies (\ref{eq:DET_1+BayesianFormulation+Bayesian1}) is called a Bayesian test, and the value \begin{equation} J(d^\star) = \inf_{d \in \mathcal{D}} J(d) = \min_{d \in \mathcal{D}} J(d) \label{eq:DET_1+BayesianFormulation+Bayesian2} \end{equation} is known as the {\em Bayesian} cost. \paragraph{Representation results for the Bayesian cost $J$} The solution to the Bayesian problem ${\cal P}_B$ is developed with the help of an auxiliary result concerning the form of the Bayesian cost. This representation result will be useful in several places and is given here for sake of easy reference. Fix $d$ in $\mathcal{D}$. Recall that the rvs $H$ and $d(\myvec{Y})$ are $\{0,1\}$-valued rvs, and that the events $[d(\myvec{Y})=H]$ and $[d(\myvec{Y}) \neq H]$ form a partition of $\Omega$, i. e., \[ \1{ d(\myvec{Y}) = H } + \1{ d(\myvec{Y}) \neq H } = \1{ \Omega } =1. \] It readily follows that \begin{eqnarray} C(H,d(\myvec{Y}) ) &=& \1{ d(\myvec{Y}) = H } C(H,H) + \1{ d(\myvec{Y}) \neq H } C(H,1-H) \nonumber \\ &=& \left ( 1 - \1{ d(\myvec{Y}) \neq H } \right ) C(H,H) + \1{ d(\myvec{Y}) \neq H } C(H,1-H) \nonumber \\ &=& C(H,H) + \left ( C(H,1-H) - C(H,H) \right ) \1{ d(\myvec{Y}) \neq H } \nonumber \\ &=& C(H,H) + \1{ d(\myvec{Y}) \neq H } \Gamma_H \label{eq:DET_1+BayesianFormulation+TowardsAuxiliaryCost} \end{eqnarray} as we introduce the relative costs $\Gamma _0$ and $\Gamma _1$ given by \[ \Gamma _h \equiv C(h,1-h) - C(h,h), \quad h=0,1. \] Taking expectations on both sides of (\ref{eq:DET_1+BayesianFormulation+TowardsAuxiliaryCost}) we find \begin{eqnarray} J(d) = \bE{C(H,H) } + \bE{\1{ d(\myvec{Y}) \neq H } \Gamma_H }. \nonumber \end{eqnarray} This last relation points to the auxiliary expected cost function $\widehat {J}:{\cal D}\rightarrow \mathbb{R}$ defined by \begin{equation} \widehat {J}(d) = \bE{ \1{ d(\myvec{Y}) \neq H} \Gamma _H }, \quad d \in {\cal D} \label{eq:DET_1+BayesianFormulation+TowardsAuxiliaryCost2} \end{equation} so that \begin{equation} J(d) = \bE{C(H,H) } + \widehat {J}(d), \quad d \in {\cal D}. \label{eq:DET_1+BayesianFormulation+TowardsAuxiliaryCost3} \end{equation} The law of total probabilities gives \begin{eqnarray} \lefteqn{ \widehat {J}(d) } & & \nonumber \\ &=& \bE{ \Gamma _0 \1{ d(\myvec{Y})\neq 0 } \1{ H=0 } + \Gamma _1 \1{ d(\myvec{Y})\neq 1 } \1{ H=1 } } \nonumber \\ &=& \Gamma _0 (1-p) \cdot \bP{ d(\myvec{Y}) \neq 0 | H=0 } + \Gamma _1 p \cdot \bP{ d(\myvec{Y}) \neq 1 | H=1 } \nonumber \\ &=& \Gamma _0 (1-p) \cdot \bP{ d(\myvec{Y}) = 1 | H=0 } + \Gamma _1p \cdot \bP{ d(\myvec{Y}) = 0 | H=1}, \label{eq:DET_1+BayesianFormulation+BasicExpression0} \end{eqnarray} and direct substitution easily yields the following expressions. \begin{lemma} {\sl For any admissible rule $d$ in $\mathcal{D}$, the relations \begin{eqnarray} \widehat J (d) = \Gamma _0(1-p) \cdot P_F (d) + \Gamma _1 p \cdot P_M(d) \end{eqnarray} and \begin{equation} J(d) = \bE{C(H,H)} + \Gamma _0(1-p) \cdot P_F (d) + \Gamma _1p \cdot P_M(d) \label{eq:DET_1+BayesianFormulation+BasicExpression} \end{equation} hold.} \label{lem:DET_1+BayesianFormulation+BasicExpression} \end{lemma} It is plain from these expressions that the Bayesian cost under a decision rule is completely determined by its probabilities of false alarm and of miss. We have \begin{eqnarray} \widehat {J}(d) &=& \Gamma _0(1-p) + \Gamma _1p \cdot \bP{d(\myvec{Y}) = 0 | H=1} \nonumber \\ & & - \Gamma _0 (1-p) \cdot \bP{d(\myvec{Y}) = 0 | H=0}, \quad d \in \cal D. \label{eq:DET_1+BayesianFormulation+BasicExpression2} \end{eqnarray} as an immediate consequence of (\ref{eq:DET_1+BayesianFormulation+BasicExpression0}). \section{Solving the Bayesian problem $\mathcal{P}_B$} \label{sec:DET_1+SolutionBayesProblem} It follows from (\ref{eq:DET_1+BayesianFormulation+TowardsAuxiliaryCost3}) that solving $\mathcal{P}_B$ is {\em equivalent} to solving the auxiliary problem ${\widehat {\mathcal{P}}}_B$ where \[ {\widehat {\mathcal{P}}}_B: \quad \mbox{Minimize $\widehat J(d)$ over $d$ in $\mathcal{D}$.} \] To solve this auxiliary problem ${\widehat {\mathcal{P}}}_B$, it will be necessary to assume that the probability distributions $F_0$ and $F_1$ satisfy the absolute continuity condition {\bf (A1)} given earlier, namely that there exists a single distribution $F$ on $\mathbb{R} ^k$ with respect to which both $F_0 $ and $F_1 $ are absolutely continuous. For any test $d$ in $\cal D$, we get \begin{eqnarray} \bP{d(\myvec{Y}) = 0 | H=h } & = & \int _{C(d)} dF_h(\myvec{y}) \nonumber \\ & = & \int _{C(d)} f_h (\myvec{y}) dF(\myvec{y}), \quad h=0,1 \end{eqnarray} with $C(d)$ defined at (\ref{eq:DET_1+AdmissibleRules+C(d)}). It is now easy to see from (\ref{eq:DET_1+BayesianFormulation+BasicExpression2}) that \begin{equation} \widehat {J}(d) = \Gamma _0(1-p) + \int _{C(d)} h(\myvec{y}) dF(\myvec{y}) \label{eq:DET_1+BayesianFormulation+AnotherExpression4} \end{equation} where the mapping $h:\mathbb{R}^k \rightarrow \mathbb{R}$ is given by \begin{equation} h(\myvec{y}):= \Gamma _1p \cdot f_1 (\myvec{y}) - \Gamma _0(1-p) \cdot f_0 (\myvec{y}), \quad \myvec{y} \in \mathbb{R}^k . \label{eq:DET_1+SolutionBayesProblem+h(y)} \end{equation} \begin{theorem} {\sl Assume the absolute continuity condition {\bf (A.1)} to hold. Define the Borel set $C^\star$ by \[ C^\star \equiv \{ y \in \mathbb{R}^k : h(\myvec{y})<0 \} \] with $h: \mathbb{R}^k \rightarrow \mathbb{R}$ given by (\ref{eq:DET_1+SolutionBayesProblem+h(y)}). The decision rule $d^\star:\mathbb{R}^k \rightarrow\{0,1\}$ given by \begin{equation} d^\star(\myvec{y}) = \left \{ \begin{array}{ll} 1 & \mbox{if $x \notin C^\star $} \\ ~ & ~ \\ 0 & \mbox{if $x \in C^\star $} \\ \end{array} \right . \label{eq:DET_1+SolutionBayesProblem+OptimalTest} \end{equation} is admissible and solves the Problem $\widehat{\mathcal{P}}_B$, hence solves the Bayesian Problem $\mathcal{P}_B$. } \label{thm:DET_1+SolutionBayesProblem+h(y)} \end{theorem} \myproof The set $C^\star$ is a Borel subset of $\mathbb{R}^k$ due to the fact that the functions $f_0,f_1: \mathbb{R}^k \rightarrow \mathbb{R}_+$ are themselves Borel measurable. The test $d^\star$ is therefore an admissible decision rule in $\mathcal{D}$ since $C(d^\star) = C^\star$. We now show that $d^\star$ satisfies \begin{equation} \widehat J( d^\star) \leq \widehat J(d), \quad d \in \mathcal{D}. \end{equation} Indeed, for every test $d$ in $\mathcal{D}$, we see from (\ref{eq:DET_1+BayesianFormulation+AnotherExpression4}) that \[ \widehat J(d) = \Gamma_0 (1-p ) + \int_{C(d)\backslash C^\star} h(\myvec{y}) dF(\myvec{y}) + \int_{C(d) \cap C^\star} h(\myvec{y}) dF(\myvec{y}) \] and \[ \widehat J(d^\star) = \Gamma_0 (1-p ) + \int_{C^\star\backslash C(d)} h(\myvec{y}) dF(\myvec{y}) + \int_{C(d) \cap C^\star} h(\myvec{y}) dF(\myvec{y}). \] Therefore, \[ \widehat J(d) - \widehat J(d^\star) = \int _{C(d) \backslash C^\star} h(\myvec{y}) dF(\myvec{y}) + \int _{C^\star\backslash C(d)} \left ( - h(\myvec{y}) \right )dF(\myvec{y})\geq 0 \] since \[ \int _{C(d)\backslash C^\star} h(\myvec{y})dF(\myvec{y})\geq 0 \quad \mbox{and} \quad \quad \int _{C^\star\backslash C(d) } h(\myvec{y})dF(\myvec{y})\leq 0$$ by the very definition of $C^\star$. The problem $\widehat{\mathcal{P}}_B$ is therefore solved by the test $d^\star$ defined at (\ref{eq:DET_1+SolutionBayesProblem+OptimalTest}). \myendpf The solution to the Bayesian problem is {\em not} unique: It should be plain that $C^\star$ could be replaced by \[ C^{\star\star} \equiv \{ y \in \mathbb{R}^k : h(\myvec{y}) \leq 0 \} \] (with corresponding test $d^{\star\star}$) without affecting the conclusion of optimality since \[ \int_{ \{ \myvec{y} \in \mathbb{R}^k: \ h(\myvec{y}) = 0 \} } h(\myvec{y}) dF(\myvec{y}) = 0. \] While it is true that $J(d^\star) = J(d^{\star\star})$, it is not necessarily the case that either of the equalities $P_F( d^\star ) = P_F( d^{\star\star}) $ or $P_M( d^\star ) = P_M( d^{\star\star}) $ holds. \section{Likelihood ratio tests} \label{sec:DET_1+LRTest} Assume that $0
0, \quad h=0,1, \] i.e., the cost of making an incorrect decision is greater than the cost of making a correct decision. This is of course a most reasonable assumption which always holds in applications. Under this condition, the Bayesian decision rule $d^\star$ given in Theorem \ref{thm:DET_1+SolutionBayesProblem+h(y)} takes the equivalent form \begin{equation} d ^\star (\myvec{y})=0 \quad \mbox{iff} \quad f_1 (\myvec{y})<{{\Gamma _0(1-p)}\over {\Gamma _1p}} f_0 (\myvec{y}). \label{eq:DET_1+LRTest+BayesianTest} \end{equation} This suggests introducing the following class of admissible tests $\{ d_\eta , \eta \geq 0 \}$ where for each $\eta \geq 0$, the mapping $d_\eta : \mathbb{R}^k \rightarrow \{0,1\}$ is defined by \begin{equation} d_\eta (\myvec{y}) =0 \quad \mbox{iff} \quad f_1 (\myvec{y}) < \eta f_0 (\myvec{y}). \label{eq:DET_1+LRTest+d_eta} \end{equation} The Bayesian test $d^\star$ described in Section \ref{sec:DET_1+BayesianFormulation} is such a test $d_\eta $ with \[ \eta \equiv {{\Gamma _0(1-p)}\over {\Gamma _1p}}. \] By convention we interpret $d_0$ (resp. $d_\infty$) as the test that always selects the non-null hypothesis $H=1$ (resp. the null hypothesis $H=0$). Such tests take an even simpler form under the additional Condition {\bf (A.2)} as will be seen shortly. First, we note that (\ref{eq:DET_1+LRTest+d_eta}) can be rewritten as \[ d_\eta (\myvec{y}) = 0 \quad \mbox{if} \quad \frac{f_1(\myvec{y})}{f_0(\myvec{y})} < \eta \quad \mbox{whenever $f_0(\myvec{y})>0$} . \] Taking our cue from this last statement, we define the {\em likelihood ratio} as any Borel mapping $L:\mathbb{R}^k\rightarrow \mathbb{R}$ of the form \begin{equation} L(\myvec{y}) \equiv \left \{ \begin{array}{ll} \frac{f_1(\myvec{y})}{f_0(\myvec{y})} & \mbox{if $f_0(\myvec{y})>0$} \\ & \\ \Lambda (\myvec{y}) & \mbox{otherwise} \\ \end{array} \right . \label{eq:DET_1+LRTest+LR_function} \end{equation} for some arbitrary Borel mapping $\Lambda : \mathbb{R}^k \rightarrow \mathbb{R}_+$. Different choices of this arbitrary non-negative function produce different versions of the likelihood ratio function. Given a version of the likelihood ratio function in (\ref{eq:DET_1+LRTest+LR_function}), we define the {\em likelihood ratio} test with {\em threshold} $\eta \geq 0$ to be the admissible decision rule $Lrt_{\eta}:\mathbb{R}^k\rightarrow\{0,1\}$ given by \begin{equation} Lrt_{\eta}(\myvec{y}) \equiv \left \{ \begin{array}{ll} 1 & \mbox{if $L(\myvec{y}) \geq \eta $} \\ & \\ 0 & \mbox{if $L(\myvec{y}) < \eta $.} \\ \end{array} \right . \label{eq:DET_1+LRTest+LR_test} \end{equation} With \[ B_h = \left \{ \myvec{y} \in \mathbb{R}^k: \ f_h(\myvec{y}) = 0 \right \}, \quad h=0,1, \] we note that \begin{eqnarray} \bP{ f_0 (\myvec{Y}) = 0 | H=h } = \int_{B_0} f_h (\myvec{y}) dF(\myvec{y}), \quad h=0,1. \end{eqnarray} Under {\bf (A.2)}, as the inclusion $B_0 \subseteq B_1$ holds, we conclude that \[ \bP{ f_0(\myvec{Y}) = 0 | H=h } = 0, \quad h=0,1. \] For any value $\eta$ of the threshold it is plain that the tests $d_{\eta}$ and $Lrt_{\eta}$ coincide on the set $\{ \myvec{y} \in \mathbb{R}^k : f_0 (\myvec{y}) >0 \}$ (while possibly disagreeing on the complement $B_0$). Thus, for each $h=0,1$, we find that \begin{eqnarray} \lefteqn{ \bP{ d_{\eta} (\myvec{Y}) = 0 | H=h } } & & \nonumber \\ &=& \bP{ d_{\eta} (\myvec{Y}) = 0 , f_0 (\myvec{Y})> 0 |H=h } + \bP{ d_{\eta} (\myvec{Y}) = 0 , f_0 (\myvec{Y})= 0 |H=h } \nonumber \\ &=& \bP{ Lrt_{\eta} (\myvec{Y}) = 0 , f_0 (\myvec{Y})> 0 |H=h } \nonumber \\ &=& \bP{ Lrt_{\eta} (\myvec{Y}) = 0 , f_0 (\myvec{Y})> 0 |H=h } + \bP{ Lrt_{\eta} (\myvec{Y}) = 0 , f_0 (\myvec{Y}) = 0 |H=h } \nonumber \\ &=& \bP{ Lrt_{\eta} (\myvec{Y}) = 0 |H=h } . \nonumber \end{eqnarray} This discussion leads to the following fact. \begin{lemma} {\sl Assume the absolute continuity conditions {\bf (A.1)}--{\bf (A.2)} to hold. For each $\eta \geq 0$, the tests $d_\eta $ and $Lrt_{\eta}$ are {\em equivalent} in that they have identical performance with \[ P_M(d_\eta) = P_M( Lrt_\eta) \quad \mbox{and} \quad P_F(d_\eta) = P_F( Lrt_\eta). \] \label{lem:DET_1+LRTest+Equivalence} } \end{lemma} It follows from (\ref{eq:DET_1+BayesianFormulation+BasicExpression}) that $J( d_{\eta}) = J( Lrt_{\eta})$ regardless of the cost function $C: \{0,1\} \times \{0,1\} \rightarrow \mathbb{R}$. The same argument also shows that any two versions of the likelihood ratio function will generate likelihood ratio tests which are equivalent. Equipped with Lemma \ref{lem:DET_1+LRTest+Equivalence} we can now restate Theorem \ref{thm:DET_1+SolutionBayesProblem+h(y)}. \begin{theorem} {\sl Assume the absolute continuity conditions {\bf (A.1)}--{\bf (A.2)} to hold. Whenever $\Gamma _h >0$ for $h=0,1$, the Bayesian decision rule $d^\star$ identified in Theorem 2.1 is equivalent to the likelihood ratio test $Lrt_{\eta ^\star}$ where \[ \eta ^\star \equiv {{\Gamma _0(1-p)}\over {\Gamma _1p}} = {{C(0,1) - C(0,0)}\over {C(1,0) - C(1,1)}}\cdot {1-p\over p}. \] } \label{the:DET_1+LRTest+SolutionBayesProblem} \end{theorem} \section{The probability of error criterion} \label{sec:DET_1+ProbError} A special case of great interest is obtained when the cost function $C$ takes the form \[ C(h,d) = \1{ h \neq d }, \quad h,d =0,1. \] The corresponding expected cost then reduces to the probability of making an incorrect decision, namely the {\em probability of error}, and is given by \[ P_E (d) \equiv \bP{d(\myvec{Y})\neq H }, \quad d \in \mathcal{D}. \] We check that \[ \Gamma_h = C(h,1-h) - C(h,h) = 1, \quad h=0,1, \] and the relation (\ref{eq:DET_1+BayesianFormulation+BasicExpression}) becomes \begin{eqnarray} P_E (d) & = & (1-p)\cdot P_F(d) + p\cdot P_M(d) \nonumber \\ & = & p + (1-p)\cdot P_F (d) - p\cdot P_D(d), \quad d \in {\cal D}. \end{eqnarray} For the probability of error criterion, the threshold $\eta^\star$ appearing in Theorem \ref{the:DET_1+LRTest+SolutionBayesProblem} has the simpler form \[ \eta ^\star = {1-p\over p}. \] The optimal decision rule $d^\star$, as described at (\ref{eq:DET_1+LRTest+BayesianTest}), can now be rewritten as \begin{equation} d ^\star (\myvec{y})=0 \quad \mbox{iff} \quad f_1 (\myvec{y})< \frac{1-p}{p}f_0 (\myvec{y}). \label{eq:DET_1+ProbError+BayesianTest} \end{equation} \paragraph{The ML test} In the uniform prior case, i.e., $p=\frac{1}{2}$, the Bayesian test (\ref{eq:DET_1+ProbError+BayesianTest}) becomes \begin{equation} d ^\star (\myvec{y})=0 \quad \mbox{iff} \quad f_1 (\myvec{y})< f_0 (\myvec{y}). \label{eq:DET_1+ProbError+BayesianTest2} \end{equation} In other words, the optimal decision is to select that hypothesis whose likelihood is largest given the observation $\myvec{y}$. We refer to this strategy as the {\em Maximum Likelihood} (ML) test. \paragraph{The MAP computer} Finally, (\ref{eq:DET_1+ProbError+BayesianTest}) can also be rewritten as \begin{equation} d ^\star (\myvec{y})=0 \quad \mbox{iff} \quad \bP{ H=1 | \myvec{Y} = \myvec{y} } < \bP{ H=0 | \myvec{Y} = \myvec{y} } \label{eq:DET_1+ProbError+BayesianTest3} \end{equation} since for each $\myvec{y}$ in $\mathbb{R}^k$, we have \[ \bP{ H=1 | \myvec{Y} = \myvec{y} } = \frac{ p f_1 (\myvec{y}) } { p f_1 (\myvec{y}) + (1-p)f_0 (\myvec{y}) } \] and \[ \bP{ H=0 | \myvec{Y} = \myvec{y} } = \frac{ (1-p)f_0 (\myvec{y}) } { p f_1 (\myvec{y}) + (1-p)f_0 (\myvec{y}) } \] by Bayes' Theorem. For each $h=0,1$, the conditional probability $\bP{ H=h | \myvec{Y} = \myvec{y} }$ is known as the {\em posterior} probability that $H=h$ occurs given the observation $\myvec{y}$. Put differently, the optimal test (\ref{eq:DET_1+ProbError+BayesianTest3}) compares these posterior probabilities given the observation $\myvec{y}$, and selects the hypothesis with the largest posterior probability, hence the terminology {\em Maximum A Posteriori} (MAP) computer. \section{The Gaussian case} \label{sec:DET_1+GaussianCase} Assume that the observation rv $\myvec{Y}$ is conditionally Gaussian given $H$, i.e., \[ \begin{array}{ll} H_1: & \ \myvec{Y} \sim {\rm N}( \myvec{m}_1, \myvec{R}_1) \\ H_0: & \ \myvec{Y} \sim {\rm N}( \myvec{m}_0, \myvec{R}_0) \\ \end{array} \] where $\myvec{m}_1$ and $\myvec{m}_0$ are elements in $\mathbb{R}^k$, and the $k \times k$ {\em symmetric} matrices $\myvec{R}_1$ and $\myvec{R}_0$ are {\em positive definite} (thus {\em invertible}). Throughout the pairs $(\myvec{m}_0, \myvec{R}_0)$ and $(\myvec{m}_1, \myvec{R}_1)$ are distinct so that the probability density functions $f_0,f_1: \mathbb{R}^k \rightarrow \mathbb{R}_+$ are distinct since \[ f_h (\myvec{y}) = \frac{1}{ \sqrt{ (2\pi)^k \det \myvec{R}_h } } e^{ - \frac{1}{2} (\myvec{y} - \myvec{m}_h )^\prime \myvec{R}_h^{-1} (\myvec{y} - \myvec{m}_h ) }, \quad \begin{array}{c} \myvec{y} \in \mathbb{R}^k \\ h=0,1. \\ \end{array} \] Both conditions {\bf (A.1)} and {\bf (A.2)} obviously hold, and for each $\eta >0$, the test $d_\eta$ and $Lrt_\eta$ coincide. \paragraph{The likelihood ratio and the likelihood ratio tests} For this example, the likelihood ratio function is given by \[ L(\myvec{y}) = \sqrt{ \frac{ \det(\myvec{R}_0) }{ \det(\myvec{R}_1) } } \cdot e^{ \frac{1}{2} Q( \myvec{y}) }, \quad \myvec{y} \in \mathbb{R}^k \] where we have used the notation \[ Q(\myvec{y}) = (\myvec{y} -\myvec{m}_0)^{\prime}\myvec{R}_0^{-1}(\myvec{y}-\myvec{m}_0) - (\myvec{y}-\myvec{m}_1)^{\prime}\myvec{R}_1^{-1}(\myvec{y}-\myvec{m}_1). \] Fix $\eta > 0$. By direct substitution, we conclude that \[ Lrt_{\eta } (\myvec{y}) = 0 \quad \mbox{iff} \quad \quad e^{ \frac{1}{2} Q( \myvec{y}) } < \sqrt{ \eta ^2 \cdot {{\det\myvec{R}_1}\over {\det\myvec{R}_0}} }, \] and a simple logarithmic transformation yields \[ Lrt_{\eta } (\myvec{y}) = 0 \quad \mbox{iff} \quad Q(\myvec{y}) < \log \left ( \eta ^2{{\det\myvec{R}_1}\over {\det\myvec{R}_0}} \right ). \] \paragraph{The equal covariance case} If the covariances are identical under both hypotheses, i.e., \[ \myvec{R}_0 = \myvec{R}_1 \equiv \myvec{R}, \] with $\myvec{m}_1 \neq \myvec{m}_0$, then \begin{eqnarray} Q(\myvec{y}) &=& (\myvec{y} -\myvec{m}_0)^{\prime}\myvec{R}^{-1}(\myvec{y}-\myvec{m}_0) - (\myvec{y}-\myvec{m}_1)^{\prime}\myvec{R}^{-1}(\myvec{y}-\myvec{m}_1) \nonumber \\ &=& 2 \myvec{y}^{\prime}\myvec{R}^{-1}(\myvec{m}_1-\myvec{m}_0) - \left ( \myvec{m}_1^{\prime}\myvec{R}^{-1}\myvec{m}_1 - \myvec{m}_0^{\prime}\myvec{R}^{-1}\myvec{m}_0 \right ). \end{eqnarray} The form of $Lrt_\eta$ simplifies even further to read \[ Lrt_{\eta } (\myvec{y}) = 0 \quad \mbox{iff} \quad \myvec{y}^{\prime}\myvec{R}^{-1}\Delta \myvec{m} < \tau (\eta ) \] where we have set \begin{eqnarray} \Delta \myvec{m} \equiv \myvec{m}_1 - \myvec{m}_0 \label{eq:DET_1+Examples+Gaussian1} \end{eqnarray} and \begin{eqnarray} \tau (\eta ) \equiv {1 \over 2} \left ( \myvec{m}_1^{\prime}\myvec{R}^{-1}\myvec{m}_1 - \myvec{m}_0^{\prime}\myvec{R}^{-1}\myvec{m}_0 \right ) + \log \eta . \label{eq:DET_1+Examples+Gaussian2} \end{eqnarray} \paragraph{Evaluating probabilities} We will now evaluate the probabilities of false alarm and miss under $Lrt_\eta$. It is plain that \begin{eqnarray} P_F(Lrt_{\eta}) & = & \bP{ Lrt_{\eta}(\myvec{Y}) = 1 | H=0 } \nonumber \\ & = & \bP{ L(\myvec{Y}) \geq \eta | H=0 } \nonumber \\ & = & \bP{ \myvec{Y}^{\prime}\myvec{R}^{-1}\Delta \myvec{m} \geq \tau(\eta )\mid H=0 } \end{eqnarray} and \begin{eqnarray} P_M(Lrt_{\eta}) & = & \bP{ Lrt_{\eta}(\myvec{Y}) = 0 | H=1 } \nonumber \\ & = & \bP{ L(\myvec{Y}) < \eta | H=1 } \nonumber \\ & = & \bP{ \myvec{Y}^{\prime}\myvec{R}^{-1}\Delta \myvec{m} < \tau (\eta )\mid H=1 } \nonumber \\ & = & 1 - \bP{ \myvec{Y}^{\prime}\myvec{R}^{-1}\Delta \myvec{m} \geq \tau (\eta )\mid H=1 }. \end{eqnarray} To carry out the calculations further, recall that for each $h=0,1$, given $H=h$, the rv $\myvec{Y}$ is conditionally Gaussian with mean vector $\myvec{m}_h$ and covariance matrix $\myvec{R}$. Therefore, the scalar rv $\myvec{Y}^{\prime}\myvec{R}^{-1}\Delta \myvec{m}$ is also conditionally Gaussian with mean and variance given by \[ \bE{ \myvec{Y}^{\prime}\myvec{R}^{-1}\Delta \myvec{m} | H=h} = \myvec{m}_h^{\prime}\myvec{R}^{-1}\Delta \myvec{m} \] and \begin{eqnarray} {\rm Var} \left [ \myvec{Y}^{\prime}\myvec{R}^{-1}\Delta \myvec{m} | H=h \right ] & = & \left ( \myvec{R}^{-1}\Delta \myvec{m} \right )^\prime {\rm Cov} \left [ \myvec{Y} | H=h \right ] \left ( \myvec{R}^{-1}\Delta \myvec{m} \right ) \nonumber \\ & = & \left ( \myvec{R}^{-1}\Delta \myvec{m} \right )^\prime \myvec{R} \left ( \myvec{R}^{-1}\Delta \myvec{m} \right ) \nonumber \\ & = & \Delta \myvec{m}^\prime \myvec{R}^{-1} \Delta \myvec{m}, \end{eqnarray} respectively. In obtaining this last relation we have used the fact that \[ \myvec{Y}^{\prime}\myvec{R}^{-1}\Delta \myvec{m} = (\myvec{R}^{-1} \Delta \myvec{m})^\prime \myvec{Y}. \] Consequently, for all $h=0,1$, \begin{eqnarray} \lefteqn{ \bP{ \myvec{Y}^{\prime}R^{-1}\Delta \myvec{m} \geq \tau (\eta )| H=h } } & & \nonumber \\ &=& \bP{ \myvec{m}_h^{\prime}\myvec{R}^{-1}\Delta \myvec{m} + \sqrt{ \Delta \myvec{m}^\prime \myvec{R}^{-1} \Delta \myvec{m} } \cdot Z \geq \tau (\eta) } \nonumber \\ &=& \bP{ Z \geq \frac{ \tau (\eta) - \myvec{m}_h^{\prime}\myvec{R}^{-1}\Delta \myvec{m} } { \sqrt{ \Delta \myvec{m}^\prime \myvec{R}^{-1} \Delta \myvec{m} } } } \end{eqnarray} where $Z \sim {\rm N}(0,1)$. For the sake of convenience, pose \begin{equation} d^2 \equiv \Delta \myvec{m}^\prime \myvec{R}^{-1} \Delta \myvec{m}, \label{eq:DET_1+Examples+Gaussian3} \end{equation} and note that \[ \tau (\eta ) - \myvec{m}_h^{\prime}\myvec{R}^{-1}\Delta \myvec{m} = \left \{ \begin{array}{ll} \log\eta - {1\over 2} d^2 & \mbox{if $h=1$} \\ & \\ \log\eta + {1\over 2} d^2 & \mbox{if $h=0$.} \\ \end{array} \right . \] It is now clear that \[ P_F(Lrt_{\eta}) = 1 - \Phi \left ( { { \log \eta + {1\over 2}d^2 } \over {d} } \right ) \] and \[ P_M(Lrt_{\eta}) = \Phi \left ( { { \log \eta - {1\over 2}d^2 } \over {d} } \right ). \] We finally obtain \[ P_D(Lrt_{\eta}) = 1 - \Phi \left ( { { \log \eta - {1\over 2}d^2 } \over {d} } \right ). \] \paragraph{The ML test} The ML test corresponds to $\eta = 1$, in which case these expressions become \[ P_F(d_{\rm ML}) = 1 - \Phi \left ( {d\over 2} \right ) = Q \left ( \frac{d}{2} \right ) \] and \[ P_M(d_{\rm ML}) = \Phi \left ( - {d\over 2} \right ) = Q \left ( \frac{d}{2} \right ). \] Therefore, \[ P_E(d_{\rm ML}) = (1-p) P_F(d_{\rm ML}) + p P_M(d_{\rm ML}) = Q \left ( \frac{d}{2} \right ) \] {\em regardless} of the prior $p$. \section{The Bernoulli case} \label{sec:DET_1+BernoulliCase} Consider now the binary hypothesis testing problem \[ \begin{array}{ll} H_1: & \ Y \sim {\rm Ber}(a_1) \\ H_0: & \ Y \sim {\rm Ber}(a_0) \\ \end{array} \] with $a_1< a_0$ in $(0,1)$. The case $a_0 < a_1$ is left as an exercise. Thus, \[ \bP{ Y = 1 | H = h} = a_h = 1 - \bP{ Y = 0 | H = h}, \quad h=0,1 \] and Conditions {\bf (A.1)} and {\bf (A.2)} obviously hold with respect to counting measure $F$ on $\{0,1\}$. The likelihood rate function is given by \[ L(y) \left ( \frac{1-a_1}{1-a_0} \right )^{1-y} \left ( \frac{a_1}{a_0} \right )^y, \quad y \in \mathbb{R}. \] For each $\eta > 0$, the test $d_\eta$ takes the following form \begin{eqnarray} d_\eta (y) = 0 \quad &\mbox{iff}& \quad \left ( \frac{ 1-a_1}{1-a_0} \right )^{1-y} \cdot \left ( \frac{a_1}{a_0} \right )^y < \eta \nonumber \\ \quad &\mbox{iff}& \quad \left ( \frac{ 1-a_0}{1-a_1} \cdot \frac{a_1}{a_0} \right )^y < \eta \frac{1-a_0}{1-a_1} \end{eqnarray} with $y=0,1$. Therefore, \begin{eqnarray} P_F(d_\eta) &=& \bP{ d_\eta(Y) = 1 | H=0 } \nonumber \\ &=& \bP{ \left ( \frac{ 1-a_0}{1-a_1} \cdot \frac{a_1}{a_0} \right )^Y \geq \eta \frac{1-a_0}{1-a_1} \Big | H = 0 } \nonumber \\ &=& \bP{ Y=1, \ \left ( \frac{ 1-a_0}{1-a_1} \cdot \frac{a_1}{a_0} \right )^Y \geq \eta \frac{1-a_0}{1-a_1} \Big | H = 0 } \nonumber \\ & & ~+ \bP{ Y=0, \ \left ( \frac{ 1-a_0}{1-a_1} \cdot \frac{a_1}{a_0} \right )^Y \geq \eta \frac{1-a_0}{1-a_1} \Big | H = 0 } \nonumber \\ &=& a_0 \1{ \eta \frac{1-a_0}{1-a_1} \leq \frac{ 1-a_0}{1-a_1} \cdot \frac{a_1}{a_0} } + (1-a_0) \1{ \eta \frac{1-a_0}{1-a_1} \leq 1 } \nonumber \\ &=& a_0 \1{ \eta \leq \frac{a_1}{a_0} } + (1-a_0) \1{ \eta \frac{1-a_0}{1-a_1} \leq 1 } \nonumber \\ &=& a_0 \1{ \eta \leq \frac{a_1}{a_0} } + (1-a_0) \1{ \eta \leq \frac{1-a_1}{1-a_0} } \label{eq:DET_1+BernoulliCase+P_F} \end{eqnarray} Similarly, we get \begin{eqnarray} P_M(d_\eta) &=& \bP{ d_\eta(Y) = 0 | H=1 } \nonumber \\ &=& \bP{ \left ( \frac{ 1-a_0}{1-a_1} \cdot \frac{a_1}{a_0} \right )^Y < \eta \frac{1-a_0}{1-a_1} \Big | H = 1 } \nonumber \\ &=& \bP{ Y=1, \ \left ( \frac{ 1-a_0}{1-a_1} \cdot \frac{a_1}{a_0} \right )^Y < \eta \frac{1-a_0}{1-a_1} \Big | H = 1 } \nonumber \\ & & ~+ \bP{ Y=0, \ \left ( \frac{ 1-a_0}{1-a_1} \cdot \frac{a_1}{a_0} \right )^Y < \eta \frac{1-a_0}{1-a_1} \Big | H = 1 } \nonumber \\ &=& a_1 \1{ \frac{ 1-a_0}{1-a_1} \cdot \frac{a_1}{a_0} < \eta \frac{1-a_0}{1-a_1} } + (1-a_1) \1{ 1 < \eta \frac{1-a_0}{1-a_1} } \nonumber \\ &=& a_1 \1{ \frac{a_1}{a_0} < \eta } + (1-a_1) \1{ 1 < \eta \frac{1-a_0}{1-a_1} } \nonumber \\ &=& a_1 \1{ \frac{a_1}{a_0} < \eta } + (1-a_1) \1{ \frac{1-a_1}{1-a_0} < \eta }. \label{eq:DET_1+BernoulliCase+P_M} \end{eqnarray} \section{Additional examples} \label{sec:DET_1+Examples} We now present several examples where Conditions {\bf (A.1)} or {\bf (A.2)} fail. In all cases we assume $\Gamma_0 > 0$ and $\Gamma_1 > 0$. \paragraph{An example where absolute continuity {\bf (A.2)} fails} Here, the observation is the scalar rv $Y$ with \[ f_0(y) = \left \{ \begin{array}{ll} 1 - \mid y \mid & \mbox{if $|y| \leq 1$} \\ & \\ 0 & \mbox{otherwise} \\ \end{array} \right . \quad \mbox{and} \quad f_1(y) = \left \{ \begin{array}{ll} \frac{1}{3} & \mbox{if $-1 \leq y \leq 2$} \\ & \\ 0 & \mbox{otherwise.} \\ \end{array} \right . \] Condition {\bf (A.1)} holds (with Lebesgue measure) but the absolute continuity condition {\bf (A.2)} is clearly not satisfied. However, simple substitution reveals that \begin{eqnarray} h(y) &=& \Gamma _1 p \cdot f_1(y) - \Gamma _0(1-p) \cdot f_0(y) \nonumber \\ &=& \left \{ \begin{array}{ll} 0 & \mbox{if $y < -1$} \\ & \\ \frac{1}{3} \Gamma _1 p - \Gamma _0(1-p) (1-|y|) & \mbox{if $|y| \leq 1 $} \\ & \\ \frac{1}{3} \Gamma _1 p & \mbox{if $1 < y \leq 2 $} \\ & \\ 0 & \mbox{if $ 2 < y$.} \\ \end{array} \right . \end{eqnarray} The Bayesian test $d^\star$ is simply \[ d^\star (y)=0 \quad \mbox{iff} \quad |y| < 1 - \frac{ \frac{1}{3} \Gamma_1 p } { \Gamma_0 (1-p) }. \] \paragraph{Another example where absolute continuity {\bf (A.2)} fails} Here, the observation is again the scalar rv $Y$ with \[ f_0(y) = \left \{ \begin{array}{ll} 1 - \mid y \mid & \mbox{if $|y| \leq 1$} \\ & \\ 0 & \mbox{otherwise} \\ \end{array} \right . \quad \mbox{and} \quad f_1(y) = \left \{ \begin{array}{ll} \frac{1}{3} & \mbox{if $0 \leq y \leq 3$} \\ & \\ 0 & \mbox{otherwise.} \\ \end{array} \right . \] Condition {\bf (A.1)} holds (with Lebesgue measure) but {\bf (A.2)} fails. Simple substitution reveals that \begin{eqnarray} h(y) &=& \Gamma _1 p \cdot f_1(y) - \Gamma _0(1-p) \cdot f_0(y) \nonumber \\ &=& \left \{ \begin{array}{ll} 0 & \mbox{if $y < -1$} \\ & \\ - \Gamma _0(1-p) (1+y) & \mbox{if $-1 \leq y \leq 0 $} \\ & \\ \frac{1}{3} \Gamma _1 p - \Gamma _0(1-p) (1-y) & \mbox{if $0 < y \leq 1$} \\ & \\ \frac{1}{3} \Gamma _1 p & \mbox{if $1 < y \leq 3$} \\ & \\ 0 & \mbox{if $ 3 < y$,} \\ \end{array} \right . \end{eqnarray} and it is straightforward to check that the Bayesian test $d^\star$ is simply \[ d^\star (y)=0 \quad \mbox{iff} \quad \begin{array}{c} -1 < y \leq 0 \\ \mbox{or} \\ 0 < y \leq 1, \ y < 1 - \frac{ \frac{1}{3} \Gamma_1 p } { \Gamma_0 (1-p) } .\\ \end{array} \] Equivalently, $d^\star$ can be described as \[ d^\star (y)=0 \quad \mbox{iff} \quad y \in \left ( -1 , \left ( 1- \frac{ \Gamma_1 p }{ 3 \Gamma_0 (1-p) } \right )^+ \right ). \] \paragraph{An example where both {\bf (A.1)} and {\bf (A.2)} fail} Consider the binary hypothesis testing problem \[ \begin{array}{ll} H_1: & \ \myvec{Y} \sim F_1 \\ H_0: & \ \myvec{Y} \sim F_0 . \\ \end{array} \] where $F_0$ is a discrete distribution uniform on $\{0,1\}$, and $F_1$ is uniform on the interval $(0,1)$. Thus, $F_1$ admits a probability density function $f_1: \mathbb{R} \rightarrow \mathbb{R}_+$ with respect to Lebesgue measure given by \[ f_1(y) = \left \{ \begin{array}{ll} 1 & \mbox{if $y \in (0,1)$} \\ & \\ 0 & \mbox{otherwise} \\ \end{array} \right . \] and \[ \bP{ Y = 0 | H=0 } = \bP{ H = 1 | H= 0 } = \frac{1}{2}. \] For each test$d$ in $\mathcal{D}$ we recall that we have \begin{eqnarray} & & \widehat J(d) \nonumber \\ &=& \Gamma_0 (1-p) + \Gamma_1 p \cdot \bP{ d(Y) = 0 | H=1 } - \Gamma_0(1- p) \cdot \bP{ d(Y) = 0 | H=0 } \nonumber \end{eqnarray} with \[ \bP{ d(Y) = 0 | H=0 } = \left \{ \begin{array}{ll} \frac{1}{2} & \mbox{if $0 \in C(d), 1 \notin C(d)$} \\ \frac{1}{2} & \mbox{if $1 \in C(d), 0 \notin C(d)$} \\ 1 & \mbox{if $0 \in C(d), 1 \in C(d)$} \\ 0 & \mbox{if $0 \notin C(d), 1 \notin C(d)$} \\ \end{array} \right . \] and \[ \bP{ d(Y) = 0 | H=1 } = \int_{C(d)} f_1(y) dy = |C(d) \cap [0,1]| . \] Adding or deleting a finite number of points from $C(d)$ will {\em not} affect the value of $\bP{ d(Y) = 0 | H=1 }$, but it will change the value of $\bP{ d(Y) = 0 | H= 0 }$. Therefore, with $C(d)$ given, modify it, if needed, by adding both points $0$ and $1$. If $C^\prime$ denotes this Borel subset of $\mathbb{R}$, then $C^\prime = C(d) \cup \{ 0,1\}$; if $d^\prime$ denotesthe corresponding test, then $C(d^\prime) = C^\prime$. Obviously \[ \bP{ d(Y) = 0 | H=1 } = \bP{ d^\prime(Y) = 0 | H=1 } = |C(d^\prime) \cap [0,1]| \] since $|C(d^\prime) \cap [0,1]| = |C(d) \cap [0,1]|$, while \[ \bP{ d(Y) = 0 | H=0 } \leq \bP{ d^\prime (Y) = 0 | H=0 } = 1. \] We conclude that \begin{eqnarray} & & \widehat J(d) \nonumber \\ &=& \Gamma_0 (1-p) + \Gamma_1 p \cdot \bP{ d(Y) = 0 | H=1 } - \Gamma_0(1- p) \cdot \bP{ d(Y) = 0 | H=0 } \nonumber \\ &\geq& \Gamma_0 (1-p) + \Gamma_1 p \cdot \bP{ d^\prime(Y) = 0 | H=1 } - \Gamma_0(1- p) \cdot \bP{ d^\prime(Y) = 0 | H=0 } \nonumber \\ &=& \Gamma_0 (1-p) + \Gamma_1 p \cdot |C(d^\prime) \cap [0,1]| - \Gamma_0(1- p) \nonumber \\ &=& \Gamma_1 p \cdot |C(d^\prime) \cap [0,1]| \geq 0. \end{eqnarray} Consider the test $d^\star : \mathbb{R} \rightarrow \{0,1\}$ given by \[ d^\star (y) = 0 \quad \mbox{iff} \quad y \in \{0,1\}. \] It has cost $\widehat J(d^\star) = 0$, whence it is the Bayesian decision rule. \section{Randomized tests} \label{sec:DET_1+RandomizedTests} As we shall see shortly, a solution cannot always be found to the minimax and Neyman--Pearson formulations of the hypothesis testing problem if the search is restricted to the class of decision rules ${\cal D}$ as was done for the Bayesian set--up. In some very real sense this class of tests ${\cal D}$ is not large enough to yield a solution, and we enlarge it by considering the class of {\em randomized} tests or decision rules. A randomized test $\delta$ is a Borel mapping $\delta : \mathbb{R}^k \rightarrow [0,1]$ with the following interpretation as conditional probability: Having observed $\myvec{Y}=\myvec{y}$, it is decided that the state of nature is $1$ (resp. $0$) with probability $ \delta (\myvec{y})$ (resp. $ 1- \delta (\myvec{y})$). The collection of all randomized tests will be denoted by $\mathcal{D}^\star$. Obviously, any test $d$ in ${\cal D}$ can be mechanized as a randomized test, say $\delta _d: \mathbb{R}^k \rightarrow [0,1]$, given by \[ \delta _d (\myvec{y}) \equiv d(\myvec{y}), \quad \myvec{y} \in \mathbb{R}^k . \] A test in $\mathcal{D}$ is often referred to as a {\em pure} strategy. \paragraph{Probabilistic construction} A natural question then arises as to how such randomization mechanisms can be incorporated into the probabilistic framework introduced earlier in Section \ref{sec:DET_1+Model}: The model data is unchanged as we are given two probability distributions $F_0$ and $F_1$ on $\mathbb{R}^k$ and a prior $p$ in $[0,1]$. We still consider a sample space $\Omega$ equipped with a $\sigma$-field of events $\cal F$, and on it we now define the three rvs $H$, $\myvec{Y}$ and $D$ which take values in $\{0,1\}$, $\mathbb{R}^k$ and $\{0,1\}$, respectively. The rvs $H$ and $\myvec{Y}$ have the same interpretation as before, as state of nature and observation, respectively, while the rv $D$ now encodes the decision to be taken on the basis of the observation $\myvec{Y}$. With each decision rule $\delta$ in $\mathcal{D}^\star$ we associate a probability measure $\mathbb{P}_\delta$ on $\cal F$ such that the following constraints are satisfied: As before, this time under $\mathbb{P}_\delta$, we still have \[ \mathbb{P}_\delta \left [ \myvec{Y} \leq \myvec{y} | H=h \right ] = F_h (\myvec{y}), \quad \begin{array}{c} \myvec{y} \in \mathbb{R}^k ,\\ h=0,1 \\ \end{array} \] and \[ p = \mathbb{P}_\delta \left [ H=1\right ] = 1- \mathbb{P}_\delta \left [ H=0 \right ] . \] Moreover, for $h=0,1$ and $\myvec{y}$ in $\mathbb{R}^k$, we require that \begin{eqnarray} \mathbb{P}_\delta \left [ D = d | H=h, \myvec{Y} = \myvec{y} \right ] &=& \left \{ \begin{array}{ll} 1 - \delta (\myvec{y}) & \mbox{if $d=0$} \\ & \\ \delta (\myvec{y}) & \mbox{if $d=1$.} \\ \end{array} \right . \end{eqnarray} The joint probability distribution of the rvs $H$, $D$ and $\myvec{Y}$ (under $\mathbb{P}_\delta$) can now be completely specified: With $h,d=0,1$ and a Borel subset $B$ of $\mathbb{R}^k$, a preconditioning argument gives \begin{eqnarray} \lefteqn{ \mathbb{P}_\delta \left [ H=h, D=d, \myvec{Y} \in B \right ] } & & \nonumber \\ &=& \mathbb{E}_\delta \left [ \1{ H=h, \myvec{Y} \in B } \mathbb{P}_\delta \left [ D = d | H, \myvec{Y} \right ] \right ] \nonumber \\ &=& \mathbb{E}_\delta \left [ \1{ H=h, \myvec{Y} \in B } \left ( d \delta (\myvec{Y}) + (1-d) \left ( 1 - \delta (\myvec{Y}) \right ) \right ) \right ] \nonumber \\ &=& \mathbb{P}_\delta \left [ H=h \right ] \cdot \int_{B} \left ( d \delta (\myvec{y}) + (1-d) \left ( 1 - \delta (\myvec{y}) \right ) \right ) dF_h (\myvec{y}) \nonumber \\ &=& \left \{ \begin{array}{ll} \mathbb{P}_\delta \left [ H=h \right ] \cdot \int_{B} (1- \delta (\myvec{y}) ) dF_h (\myvec{y}) & \mbox{if $d=0$}\\ & \\ \mathbb{P}_\delta \left [ H=h \right ] \cdot \int_{B} \delta (\myvec{y}) dF_h (\myvec{y}) & \mbox{if $d=1$.}\\ \end{array} \right . \end{eqnarray} \paragraph{An alternate framework} The class $\mathcal{D}^\star$ of randomized strategies gives rise to a {\em collection} of probability triples, namely \[ \left \{ \left ( \Omega , \mathcal{F}, \mathbb{P}_\delta \right ) , \ \delta \in \mathcal{D}^\star \right \}. \] It is however possible to provide an equivalent probabilistic framework using a {\em single} probability triple $(\Omega , \mathcal{F}, \mathbb{P} )$. To see how this can be done, imagine that the original probability triple $(\Omega , \mathcal{F}, \mathbb{P} )$ is sufficiently rich that there exists a rv $U: \Omega \rightarrow [0,1]$ which is uniformly distributed and independent of the pair of rvs $H$ and $\myvec{Y}$, This amounts to \[ \bP{ U \leq t, H=h, \myvec{Y} \leq \myvec{y} } = \bP{ U \leq t } \bP{ H=h, \myvec{Y} \leq \myvec{y} }, \quad \begin{array}{c} t \in \mathbb{R} \\ h=0,1, \\ \myvec{y} \in \mathbb{R}^k \end{array} \] with \[ \bP{ U \leq t } = \left \{ \begin{array}{ll} 0 & \mbox{if $t\leq 0$} \\ & \\ \min (t,1) & \mbox{if $t \geq 0$.} \\ \end{array} \right . \] Now, for each decision rule $\delta$ in $\mathcal{D}^\star$, define the $\{0,1\}$-valued rv $D_\delta$ given by \[ D_\delta = \1{ U \leq \delta (\myvec{Y}) }. \] Note that \begin{eqnarray} \bP{ D_\delta = 1 | H=h, \myvec{Y} = \myvec{y} } &=& \bE{ \1{ U \leq \delta (\myvec{Y}) } | H=h, \myvec{Y} = \myvec{y} } \nonumber \\ &=& \bE{ \1{ U \leq \delta (\myvec{y}) } | H=h, \myvec{Y} = \myvec{y} } \nonumber \\ &=& \bP{ U \leq \delta (\myvec{y})} \nonumber \\ &=& \delta ( \myvec{y} ) \end{eqnarray} while \begin{eqnarray} \bP{ D_\delta = 0 | H=h, \myvec{Y} = \myvec{y} } &=& \bE{ \1{ U > \delta (\myvec{Y}) } | H=h, \myvec{Y} = \myvec{y} } \nonumber \\ &=& \bE{ \1{ U > \delta (\myvec{y}) } | H=h, \myvec{Y} = \myvec{y} } \nonumber \\ &=& \bP{ U > \delta (\myvec{y}) } \nonumber \\ &=& 1- \delta ( \myvec{y} ) \end{eqnarray} under the enforced independence assumption. Therefore, the conditional distribution of $D_\delta$ (under $\mathbb{P}$) given $H$ and $\myvec{Y}$ coincides with the conditional distribution of $D$ (under $\mathbb{P}_\delta$) given $H$ and $\myvec{Y}$, and the two formalisms are probabilistically equivalent. \paragraph{Evaluating error probabilities} Consider a randomized test $\delta$ in $\mathcal{D}^\star$. In analogy with (\ref{eq:DET_1+AdmissibleRules+P_F}) and (\ref{eq:DET_1+AdmissibleRules+P_M}), we evaluate the probabilities of false alarm and miss under $\delta$ as \begin{equation} P_F (\delta) \equiv \mathbb{P}_\delta \left [ D = 1 | H=0 \right ] \label{eq:DET_1+RandomizedTests+P_F} \end{equation} and \begin{equation} P_M(\delta) \equiv \mathbb{P}_\delta \left [ D = 0 | H=1 \right ] . \label{eq:DET_1+RandomizedTests+P_M} \end{equation} It is also convenient to consider the so--called probability of {\em detection} given by \begin{equation} P_D(\delta) \equiv \mathbb{P}_\delta \left [ D = 1 | H=1 \right ] = 1 - P_M(\delta) . \label{eq:DET_1+RandomizedTests+P_D} \end{equation} Because \[ \mathbb{P}_\delta \left [ D = h | H \right ] = \mathbb{E}_\delta \left [ \mathbb{P}_\delta \left [ D = h | H, \myvec{Y} \right ] | H \right ], \quad h=0,1 \] we readily get that \begin{equation} P_F (\delta) = \int_{ \mathbb{R}^k } \delta (\myvec{y}) dF_0 ( \myvec{y}) \label{eq:DET_1+RandomizedTests+P_F+B} \end{equation} and \begin{equation} P_M (\delta) = \int_{ \mathbb{R}^k } \left ( 1 - \delta (\myvec{y}) \right ) dF_1 ( \myvec{y}). \label{eq:DET_1+RandomizedTests+P_M+B} \end{equation} \label{sec:DET_1+RandomizedTests} \section{The Bayesian problem revisited} \label{sec:DET_1+BayesianProblemUnderRandomizedTests} Assuming the cost function $C: \{0,1\} \times \{0,1\} \rightarrow \mathbb{R}$ introduced in Section \ref{sec:DET_1+BayesianFormulation}, we define the expected cost function $J^\star: \mathcal{D}^\star \rightarrow \mathbb{R}$ given by \[ J^\star(\delta) = \mathbb{E}_\delta \left [ C(H,D) \right ], \quad \delta \in \mathcal{D}^\star. \] When considering randomized decision rules, the original Bayesian Problem $\mathcal{P}_B$ is now reformulated as the minimization problem \[ \mathcal{P}^\star_B: \quad \mbox{Minimize $J^\star(\delta)$ over $\delta$ in $\mathcal{D}^\star$.} \] This amounts to finding an admissible test $\delta^\star$ in $\mathcal{D}^\star$ such that \begin{equation} J^\star (\delta^\star) \leq J^\star(\delta), \quad \delta \in \mathcal{D}^\star. \label{eq:DET_1+RandomizedTests+Bayesian1} \end{equation} Any admissible test $\delta^\star$ which satisfies (\ref{eq:DET_1+RandomizedTests+Bayesian1}) is called a randomized Bayesian test, and the value \begin{equation} J^\star(\delta^\star) = \inf_{\delta \in \mathcal{D}^\star} J^\star(\delta) ) = \min_{\delta \in \mathcal{D}^\star} J^\star (d) \label{eq:DET_1+RandomizedTests+Bayesian2} \end{equation} is sometimes referred to as the randomized {\em Bayesian} cost. Obviously, since $\mathcal{D} \subset \mathcal{D}^\star$ with \[ J^\star(\delta_d) = J(d), \quad d \in \mathcal{D} , \] it is plain that \[ \inf_{\delta \in \mathcal{D}^\star} J^\star(\delta) \leq \inf_{d \in \mathcal{D}} J(d) . \] While in principle this last inequality could be strict, we now show that it is not so and that the Bayesian problem is not affected by considering the larger set of randomized decision rules. \begin{theorem} {\sl Under the absolute continuity condition {\bf (A.1)}, it holds that \begin{equation} \inf_{\delta \in \mathcal{D}^\star} J^\star(\delta) = \inf_{d \in \mathcal{D}} J(d) . \label{eq:DET_1+RandomizedTests+BayesianEquality} \end{equation} } \label{thm:DET_1+RandomizedTests+BayesianEquality} \end{theorem} \myproof Pick an arbitrary test $\delta $ in $\mathcal{D}^\star$, A simple preconditioning argument shows that \begin{eqnarray} J^\star(\delta ) &=& \mathbb{E}_\delta \left [ C(H,D) \right ] \nonumber \\ &=& \mathbb{E}_\delta \left [ \mathbb{E}_\delta \left [ C(H,D) | H, \myvec{Y} \right ] \right ] \nonumber \\ &=& \mathbb{E}_\delta \left [ C(H,1) \mathbb{P}_\delta \left [ D = 1 | H, \myvec{Y} \right ] + C(H,0) \mathbb{P}_\delta \left [ D = 0 | H, \myvec{Y} \right ] \right ] \nonumber \\ &=& \mathbb{E}_\delta \left [ C(H,1) \cdot \delta (\myvec{Y}) + C(H,0)\cdot \left ( 1 - \delta (\myvec{Y}) \right ) \right ] \nonumber \\ &=& \mathbb{E}_\delta \left [ C(H,0) \right ] + \mathbb{E}_\delta \left [ \left ( C(H,1) - C(H,0) \right ) \cdot \delta (\myvec{Y}) \right ] \end{eqnarray} with \begin{eqnarray} \lefteqn{ \mathbb{E}_\delta \left [ \left ( C(H,1) - C(H,0) \right ) \cdot \delta (\myvec{Y}) \right ] } & & \nonumber \\ &=& \mathbb{E}_\delta \left [ \left ( C(H,1) - C(H,0) \right ) \cdot \mathbb{E}_\delta \left [ \delta (\myvec{Y}) | H \right ] \right ] \nonumber \\ &=& \left ( C(1,1) - C(1,0) \right ) \mathbb{E}_\delta \left [ \delta (\myvec{Y}) | H=1 \right ] \mathbb{P}_\delta \left [ H=1 \right ] \nonumber \\ & & + \left ( C(0,1) - C(0,0) \right ) \mathbb{E}_\delta \left [ \delta (\myvec{Y}) | H=0 \right ] \mathbb{P}_\delta \left [ H=0 \right ] \nonumber \\ &=& - \Gamma_1 p \cdot \mathbb{E}_\delta \left [ \delta (\myvec{Y}) | H=1 \right ] + \Gamma_1 (1-p) \cdot \mathbb{E}_\delta \left [ \delta (\myvec{Y}) | H=0 \right ] . \end{eqnarray} Using the absolute continuity condition {\bf (A.1)} we can now write \[ \mathbb{E}_\delta \left [ \delta (\myvec{Y}) | H=h \right ] = \int_{ \mathbb{R}^k } \delta(\myvec{y}) dF_h (\myvec{y}) = \int_{ \mathbb{R}^k } \delta(\myvec{y}) f_h (\myvec{y}) dF (\myvec{y}), \quad h=0,1 \] so that \begin{eqnarray} J^\star(\delta ) &=& - \Gamma_1 p \cdot \int_{ \mathbb{R}^k } \delta(\myvec{y}) f_1 (\myvec{y}) dF (\myvec{y}) + \Gamma_0 (1-p) \cdot \int_{ \mathbb{R}^k } \delta(\myvec{y}) f_0 (\myvec{y}) dF (\myvec{y}) \nonumber \\ &=& \int_{ \mathbb{R}^k } \left ( - \Gamma_1 p f_1(\myvec{y}) + \Gamma_0 (1- p ) f_0(\myvec{y}) \right ) \delta (\myvec{y}) dF (\myvec{y}) \nonumber \\ &=& - \int_{ \mathbb{R}^k } h(\myvec{y}) \delta (\myvec{y}) dF (\myvec{y}) \end{eqnarray} where the mapping $h:\mathbb{R}^k \rightarrow \mathbb{R}$ is given by (\ref{eq:DET_1+SolutionBayesProblem+h(y)}) From Theorem \ref{thm:DET_1+SolutionBayesProblem+h(y)} recall that the Bayesian rule which solves Problem $\mathcal{P}^\star_B$ is the test $d^\star : \mathbb{R}^k \rightarrow \{0,1\}$ in $\mathcal{D}$ given by (\ref{eq:DET_1+SolutionBayesProblem+OptimalTest}). Note that $d^\star$ can also be interpreted as the randomized rule $\delta^\star: \mathbb{R}^k \rightarrow [0,1]$ given by \[ \delta^\star(\myvec{y}) = \left \{ \begin{array}{ll} 0 & \mbox{if $ h(\myvec{y}) < 0 $ }\\ & \\ 1 & \mbox{if $ h(\myvec{y}) \geq 0 $. }\\ \end{array} \right . \] Equivalently, this can be written as \[ \delta^\star(\myvec{y}) = \left \{ \begin{array}{ll} 0 & \mbox{if $\myvec{y} \in C^\star$ }\\ & \\ 1 & \mbox{if $ \myvec{y} \not\in C^\star $. }\\ \end{array} \right . \] The desired result will be established if we show that \[ J^\star (d^\star) \leq J^\star(\delta), \quad \delta \in \mathcal{D}^\star. \] The approach is reminiscent of the one used in the proof of Theorem \ref{thm:DET_1+SolutionBayesProblem+h(y)}: For an arbitrary $\delta$ in $\mathcal{D}^\star$, earlier calculations show that \begin{eqnarray} J^\star(\delta) - J^\star(d^\star) &=& - \int_{ \mathbb{R}^k } h(\myvec{y}) \delta (\myvec{y}) dF (\myvec{y}) + \int_{ \mathbb{R}^k } h(\myvec{y}) \delta^\star (\myvec{y}) dF (\myvec{y}) \nonumber \\ &=& \int_{ \mathbb{R}^k } h(\myvec{y}) \left ( \delta^\star (\myvec{y}) - \delta (\myvec{y}) \right ) dF (\myvec{y}) \nonumber \\ &=& \int_{ C^\star } ( - h(\myvec{y}) ) \delta (\myvec{y}) dF (\myvec{y}) + \int_{ \mathbb{R}^k \backslash C^\star } \left ( 1 - \delta (\myvec{y}) \right ) h(\myvec{y}) dF (\myvec{y}) \nonumber \\ &\geq& 0 \end{eqnarray} since \[ \int_{ C^\star }( - h(\myvec{y}) ) dF (\myvec{y}) \geq 0 \quad \mbox{and} \quad \int_{ \mathbb{R}^k \backslash C^\star } \left ( 1 - \delta (\myvec{y}) \right ) h(\myvec{y}) dF (\myvec{y}) \geq 0 \] by the very definition of the set $C^\star$ and of the mapping $h: \mathbb{R}^k \rightarrow \mathbb{R}$. \myendpf \section{Randomizing between two pure decision rules} \label{sec:DET_1+RandomizingBetweenTwoPureRules} Consider two pure strategies $d_1$ and $d_2$ in $\mathcal{D}$. With $a$ in $(0,1)$, we introduce a randomized policy $\delta_a$ in $\mathcal{D}^\star$ which first selects the pure strategy $d_1$ (resp. $d_2$) with probability $a$ (resp. $1-a$), and then uses the pure policy that was selected. Formally, $\delta_a : \mathbb{R}^k \rightarrow [0,1]$ is given by \[ \delta (\myvec{y}) = a d_1(\myvec{y}) + (1-a) d_2(\myvec{y}) , \quad \myvec{y} \in \mathbb{R}^k. \] One very concrete way to implement such a randomized policy on the original triple $(\Omega, \mathcal{F}, \mathbb{P})$ is as follows: Consider the original probabilistic framework introduced in Section \ref{sec:DET_1+Model} and assume it to be sufficiently rich to carry an extra $\mathbb{R}$-valued rv $V$ independent of the rvs $H$ and $\myvec{Y}$ (under $\mathbb{P}$), and uniformly distributed on the interval $[0,1]$. Define the $\{0,1\}$-valued rv $B_a$ given by \[ B_a = \1{ V \leq a }. \] It is plain that the rv $B_a$ is independent of the rvs $H$ and $\myvec{Y}$ (under $\mathbb{P}$), with \[ \bP{ B_a = 1 } = a = 1 - \bP{ B_a = 0 }. \] Define the decision rv $D_a$ given by \[ D_a = B_a d_1(\myvec{Y}) + (1-B_a) d_2(\myvec{Y}) . \] It is easy to check that \begin{eqnarray} & & \bP{ D_a = 1 | H=h, \myvec{Y} = \myvec{y} } \nonumber \\ &=& \bP{ B_a d_1(\myvec{Y}) + (1-B_a) d_2(\myvec{Y}) = 1 | H=h, \myvec{Y} = \myvec{y} } \nonumber \\ &=& \bP{ B_a d_1(\myvec{y}) + (1-B_a) d_2(\myvec{y}) = 1 | H=h, \myvec{Y} = \myvec{y} } \nonumber \\ &=& \bP{ B_a = 1 , d_1(\myvec{y}) = 1 | H=h, \myvec{Y} = \myvec{y} } + \bP{ B_a = 0, d_2(\myvec{y}) = 1 | H=h, \myvec{Y} = \myvec{y} } \nonumber \\ &=& d_1(\myvec{y}) \bP{ B_a = 1 | H=h, \myvec{Y} = \myvec{y} } + d_2(\myvec{y}) \bP{ B_a = 0 | H=h, \myvec{Y} = \myvec{y} } \nonumber \\ &=& d_1(\myvec{y}) \bP{ B_a = 1 } + d_2(\myvec{y}) \bP{ B_a = 0 } \nonumber \\ &=& a d_1(\myvec{y}) + (1-a) d_2(\myvec{y}) , \quad \begin{array}{c} \myvec{y} \in \mathbb{R}^k , \\ h=0,1 \\ \end{array} \end{eqnarray} as desired. Applying the expressions (\ref{eq:DET_1+RandomizedTests+P_F+B}) and (\ref{eq:DET_1+RandomizedTests+P_M+B}) with the randomized test $\delta_a$ we get \begin{eqnarray} P_F(\delta_a) &=& \int_{ \mathbb{R}^k } \delta_a (\myvec{y}) dF_0 (\myvec{y}) \nonumber \\ &=& \int_{ \mathbb{R}^k } \left ( a d_1 (\myvec{y}) + (1-a) d_2 (\myvec{y}) \right ) dF_0 (\myvec{y}) \nonumber \\ &=& a \int_{ \mathbb{R}^k } d_1 (\myvec{y}) dF_0 (\myvec{y}) + (1-a) \int_{ \mathbb{R}^k } d_2 (\myvec{y}) dF_0 (\myvec{y}) \nonumber \\ &=& a P_F (d_1) + (1-a) P_F(d_2). \end{eqnarray} Similarly we find that \begin{eqnarray} P_M(\delta_a) &=& \int_{ \mathbb{R}^k } \left ( 1- \delta_a (\myvec{y}) \right ) dF_1 (\myvec{y}) \nonumber \\ &=& \int_{ \mathbb{R}^k } \left ( 1 - a d_1(\myvec{y}) - (1-a) d_2(\myvec{y}) \right ) dF_1 (\myvec{y}) \nonumber \\ &=& a \int_{ \mathbb{R}^k } (1-d_1 (\myvec{y}) ) dF_1 (\myvec{y}) + (1-a) \int_{ \mathbb{R}^k } (1-d_2 (\myvec{y})) dF_1 (\myvec{y}) \nonumber \\ &=& a P_M (d_1) + (1-a) P_M(d_2). \end{eqnarray} \section{The minimax formulation} \label{sec:DET_1+MinimaxFormulation} The Bayesian formulation {\em implicitly} assumes knowledge of the prior distribution of the hypothesis rv $H$. In many situations, this assumption cannot be adequately justified, and the Bayesian formulation has to abandonned for the so--called {\em minimax criterion}. \paragraph{The basic idea} For each $p$ in $[0,1]$, let $J_p(d)$ denote the expected cost associated with the admissible decision rule $d$ in ${\cal D}$ when the prior on $H$ is $p$, i.e., \[ J_p(d) \equiv \mathbb{E}_p \left [ C(H,d(H)) \right ] \] where $\mathbb{E}_p \left [ \cdot \right ]$ denotes expectation with prior $p$. The Bayesian problem now reads \[ \mathcal{P}_{p,B}: \quad \mbox{Minimize $J_p(d)$ over $d$ in $\mathcal{D}$.} \] As shown earlier, under the mild Conditions {\bf (A.1)} and {\bf (A.2)} (assumed from now on), $\mathcal{P}_{p,B}$ has a solution which is now denoted by $d^\star (p)$ to indicate its dependence on the prior $p$. Clearly, any such solution satisfies \begin{equation} J_p(d^\star(p)) \leq J_p(d), \quad d \in \mathcal{D}. \label{eq:DET_1+MinimaxFormulation+BayesianCost0} \end{equation} Let the corresponding Bayesian cost be denoted by \begin{equation} V (p) \equiv \min_{d\in{\cal D}} J_p(d) \label{eq:DET_1+MinimaxFormulation+BayesianCost} \end{equation} It is plain that \begin{equation} V (p) = J_p(d^\star(p)). \label{eq:DET_1+MinimaxFormulation+BayesianCost1} \end{equation} Since the exact value of the prior $p$ is not available, a reasonable way to proceed consists in using the Bayesian test for that value of $p$ which yields the {\em largest} Bayesian cost (\ref{eq:DET_1+MinimaxFormulation+BayesianCost}): Thus, with the notation introduced, let $p_{\rm m}$ in $[0,1]$ such that \begin{equation} V (p_{\rm m}) = \max _{p\in [0,1]} V (p), \label{eq:DET_1+MinimaxFormulation+BayesianCost2} \end{equation} and use the Bayesian rule $d^\star (p_{\rm m})$ -- The existence of $p_{\rm m}$ is guaranteed by the fact that the mapping $V: [0,1] \rightarrow \mathbb{R}$ is continuous on the closed bounded interval $[0,1]$ by Lemma \ref{lem:DET_1+MinimaxFormulation+Concavity1}, hence achieves its maximum value on $[0,1]$. The test $d^\star (p_{\rm m})$, hereafter denoted $d_{\rm m}^\star $, is known as the {\em minimax} decision rule; it tries to {\em compensate} for the uncertainty in the modeling assumptions, namely, that the exact value of $p$ is not known. We refer to the cost value \[ V(p_{\rm m}) = J_{p_{\rm m}} (d^\star (p_{\rm m})) \] as the {\em minimax} cost; it will be denoted by $V_{\rm m}$. That there is a performance cost to pay for this uncertainty is not surprising in the least: Indeed, let $p_{\rm True}$ be the true (but {\em unknown}) value of the prior $p$. If $p_{\rm m} = p_{\rm True} $, then $d_{\rm m}^\star$ coincides with the Bayesian test $d^\star (p_{\rm True})$ and this choice is optimal. On the other hand, if $p_{\rm m} \neq p_{\rm True}$, then \[ J_{p_{\rm True}}(d^\star (p_{\rm True})) \leq J_{p_{\rm True}} ( d^\star (p_{\rm m})) = J_{p_{\rm True}} ( d_{\rm m}^\star ) \] with a {\em strict} inequality in most cases, i.e., as expected, a better performance level could be reached if the value $p_{\rm True}$ were known. \paragraph{The Minimax Theorem} \begin{theorem} {\sl Under the absolute continuity condition {\bf (A.1)}, the minimax cost $V_{\rm m}$ can be characterized by \begin{equation} V_{\rm m} = \min_{ d \in \mathcal{D} } \left ( \max_{p \in [0,1]} J_p(d) \right ) = \max_{p \in [0,1]} \left ( \min_{ d \in \mathcal{D} } J_p(d) \right ). \label{eq:DET_1+MinimaxFormulation+MinimaxTheorem} \end{equation} } \label{thm:DET_1+MinimaxFormulation+MinimaxTheorem} \end{theorem} \myproof Combining (\ref{eq:DET_1+MinimaxFormulation+BayesianCost0}) and (\ref{eq:DET_1+MinimaxFormulation+BayesianCost1}) we get \[ V(p) \leq J_p(d), \quad \begin{array}{c} d \in \mathcal{D} \\ p \in [0,1]. \\ \end{array} \] Therefore, for each $d$ in $\mathcal{D}$ it holds that \[ \max_{p \in [0,1]} V(p) \leq \max_{p \in [0,1]} J_p(d), \] whence \[ \min_{ d \in \mathcal{D} } \left ( \max_{p \in [0,1]} V(p) \right ) \leq \min_{ d \in \mathcal{D} } \left ( \max_{p \in [0,1]} J_p(d) \right ). \] This is equivalent to \[ \max_{p \in [0,1]} V(p) \leq \min_{ d \in \mathcal{D} } \left ( \max_{p \in [0,1]} J_p(d) \right ), \] an inequality that also reads as \[ \max_{p \in [0,1]} \left ( \min_{ d \in \mathcal{D} } J_p(d) \right ) \leq \min_{ d \in \mathcal{D} } \left ( \max_{p \in [0,1]} J_p(d) \right ). \] The equality (\ref{eq:DET_1+MinimaxFormulation+MinimaxTheorem}) will be established if we show that \begin{equation} \min_{ d \in \mathcal{D} } \left ( \max_{p \in [0,1]} J_p(d) \right ) \leq \max_{p \in [0,1]} \left ( \min_{ d \in \mathcal{D} } J_p(d) \right ). \label{eq:DET_1+MinimaxFormulation+MinimaxTheorem2} \end{equation} \myendpf \section{The minimax equation} \label{sec:DET_1+MinimaxEquation} The main issue in constructing the minimax rule $d^\star_{\rm m}$ consists in determining the value of $p_{\rm m}$ such that (\ref{eq:DET_1+MinimaxFormulation+BayesianCost2}) holds; its characterization is achieved through the {\em Minimax Equation} discussed below. In view of (\ref{eq:DET_1+MinimaxFormulation+BayesianCost1}) this requirement also reads \begin{equation} J_{p_{\rm m}} (d^\star (p_{\rm m})) = \max_{ p \in [0,1] } V(p). \label{eq:DET_1+MinimaxEquation+Characterization1} \end{equation} \paragraph{An auxiliary concavity result} The following technical fact will be useful in the derivation of the minimax equation. \begin{lemma} {\sl The mapping $V: [0,1] \rightarrow \mathbb{R}$ is concave and continuous on the closed interval $[0,1]$ with boundary values $V(0) = C(0,0)$ and $V(1) = C(1,1)$. } \label{lem:DET_1+MinimaxFormulation+Concavity1} \end{lemma} For easy reference, recall that for each test $d$ in $\mathcal{D}$, the expression \begin{eqnarray} J_p(d) &=& p C(1,1) + (1-p) C(0,0) \nonumber \\ & & +~ \Gamma_0 (1-p) \cdot P_F(d) + \Gamma_1 p \cdot P_M(d) \label{eq:DET_1+MinimaxEquation+Expression} \end{eqnarray} holds with $p$ in $[0,1]$. \myproof With $p=0$ in (\ref{eq:DET_1+MinimaxEquation+Expression}) we get \[ J_0(d) = C(0,0) + \Gamma_0 P_F(d), \quad d \in \mathcal{D}. \] Therefore, \[ V(0) = C(0,0) + \Gamma_0 \cdot \inf_{d \in \mathcal{D}} P_F(d) \] upon using $\Gamma_0 > 0$. However, $P_F(d_\infty) = 0$, so that $ \inf_{d \in \mathcal{D}} P_F(d) = 0$, whence $V(0) = C(0,0)$. A similar argument shows that $V(1) = C(1,1)$. The probabilities $P_F(d)$ and $P_M(d)$ appearing in (\ref{eq:DET_1+MinimaxEquation+Expression}) do {\em not} depend on $p$, but rather on $F_0$, $F_1$ and $d$. Thus, the mapping $p \rightarrow J_p(d) $ is affine, hence concave in $p$. As a result, the mapping $V: [0,1] \rightarrow \mathbb{R}$ is concave on the closed interval $[0,1]$, being the infimum of the family $\left \{ J_p(d), \ d \in \mathcal{D} \right \}$ of concave functions. Because a concave function defined on an open interval is necessarily continuous on that open interval, the mapping $V: [0,1] \rightarrow \mathbb{R}$ is continuous on $(0,1)$. Therefore, it remains only to show that this mapping is also continuous at the boundary points $p=0$ and $p=1$. We discuss only the case $p=0$; the case $p=1$ can be handled {\em mutatis mutandis} and is left to the interested reader as an exercise. In view of (\ref{eq:DET_1+MinimaxEquation+Expression}) continuity of the mapping $V: [0,1] \rightarrow \mathbb{R}$ at $p=0$ is equivalent to \begin{equation} \lim_{p \rightarrow 0 } \left ( \inf_{ d \in \mathcal{D}} \left ( \Gamma_0 (1-p) \cdot P_F(d) + \Gamma_1 p \cdot P_M(d) \right ) \right ) = 0 \label{eq:DET_1+MinimaxEquation+Limit1} \end{equation} since $V(0) = C(0,0)$ by the first part of the proof. To do so, write \[ \Delta (p) = \inf_{ d \in \mathcal{D}} \left ( \Gamma_0 (1-p) \cdot P_F(d) + \Gamma_1 p \cdot P_M(d) \right ), \quad p \in [0,1]. \] Thus, for any fixed $p$ in $(0,1)$, we get \begin{eqnarray} \Delta (p) &=& \inf _{d\in{\cal D}} \left ( \Gamma _0 \cdot P_F(d) + p \left ( \Gamma _1 \cdot P_M(d) - \Gamma _0 \cdot P_F(d) \right ) \right ) \nonumber \\ &=& \inf _{d\in{\cal D}} \left ( \Gamma _0 \cdot P_F(d) + p \cdot A(d) \right ) \end{eqnarray} where we have set \[ A(d) = \Gamma _1 \cdot P_M(d) - \Gamma _0 \cdot P_F(d) , \quad d \in \mathcal{D}. \] Next, note the elementary bounds \begin{equation} \inf _{d\in{\cal D}} \left ( p \cdot A(d) \right ) \leq \Delta (p) \leq \Gamma _0 \cdot P_F(d) + p \cdot A(d), \quad d \in \mathcal{D}. \label{eq:DET_1+MinimaxFormulation+Bounds} \end{equation} It is clear that $ \lim_{p \rightarrow 0} \inf _{d\in{\cal D}} \left ( p \cdot A(d) \right ) = 0$ since \[ \left | \inf _{d\in{\cal D}} \left ( p \cdot A(d) \right ) \right | \leq p \cdot \sup_{d \in \mathcal{D}} |A(d)| \leq p \left ( \Gamma_0 + \Gamma_1 \right ). \] It then follows from the first inequality in (\ref{eq:DET_1+MinimaxFormulation+Bounds}) that $\liminf_{p \rightarrow 0} \Delta (p) \geq 0 $. On the other hand, the second inequality in (\ref{eq:DET_1+MinimaxFormulation+Bounds}) yields \[ \limsup_{p \rightarrow 0} \Delta (p) \leq \Gamma _0 \cdot P_F(d) , \quad d \in \mathcal{D}. \] Taking $d=d_\infty$ in this last inequality gives $\limsup_{p \rightarrow 0} \Delta (p) \leq 0$, and combining the last two limiting statements we get the desired conclusion $\lim_{ p \rightarrow 0} \Delta (p) = 0$. \myendpf \paragraph{The minimax equation} \vskip 2truein \centerline {Figure} Fix $p$ in $[0,1]$. The mapping $\alpha \rightarrow J_\alpha (d^\star(p)) $ is affine in the variable $\alpha$ on the interval $[0,1]$ as we recall that \begin{eqnarray} J_\alpha (d^\star(p)) &=& \alpha C(1,1) + (1-\alpha) C(0,0) \nonumber \\ & & + \Gamma_0 (1-\alpha) \cdot P_F( d^\star(p) ) + \Gamma_1 \alpha \cdot P_M( d^\star(p) ) \label{lem:DET_1+MinimaxFormulation+Affine} \end{eqnarray} with $\alpha$ in $[0,1]$ upon specializing (\ref{eq:DET_1+MinimaxEquation+Expression}) to the test $d^\star(p)$. Therefore, the graph of the mapping $\alpha \rightarrow J_\alpha (d^\star(p)) $ is a straight line, whose slope is constant over $[0,1]$ and given by \begin{equation} \frac{d}{d\alpha} J_\alpha (d^\star(p)) = C(1,1) - C(0,0) + \Gamma_1 \cdot P_M( d^\star(p) ) - \Gamma_0 \cdot P_F( d^\star(p) ). \label{lem:DET_1+MinimaxFormulation+Slope1} \end{equation} By its definition, the Bayesian cost satisfies \[ V(\alpha) \leq J_\alpha (d), \quad \begin{array}{c} d \in \mathcal{D} \\ \alpha \in [0,1] \\ \end{array} \] with strict inequality for most tests. With $d=d^\star (p)$ this inequality becomes an equality when $\alpha = p$, namely \[ V(p) = J_p (d^\star(p)) \] while \[ V(\alpha) \leq J_\alpha (d^\star(p)), \quad \alpha \in [0,1]. \] With $p$ in $(0,1)$, if the concave mapping $\alpha \rightarrow V(\alpha)$ is {\em differentiable} at $\alpha = p$, then the straight line $\alpha \rightarrow J_\alpha (d^\star(p)) $ will be tangential to the mapping $\alpha \rightarrow V(\alpha)$ at $\alpha = p$ -- This is a consequence of the concavity established in Lemma \ref{lem:DET_1+MinimaxFormulation+Concavity1}. Thus, \begin{equation} \frac{d}{d\alpha} V(\alpha) \Bigl |_{\alpha = p} = \frac{d}{d\alpha} J_\alpha (d^\star(p)) \Bigl |_{\alpha = p}. \label{lem:DET_1+MinimaxFormulation+Slope2} \end{equation} In particular, if $p_{\rm m}$ is an element of $(0,1)$ and the mapping $\alpha \rightarrow V(\alpha)$ is differentiable at $\alpha=p_{\rm m}$, then \begin{equation} \frac{d}{d\alpha} V(\alpha) \Bigl |_{\alpha = p_{\rm m}} = \frac{d}{d\alpha} J_\alpha (d^\star(p_{\rm m})) \Bigl |_{\alpha = p_{\rm m}}. \label{lem:DET_1+MinimaxFormulation+Slope3} \end{equation} But $p_{\rm m}$ being a maximum for the function $\alpha \rightarrow V(\alpha)$, we must have \[ \frac{d}{d\alpha} V(\alpha) \Bigl |_{\alpha = p_{\rm m}} = 0, \] whence \[ \frac{d}{d\alpha} J_\alpha (d^\star(p_{\rm m})) \Bigl |_{\alpha = p_{\rm m}} = 0. \] Using (\ref{lem:DET_1+MinimaxFormulation+Slope1}) we conclude that \begin{equation} C(1,1) - C(0,0) + \Gamma_0 P_F(d^\star(p_{\rm m})) - \Gamma_1 P_M (d^\star(p_{\rm m})) = 0. \label{eq:DET_1+MinimaxFormulation+MinimaxEqn} \end{equation} This equation characterizing $p_{\rm}$ is called the Minimax Equation. For the probability of error criterion, the Minimax Equation takes the simpler form \[ P_F(d^\star(p)) = P_M (d^\star(p)) \] at $p= p_{\rm m}$. This analysis does not cover the cases when (i) $p_{\rm m} = 0$, (ii) $p_{\rm m} = 1$ and (iii) $p_{\rm m}$ is an element of $(0,1)$ but the mapping $\alpha \rightarrow V(\alpha)$ is not differentiable at $\alpha=p_{\rm m}$, \section{The minimax formulation -- Two examples} \label{sec:DET_1+NeymanPearsonFormulationTwoExamples} \paragraph{The Gaussian case} The setting is that of Section \ref{sec:DET_1+GaussianCase} to which we refer the reader for the notation. As shown there, for any $\eta >0$ we have \[ P_F(Lrt_\eta) = 1 - \Phi \left ( \frac{\log \eta +\frac{1}{2} d^2 }{d} \right ) \] and \[ P_M(Lrt_\eta) = \Phi \left ( \frac{\log \eta - \frac{1}{2} d^2 }{d} \right ) \] For each $p$ in $[0,1]$, with \[ \eta (p) = \frac{1-p}{p} \cdot \frac{\Gamma_0}{\Gamma_1} \] we have $d^\star(p) = Lrt_{\eta(p)}$ and the expression (\ref{eq:DET_1+MinimaxEquation+Expression}) becomes \begin{eqnarray} J_p(d^\star(p)) &=& p C(1,1) + (1-p) C(0,0) \nonumber \\ & & +~ \Gamma_0 (1-p) \cdot P_F(d^\star(p)) + \Gamma_1 p \cdot P_M(d^\star(p)) \nonumber \\ &=& p C(1,1) + (1-p) C(0,0) \nonumber \\ & & +~ \Gamma_0 (1-p) \cdot \left ( 1 - \Phi \left ( \frac{\log \eta (p) + \frac{1}{2} d^2 }{d} \right ) \right ) \nonumber \\ & & +~ \Gamma_1 p \cdot \Phi \left ( \frac{\log \eta (p) - \frac{1}{2} d^2 }{d} \right ) . \end{eqnarray} Therefore, \begin{eqnarray} V(p) &=& p C(1,1) + (1-p) C(0,0) \nonumber \\ & & +~ \Gamma_0 (1-p) \cdot \left ( 1 - \Phi \left ( \frac{\log \eta (p) + \frac{1}{2} d^2 }{d} \right ) \right ) \nonumber \\ & & +~ \Gamma_1 p \cdot \Phi \left ( \frac{\log \eta (p) - \frac{1}{2} d^2 }{d} \right ) . \end{eqnarray} The boundary cases $p=0$ and $p=1$ are easily recovered upon formally substituting these values in the expression above. The Minimax Equation takes the form \begin{eqnarray} \lefteqn{ C(1,1) - C(0,0) } & & \nonumber \\ &=& \Gamma_1 \Phi \left ( \frac{\log \eta(p_{\rm m}) - \frac{1}{2} d^2 }{d} \right ) - \Gamma_0 \left ( 1 - \Phi \left ( \frac{\log \eta (p_{\rm m}) +\frac{1}{2} d^2 }{d} \right ) \right ) . \nonumber \end{eqnarray} For the probability of error case, simplifications occur. The last expression becomes \begin{eqnarray} V(p) &=& (1-p) \cdot \left ( 1 - \Phi \left ( \frac{ \frac{1-p}{p} + \frac{1}{2} d^2 }{d} \right ) \right ) \nonumber \\ & & +~ p \cdot \Phi \left ( \frac{\log \frac{1-p}{p} - \frac{1}{2} d^2 }{d} \right ) \end{eqnarray} and the Minimax Equation reduces to \begin{eqnarray} \Phi \left ( \frac{\log \eta(p_{\rm m}) - \frac{1}{2} d^2 }{d} \right ) + \Phi \left ( \frac{\log \eta (p_{\rm m}) +\frac{1}{2} d^2 }{d} \right ) = 1. \nonumber \end{eqnarray} It is easy to see that this requires $\log \eta (p_{\rm m}) = 0$ so that $p_{\rm m} = \frac{1}{2}$ (indeed in $(0,1)$), an intuitively satisfying conclusion! \paragraph{The Bernoulli case} The setting is that of Section \ref{sec:DET_1+BernoulliCase} to which we refer the reader for the notation. We discuss only the case $a_1 < a_0$, and leave the case $a_0 < a_1$ as an exercise for the interested reader. Note that the condition $a_1< a_0$ is equivalent to $1 < \frac{1-a_1}{1-a_0}$, so that the expressions (\ref{eq:DET_1+BernoulliCase+P_F}) and (\ref{eq:DET_1+BernoulliCase+P_M}) for the probabilities $P_F(d_\eta) $ and $P_M(d_\eta) $, respectively, are {\em piecewise} constant functions of $\eta$ with different constant values on the intervals $(0, \frac{a_1}{a_0}]$, $(\frac{a_1}{a_0}, \frac{1-a_1}{1-a_0} ]$ and $( \frac{1-a_1}{1-a_0} , \infty)$: Direct inspection of the expression (\ref{eq:DET_1+BernoulliCase+P_F}) yields \begin{eqnarray} P_F(d_\eta) &=& \left \{ \begin{array}{ll} 1 & \mbox{if $ 0 < \eta \leq \frac{a_1}{a_0} $} \\ & \\ 1-a_0 & \mbox{if $ \frac{a_1}{a_0} < \eta \leq \frac{1-a_1}{1-a_0} $} \\ & \\ 0 & \mbox{if $ \frac{1-a_1}{1-a_0} < \eta$.} \\ \end{array} \right . \end{eqnarray} Similarly, using (\ref{eq:DET_1+BernoulliCase+P_M}) we find \begin{eqnarray} P_M(d_\eta) &=& \left \{ \begin{array}{ll} 0 & \mbox{if $ 0 < \eta \leq \frac{a_1}{a_0} $} \\ & \\ a_1 & \mbox{if $ \frac{a_1}{a_0} < \eta \leq \frac{1-a_1}{1-a_0} $} \\ & \\ 1 & \mbox{if $ \frac{1-a_1}{1-a_0} < \eta$.} \\ \end{array} \right . \end{eqnarray} Thus, for each $p$ in $[0,1]$, we see from (\ref{eq:DET_1+MinimaxEquation+Expression}) that the cost $J_p(d_\eta)$ takes a different value on each of the intervals $(0, \frac{a_1}{a_0}]$, $(\frac{a_1}{a_0}, \frac{1-a_1}{1-a_0} ]$ and $( \frac{1-a_1}{1-a_0} , \infty)$: Specifically, we have: On $(0, \frac{a_1}{a_0}]$, \begin{eqnarray} J_p(d_\eta) &=& p C(1,1) + (1-p) C(0,0) + \Gamma_0 (1-p) \nonumber \\ &=& p C(1,1) + (1-p) C(0,1); \end{eqnarray} On $(\frac{a_1}{a_0}, \frac{1-a_1}{1-a_0} ]$, \begin{eqnarray} \lefteqn{ J_p(d_\eta) } & & \nonumber \\ &=& p C(1,1) + (1-p) C(0,0) + \Gamma_0 (1-p) \cdot (1-a_0) + \Gamma_1 p \cdot a_1 \nonumber \\ &=& p C(1,1) + (1-p) C(0,1) + \Gamma_1 p \cdot a_1 - \Gamma_0 (1-p) \cdot a_0 \nonumber \\ &=& p \left ( C(1,1) + \Gamma_1 a_1 \right ) + (1-p) \left ( C(0,1) - \Gamma_0 a_0 \right ); \end{eqnarray} On $( \frac{1-a_1}{1-a_0} , \infty)$, \begin{eqnarray} J_p(d_\eta) &=& p C(1,1) + (1-p) C(0,0) + \Gamma_1 p \nonumber \\ &=& p C(1,0) + (1-p) C(0,0). \end{eqnarray} Recall that \[ V(p) = J_p (d_{\eta(p)} ) \mbox{~with $\eta(p) = \frac{ \Gamma_0 (1-p)}{\Gamma_1 p}$}, \quad 0 < p \leq 1. \] As the mapping $p : (0,1] \rightarrow \mathbb{R}_+: p \rightarrow \eta(p)$ is strictly decreasing, the equations \[ \eta(p) = \frac{1-a_1}{1-a_0}, \quad 0 < p \leq 1 \] and \[ \eta(p) = \frac{a_1}{a_0}, \quad 0 < p \leq 1 \] have each a unique solution in $(0,1)$. Their solutions, denoted $p_{-}$ and $p_{+}$, respectively, are given by \[ p_{-} = \frac{ \Gamma_0 (1-a_0)}{ \Gamma_1 (1-a_1) + \Gamma_0 (1-a_0) } \] and \[ p_{+} = \frac{ \Gamma_0 a_0 }{ \Gamma_1 a_1 + \Gamma_0 a_0}. \] As expected $p_{-} < \frac{1}{2} < p_{+}$. Earlier expressions can now be used\begin{eqnarray} \lefteqn{ V(p) } & & \nonumber \\ &=& \left \{ \begin{array}{ll} p C(1,0) + (1-p) C(0,0) & \mbox{if $p \in (0, p_{-}]$} \\ & \\ p \left ( C(1,1) + \Gamma_1 a_1 \right ) + (1-p) \left ( C(0,1) - \Gamma_0 a_0 \right ) & \mbox{if $p \in (p_{-}, p_{+} ]$} \\ & \\ p C(1,1) + (1-p) C(0,1) & \mbox{if $p \in (p_{+}, 1)$.} \\ \end{array} \right . \nonumber \end{eqnarray} It is plain that the function $V: [0,1] \rightarrow \mathbb{R}$ is piecewise linear with three distinct segments, namely $(0, p_{-} ]$, $(p_{-}, p_{+}]$ and $(p_{+} , 1 ]$. There are two kinks at $p=P_{-}$ and $p=p_{+}$, respectively. That the function is concave can be seen by computing the left and right-derivatives at these points. However, the function $V: [0,1] \rightarrow \mathbb{R}$ is differentiable everywhere except at these kinks. However the maximum occurs at one of these points so that $p_{\rm m} \in \{ p_{-},p_{+} \}$. For the probability of error criterion, we find that \begin{eqnarray} V(p) &=& \left \{ \begin{array}{ll} p & \mbox{if $p \in (0, p_{-}]$} \\ & \\ p a_1 + (1-p) \left ( 1 - a_0 \right ) & \mbox{if $p \in (p_{-}, p_{+} ]$} \\ & \\ 1-p & \mbox{if $p \in (p_{+}, 1)$} \\ \end{array} \right . \end{eqnarray} with \[ p_{-} = \frac{ 1- a_0 }{ (1- a_1) + (1 - a_0 ) } \] and \[ p_{+} = \frac{ a_0 }{ a_1 + a_0 }. \] It is trivial to check that $V(p_{\pm}-) = V(p_{\pm}+)$, establishing continuity at the kink points. Note that here $p_{\rm m} = p_{+}$ so that the minimax cost is given by \[ V_{\rm m} = 1-p_{+} = \frac{ a_1 }{ a_1 + a_0}. \] \section{The Neyman-Pearson formulation} \label{sec:DET_1+NeymanPearsonFormulation} In many situations, not only is the prior probability $p$ not available but it is quite difficult to make meaningful cost assignments. This is typically the case in radar applications -- After all, what is the real cost of failing to detect an incoming missile? While it is tempting to seek to minimize {\em both} the probabilities of miss and false alarm, these are (usually) conflicting objectives and a {\em constrained} optimization problem is considered instead. For $0\leq\alpha\leq 1$, consider the constrained optimization problem ${\rm NP}_{\alpha}$ where \[ {\rm NP}_\alpha: \quad \mbox{Maximize $P_D(d)$ over $d$ in $\mathcal{D}_\alpha$} \] where ${\cal D}_{\alpha}$ is the collection of admissible tests in $\cal D$ of {\em size} at most $\alpha$, i. e., \[ {\cal D}_{\alpha} = \{ d\in {\cal D} : \ P_F(d) \leq \alpha \} . \] Solving ${\rm NP}_{\alpha}$ amounts to finding a test $d_{\rm NP}(\alpha )$ in ${\cal D}_{\alpha}$ with the property that \[ P_D(d) \leq P_D(d_{\rm NP}(\alpha )), \quad d \in {\cal D}_{\alpha}. \] Such a test $d_{\rm NP}(\alpha )$, when it exists, is called a {\em Neyman--Pearson} test of {size} $\alpha$. Following the accepted terminology, its {\em power} $\beta (\alpha )$ is given by \[ \beta (\alpha ):=\sup_{d\in {\cal D}_{\alpha}} P_D(d) \] \paragraph{The Lagrangian argument} When reformulated as \[ {\rm NP}_\alpha: \quad \mbox{Minimize $P_M(d)$ over $d$ in $\mathcal{D}_\alpha$}, \] the constrained optimization problem ${\rm NP}_{\alpha}$ can be solved by the following Lagrangian arguments: First, for each $\lambda \geq 0$ consider the Lagrangian functional $J_{\lambda}: {\cal D}\rightarrow\mathbb{R}$ given by \[ J_{\lambda}(d) = P_M (d)+\lambda \left ( P_F(d)-\alpha \right ), \quad d \in \mathcal{D}. \] The {\em Lagrangian} problem ${\rm LP}_{\lambda}$ is then defined as the {\em unconstrained} minimization problem \[ {\rm LP}_{\lambda}: \quad \mbox{Minimize $J_{\lambda}(d)$ over $d$ in $\mathcal{D}$} \] Its solution is readily obtained as follows: For any test $d$ in $\cal D$, we observe that \begin{eqnarray} J_{\lambda}(d) &=& \bP{ d(\myvec{Y}) = 0 | H=1 } + \lambda \left ( \bP{ d(\myvec{Y}) = 1 | H=0 } - \alpha \right ) \nonumber \\ &=& \bP{ d(\myvec{Y}) = 0 | H=1 } + \lambda \left ( 1 - \bP{ d(\myvec{Y}) = 0 | H=0 } - \alpha \right ) \nonumber \\ &=& \lambda ( 1 - \alpha ) + \bP{ d(\myvec{Y}) = 0 | H=1 } - \lambda \bP{ d(\myvec{Y}) = 0 | H=0 } \nonumber \\ &=& \lambda ( 1 - \alpha ) + \int_{C(d)} h_{\lambda}(\myvec{y})dF(\myvec{y}) \end{eqnarray} with $h_{\lambda}: ~\mathbb{R}^k\rightarrow\mathbb{R}$ given by \[ h_{\lambda} (\myvec{y}) = f_1 (\myvec{y}) - \lambda f_0 (\myvec{y}), \quad \myvec{y} \in \mathbb{R}^k. \] By the arguments leading to Theorem \ref{thm:DET_1+SolutionBayesProblem+h(y)}, the Lagrangian problem ${\rm LP}_{\lambda}$ is seen to be solved by the test $d_{\lambda}$ given by \begin{eqnarray} d_{\lambda}(\myvec{y}) = 0 \quad \mbox{iff} \quad h_{\lambda}(\myvec{y}) < 0, \end{eqnarray} or equivalently, \begin{eqnarray} d_{\lambda}(\myvec{y}) = 0 \quad \mbox{iff} \quad f_1 (\myvec{y}) < \lambda f_0(\myvec{y}). \end{eqnarray} The next step consists in finding some value $\lambda (\alpha )>0$ of the Lagrangian multiplier such that the test $d_{\lambda (\alpha )}$ meets the constraint, i.e., \begin{equation} P_F(d_{\lambda (\alpha )}) = \alpha . \label{eq:DET_1+NeymanPearsonFormulation+MeetConstraint} \end{equation} If such value $\lambda(\alpha)$ exists, then we get \[ J_{\lambda (\alpha )}(d_{\lambda (\alpha )}) \leq J_{\lambda (\alpha )}(d), \quad d \in \mathcal{D}, \] or equivalently, \[ P_M(d_{\lambda (\alpha )}) \leq P_M(d)+ \lambda (\alpha ) \left ( P_F(d)-\alpha \right ), \quad d \in \mathcal{D}. \] Consequently, for every test $d$ in ${\cal D}_{\alpha}$ (and not merely in $\cal D$), we conclude that \[ P_M(d_{\lambda (\alpha )}) \leq P _M(d) \] since then $P_M(d) \leq\alpha $. The test $d_{\lambda (\alpha )}$ being in ${\cal D}_{\alpha}$, it is clear that $d_{\lambda (\alpha )}$ solves ${\rm NP}_{\alpha }$. In other words, $d_{\rm NP}(\alpha )$ can be taken to be $d_{\lambda (\alpha ) }$. \paragraph{Meeting the constraint (\ref{eq:DET_1+NeymanPearsonFormulation+MeetConstraint})} The Lagrangian argument hinges upon the possibility of finding a value $\lambda (\alpha )$ of the Lagrange multiplier such that $P_F(d_{\lambda (\alpha )}) = \alpha $. Unfortunately, this may not be always possible, unless additional assumptions are imposed. To see this, note that for every $\lambda \geq 0$, it holds that \begin{eqnarray} P_F(d_{\lambda }) &=& \bP{ d_{\lambda}(\myvec{Y}) = 1 | H = 0 } \nonumber \\ &=& \bP{ f_1(\myvec{Y}) \geq \lambda f_0 (\myvec{Y}) | H = 0 }. \label{eq:DET_1+NeymanPearsonFormulation+MeetConstraint1} \end{eqnarray} The mapping $\mathbb{R}_+ \rightarrow [0,1]: \lambda \rightarrow P_F(d_{\lambda })$ is clearly {\em monotone non-increasing} with boundary values $P_F(d_0) = 1$ and $\lim_{\lambda \uparrow \infty} P_F(d_{\lambda }) = P_F(d_\infty ) = 0$. However, the constraint $P_F(d_\lambda ) = \alpha $ may fail to hold for some $\alpha$ in $[0,1]$ because the set of values $\{ P_F(d_{\lambda}), \ \lambda\geq 0 \}$ need not contain $\alpha$. This will occur if the mapping $\lambda \rightarrow P_F(d_{\lambda})$ is not continuous at some point, say $\lambda^\star > 0$, with \[ \lim_{\lambda \uparrow \lambda^\star} P_F(d_{\lambda }) < \alpha < \lim_{\lambda \downarrow \lambda^\star} P_F(d_{\lambda }). \] Later in this section we illustrate this situation on the simple example of deciding between the two hypotheses on the basis of a Poisson rv. Although randomized policies are introduced to solve this difficulty. there are however situations where this can be avoided because each one of the problems ${\rm NP} _{\alpha}$ has a solution within the set of non-randomized policies $\cal D$. One such situation occurs when $F$ is Lebesgue measure on $\mathbb{R}^k$ and the absolute continuity condition {\bf (A.2)} holds. \begin{lemma} {\sl Assume $F$ to be Lebesgue measure on $\mathbb{R}^k$ and that the absolute continuity condition {\bf (A.2)} holds. Then the mapping $\mathbb{R}_+ \rightarrow [0,1]: \lambda \rightarrow P_F(d_\lambda) $ is continuous. } \label{lem:DET_1+NeymanPearsonFormulation+MeetingConstraint} \end{lemma} The proof of Lemma \ref{lem:DET_1+NeymanPearsonFormulation+MeetingConstraint} is given Section \ref{sec:DET_1+Proofs}. Continuity, ogether with the monotonicity property discussed earlier, implies $\{ P_F(d_\lambda) , \ \lambda \geq 0 \} = [0,1)$, and the requirement can always be met. \section{The Neyman-Pearson Lemma} \label{sec:DET_1+NP_Lemma} The discussion of Section \ref{sec:DET_1+NeymanPearsonFormulation} suggests the need to consider an extended version of the Neyman-Pearson formulation. First for each $\alpha$ in $[0,1]$ let ${\cal D}^\star_{\alpha}$ is the collection of all randomized tests in $\mathcal{D}^\star$ of {\em size} at most $\alpha$, i. e., \[ \mathcal{D}^\star_{\alpha} = \{ \delta \in \mathcal{D}^\star : \ P_F(\delta) \leq \alpha \} . \] Now consider the following constrained optimization problem ${\rm NP}^\star_{\alpha}$ where \[ {\rm NP}^\star_\alpha: \quad \mbox{Maximize $P_D(\delta)$ over $\delta$ in $\mathcal{D}^\star_\alpha$} \] Solving ${\rm NP}^\star_{\alpha}$ amounts to finding a test $\delta_{\rm NP}(\alpha )$ in $\mathcal{D}^\star_{\alpha}$ with the property that \[ P_D(\delta) \leq P_D(\delta_{\rm NP}(\alpha )), \quad \delta \in \mathcal{D}^\star_{\alpha}. \] Such a test $\delta_{\rm NP}(\alpha )$, when it exists, is also called a {\em Neyman--Pearson} test of {size} $\alpha$. Its existence and characterization are consequences of two Lemmas, namely Lemma \ref{lem:DET_1+NP_LemmaOptimality} and Lemma \ref{lem:DET_1+NP_LemmaAdmissibility} below, which are collectively referred as the Neyman-Pearson Lemma. Optimality is handled first. \paragraph{Optimality} With $\eta > 0$ and Borel mapping $\gamma: \mathbb{R}^k \rightarrow [0,1]$ (to be selected shortly), define the randomized test $\delta^\star: \mathbb{R}^k \rightarrow [0,1]$ in $\mathcal{D}^\star$ given by \[ \delta^\star (\myvec{y}) = \left \{ \begin{array}{ll} 1 & \mbox{if $ \eta f_0(\myvec{y}) < f_1(\myvec{y}) $ } \\ & \\ \gamma(\myvec{y}) & \mbox{if $f_1(\myvec{y}) = \eta f_0(\myvec{y}) $ } \\ & \\ 0 & \mbox{ if $ f_1(\myvec{y}) < \eta f_0(\myvec{y}) $} \\ \end{array} \right . \] \begin{lemma} {\sl For any test $\delta: \mathbb{R}^k \rightarrow [0,1]$ in $\mathcal{D}^\star$, the inequality \begin{equation} P_D(\delta^\star) - P_D(\delta) \geq \eta \left ( P_F(\delta^\star) - P_F(\delta) \right ) \label{eq:DET_1+NP_LemmaBasicInequality} \end{equation} holds. } \label{lem:DET_1+NP_LemmaOptimality} \end{lemma} \myproof Let $\delta: \mathbb{R}^k \rightarrow [0,1]$ be an arbitrary test in $\mathcal{D}^\star$. Recall that \[ P_F(\delta) = \int_{ \mathbb{R}^k} \delta (\myvec{y}) f_0(\myvec{y}) dF(\myvec{y}) \quad \mbox{and} \quad P_F(\delta^\star) = \int_{ \mathbb{R}^k} \delta^\star (\myvec{y}) f_0(\myvec{y}) dF(\myvec{y}), \] while \[ P_D(\delta) = \int_{ \mathbb{R}^k} \delta (\myvec{y}) f_1(\myvec{y}) dF(\myvec{y}) \quad \mbox{and} \quad P_D(\delta^\star) = \int_{ \mathbb{R}^k} \delta^\star (\myvec{y}) f_1(\myvec{y}) dF(\myvec{y}). \] Now, \begin{eqnarray} \lefteqn{ \int_{ \mathbb{R}^k } \left ( \delta^\star (\myvec{y}) - \delta (\myvec{y}) \right ) \left ( f_1(\myvec{y}) - \eta f_0(\myvec{y}) \right ) dF(\myvec{y}) } & & \nonumber \\ &=& \int_{ \mathbb{R}^k } \left ( \delta^\star (\myvec{y}) - \delta (\myvec{y}) \right ) f_1(\myvec{y}) dF(\myvec{y}) - \eta \int_{ \mathbb{R}^k } \left ( \delta^\star (\myvec{y}) - \delta (\myvec{y}) \right ) f_0(\myvec{y}) dF(\myvec{y}) \nonumber \\ &=& P_D(\delta^\star) - P_D(\delta) - \eta \left ( P_F(\delta^\star) - P_F(\delta) \right ). \end{eqnarray} But direct inspection shows that it always the case that \begin{equation} \left ( \delta^\star (\myvec{y}) - \delta (\myvec{y}) \right ) \left ( f_1(\myvec{y}) - \eta f_0(\myvec{y}) \right ) \geq 0, \quad \myvec{y} \in \mathbb{R}^k . \end{equation} Therefore, \begin{equation} P_D(\delta^\star) - P_D(\delta) - \eta \left ( P_F(\delta^\star) - P_F(\delta) \right ) \geq 0 \end{equation} and the desired conclusion (\ref{eq:DET_1+NP_LemmaBasicInequality}) follows. \myendpf Fix $ \alpha $ in $(0,1)$. If we select $\eta > 0$ and $\gamma: \mathbb{R}^k \rightarrow [0,1]$ so that \begin{equation} P_F(\delta^\star) = \alpha, \label{eq:DET_1+NP_LemmaMeetingConstraint} \end{equation} then the inequality (\ref{eq:DET_1+NP_LemmaBasicInequality}) implies \begin{equation} P_D(\delta^\star) - P_D(\delta) \geq \eta \left ( \alpha - P_F(\delta) \right ), \quad \delta \in \mathcal{D}^\star. \label{eq:DET_1+NP_LemmaBasicInequality2} \end{equation} Therefore, for any test $\delta: \mathbb{R}^k \rightarrow [0,1]$ in $\mathcal{D}^\star_\alpha$, we get \begin{equation} P_D(\delta^\star) - P_D(\delta) \geq \eta \left ( \alpha - P_F(\delta) \right ) \geq 0 \label{eq:DET_1+NP_LemmaBasicInequality3} \end{equation} since $P_F(\delta) \leq \alpha$. In other words, \[ P_D(\delta) \leq P_D(\delta^\star) , \quad \delta \in \mathcal{D}_\alpha ^\star \] and the test $\delta^\star$ solves the constrained problem ${\rm NP}^\star_\alpha$. \paragraph{Meeting the constraint (\ref{eq:DET_1+NP_LemmaMeetingConstraint})} We now show that the parameter $\eta > 0$ and the Borel mapping $\gamma: \mathbb{R}^k \rightarrow [0,1]$ can be selected so that the test $\delta^\star$ indeed satisfies (\ref{eq:DET_1+NP_LemmaMeetingConstraint}). \begin{lemma} {\sl For every $\alpha$ in $(0,1]$ it is always possible to select $\eta > 0$ and a Borel mapping $\gamma: \mathbb{R}^k \rightarrow [0,1]$ so that (\ref{eq:DET_1+NP_LemmaMeetingConstraint}) holds. } \label{lem:DET_1+NP_LemmaAdmissibility} \end{lemma} \myproof Note that \begin{eqnarray} & & P_F(\delta^\star) \nonumber \\ &=& \int_{ \mathbb{R}^k} \delta^\star (\myvec{y}) f_0(\myvec{y}) dF(\myvec{y}) \nonumber \\ &=& \int_{ \left \{ \myvec{y} \in \mathbb{R}^k: f_1(\myvec{y}) = \eta f_0(\myvec{y}) \right \} } \gamma (\myvec{y}) f_0(\myvec{y}) dF(\myvec{y}) + \int_{ \left \{ \myvec{y} \in \mathbb{R}^k: f_1(\myvec{y}) < \eta f_0(\myvec{y}) \right \} } f_0(\myvec{y}) dF(\myvec{y}) \nonumber \\ &=& \int_{ \left \{ \myvec{y} \in \mathbb{R}^k: f_1(\myvec{y}) = \eta f_0(\myvec{y}) \right \} } \gamma (\myvec{y}) f_0(\myvec{y}) dF(\myvec{y}) + \bP{ f_1(\myvec{Y}) > \eta f_0(\myvec{Y}) | H=0 }. \nonumber \end{eqnarray} Therefore, as we seek to satisfy (\ref{eq:DET_1+NP_LemmaMeetingConstraint}) we need to select $\eta > 0$ and a Borel mapping $\gamma: \mathbb{R}^k \rightarrow [0,1]$ such that \[ \alpha - \bP{ f_1(\myvec{Y}) > \eta f_0(\myvec{Y}) | H=0 } = \int_{ \left \{ \myvec{y} \in \mathbb{R}^k: f_1(\myvec{y}) = \eta f_0(\myvec{y}) \right \} } \gamma (\myvec{y}) f_0(\myvec{y}) dF(\myvec{y}). \] The form of this last relation suggests introducing the quantity $\eta(\alpha)$ defined by \[ \eta(\alpha) = \inf \left \{ \eta > 0: \ \bP{ f_1(\myvec{Y}) > \eta f_0(\myvec{Y}) | H=0 } < \alpha \right \}. \] The definition of $\eta(\alpha)$ is well posed since $\eta \rightarrow \bP{ f_1(\myvec{Y}) > \eta f_0(\myvec{Y}) | H=0 } $ is non-increasing on $(0,\infty)$. Two cases are possible: If \[ \bP{ f_1(\myvec{Y}) > \eta (\alpha) f_0(\myvec{Y}) | H=0 } < \alpha , \] then take $\gamma: \mathbb{R}^k \rightarrow [0,1]$ to be constant, say \[ \gamma(\myvec{y}) = \gamma(\alpha) , \quad \myvec{y} \in \mathbb{R}^k. \] This requires that $\gamma(\alpha)$ be selected so that \begin{eqnarray} \lefteqn{ \alpha - \bP{ f_1(\myvec{Y}) > \eta (\alpha) f_0(\myvec{Y}) | H=0 } } & & \nonumber \\ &=& \gamma (\alpha) \int_{ \left \{ \myvec{y} \in \mathbb{R}^k: f_1(\myvec{y}) = \eta(\alpha) f_0(\myvec{y}) \right \} } f_0(\myvec{y}) dF(\myvec{y}). \end{eqnarray} More compactly, we find that $\gamma(\alpha)$ is given by \begin{equation} \gamma (\alpha) = \frac{ \alpha - \bP{ f_1(\myvec{Y}) > \eta (\alpha) f_0(\myvec{Y}) | H=0 } } { \bP{ f_1(\myvec{Y}) = \eta (\alpha) f_0(\myvec{Y}) | H=0 } }. \label{eq:DET_1+NP_LemmaSaturation2} \end{equation} If \[ \bP{ f_1(\myvec{Y}) > \eta (\alpha) f_0(\myvec{Y}) | H=0 } = \alpha , \] then we must select the mapping $\gamma: \mathbb{R}^k \rightarrow [0,1]$ so that \[ \int_{ \left \{ \myvec{y} \in \mathbb{R}^k: f_1(\myvec{y}) = \eta f_0(\myvec{y}) \right \} } \gamma (\myvec{y}) f_0(\myvec{y}) dF(\myvec{y}) = 0. \] Just take the constant mapping given by \[ \gamma(\myvec{y}) = 0 , \quad \myvec{y} \in \mathbb{R}^k. \] \myendpf \section{Examples} \label{sec:DET_1+ExamplesII} \paragraph{The Gaussian case} Consider again the situation discussed in Section \ref{sec:DET_1+Examples} where the observation rv $\myvec{Y}$ is conditionally Gaussian given $H$, i.e., \[ \begin{array}{ll} H_1: & \ \myvec{Y} \sim {\rm N}( \myvec{m}_1, \myvec{R}) \\ H_0: & \ \myvec{Y} \sim {\rm N}( \myvec{m}_0, \myvec{R}) \\ \end{array} \] where $\myvec{m}_1$ and $\myvec{m}_0$ are distinct elements in $\mathbb{R}^k$, and the $k \times k$ {\em symmetric} matrix $\myvec{R}$ is positive definite (thus invertible). From the discussion given in Section \ref{sec:DET_1+Examples}, it follows for each $\lambda > 0$ the test $d_{\lambda}$ takes the form \[ d_\lambda(\myvec{y}) = 0 \quad \mbox{iff} \quad \myvec{y}^\prime \myvec{R}^{-1} \Delta \myvec{m} > \phi(\lambda) \] with $\Delta \myvec{m}$ and $\phi (\lambda)$ given by (\ref{eq:DET_1+Examples+Gaussian1}) and (\ref{eq:DET_1+Examples+Gaussian2}), respectively. We also have \[ P_F(d_{\lambda}) = 1 - \Phi \left ( { { \log \lambda +{1\over 2 } d^2 } \over d } \right ). \] where $d^2$ is given by (\ref{eq:DET_1+Examples+Gaussian3}) -- It is plain that the function $\lambda \rightarrow P_F(d_{\lambda})$ is continuous on $\mathbb{R}_+$. Given $\alpha$ in the unit interval $(0,1)$, the value $\lambda (\alpha )$ is {\em uniquely} determined through the relation \[ 1 - \alpha = \Phi \left ( { { \log \lambda +{1\over 2 } d^2 } \over d } \right ). \] This is equivalent to \[ \lambda ( \alpha ) = e^{ d \cdot x_{1-\alpha } -{1\over 2 }d^2 }. \] where for $t$ in $(0,1)$, let $x_t$ denote the only solution to the equation \[ \Phi (x) = t, \quad x \in \mathbb{R}. \] \paragraph{Discontinuity with Bernoulli rvs} The setting is that of Section \ref{sec:DET_1+BernoulliCase} to which we refer the reader for the notation. We discuss only the case $a_1 < a_0$, and leave the case $a_0 < a_1$ as an exercise for the interested reader. In Section \ref{sec:DET_1+NeymanPearsonFormulationTwoExamples} we have shown that \begin{eqnarray} P_F(d_\lambda) &=& \left \{ \begin{array}{ll} 1 & \mbox{if $ 0 < \lambda \leq \frac{a_1}{a_0} $} \\ & \\ 1-a_0 & \mbox{if $ \frac{a_1}{a_0} < \lambda \leq \frac{1-a_1}{1-a_0} $} \\ & \\ 0 & \mbox{if $ \frac{1-a_1}{1-a_0} < \lambda$.} \\ \end{array} \right . \end{eqnarray} as $\lambda$ ranges over $(0,\infty)$. Here $\eta (\alpha)$ given by \paragraph{Discontinuity with Poisson rvs} With $\mathcal{P}(m)$ denoting the Poisson pmf on $\mathbb{N}$ with parameter $m > 0$, consider the following simple binary hypothesis testing problem \[ \begin{array}{ll} H_1: & \ Y \sim \mathcal{P} (m_1) \\ H_0: & \ Y \sim \mathcal{P} (m_0) \\ \end{array} \] where $m_1 \neq m_0$ in $(0,\infty)$, Thus, \[ \bP{ Y = k | H=h } = \frac{(m_h)^k}{k!} e^{-m_h}, \quad \begin{array}{c} h=0,1 \\ k=0,1, \ldots \\ \end{array} \] In this example, we take $F$ to be the counting measure on $\mathbb{N}$, and for every $\lambda \geq 0$, the definition of $d_{\lambda }$ reduces to \begin{eqnarray} d_{\lambda }(k)=0 &\mbox{iff}& \frac{(m_1)^k}{k!}e^{-m_1}<\lambda \frac{(m_0)^k}{k!}e^{-m_0} \nonumber \\ &\mbox{iff}& \left ( \frac{m_1}{m_0} \right )^k < \lambda e^{-( m_0 - m_1)} \end{eqnarray} with $k=0,1, \ldots $. If $m_0 < m_1$, then \begin{eqnarray} d_{\lambda }(k)=0 &\mbox{iff}& \frac{(m_1)^k}{k!}e^{-m_1}<\lambda \frac{(m_0)^k}{k!}e^{-m_0} \nonumber \\ &\mbox{iff}& \left ( \frac{m_1}{m_0} \right )^k < \lambda e^{-( m_0 - m_1)} \nonumber \\ &\mbox{iff}& k < \eta(\lambda) \end{eqnarray} with $k=0,1, \ldots $, where \[ \eta (\lambda ) = \frac{ \log \lambda e^{-(m_0 - m_1)} } { \log \left ( \frac{m_1}{m_0} \right ) }. \] It follows that \begin{eqnarray} P_F(d_\lambda) &=& \bP{ d_\lambda (Y) = 1 | H=0 } \nonumber \\ &=& \bP{ Y \geq \eta(\lambda) | H=0 } \nonumber \\ &=& \sum_{k= 0: \ \eta(\lambda) \leq k}^\infty \frac{(m_0)^k}{k!}e^{-m_0} . \end{eqnarray} In this last expression only the integer ceiling $\lceil \eta (\lambda )\rceil$ of $\eta (\lambda )$ matters, where $\lceil \eta (\lambda )\rceil = \inf \left \{ k \in \mathbb{N}: \eta (\lambda )\leq k \right \} $, whence \[ P_F(d_\lambda) = \sum_{ k= \lceil \eta (\lambda )\rceil }^\infty \frac{(m_0)^k}{k!}e^{-m_0} . \] As a result, the mapping $\lambda\rightarrow P_F(d_{\lambda })$ is easily seen to be a {\em left-continuous piecewise constant} mapping with \[ P_F(d_\lambda) = P_F(d_{\lambda_n}), \quad \begin{array}{l} \lambda_n < \lambda \leq \lambda_{n+1} \\ n=0,1, \ldots \\ \end{array} \] where $\{ \lambda_n, \ n=1,2, \ldots \}$ is a strictly monotone increasing sequence determined by the relation \[ n = \frac{ \log \lambda_n e^{-(m_0 - m_1)} } { \log \left ( \frac{m_1}{m_0} \right ) }. \quad n=1,2, \ldots \] or equivalently, \[ \lambda_n = \left ( \frac{m_1}{m_0} \right )^n e^{-(m_1 - m_0)}, \quad n=1, 2,\ldots \] It is now plain that whenever $\alpha$ is chosen in $[0,1]$ such that \[ \] for some integer $n = 0, 1, \ldots$ then the requirement that $P_F (d _{\lambda (\alpha )} ) = \alpha $ cannot be met. This difficulty is circumvented by enlarging $\cal D$ with {\em randomized} policies; see Section \ref{sec:DET_1+RandomizedTests}. \section{The receiver operating characteristic (ROC)} \label{sec:DET_1+ROC} A careful inspection of the solutions to the three formulations discussed so far shows that under mild assumptions, the tests of interest all take the form \begin{equation} d_\eta (\myvec{y})=0 \quad {\rm iff} \quad f_1(\myvec{y}) < \eta f_0(\myvec{y}) \label{eq:DET_1+ROC+d_eta} \end{equation} for some $\eta > 0$. It is only the value of the threshold $\eta$ that varies with the problem formulation. With the notation used earlier, we have \begin{enumerate} \item In the Bayesian formulation, \[ \eta_B = {{\Gamma_0(1-p)}\over {\Gamma_1p}} \] \item In the minimax formulation, \[ \eta_{m} = {{\Gamma_0(1-p_{\rm m})}\over {\Gamma_1p_{\rm m}}} \] with $p_{\rm m}$ such that \[ V(p_{\rm m}) = \max \left ( V(p): \ p \in [0,1] \right ) \] \item In the Neyman--Pearson formulation, \[ \eta_{\rm NP}(\alpha ) = \lambda (\alpha ). \] with $\lambda(\alpha)$ satisfying the constraint (\ref{eq:DET_1+NeymanPearsonFormulation+MeetConstraint}). \end{enumerate} In view of this, it seems natural to analyze in some details the performance of the tests (\ref{eq:DET_1+ROC+d_eta}). This is done by considering how their probabilities of detection and of false alarm, namely $P_F (d_\eta )$ and $P_D (d_\eta )$, vary in relation to each other as $\eta$ ranges from $\eta =0$ to $\eta =+\infty$. This is best understood by plotting the graph $(\Gamma)$ of the detection probability against the probability of false alarm. Such a graph is analogous to a {\em phase portrait} for two-dimensional non-linear ODEs, and is called a {\em receiver operating characteristic} (ROC) curve. Its {\em parametric} representation is given by \[ \mathbb{R} _+ \rightarrow [0,1]\times [0,1] : \eta \rightarrow (P_F (d_\eta ) , (P_D (d_\eta ) ), \] whence \[ (\Gamma ): \quad \left \{ (P_F (d_\eta ) , (P_D (d_\eta ) ) , \ \eta \geq 0 \right \}. \] This graph is {\em completely} determined by the probability distributions $F_0$ and $F_1$ of the observation rv $\myvec{Y}$ under the two hypotheses (through the densities $f_0$ and $f_1$ with respect to the underlying distribution $F$) and not by cost assignments or the prior probabilities. A typical ROC curve is drawn below. \paragraph{Geometric properties} The following geometric properties of the ROC curve are useful for using it. \begin{theorem} {\sl Assume that the absolute continuity conditions {\bf (A.1)} and {\bf (A.2)} hold. (i): Both mappings $\mathbb{R}_+\rightarrow [0,1]: \eta\rightarrow P_F (d\eta )$ and $\mathbb{R}_+\rightarrow [0,1]:\eta\rightarrow P_D(d\eta )$ are monotone non-increasing, with \[ P_F (d_0) = 1 \quad \mbox{and} \quad P_D (d_0) =1, \] and \[ \lim_{\eta \uparrow \infty} P_F(d_\eta ) = \lim_{\eta \uparrow \infty} P_D(d_\eta ) = 0. \] (ii): If the right-derivative of $\eta \rightarrow P_F (d_\eta ) $ exists at $\eta=\lambda$ for some $\lambda \geq 0$, then the right-derivative of $\eta \rightarrow P_D (d_\eta ) $ exists at $\eta=\lambda$, and the relation \begin{equation} {d^+ \over d\eta } P_D (d_\eta ) \Big |_{\eta = \lambda} = \lambda \cdot {d^+ \over d\eta } P_F (d_\eta ) \Big |_{\eta = \lambda} \end{equation} holds. (iii): If the left-derivative of $\eta \rightarrow P_F (d_\eta ) $ exists at $\eta=\lambda$ for some $\lambda > 0$, then the left-derivative of $\eta \rightarrow P_D (d_\eta ) $ exists at $\eta=\lambda$, and the relation \begin{equation} {d^- \over d\eta } P_D (d_\eta )\Big |_{\eta = \lambda} = \lambda \cdot {d^- \over d\eta } P_F (d_\eta ) \Big |_{\eta = \lambda} \end{equation} holds. } \label{thm:DET_1+ROC+BasicProperties} \end{theorem} The proof of Theorem \ref{thm:DET_1+ROC+BasicProperties} is available in Section \ref{sec:DET_1+Proofs}. It follows from this last result that whenever the mapping $\eta\rightarrow P_F (d_\eta )$ is {\em strictly} decreasing, then the mapping $\eta\rightarrow P_D (d_\eta )$ is also {\em strictly} decreasing, whence the curve $(\Gamma )$ can be represented as the graph of a function $\Gamma : [0,1] \rightarrow [0,1]: P_F \rightarrow P_D = \Gamma (P_F )$, i. e., for every $\eta \geq 0$, $$ P_D (d_\eta ) = \Gamma ( ( P_F (d_\eta ) ).$$ In this case, Theorem 3 yields the the following information \begin{corollary} {\sl Assume the absolute continuity conditions {\bf (A.1)}-{\bf (A.2)} to hold. Whenever the mapping $\mathbb{R}_+\rightarrow [0,1]: \eta\rightarrow P_F (d_\eta )$ is differentiable and strictly decreasing, so is the mapping $\mathbb{R}_+\rightarrow [0,1]: \eta\rightarrow P_D(d_ \eta )$. In that case, the mapping $\Gamma : [0,1] \rightarrow [0,1]$ is differentiable, strictly increasing and concave with \[ \frac{ d \Gamma }{ d P_F } \left ( P_F (d_\eta ) \right ) = \eta, \quad \eta \geq 0. \] \label{cor:DET_1+ROC+BasicProperties} } \end{corollary} \myproof By the very definition of the function $\Gamma$, the identity \[ P_D (d_\eta ) =\Gamma (P_F (d_\eta )), \quad \eta \geq 0 \] must hold. Now Theorem 3 and the Implicit Function Theorem yield the differentiability of the mapping $\Gamma$ while the formula for its derivative follows from the chain rule applied to the identity given above. The other properties follow readily. \myendpf \paragraph{Operating the ROC} These results are most useful for operationally using the ROC curve: \begin{enumerate} \item For the Neyman--Pearson test of size $\alpha$, consider the point on the ROC curve with abscissa $\alpha$. It is determined by the threshold value $\eta (\alpha )$ with the property that $$P_F(d_{\eta (\alpha )})~=~\alpha .$$ and $d_{NP}(\alpha )$ is simply $d_{\eta (\alpha )}$. Note that $\eta (\alpha )$ is the {\em slope} of the tangent to the ROC curve at the point with abscissa $\alpha$ and the power $\beta (\alpha )$ of the test is simply the ordinate of that point. \item For the Bayesian problem, $\eta$ is determined by the cost assignment and the prior distribution of the rv H. The values of $P_D(d_\eta )$ and $P_F (d_\eta )$ can be easily determined by finding the point on the ROC where the tangent has slope $\eta$. \item The Minimax Equation takes the form \[ C(1,1) - C(0,0) + \Gamma_1 P_M(d_\eta) - \Gamma_0 P_F(d_\eta) = 0, \] or equivalently \[ C(1,1) - C(0,0) + \Gamma_1 = \Gamma_1 P_D(d_\eta) + \Gamma_0 P_F(d_\eta). \] This shows that the minimax rule $d^\star_{\rm m}$ is obtained as follows. Consider the straight line (L) in the $(P_F,P_D)$-plane with equation \[ C(1,1) - C(0,0) + \Gamma_1 = \Gamma_1 P_D + \Gamma_0 P_F. \] Let $(P_F^\star,P_D^\star)$ be the point of intersection of the straight line (L) wth the ROC curve $(\Gamma )$, and let $\eta^\star$ be the corresponding threshold value, i.e., $P_F^\star = P_F(d_{\eta^\star}) $ and $P_D^\star = P_D(d_{\eta^\star}) $ It is now clear that $d^\star_{\rm m} = d_{\eta^\star})$. \end{enumerate} \section{ROC curves for some examples} \label{sec:DET_1+ROCExamples} \paragraph{The Gaussian case} The setting is that of Section \ref{sec:DET_1+GaussianCase} to which we refer the reader for the notation. As shown there, for any $\eta >0$ we have \[ P_F(d_\eta) = 1 - \Phi \left ( \frac{\log \eta +\frac{1}{2} d^2 }{d} \right ) \] and \[ P_M(d_\eta) = \Phi \left ( \frac{\log \eta - \frac{1}{2} d^2 }{d} \right ) \] so that \[ P_D(d_\eta) = 1 - \Phi \left ( \frac{\log \eta - \frac{1}{2} d^2 }{d} \right ). \] To find the ROC curve, note that \[ d \Phi^{-1} \left ( 1 - P_F (d_\eta) \right ) - \frac{d^2}{2} = \log \eta , \] while \[ d \Phi^{-1} \left ( 1 - P_D (d_\eta) \right ) + \frac{d^2}{2} = \log \eta , \] whence \[ d \Phi^{-1} \left ( 1 - P_F (d_\eta) \right ) - \frac{d^2}{2} = d \Phi^{-1} \left ( 1 - P_D (d_\eta) \right ) + \frac{d^2}{2} . \] It follows that \[ \Phi^{-1} \left ( 1 - P_D (d_\eta) \right ) = \Phi^{-1} \left ( 1 - P_F (d_\eta) \right ) - d \] so that \[ 1 - P_D (d_\eta) = \Phi \left ( \Phi^{-1} \left ( 1 - P_F (d_\eta) \right ) - d \right ) \] This shows that here the mapping $\Gamma: [0,1] \rightarrow [0,1]$ is well defined and given by \[ P_D = 1 - \Phi \left ( \Phi^{-1} \left ( 1 - P_F \right ) - d \right ), \quad P_F \in [0,1]. \] \paragraph{The Bernoulli case} \section{Proofs} \label{sec:DET_1+Proofs} \paragraph{Preliminaries} We start with some facts that prove useful in discussing both Lemma \ref{lem:DET_1+NeymanPearsonFormulation+MeetingConstraint} and Theorem \ref{thm:DET_1+ROC+BasicProperties}. Fix $\lambda > 0$, and define the set \[ R(\lambda) \equiv \{ \myvec{y} \in \mathbb{R}^k: \ f_1(\myvec{y}) \geq \lambda f_0 (\myvec{y}) \}. \] Noting that \[ d_\lambda (\myvec{y}) = 1 \quad \mbox{iff} \quad \myvec{y} \in R(\lambda) , \] it is plain that \begin{eqnarray} P_F(d_\lambda ) &=& \bP{ d_\lambda (\myvec{y}) = 1 | H=0 } \nonumber \\ &=& \bP{ \myvec{Y} \in R(\lambda ) | H=0 } \nonumber \\ &=& \int_{ R(\lambda) } f_0(\myvec{y}) dF(\myvec{y}) \end{eqnarray} and \begin{eqnarray} P_D(d_\lambda ) &=& \bP{ d_\lambda (\myvec{y}) = 1 | H=1 } \nonumber \\ &=& \int_{ R(\lambda) } f_1(\myvec{y}) dF(\myvec{y}). \end{eqnarray} For each $\Delta \lambda > 0$, easy algebra now leads to \begin{eqnarray} P_F(d_{\lambda + \Delta \lambda} ) - P_F(d_\lambda ) &=& \int_{R(\lambda+\Delta\lambda)} f_0(\myvec{y}) dF(\myvec{y}) - \int_{R(\lambda)} f_0(\myvec{y}) dF(\myvec{y}) \nonumber \\ &=& - \int_{R_+( \lambda;\Delta \lambda )} f_0(\myvec{y}) dF(\myvec{y}) \end{eqnarray} where \[ R_+( \lambda;\Delta \lambda ) \equiv \{ \myvec{y} \in \mathbb{R}^k: \ \lambda f_0 (\myvec{y}) \leq f_1(\myvec{y}) < (\lambda + \Delta \lambda ) f_0 (\myvec{y}) \} . \] Similarly, we have \begin{eqnarray} P_F(d_{\lambda - \Delta \lambda} ) - P_F(d_\lambda ) &=& \int_{R(\lambda-\Delta\lambda)} f_0(\myvec{y}) dF(\myvec{y}) - \int_{R(\lambda)} f_0(\myvec{y}) dF(\myvec{y}) \nonumber \\ &=& \int_{R_-( \lambda;\Delta \lambda )} f_0(\myvec{y}) dF(\myvec{y}) \end{eqnarray} where \[ R_-( \lambda;\Delta \lambda ) \equiv \{ \myvec{y} \in \mathbb{R}^k: \ (\lambda - \Delta \lambda ) f_0 (\myvec{y}) \leq f_1(\myvec{y}) < \lambda f_0 (\myvec{y}) \}. \] \paragraph{A proof of Lemma \ref{lem:DET_1+NeymanPearsonFormulation+MeetingConstraint}} Fix $\lambda > 0$ and $\Delta \lambda > 0$. When $F$ is Lesbegue measure on $\mathbb{R}^k$, these relations become \[ P_F(d_{\lambda \pm \Delta \lambda} ) - P_F(d_\lambda ) = \mp \int_{ R_\pm ( \lambda;\Delta \lambda )} f_0(\myvec{y}) d\myvec{y} \] \begin{eqnarray} P_F(d_{\lambda + \Delta \lambda} ) - P_F(d_\lambda ) &=& - \int_{R_+( \lambda;\Delta \lambda )} f_0(\myvec{y}) d\myvec{y} \nonumber \\ &=& -\int_{ \mathbb{R}} \1{ \myvec{y} \in R_+( \lambda;\Delta \lambda ) } f_0(\myvec{y}) d\myvec{y} \end{eqnarray} and \begin{eqnarray} P_F(d_{\lambda - \Delta \lambda} ) - P_F(d_\lambda ) &=& \int_{R_-( \lambda;\Delta \lambda )} f_0(\myvec{y}) d\myvec{y} \nonumber \\ &=& \int_{ \mathbb{R}} \1{ \myvec{y} \in R_-( \lambda;\Delta \lambda ) } f_0(\myvec{y}) d\myvec{y}. \end{eqnarray} Because \[ \cap_{ \Delta \lambda > 0 } R_+ ( \lambda;\Delta \lambda ) = \{ \myvec{y} \in \mathbb{R}^k: \ \lambda f_0(\myvec{y}) \} \] and \[ \cap_{ \Delta \lambda > 0 } R_-( \lambda;\Delta \lambda ) = \{ \myvec{y} \in \mathbb{R}^k: \ \lambda f_0(\myvec{y}) \} \] we note that \[ \lim_{ \Delta \lambda \downarrow 0} \1{ \myvec{y} \in R_+( \lambda;\Delta \lambda ) } = \1{ \myvec{y} \in R^\star( \lambda ) }, \quad \myvec{y} \in \mathbb{R}^k \] and \[ \lim_{ \Delta \lambda \downarrow 0} \1{ \myvec{y} \in R_-( \lambda;\Delta \lambda ) } = \1{ \myvec{y} \in R^\star( \lambda ) }, \quad \myvec{y} \in \mathbb{R}^k \] \paragraph{A proof of Theorem \ref{thm:DET_1+ROC+BasicProperties}} The first part of the theorem readily follows from the monotonicity of the sets $\{ R (\eta ), \eta \geq 0 \}$, namely \[ R (\eta_2 ) \subseteq R (\eta_1 ), \quad 0 \leq \eta_1 < \eta_2. \] Next, fix $\eta \geq 0$. With $\Delta\eta > 0$, it is easy to see that The very definition of $R(\eta ; \Delta\eta )$ implies the inequalities \[ \eta \int_{R(\eta; \Delta \eta )} f_0(\myvec{y}) dF(\myvec{y}) \leq \int_{R(\eta; \Delta \eta )} f_1(\myvec{y}) dF(\myvec{y}) \] and \[ \int_{R(\eta; \Delta \eta )} f_1(\myvec{y}) dF(\myvec{y}) \leq (\eta + \Delta \eta ) \int_{R(\eta; \Delta \eta )} f_0 (\myvec{y}) dF(\myvec{y}). \] It then follows that \[ (\eta + \Delta \eta ) \cdot \frac{ P_F(d_{\eta+ \Delta \eta} ) - P_F(d_{\eta})}{\Delta \eta } \leq \frac{ P_D(d_{\eta+ \Delta \eta} ) - P_D(d_{\eta})}{\Delta \eta } \] and \[ \frac{ P_D(d_{\eta+ \Delta \eta} ) - P_D(d_{\eta})}{\Delta \eta } \leq \eta \cdot \frac{ P_F(d_{\eta+ \Delta \eta} ) - P_F(d_{\eta})}{\Delta \eta }. \] If the right-derivative of $\eta \rightarrow P_F(d_\eta )$ exists, then \[ \frac{d^+}{d\eta} P_F(d_\eta ) = \lim_{ \Delta \eta \downarrow 0 } \frac{ P_F(d_{\eta+ \Delta \eta} ) - P_F(d_{\eta})}{\Delta \eta } \] and an easy sandwich argument shows that the limit \[ \lim_{ \Delta \eta \downarrow 0 } \frac{ P_D(d_{\eta+ \Delta \eta} ) - P_D(d_{\eta})}{\Delta \eta } \] also exists. Therefore, the right-derivative of $\eta \rightarrow P_D(d_\eta )$ also exists and is given by \[ \frac{d^+}{d\eta} P_D(d_\eta ) = \eta \cdot \frac{d^+}{d\eta} P_F(d_\eta ). \] \myendpf \section{Exercises} \section{References}