## Itakura.kes.tul.cz

ALGORITHMS FOR NONORTHOGONAL APPROXIMATE JOINT
Institute of Information Theory and Automation
Pod vod´arenskou vˇeˇz´ı 4, 18208 Prague 8, Czech Republic
Approximate joint block diagonalization (AJBD) of a set of
matrices has applications in blind source separation, e.g.,

*n***W***T ,*
when the signal mixtures contain mutually independent sub-
are all approximately block diagonal, having the blocks on the
spaces of dimension higher than one. In this paper we present
main diagonal of the same size as the original matrices

**M**
three novel AJBD algorithms which differ in the final target
Ideally, one may wish to estimate

**W **=

**A***−*1 and get

**RW ***≈*
criterion to be minimized in the optimization process. Two
Bdiag(

**RW***, . . . , ***RW **), where

**RW ***≈ ***M**
of the algorithms extend the principle of Jacobi elementary
task is reversed and consists in fitting the given data matrices
rotations to the more general concept of matrix elementary
rotations. The proposed algorithms are compared to existing

*n *by the model (1) with parameters

**A***, {***M***n}*.

In general, however, it is impossible to recover the orig-
inal blocks

**M***nk *(even in the “noiseless” case), because of

*Index Terms*— Source separation, independent sub-
inherent ambiguities of the problem (e.g., [6]), but it is possi-
ble to recover “independent subspaces”, as explained below.

Let

**W**0 =

**A***−*1, also called demixing matrix, be parti-

tioned in

*K *blocks

**W***k *of size

*Ik×d*,

**W**0 = [

**W***T , . . . , ***W***T *]

*T*
Each block

**W***k *represents a linear space of all linear com-

Consider a set of square symmetric matrices

**M***n*,

*n *=

1

*, . . . , N *, that are all block diagonal, with

*K *blocks along its
uniquely identifiable [6, 2]. Let

**W **be an estimated demixing

main diagonal,

**M***n *= Bdiag(

**M***n*1

*, . . . , ***M***nK *), where

**M***nk*
matrix. We say that

**W **is “essentially equivalent” to

**W**0 (and

is the

*k−*th block of

**M***n *and the Bdiag(

*·*) operator constructs

therefore represents an ideal joint block diagonalization), if
a block-diagonal matrix from its argument matrices. The size
there exists a suitable

*d × d *permutation matrix

**Π **such that

of the blocks might not be identical. We assume that the size
for each

*k *= 1

*, . . . , K *the subspaces spanned by

**W***k *and by

of the block

**M***nk *is

*Ik × Ik*. The dimension of the matrices

the respective

*k*-th block of

**ΠW **coincide, and their mutual

**M***n *is

*d *=

*I*1 +

*. . . IK *.

Next, assume that (possibly perturbed) congruence trans-
Some existing AJBD algorithms are restricted to the case
formations of these matrices are given as
where

**A **(and therefore also

**W**) are orthogonal [3], some

other algorithms consider a general matrix

**A **[4, 6]. In this

**R***n *=

**AM***n***A***T *+

**N***n,*
where the superscript

*T *denotes a matrix transposition,

**A **is

It was shown in [8] that reasonable solutions to AJBD can
an unknown square “mixing matrix”, and

**N**
be obtained using a two step approach, by first applying an
tion (or “noise”) matrix. We shall refer to the case where all
ordinary approximate joint diagonalization (AJD) algorithm,
and then clustering the separated components (rows of the

*n *=

**0**,

*n *= 1

*, . . . , N *as the “unperturbed” (or “noiseless”)

case. The choice of symbol

**R **reflects the fact that the matri-

demixing matrix). More specifically, it was shown that un-
ces in the set often play a role of (sample-) covariance matri-
like several popular AJD approaches, one recently proposed
ces of a partitioned data, or time-lagged (sample-) covariance
AJD method (U-WEDGE, Uniformly Weighted Exhaustive
Diagonalization with Gauss itErations [9]) features a unique
The goal in Approximate Joint Block Diagonalization
ability to attain ideal separation in the unperturbed (“noise-
(AJBD) is to find a “demixing” matrix

**W**, such that the

less”) case, for general (not necessarily orthogonal) matrices

*∗*Supported by Grant Agency of the Czech Republic through the project
1The mutual angle between two subspaces can be obtained in Matlab us-

**A**. The above mentioned methods are computationally ap-

Success of the joint block diagonalization can be measured by
pealing, but it appeared that in the presence of the noise, ded-
these three criteria or, if the ideal mixing/demixing matrix is
icated block algorithms may perform better. This paper is
known, by the angle between the true and estimated subspaces
therefore devoted to finding AJBD solutions that minimize
mentioned in the previous subsection.

some

*a priori *chosen criteria.

The paper is organized as follows. Section 2 presents a
survey of three main cost functions used for AJBD in the lit-
erature. In Section 3, a general AJBD algorithm based ongeneralized Jacobi rotations is proposed. Specific details of
Each of the criterion mentioned in the former section (and
the general procedure for the two criteria are derived in Sec-
namely the first two criteria) can be optimized by applying
tion 4. Section 5 presents an algorithm for minimizing the
set of elementary rotations that follow the idea of well known
third criterion. In Section 6, a novel initialization method is
Jacobi rotations. The elementary rotations follow the block
proposed. A simulation study in Section 7 verifies effective-
structure of the assumed block matrices

**M***n*. The rotations

*ij *(

**B***, ***C**) =

In the literature, we can see three main criteria used for joint
where the diagonal is as in the identity matrix and the only

*C*LS(

**W**) =

*∥*Boff(

**RW**)

*∥*2

nonzero off-diagonal blocks

**B **and

**C **of the sizes

*Ii × Ij*
and

*Ij × Ii *lie at positions (

*i, j*) and (

*j, i*), respectively, for
where the operator “Boff” nullifies the elements of a matrix
that lie in the diagonal blocks. This is used in [3] to build a
The blocks

**B **and

**C **are selected so that the updated

unitary block diagonalization algorithm. In the case of non-
demixing matrix

**W***′ *=

**E***ij*(

**B***, ***C**)

**W **approximates the low-

orthogonal diagonalization, the criterion has to be completed
est attainable criterion function

*C*(

**W***′*), where

*C*(

**W***′*) is one

by adding a constraint that prevents the algorithm from con-
of the criteria (3), (4), (5). The idea of the minimization is
verging to the trivial solution

**W **=

**0**. The constraint is that

that the elementary rotations

**E***ij*(

**B***, ***C**) should be small, i.e.

all rows of

**W **have unit Euclidean norm. An example is [4].

the matrices

**B **and

**C **should be small. The true criterion

A different criterion was proposed by Lahat

*et al *[5].

function

*C*(

**W***′*) is replaced by the second order approxi-

mation in terms of

**B **and

**C**, and is minimized in a closed

det Bdiag(

**RW**)

We will show in the next section that the approximation
of

*C*(

**W***′*) might have the form

*C*(

**W***′*) =

*φ*(

**B***, ***C**),

where Bdiag is the opposite of the operator “Boff”, nullifies
the elements of a matrix that lie out of the diagonal blocks.

*φ*(

**B***, ***C**) =

*φ*
The criterion in motivated by maximum likelihood estima-

*n*1

**B**) + tr(

**J***n*2

**C**)

tion. It is a generalization of the criterion proposed for ap-proximate joint diagonalization in [7]. Minimization of this
+ tr(

**J***n*3

**BJ***n*4

**B**) + tr(

**J***n*5

**BJ***n*6

**B***T *)

criterion leads to the maximum likelihood estimator, if

**R***n*
+ tr(

**J***n*7

**CJ***n*8

**C**) + tr(

**J***n*9

**CJ***n*10

**C***T *)

are sample covariance matrices from Gaussian distributed iid
+tr(

**J***n*11

**BJ***n*12

**C**) + tr(

**J***n*13

**BJ***n*14

**C***T *)

random vectors that have covariance matrices admitting theexactly block diagonal structure.

where the matrices

**J***nm*,

*m *= 1

*, . . . , *14, depend on

**RW**
The third possible criterion was suggested by Nion [6].

*φ*0 =

*φ*(

**0***, ***0**), and “tr” denotes the trace of a matrix (sum

The latest method is based on fitting the block diagonal model
of diagonal elements). Explicit expressions for the matrices
to the given set of matrices, seeking the mixing matrix

**A **=

**J***nm*,

*m *= 1

*, . . . , *14, for the criteria, (3) and (4) will be de-

**W***−*1,

*∥***R***n − ***AMAA***T ∥*2

*φ*(

**B***, ***C**) =

*φ*0 +

**g***T ***v**(

**B***, ***C**) +

**v**(

**B***, ***C**)

*T ***Hv**(

**B***, ***C**) (9)

*∥***R**
*n − ***AM***n***A***T ∥*2

**v**(

**B***, ***C**) =

*n *=Bdiag(

**M***n *)

**g **is a suitable vector of the same length, and

**H **is a symmetric

Let

**RW **and ˜

**R **=

**ERWE***T *be partitioned to blocks

**R**11

*, . . . , ***R**33 and ˜

**R**11

*, . . . , *˜

**R**33, respectively. Since

vec(

**J***T *)

vec(

**J***T *)

**C***T*
**B***T*
**H***BCn*
**H***T*
**H***CCn*
**R**12 =

**R**12 +

**R**11

**C***T *+

**BR**22, ˜

**R**13 =

**R**13 +

**BR**23

**R**23 =

**R**23 +

**CR**13. Then

**H***BBn*
**J***n*6

*⊗ ***J***T *+

**J***T ⊗ ***J**
*LS *(

**B***, ***C**) =

*∥ *˜

**R**12

*∥*2

*F*
+(

**J***n*3

*⊗ ***J***T *+

**J**
*n*4

*⊗ ***J***T*
A straightforward computation gives

**J**1 = 2

**R**22

**R***T*12,

**J**2 =

**H***BCn*
**J***n*14

*⊗ ***J***T*
*n*12

*⊗ ***J***T*
2

**R**11

**R**12,

**J**5 =

**I**,

**J**6 =

**R**222,

**J**9 =

**I**,

**J**10 =

**R**211,

**J**11 =

**H***CCn*
**J***n*10

*⊗ ***J***T *+

**J***T*
*⊗ ***J**
**R**11,

**J**12 = 2

**R**22,

**J**3 =

**J**4 =

**J**7 =

**J**8 =

**J**13 =

**J**14 =

**0**.

+(

**J***n*7

*⊗ ***J***T *+

**J**
)

**P***,*
*n*8

*⊗ ***J***T*
*⊗ *is the Kronecker product, and where

**P **stands for a permu-

Consider the following Taylor series expansion of the matrix
tation matrix such that

**P**vec(

**M**) = vec(

**M***T *) for any matrix

function log det

**R **in a neighborhood of a positive definite

**M **of a suitable dimension (

*Ii × Ij*).

Once

**g **and

**H **are computed, the optimum

**v**(

**B***, ***C**) that

0. Let ∆

**R **=

**R ***− ***R**0. It holds

minimizes

*φ*(

**B***, ***C**) can be found as

*≈ *tr(

**R***−*1∆

**R**)

*− *1tr(

**R***−*1∆

**RR***−*1∆

**R**)

*.*
**v**(

**B***, ***C**) =

*−***H***−*1

**g ***.*
We apply this approximation to the last line of the expression
After each elementary transformation, i.e. update

**W***′ *=

**E**
*ij *(

**B***, ***C**)

**W**, it is recommended to check if the original op-

*φ*LL(

**B***, ***C**) =

timization criterion has really decreased its value. If this is not
the case, it means that the quadratic approximation was not

**R**)

*− *1

det Bdiag(

**R***W *)

accurate enough. One can proceed by replacing the step (16)
det

**R***W*
by another one as it is done in the so called damped Gauss-
Newton algorithm, also known as Levenberg-Marquardt,
det Bdiag(

**R***W *)

det

**R***W*
**v**(

**B***, ***C**) =

*−*(

**H **+

*µ***I**)

*−*1

**g**
After some computation we get (8) with

**J**1 =

**J**3 =

**R**21

**R***−*1,

**J**
**R***−*1,

2 =

**J**7 =

**R**12

**R***−*1

where

*µ *is a suitable positive parameter. Discussion of this

**J**6 =

**R**22

*− ***R**21

**R***−*1

**R**
**R***−*1,

=

*− *1 7,

**J**9 = 1 22

modification, however, exceeds the scope of this paper.

**J**10 =

**R**11

*− ***R**12

**R***−*1

**R**21,

**J**11 =

**J**12 =

**I**,

**J**13 =

**J**14 =

**0**.

The main algorithm consists after a suitable initialization
The resultant algorithm is similar to the one proposed in [5].

in cyclic minimization of the elementary rotations with re-

spect to all pairs (

*i, j*), 1

*≤ i < j ≤ K*, until norms of the

matrices

**B **and

**C **are smaller than a threshold for all the pairs.

The generalized Jacobi rotations presented in Section 3, ap-
pear not to be so effective in the case of the FIT criterion (5).

The reason is that while in the case of the former two crite-
In this section we analyze the criteria (3), (4) in order to de-
ria, the Hessian matrix

**H **can be shown to be always positive

rive matrices

**J**
(semi-)definite, it is no longer true in the last case. There-

*nm *that are needed for computation of the el-
ementary rotations considered in the previous section. For
fore, we propose another simple iterative procedure for this
simplicity of presentation we assume here that there are only
cost function. Starting from an initial estimate of the mixing
three blocks in the matrices

**M**
matrix

**A**, its update

**A***′ *is obtained as

is sought for

*i *= 1 and

*j *= 2. We can assume without any
loss in generality that the third block includes all remaining

**A***′ *= arg min

*∥***R***n − *˜

**AMAA***T ∥*2

*.*
blocks. We skip the running index

*n*.

global minimum, it is convenient to find a suitable initializa-tion. We propose to use the method advocated in [8], apply

*∥***R***n − *˜

**AMAA***T ∥*2 =

*∥*
*⊗ ***I**)

vec(

**R***n*)

*− *(

**AMA**
**A**)

*∥*2

the AJD algorithm U-WEDGE, followed by a suitable clus-tering of rows of the mixing matrix, which would reveal the

**A**) can be found by solving the overdeter-

block structure of the demixed matrices.

First, we propose a modified method of clustering the
rows, compared to the one proposed in [8]. It is a greedy al-

**AMA ***⊗ ***I**
gorithm again. Given the AJD demixing matrix

**W**, compute

**A**)

*≈ *

an auxiliary matrix

**D **as

**AMA ***⊗ ***I**
*|***WR***n***W***T |*
vec(

**A***′*) =

(

**AMA ***⊗ ***I**)

*T *(

**AMA ***⊗ ***I**)

where the absolute value is taken elementwise. If the demix-
ing is perfect,

**D **should have, after arranging columns and

rows, the same block structure as the original matrices

**M***n*.

(

**AMA ***⊗ ***I**)

*T*
Let the block sizes be ordered increasingly,

*I*1

*≤ . . . ≤ IK*.

vec(

**R***n*)

Elements of all columns of

**D **are sorted decreasingly to form

a matrix

**D***′*. Then, sort the (

*I*1 +1)-th row of

**D***′ *and denote it

**r**1 for easy reference. The first block of the demixing matrix

will be composed of the indices that correspond to the small-

**A***′ *=

1 elements in

**r**1. It means that

**W**1 is built of the rows

*n***AMA**
**MAA***T ***AMA**
of

**W **with these indices. The rows and columns of

**D **at the

positions

*i*1

*, . . . , iI *are set to zero, and the procedure iterated
It remains to explain computation of

**MA**
until

*K − *1 subspaces (blocks)

**W**
*k *,

*k *= 1

*, . . . , K − *1, have
ing matrix

**A **be vertically partitioned as

**A **= [

**A**
been determined. The

*K*-th block

**W**
1

*, . . . , ***A***K *],

where the size of blocks

**A**
the rows that have not been selected before.

*k *is

*d × Ik*, and let

**M***nk *be the

*k*th

diagonal block of

**MA**
*∥***R***n − ***AMAA***T ∥*2 =

*∥***R**
*k ***M***nk ***A***T*
We have compared performance of five AJBD algorithms:
(1) U-WEDGE completed by clustering of rows of a demix-
ing matrix, this algorithm is used to initialize all subsequent
=

*∥*vec(

**R***n*)

*−*
(

**A***k ⊗ ***A***k*)vec(

**M***nk*)

*∥*2

ones, (2) algorithm JBD NCG [6], (3) the ad hoc algorithm
of Ghennioui et al [4], and two algorithms proposed in this pa-
The desired elements of

**MA**
per: (4) LSJBD and (5) FITJBD, minimizing the criteria (3)
and (5), respectively. The LLAJD algorithm is not included,

**m***n *= [vec(

**M***n*1)

*T , . . . , *vec(

**M***nK *)

*T *]

*T .*
because it would need a different setup with all target matricespositive (semi-)definite.

The optimum

**m***n *is given as

We consider 10 target matrices, each having four diagonal
blocks of the size 5

*× *5. The blocks were taken at random,

**m***n *= (

**G***T ***G**)

*−*1

**G***T *vec(

**R***n*)

different at each simulation trial, normally distributed with a
zero mean, and were symmetrized by adding their transposi-
tion. The mixing matrix

**A **was taken at random, also new

1

*⊗ ***A**1

*, . . . , ***A***K ⊗ ***A***K *]

*.*
in each simulation trial. Finally, an additive noise was added
The convergence of the iterative process (19) appears to be
as in (1); the noise matrices were symmetrized as well, and
very good, as it is shown in the simulation section.

scaled to achieve desired signal-to-noise ratio (SNR), whichis defined as 10 log10 of Frobenius norm of the mixtures di-
vided by Frobenius norm of the noise.

The noisy mixtures were processed by each of the five
The algorithms proposed in the previous session are iterative
algorithms in 100 independent trials. Each algorithm has its
and convergence to the global minimum of the respective cri-
own cost function, which is optimized, therefore we have cho-
teria (3), (4), and (5) is not guaranteed. In order to minimize
sen to measure the performance by the angular difference of
probability of getting the algorithm stuck in a local but not
the subspaces spanned by corresponding groups of rows of the
true and estimated demixing matrix. The resultant mean and
median average angular differences, as a function of SNR are
plotted in Figure 1. We note namely excellent performance of

**GHENNIO**

JBD_NCG
**FITJBD**

LSJBD

GHENNIO
Fig. 2. Average and median number of iterations versus SNR.

Fig. 1. Performance of the five AJBD techniques.

the FITJBD algorithm, which appears to work at lower SNR’s
than its competitors. Convergence of the algorithm is nearly
linear, as in gradient methods or as in JBD NCG. In this par-

**GHENNIO**

JBD_NCG
ticular example, the latter algorithm does not work well, but
it is not bad in general, namely in lower dimensions. Both
algorithms need thousands of iterations to converge.

An advantage of the methods based on generalized Jacobi
Fig. 3. Average and median CPU time versus SNR.

rotations (here LSJBD) is that they have nearly quadratic con-vergence. In general they require only 10-100 iterations toconverge. The algorithm of Ghennioui is something between.

[3] C. F´evotte and F. J. Theis, “Pivot Selection Strategies in
Sometimes it converges quickly, as it had a quadratic conver-
Jacobi Joint Block-Diagonalization”,

*Proc. ICA’2007*,
gence, namely at high SNR scenarios, in other cases it is slow.

Numbers of iterations and the CPU time of the algorithms are
[4] H. Ghennioui, F. El Mostafa, N. Thirion-Moreau, A.

shown in Figures 2 and 3, respectively.

Adib, and E. Moreau, “A Nonunitary Joint Block Diag-onalization algorithm for Blind Separation of Convolu-tive Mixtures of Sources”,

*IEEE Signal Processing Let-*
*ters*, vol. 14, pp. 860–863, 2007.

We have presented novel algorithms for approximate joint
[5] D. Lahat, J.-F. Cardoso, and H. Messer, “Joint Block
block–diagonalization. They differ in the cost function that
Diagonalization Algorithms for Optimal Separation of
is optimized. Matlab codes of the proposed algorithms are
Multidimensional Components”, F. Theis et al. (Eds.):
posted on the Internet at http://si.utia.cas.cz/downloadPT.htm .

LVA/ICA 2012, LNCS 7191, pp. 155-162, 2012.

[6] D. Nion, “A Tensor Framework for Nonunitary Joint
Block Diagonalization”,

*IEEE Trans. Signal Process-*
*ing*, vol. 59, no. 10, pp. 4585-4594, Oct. 2011.

The authors wish to give thanks to Dr. Dimitri Nion for send-
[7] D.-T. Pham , “Joint Approximate Diagonalization of
ing them a matlab code of his algorithm JBD NCG.

Positive Definite Hermitian Matrices”,

*SIAM J. MatrixAnal. and Appl. *vol. 22, pp. 1136-1152, 2001.

[8] P. Tichavsk´y, A. Yeredor, and Z. Koldovsk´y, “On Com-
putation of Approximate Joint Block-Diagonalization
[1] K. Abed-Meraim and A. Belouchrani, “Algorithms for
using Ordinary AJD”, F. Theis et al. (Eds.):

*LVA/ICA*
Joint Block Diagonalization”,

*Proc. of EUSIPCO *2004,

*2012*, LNCS 7191, pp. 163-171, 2012.

Vienna, Austria, pp. 209 - 212, 2004.

[9] P. Tichavsk´y and A. Yeredor , “Fast Approximate Joint
Diagonalization Incorporating Weight Matrices”,

*IEEE*
[2] L. de Lathauwer, “Decomposition of Higher-Order Ten-

*Tr. Signal Processing*, vol. 57, pp. 878-891, March 2009.

sor in Block Terms-Part II: Definitions and Unique-ness”,

*SIAM J. Matrix Anal. and Appl. *vol. 30, pp. 1033-1066, 2008.

Source: http://itakura.kes.tul.cz/zbynek/pubs/bloky.pdf

Pursuant to due call and notice thereof, the regularly scheduled meeting of the Spring Lake Park City Council was held on September 16, 2013 at the Spring Lake Park Community Center, 1301 81st Avenue N. E., at 7:00 P.M. 1. Call to Order Mayor Hansen called the meeting to order at 7:00 P.M. 2. Roll Call Members Present: Councilmembers Mason, Nash, Nelson, Raymond and Mayor Hansen Building Offici

The first publications about biphosphonates , which were initially named diphosphonates, were available in 1969, now more than 40 years ago. Biphosphonates are internalized by osteoclasts where they interfere with specific biochemical pathways (Russell, 2011). They enhance osteoclasts apoptosis and they inhibit osteoclasts attachment to the bone matrix (Dominguez et al., 2011). They are theref