232x Filetype PDF File size 0.21 MB Source: paulklein.ca
MatrixCalculus-NotesontheDerivativeofaTrace
JohannesTraa
This write-up elucidates the rules of matrix calculus for expressions involving the trace
of a function of a matrix X:
f =tr£g X ¤ . (1)
( )
Wewouldliketotakethederivativeof f withrespecttoX:
∂f =? . (2)
∂X
One strategy is to write the trace expression as a scalar using index notation, take the
derivative, and re-write in matrix form. An easier way is to reduce the problem to one or
moresmallerproblemswheretheresultsforsimplerderivativescanbeapplied. It’sbrute-
forcevsbottom-up.
MATRIX-VALUEDDERIVATIVE
Thederivativeofascalar f withrespecttoamatrixX∈RM×N canbewrittenas:
1
∂f ∂f ··· ∂f
∂X11 ∂X12 ∂X1N
∂f ∂f ∂f
∂f ∂X ∂X ··· ∂X
= 21 22 2N (3)
∂X . . . .
. . .. .
. . .
∂f ∂f ··· ∂f
∂XM1 ∂XM2 ∂XMN
So, the result is the same size as X.
MATRIXANDINDEXNOTATION
It is useful to be able to convert between matrix notation and index notation. For example,
theproductABhaselements:
[AB]ik =XAijBjk , (4)
j
andthematrixproductABCT haselements:
£ T¤ X £ T¤ X X XX
ABC il = j Aij BC jl = j Aij k BjkClk = j k AijBjkClk . (5)
FIRST-ORDER DERIVATIVES
EXAMPLE1
Considerthisexample:
f =tr AXB . (6)
[ ]
Wecanwritethisusingindexnotationas:
f =X AXB =XXA XB =XXA XX B =XXXA X B . (7)
[ ]ii i j [ ]ji i j jk ki i j jk ki
i i j i j k i j k
Takingthederivativewithrespectto Xjk,weget:
∂f =XA B = BA . (8)
∂X i j ki [ ]kj
jk i
The result has to be the same size as X, so we know that the indices of the rows and
columnsmustbe j andk,respectively. This means we have to transpose the result above
towritethederivativeinmatrixformas:
∂tr AXB
[ ] T T
∂X =A B . (9)
2
EXAMPLE2
Similarly, we have:
£ T ¤ XXX
f =tr AX B = i j k AijXkjBki , (10)
sothatthederivativeis:
∂f =XA B = BA , (11)
∂X i j ki [ ]kj
kj i
TheXtermappearsin(10)withindiceskj,soweneedtowritethederivativeinmatrix
formsuchthatk istherowindexand j isthecolumnindex. Thus,wehave:
£ T ¤
∂tr AX B =BA . (12)
∂X
MULTIPLE-ORDER
Nowconsideramorecomplicatedexample:
£ T¤
f =tr AXBXC (13)
=XXXXXAijXjkBklXlmCim . (14)
i j k l m
ThederivativehascontributionsfrombothappearancesofX.
TAKE1
Inindexnotation:
∂f XXX £ T ¤
∂Xjk = i l m AijBklXlmCim = BXC A kj , (15)
∂f XXX £ T ¤
∂Xlm = i j k AijXjkBklCim = C AXB ml . (16)
Transposingappropriatelyandsummingthetermstogether,wehave:
£ T¤
∂tr AXBXC T T T T T T
∂X =A CX B +B X A C . (17)
TAKE2
Wecanskipthistediousprocessbyapplying(9)foreachappearanceofX:
£ T¤ £ T¤
∂tr AXBXC ∂tr[AXD] ∂tr EXC T T T
∂X = ∂X + ∂X =A D +E C . (18)
3
where D=BXCT andE=AXB. Sowejustevaluatethematrixderivativeforeachappear-
anceofXassumingthateverythingelseisaconstant(includingotherX’s). Toseewhythis
rule is useful, consider the following beast:
£ T T ¤
f =tr AXX BCX XC . (19)
Wecanimmediatelywritedownthederivativeusing(9)and(12):
£ T T ¤ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢
∂tr AXX BCX XC T T T T T T T T T T
= A X BCX XC + BCX XC AX + XC AXX BC + AXX BCX C
∂X ( ) ( ) ( ) ( )
(20)
T T T T T T T T T T T
=AC X XC B X+BCX XCAX+XCAXX BC+XC B XX A C . (21)
FROBENIUSNORM
TheFrobeniusnormshowsupwhenwehaveanoptimizationprobleminvolvingamatrix
factorization andwewanttominimizeasum-of-squareserrorcriterion:
à !
XX X 2 £ ¤
2 T
f = X − W H =kX−WHk =tr X−WH X−WH . (22)
ik i j jk F ( )( )
i k j
Wecanworkwiththeexpressioninindexnotation, but it’s easier to work directly with
matricesandapplytheresultsderivedearlier. Supposewewanttofindthederivativewith
respecttoW. Expandingthematrixouterproduct,wehave:
£ T¤ £ T T¤ £ T¤ £ T T¤
f =tr XX −tr XH W −tr WHX +tr WHH W . (23)
Applying(9)and(12),weeasilydeducethat:
∂tr£ X−WH X−WHT¤
( )( ) T T
∂W =−2XH +2WHH . (24)
LOG
Considerthistraceexpression:
£ T ¤ XX µXX ¶
f =tr V log AXB = V log A X B . (25)
( ) i j im mn nj
i j m n
Takingthederivativewithrespectto Xmn,weget:
∂f XX Vij XX µ V ¶
∂Xmn = i j PPAimXmnBnjAimBnj = i j Aim AXB ijBnj . (26)
m n
Thus:
4
no reviews yet
Please Login to review.