Difference between revisions of "The Chain Rule for Functions of more than One Variable"

Latest revision as of 16:21, 20 January 2022

The generalization of the chain rule to multi-variable functions is rather technical. However, it is simpler to write in the case of functions of the form

f(g_{1}(x),\dots ,g_{k}(x)).

As this case occurs often in the study of functions of a single variable, it is worth describing it separately.

The simplest way for writing the chain rule in the general case is to use the total derivative, which is a linear transformation that captures all directional derivatives in a single formula. Consider differentiable functions $f : R m \to R k$ and $g : R n \to R m$ , and a point $a$ in $R n$ . Let $D a g$ denote the total derivative of $g$ at $a$ and $D g (a) f$ denote the total derivative of $f$ at $g (a)$ . These two derivatives are linear transformations $R n \to R m$ and $R m \to R k$ , respectively, so they can be composed. The chain rule for total derivatives is that their composite is the total derivative of $f \circ g$ at $a$ :

D_{\mathbf {a} }(f\circ g)=D_{g(\mathbf {a} )}f\circ D_{\mathbf {a} }g,

or for short,

D(f\circ g)=Df\circ Dg.

The higher-dimensional chain rule can be proved using a technique similar to the second proof given above.

Because the total derivative is a linear transformation, the functions appearing in the formula can be rewritten as matrices. The matrix corresponding to a total derivative is called a Jacobian matrix, and the composite of two derivatives corresponds to the product of their Jacobian matrices. From this perspective the chain rule therefore says:

J_{f\circ g}(\mathbf {a} )=J_{f}(g(\mathbf {a} ))J_{g}(\mathbf {a} ),

or for short,

J_{f\circ g}=(J_{f}\circ g)J_{g}.

That is, the Jacobian of a composite function is the product of the Jacobians of the composed functions (evaluated at the appropriate points).

The higher-dimensional chain rule is a generalization of the one-dimensional chain rule. If k, m, and n are 1, so that $f : R \to R$ and $g : R \to R$ , then the Jacobian matrices of f and g are $1 \times 1$ . Specifically, they are:

{\begin{aligned}J_{g}(a)&={\begin{pmatrix}g'(a)\end{pmatrix}},\\J_{f}(g(a))&={\begin{pmatrix}f'(g(a))\end{pmatrix}}.\end{aligned}}

The Jacobian of f ∘ g is the product of these $1 \times 1$ matrices, so it is $f'(g (a))\cdot g'(a)$ , as expected from the one-dimensional chain rule. In the language of linear transformations, D_a(g) is the function which scales a vector by a factor of g′(a) and D_g(a)(f) is the function which scales a vector by a factor of f′(g(a)). The chain rule says that the composite of these two linear transformations is the linear transformation $D a (f \circ g)$ , and therefore it is the function that scales a vector by f′(g(a))⋅g′(a).

Another way of writing the chain rule is used when f and g are expressed in terms of their components as $y = f (u) = (f 1 (u), \dots, f k (u))$ and $u = g (x) = (g 1 (x), \dots, g m (x))$ . In this case, the above rule for Jacobian matrices is usually written as:

{\frac {\partial (y_{1},\ldots ,y_{k})}{\partial (x_{1},\ldots ,x_{n})}}={\frac {\partial (y_{1},\ldots ,y_{k})}{\partial (u_{1},\ldots ,u_{m})}}{\frac {\partial (u_{1},\ldots ,u_{m})}{\partial (x_{1},\ldots ,x_{n})}}.

The chain rule for total derivatives implies a chain rule for partial derivatives. Recall that when the total derivative exists, the partial derivative in the ith coordinate direction is found by multiplying the Jacobian matrix by the ith basis vector. By doing this to the formula above, we find:

{\frac {\partial (y_{1},\ldots ,y_{k})}{\partial x_{i}}}={\frac {\partial (y_{1},\ldots ,y_{k})}{\partial (u_{1},\ldots ,u_{m})}}{\frac {\partial (u_{1},\ldots ,u_{m})}{\partial x_{i}}}.

Since the entries of the Jacobian matrix are partial derivatives, we may simplify the above formula to get:

{\frac {\partial (y_{1},\ldots ,y_{k})}{\partial x_{i}}}=\sum _{\ell =1}^{m}{\frac {\partial (y_{1},\ldots ,y_{k})}{\partial u_{\ell }}}{\frac {\partial u_{\ell }}{\partial x_{i}}}.

More conceptually, this rule expresses the fact that a change in the x_i direction may change all of g₁ through g_m, and any of these changes may affect f.

In the special case where $k = 1$ , so that f is a real-valued function, then this formula simplifies even further:

{\frac {\partial y}{\partial x_{i}}}=\sum _{\ell =1}^{m}{\frac {\partial y}{\partial u_{\ell }}}{\frac {\partial u_{\ell }}{\partial x_{i}}}.

This can be rewritten as a dot product. Recalling that u = (g₁, …, g_m), the partial derivative $\partial u / \partial x i$ is also a vector, and the chain rule says that:

{\frac {\partial y}{\partial x_{i}}}=\nabla y\cdot {\frac {\partial \mathbf {u} }{\partial x_{i}}}.

Example

Given $u (x, y) = x 2 + 2 y$ where $x (r, t) = r sin(t)$ and $y (r, t) = sin 2 (t)$ , determine the value of $\partial u / \partial r$ and $\partial u / \partial t$ using the chain rule.

{\frac {\partial u}{\partial r}}={\frac {\partial u}{\partial x}}{\frac {\partial x}{\partial r}}+{\frac {\partial u}{\partial y}}{\frac {\partial y}{\partial r}}=(2x)(\sin(t))+(2)(0)=2r\sin ^{2}(t),

and

{\begin{aligned}{\frac {\partial u}{\partial t}}&={\frac {\partial u}{\partial x}}{\frac {\partial x}{\partial t}}+{\frac {\partial u}{\partial y}}{\frac {\partial y}{\partial t}}\\&=(2x)(r\cos(t))+(2)(2\sin(t)\cos(t))\\&=(2r\sin(t))(r\cos(t))+4\sin(t)\cos(t)\\&=2(r^{2}+2)\sin(t)\cos(t)\\&=(r^{2}+2)\sin(2t).\end{aligned}}

Higher derivatives of multivariable functions

Faà di Bruno's formula for higher-order derivatives of single-variable functions generalizes to the multivariable case. If y = f(u) is a function of $u = g (x)$ as above, then the second derivative of $f \circ g$ is:

{\frac {\partial ^{2}y}{\partial x_{i}\partial x_{j}}}=\sum _{k}\left({\frac {\partial y}{\partial u_{k}}}{\frac {\partial ^{2}u_{k}}{\partial x_{i}\partial x_{j}}}\right)+\sum _{k,\ell }\left({\frac {\partial ^{2}y}{\partial u_{k}\partial u_{\ell }}}{\frac {\partial u_{k}}{\partial x_{i}}}{\frac {\partial u_{\ell }}{\partial x_{j}}}\right).

Resources

Chain Rule, WikiBooks: Multivariable Calculus
Multivariable Chain Rule, Harvey Mudd College
Chain Rule With Partial Derivatives - Multivariable Calculus Video by The Organic Chemistry Tutor 2019

Licensing

Content obtained and/or adapted from:

Chain rule, Wikipedia under a CC BY-SA license

@@ Line 39: / Line 39: @@
 ===General rule===
-The simplest way for writing the chain rule in the general case is to use the [[Total derivative#The total derivative as a linear map|total derivative]], which is a linear transformation that captures all [[directional derivative]]s in a single formula.  Consider differentiable functions {{math|''f'' : '''R'''<sup>''m''</sup> → '''R'''<sup>''k''</sup>}} and {{math|''g'' : '''R'''<sup>''n''</sup> → '''R'''<sup>''m''</sup>}}, and a point {{math|'''a'''}} in {{math|'''R'''<sup>''n''</sup>}}.  Let {{math|''D''<sub>'''a'''</sub> ''g''}} denote the total derivative of {{math|''g''}} at {{math|'''a'''}} and {{math|''D''<sub>''g''('''a''')</sub> ''f''}} denote the total derivative of {{math|''f''}} at {{math|''g''('''a''')}}.  These two derivatives are linear transformations {{math|'''R'''<sup>''n''</sup> → '''R'''<sup>''m''</sup>}} and {{math|'''R'''<sup>''m''</sup> → '''R'''<sup>''k''</sup>}}, respectively, so they can be composed.  The chain rule for total derivatives is that their composite is the total derivative of {{math|''f'' ∘ ''g''}} at {{math|'''a'''}}:
+The simplest way for writing the chain rule in the general case is to use the total derivative, which is a linear transformation that captures all directional derivatives in a single formula.  Consider differentiable functions {{math|''f'' : '''R'''<sup>''m''</sup> → '''R'''<sup>''k''</sup>}} and {{math|''g'' : '''R'''<sup>''n''</sup> → '''R'''<sup>''m''</sup>}}, and a point {{math|'''a'''}} in {{math|'''R'''<sup>''n''</sup>}}.  Let {{math|''D''<sub>'''a'''</sub> ''g''}} denote the total derivative of {{math|''g''}} at {{math|'''a'''}} and {{math|''D''<sub>''g''('''a''')</sub> ''f''}} denote the total derivative of {{math|''f''}} at {{math|''g''('''a''')}}.  These two derivatives are linear transformations {{math|'''R'''<sup>''n''</sup> → '''R'''<sup>''m''</sup>}} and {{math|'''R'''<sup>''m''</sup> → '''R'''<sup>''k''</sup>}}, respectively, so they can be composed.  The chain rule for total derivatives is that their composite is the total derivative of {{math|''f'' ∘ ''g''}} at {{math|'''a'''}}:
 :<math>D_{\mathbf{a}}(f \circ g) = D_{g(\mathbf{a})}f \circ D_{\mathbf{a}}g,</math>
 or for short,
 :<math>D(f \circ g) = Df \circ Dg.</math>
-The higher-dimensional chain rule can be proved using a technique similar to the second proof given above.<ref name="spivak_manifolds">{{cite book |first=Michael |last=Spivak |author-link=Michael Spivak |title=[[Calculus on Manifolds (book)|Calculus on Manifolds]] |location=Boston |publisher=Addison-Wesley |year=1965 |isbn=0-8053-9021-9 |pages=19–20 }}</ref>
+The higher-dimensional chain rule can be proved using a technique similar to the second proof given above.
-Because the total derivative is a linear transformation, the functions appearing in the formula can be rewritten as matrices.  The matrix corresponding to a total derivative is called a [[Jacobian matrix]], and the composite of two derivatives corresponds to the product of their Jacobian matrices.  From this perspective the chain rule therefore says:
+Because the total derivative is a linear transformation, the functions appearing in the formula can be rewritten as matrices.  The matrix corresponding to a total derivative is called a Jacobian matrix, and the composite of two derivatives corresponds to the product of their Jacobian matrices.  From this perspective the chain rule therefore says:
 :<math>J_{f \circ g}(\mathbf{a}) = J_{f}(g(\mathbf{a})) J_{g}(\mathbf{a}),</math>
 or for short,
@@ Line 70: / Line 70: @@
 In the special case where {{math|1=''k'' = 1}}, so that ''f'' is a real-valued function, then this formula simplifies even further:
 :<math>\frac{\partial y}{\partial x_i} = \sum_{\ell = 1}^m \frac{\partial y}{\partial u_\ell} \frac{\partial u_\ell}{\partial x_i}.</math>
-This can be rewritten as a [[dot product]].  Recalling that {{math|'''u''' {{=}} (''g''<sub>1</sub>, …, ''g''<sub>''m''</sub>)}}, the partial derivative {{math|∂'''u''' / ∂''x''<sub>''i''</sub>}} is also a vector, and the chain rule says that:
+This can be rewritten as a dot product.  Recalling that '''u''' = (''g''<sub>1</sub>, …, ''g''<sub>''m''</sub>), the partial derivative {{math|∂'''u''' / ∂''x''<sub>''i''</sub>}} is also a vector, and the chain rule says that:
 :<math>\frac{\partial y}{\partial x_i} = \nabla y \cdot \frac{\partial \mathbf{u}}{\partial x_i}.</math>
@@ Line 85: / Line 85: @@
 ==== Higher derivatives of multivariable functions ====
-{{Main|Faà di Bruno's formula#Multivariate version}}
-Faà di Bruno's formula for higher-order derivatives of single-variable functions generalizes to the multivariable case.  If {{math|''y'' {{=}} ''f''('''u''')}} is a function of {{math|1='''u''' = ''g''('''x''')}} as above, then the second derivative of {{math|''f'' ∘ ''g''}} is:
+Faà di Bruno's formula for higher-order derivatives of single-variable functions generalizes to the multivariable case.  If ''y'' = ''f''('''u''') is a function of {{math|1='''u''' = ''g''('''x''')}} as above, then the second derivative of {{math|''f'' ∘ ''g''}} is:
 :<math>\frac{\partial^2 y}{\partial x_i \partial x_j} = \sum_k \left(\frac{\partial y}{\partial u_k}\frac{\partial^2 u_k}{\partial x_i \partial x_j}\right) + \sum_{k, \ell} \left(\frac{\partial^2 y}{\partial u_k \partial u_\ell}\frac{\partial u_k}{\partial x_i}\frac{\partial u_\ell}{\partial x_j}\right).</math>

Difference between revisions of "The Chain Rule for Functions of more than One Variable"

Latest revision as of 16:21, 20 January 2022

Contents

Case of $f (g 1 (x), ... , g k (x))$

Example: arithmetic operations

General rule

Example

Higher derivatives of multivariable functions

Resources

Licensing

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

Difference between revisions of "The Chain Rule for Functions of more than One Variable"

Latest revision as of 16:21, 20 January 2022

Contents

Case of f(g1(x), ... , gk(x))

Example: arithmetic operations

General rule

Example

Higher derivatives of multivariable functions

Resources

Licensing

Navigation menu

Search

Case of $f (g 1 (x), ... , g k (x))$