Mollification and the Product Rule


In the abstract study of partial differential equations, we often times throw the concept of weak derivatives around as if they acted just like classical derivatives. One such example is the product rule. For example, when studying evolution equations (time dependent partial differential equations such as the Navier-Stokes equations), books blow past the following equality


where (\cdot,\cdot) is some sort of inner product or integration and |\cdot| is some sort of associated norm. What really happened is a coupling of a weak product rule and differentiating under the integral sign. But, to make these assumptions, we are often thinking of a “smoothed” version of the equation, where we have replaced all the nasty functions (often lying in some Hilbert space or Banach space) with appropriate “smoothified” cousins.

What does it mean for the weak derivative respect the product rule? That is, for weakly differentiable functions u and v with weak derivatives u' and v', respectively, is it the case that the weak derivative of uv, their product exists? If so, is it given by


In fact, if only one of them is smooth, we can easily show that, yes, this is the case. What do we mean by smooth?

Definition: Let \Omega\subseteq\mathbb{R}^n be open. We say that v:\Omega\rightarrow\mathbb{R} is smooth if it is infinitely differentiable (in a classical sense). In this case, we will write that v\in C^\infty(\Omega).

Theorem 1: Let \Omega\subseteq\mathbb{R}^n be open. Let u:\Omega\rightarrow\mathbb{R} be a weakly differentiable, and let v\in C^\infty(\Omega). Then, their product uv is weakly differentiable. Moreover, the weak derivative of uv is given by


Proof: Let \phi\in C^\infty_0(\Omega) be a test function. Then, we have that v\phi\in C^\infty_0(\Omega). Moreover,


Thus, rearranging this, we have that

\displaystyle \int_\Omega uv\phi' dx = \int_\Omega u(v\phi)'dx-\int_\Omega uv'\phi dx.

Using the weak differentiability of u on the first term on the right-hand side, we see that

\displaystyle\int_\Omega uv\phi' dx=-\int_\Omega u'v\phi dx -\int_\Omega uv'\phi dx = -\int_\Omega(u'v+uv')\phi dx

and the theorem is proved.


However, we don’t need to assume that v is smooth. We just have to replace it with something smooth and take limits. That is the magic of mollification.


Mollification is the process of using convolution to replace a function with a smooth version of it which has nice limiting properties. The function we are convolutioning against is known as a mollifier.

Definition: A function m:\mathbb{R}^n\rightarrow\mathbb{R} is called a mollifier if

  1. m\in C^\infty_0(\mathbb{R}^n).
  2. \displaystyle\int_{\mathbb{R}^n}m(x)dx = 1.
  3. For each integrable u,

\displaystyle\int_{\mathbb{R}^n}u(y)\frac{1}{h^n}m\left(\frac{x-y}{h}\right)dy\rightarrow u(x) as h\rightarrow 0.

We will use a function called the standard mollifier. That is, m:\mathbb{R}^n\rightarrow\mathbb{R} given by

m(x):=\left\{\begin{array}{ll} c~exp\left(\frac{1}{|x|^2-1}\right) & |x|\le 1 \\ 0 & |x|\ge 1\end{array}\right.

where c\approx 2.25 is chosen so that \int m(x) dx=1. We have not established property 3 in the definition of  a mollifier. That, we will leave until later. The graph for m in one dimension is as follows:


Now, we see that the rescaled version of m given by m_h:=\frac{1}{h^n}m(x/n) looks as follows in one dimension (h=0.1):


Using this rescaling, we have that m_h has support in \{|x|\le h\}, and using a simple change of variables (x\mapsto hx),

\displaystyle\int_{\mathbb{R}^n}m_h(x) dx = \int_{\mathbb{R}^n}m(x) dx =1.

Definition: Let $\Omega\subseteq\mathbb{R}^n$ be open. Let u:\Omega\rightarrow\mathbb{R} be integrable. The mollification of u, written as u_h is given by

u_h(x):=\int_\Omega m_h(x-y)u(y) dy for x\in\Omega, h<dist(x,\partial\Omega).

Note that using differentiation under the integral sign, u_h is smooth. We do this by passing the derivatives under the integral sign and onto the mollifier (which is smooth). That is, for any multi-index \alpha,

\displaystyle D^\alpha u_h(x):=D^\alpha\frac{1}{h^n}\int_\Omega m\left(\frac{x-y}{h}\right)u(y) dy = \frac{1}{h^n}\int_\Omega D^\alpha m\left(\frac{x-y}{h}\right)u(y) dy.

Convergence Results

Next, we investigate the convergence of u_h to u as h\rightarrow 0. As usual, we start with the continuous case.

Theorem 2: Let u\in C(\Omega). Then, u_h\rightarrow u as h\rightarrow 0 uniformly on compact subsets of \Omega.

Proof: Let K\subset\Omega be compact. Let h<dist(K,\partial\Omega)/2 (to make sense of the following integrals). Note that using the fact that supp m_h=\{|x|\le h\}, we have that

\displaystyle u_h(x)=\int_{|x-y|\le h}m_h(x-y)u(y) dy

using the change of variables y\mapsto x-hy. Since u(x)=\int u(x)m(y)dy, we have that

\displaystyle u(x)-u_h(x)=\int_{|x-y|\le h} m_h(x-y)[u(x)-u(y)]dy.

Thus, taking absolute values, we have that

\displaystyle |u(x)-u_h(x)|\le \sup_{|x-y|\le h}|u(x)-u(y)|.

Since K is compact, u is uniformly continuous on K. Thus, for any \epsilon>0, there is a sufficiently small h so that

\displaystyle\sup_{x\in K}|u(x)-u_h(x)|<\epsilon.


Now, we extend to more general functions. See this article for a refresher on L^p spaces.

Theorem 3: Let u\in L^p(\Omega) for some open set \Omega\subseteq \mathbb{R}^n and 1\le p<\infty. Then, u_h\in L^p(\Omega), and u_h\rightarrow u in L^p(\Omega).

Proof: If we extend u to zero outside of \Omega, we may assume without loss of generality that \Omega=\mathbb{R}^n. We will start by showing that


First, using the change of variables y\mapsto x-hy and the  with q the conjugate exponent of p (1/p+1/q=1),

\displaystyle |u_h(x)|^p\le\left(\int_{|y|\le 1} m(y)|u(x-hy)| dy\right)^p=\left(\int_{|y|\le 1}m(y)^{1/q}m(y)^{1/p}|u(x-hy)|dy\right)^p.

Using an application of Hölder’s inequality, this becomes

\displaystyle |u_h(x)|^p\le\left(\int_{|y|\le 1}m(y) dy\right)^{p/q}\left(\int_{|y|\le 1}m(y)|u(x-hy)|^p dy\right).

Therefore, by Fubini’s theorem,

\displaystyle \int_{\mathbb{R}^n}|u_h(x)|^p dx \le \int_{|y|\le 1} m(y)\int_{\mathbb{R}^n}|u(x-hz)|^p dx dy=|u|_{L^p}\int_{|y|\le 1} m(y)dy

by the shift-invariance of the integral. Thus, we use that \int m(y) dy=1 to finish the proof that u_h\in L^p.

Let \epsilon>0. Using the density of C_0(\mathbb{R}^n) in L^p(\mathbb{R}^n), we can find a \phi\in C_0(\mathbb{R}^n) so that |\phi-u|_{L^p}\le \epsilon/3. This result can be found in any measure theory book. For instance, Donald Cohn’s Measure Theory. So, by the results above, we get that

|u_h-u|_{L^p}\le|u_h-\phi_h|_{L^p}+|\phi_h-\phi|_{L^p}+|\phi-u|_{L^p}\le 2|u-\phi|_{L^p}+|\phi_h-\phi|_{L^p}

since u_h+v_h=(u+v)_h as the reader can easily check. Letting h\rightarrow 0 gives us that |\phi_h-\phi|_{L^p}\rightarrow0 using Theorem 2. So, for small enough h, we can make the above terms less than \epsilon.


The Product Rule

We are almost ready to prove the product rule for weak derivatives. We just need to know how mollification and weak derivatives interact.

Theorem 4: Let u:\Omega\rightarrow\mathbb{R} have weak derivative D^\alpha u for some multi-index \alpha. Then,

D^\alpha u_h(x)=(D^\alpha u)_h(x).

Note that the left-hand side of the preceeding equation is the derivative of the smooth function u_h in the classical sense, and the right-hand side is the mollification of the weak derivative D^\alpha u. In the below proof, we will specify that D^\alpha_x is the (weak) derivative in the x-variable, and D^\alpha_y is the (weak) derivative in the y-variable when such ambiguities must be dealt with.

Proof: Using differentiation under the integral sign, we have that

\displaystyle D^\alpha u_h(x) =\int_\Omega \left(D^\alpha_x m_h(x-y)\right)u(y) dy=(-1)^{|\alpha|}\int_\Omega \left( D^\alpha_y m_h(x-y)\right) u(y) dy

by the chain rule in the classical sense. Note that D^\alpha_y m_h(x-y)\in C^\infty_0(\mathbb{R}^n). So, using the weak derivative property of u, we get that

\displaystyle D^\alpha u_h(x)=\int_\Omega m_h(x-y)D^\alpha_y u(y) dy=(D^\alpha u)_h(x).


Now, we have all the machinery necessary to prove our initial theorem for the product rule. Note that we have to make the assumption that the integral of the product makes sense.

Theorem 5: Let u,v:\Omega\rightarrow\mathbb{R} be weakly differentiable. Then, if the product uv is integrable, it is weakly differentiable with


Proof: Replacing v with v_h, the mollified version of v, we can use Theorem 1 to get that


By Theorem 4, (v_h)'=(v')_h. We can then say that the right-hand side converges to u'v+uv' using Theorem 3. Also, we can show that uv_h\rightarrow uv in L^1 using the fact that v_h\rightarrow v in L^1 implies that there is some subsequence v_n\rightarrow v almost everywhere (see any introductory book on measure theory). This, and the dominated convergence theorem give the result.


Posted in PDE Theory | Tagged , , , , | 1 Comment

Differentiating Under the Integral Sign


A few natural questions arise when we first encounter the weak derivative. Does the product rule hold? How about the chain rule? In order to answer to answer these questions, we will need some more analytical machinery. The first topic is the concept of “differentiating under the integral sign.” The question is this: suppose f:\mathbb{R}^2\rightarrow\mathbb{R} is given. When does

\displaystyle\frac{d}{dx}\int_a^b f(x,t)dt=\int_a^b\frac{\partial}{\partial x}f(x,t) dt?

When can we interchange the operations of integrating and differentiating? Let’s start, as we usually do, with the easy case (just like in elementary calculus, where we seemed to interchange these operations like “magic”).

The Continuously Differentiable Version

Theorem 1: Let f\in C^1(\mathbb{R}^{n+1}). That is, f has continuous partial derivatives. Let a<b\in\mathbb{R}. Then, for any i\in\{1,\dots,n\},

\displaystyle\frac{\partial}{\partial x_i}\int_a^b f(x,t)dt = \int_a^b\frac{\partial}{\partial x_i} f(x,t) dt.

Proof: Without loss of generality (WLOG), let i=1. The other possibilities for i are similar. Fix \hat{x}:=(x_2,\dots,x_n) and define the function F:\mathbb{R}:\rightarrow \mathbb{R} by F(\xi):=\int_a^b f(\xi,\hat{x},t) dt. Then, by the fundamental theorem of calculus, since partial derivatives are continuous,

\displaystyle\int_a^b\int_0^{x_1}\frac{\partial}{\partial y} f(y,\hat{x},t)dydt=\int_a^b f(x_1,\hat{x},t)-f(0,\hat{x},t) dt=F(x_1)-F(0).

Switching the order of integration on the first integral, and differentiating, we again use the fundamental theorem of calculus (Lebesgue Differentiation!) to say that the left hand side of above becomes

\displaystyle\frac{\partial}{\partial x_1}\int_0^{x_1} \int_a^b \frac{\partial}{\partial y} f(y,\hat{x},t)dtdy=\int_a^b\frac{\partial}{\partial x_1}f(x_1,\hat{x},t)dt.

The right hand side gives us simply

\displaystyle\frac{\partial}{\partial x_1} F(x_1)=\frac{\partial}{\partial x_1}\int_a^b f(x_1,\hat{x},t)dt,

and we are done.


The Measure Theoretic Version

Next, we take the usual trip into measure theory. The following theorem is taken mostly from Folland’s Real Analysis: Modern Techniques and their Applications.

Theorem 2: Let (X,d\mu) be a measure space. Let f:X\times\Omega \rightarrow \mathbb{R} for some open subset \Omega \subseteq \mathbb{R}^n be an integrable function of x\in X for each t\in\Omega. Moreover, suppose that for each x\in X, \frac{\partial}{\partial t} f(x,t) exists and there is some g integrable with \left|\frac{\partial}{\partial t}f(x,t)\right|\le g(x) for each t\in\Omega. Then, for each x\in X, and some i\in\{1,\dots,n\},

\displaystyle\frac{\partial}{\partial t_i}\int_X f(x,t_1,\dots,t_n)d\mu=\int_X \frac{\partial}{\partial t_i}f(x,t_1,\dots,t_n)d\mu.

Proof: For notational simplicity, we will assume that \Omega\subseteq\mathbb{R}. We will leave it to the reader to extrapolate to the above case. This theorem a classic exercise in the use of dominated convergence. We just need to set everything up the right way. We want that

\displaystyle\frac{d}{ dt}\int_X f(x,t)d\mu=\lim_{h\rightarrow0}\frac{1}{h}\int_X f(x,t+h)-f(x,t)d\mu

and then to pass the limit underneath the integral. This only requires the existence of a dominating integrable function. By the mean value theorem (or multi-dimensional mean-value theorem if we are using t\in\Omega\subseteq\mathbb{R}^n), there is some c\in(t,t+h) so that

\displaystyle f'(x,c)=\frac{f(x,t+h)-f(x,t)}{h}.

Thus, we have that

\displaystyle\left|\frac{f(x,t+h)-f(x,t)}{h}\right|\le\sup_{c\in[t,t+h]}|f'(x,c)|\le g(x),

an integrable function. Thus, we can pass the limit under the integral sign using Lebesgue’s dominated convergence theorem, and we are done.


Further Questions

Though I do not delve into the subject any further at this time, one could ask the questions of how far exactly can we weaken these assumptions? What if we wanted to talk about interchanging the weak derivative and the integral? When is that possible? This post gives a good flavor of how far this subject goes. Basically, to weaken our assumptions anymore, we would need to introduce distribution theory, and I have not yet decided whether or not I’ll get into. Another time, perhaps.

Posted in Measure Theory | Tagged , , | 1 Comment

The Weak Derivative


The basis of modern PDE theory is the idea of the weak or distributional derivative. Since measure theory ignores a function’s values on a set of measure zero, why can’t we ignore some of the more “problematic points” from classical differentiability theory? For instance, take the function f:[0,2]\rightarrow\mathbb{R} defined by

\displaystyle f(x):=\left\{\begin{array}{ll} x & x\in[0,1) \\ 1 & x\in[1,2].\end{array}\right.

This has the problem of a “cusp” or “elbow” where we are unable to define a tangent line as the following picture shows.

Weak Differentiable

If we were able to “ignore” the point at 0, then we would like to say that in some sense

\displaystyle f'(x)=\left\{\begin{array}{ll}1&x\in[0,1) \\ 0&x\in[1,2].\end{array}\right.

This is where the weak derivative (or distributional derivative) comes into play.

The Weak Derivative

The One-Dimensional Case

Let’s keep building on our intuition into this subject. Let f,g:[0,1] \rightarrow \mathbb{R} be two differentiable functions. Let g(0)=g(1)=0. Then, elementary calculus (integration by parts) tells us that

\displaystyle\int_0^1 f(x)g'(x)dx=-\int_0^1f'(x)g(x)dx

since the boundary term of f(x)g(x) disappears. Even if f is not differentiable we might be able to make sense of the above formula. We are now ready to state our first preliminary definition of the weak derivative.

Definition: Let f:[0,1]\rightarrow\mathbb{R} be any real-valued function. We will say that h:[0,1]\rightarrow\mathbb{R} is the weak derivative of f if for every differentiable function g:[0,1]\rightarrow\mathbb{R} with g(0)=g(1)=0, we have that


The n-Dimensional Case

The higher-dimensional version requires a little more work. First, we let \Omega be some domain in \mathbb{R}^n an open and connected region. For simplicity, you can think of \Omega=\mathbb{R}^n or B_n, the open unit ball. Let C^\infty_0(\Omega) be the space of infinitely differentiable functions on \Omega with compact support. That is, for \phi\in C^\infty_0(\Omega), the support of \phi defined by \mathrm{supp}(\phi):=\overline{\{x\in\Omega:\phi(x)\ne0\}} is compact in \Omega. We will refer to this space as our space of “test functions.” Let \alpha:=(\alpha_1, \dots, \alpha_n) \in \mathbb{Z}^n_{\ge0} be a multi-index. Then, for any \phi\in C^\infty(\mathbb{R}^n), define the differential operator D^\alpha by

\displaystyle D^\alpha:=\frac{\partial^{\alpha_1}}{\partial x_1^{\alpha_1}}\cdots\frac{\partial^{\alpha_n}}{\partial x_n^{\alpha_n}}.

We are now ready to define the n-dimensional weak derivative.

Definition: Let f:\Omega\rightarrow\mathbb{R} be given. Then, we say that g:\Omega\rightarrow\mathbb{R} is the \alpha-weak derivative of f for some multi-index \alpha, if for each \phi\in C^\infty_0(\Omega), the following integration by parts formula holds:

\displaystyle\int_\Omega f(x)D^\alpha\phi(x)dx=(-1)^{|\alpha|}\int_\Omega g(x)\phi(x)dx

where |\alpha|=|\alpha_1|+\cdots+|\alpha_n|.

Example 1

As an example, consider the above function

\displaystyle f(x):=\left\{\begin{array}{ll}x&x\in[0,1) \\ 1&x\in[1,2].\end{array}\right.

Then, for any function \phi:[0,2]\rightarrow\mathbb{R} differentiable with \phi(0)=\phi(2)=0, we have that

\displaystyle -\int_0^2 f(x)\phi'(x)dx=-\int_0^1 x\phi'(x)dx-\int_1^2\phi'(x)dx.

Working with the first term in the right-hand side, we use integration by parts to get

\displaystyle -\int_0^1 x\phi'(x)dx = -x\phi(x)|_0^1+\int_0^1 \phi(x)dx = -\phi(1)+\int_0^1\phi(x)dx.

The fundamental theorem of calculus plus the assumption that \phi(2)=0 on the second term on the right-hand side gives

\displaystyle -\int_1^2\phi'(x)dx=-\phi(2)+\phi(1)=\phi(1).

Putting this all together, we have that

\displaystyle -\int_0^2 f(x)\phi'(x)dx=\int_0^1\phi(x)dx=\int_0^2 g(x)\phi(x)dx

where g is given by

\displaystyle g(x):=\left\{\begin{array}{ll}1&x\in[0,1) \\ 0&x\in[1,2].\end{array}\right.

Note first of all that g is only defined up to a set of measure zero (if we are thinking of our integrals as Lebesgue) or up to a discrete set of points (if we are thinking of our integrals as Riemannian). Also, notice that g is not even continuous which contradicts the standard real analysis proof that differentiable g are necessarily continuous. This is because of the weak formulation and not a true counterexample.

Example 2

Corners don’t seem to be a problem, but what about jumps? So, let’s consider the function

\displaystyle f(x):=\left\{\begin{array}{ll}0&x\in[0,1) \\ 1&x\in[1,2].\end{array}\right.

This seems like it should have a weak derivative of zero. So, let \phi be a test function. Then, by the fundamental theorem of calculus and the fact that \phi(2)=0,

\displaystyle -\int_0^2 f(x)\phi'(x) dx= -\int_1^2 \phi'(x)dx=-\phi(2)+\phi(1)=\phi(1).

On the other hand, we want this to be equal to

\displaystyle\int_0^2 g(x)\phi(x) dx

for some g and any test function \phi. If such a g existed, then

\displaystyle\int_0^2 g(x)\phi(x) dx=\phi(1)

for any test function \phi. Picking test functions with \phi(1)=0 would give us that g=0 almost everywhere on the interval [0,2] (this is a non-trivial result most readily seen through the lens of functional analysis). On the other hand, we can certainly find test functions with \phi(1)=k for any value k\in\mathbb{R} (also non-trivial). So, no such g exists and f is not weakly differentiable. 

Note: Note that the above f does have a derivative if we expand our notion of derivative even further. This is the notion of the distributional derivative, in which case f' is given as a delta function. However, these subjects will have to wait for now…

Posted in PDE Theory | Tagged , , , , | 4 Comments

Lebesgue Differentiation

Introduction: The Continuous Case

As we briefly stated in my previous post, there is a particularly powerful measure theoretic tool called Lebesgue differentiation. It is a generalization of the fundamental theorem of calculus part one. Here it is, stated in the most general form for which we will prove it.

Theorem 1: Let f:\mathbb{R}^n\rightarrow\mathbb{R} be integrable (denoted by f\in L^1). Then, for almost every x\in\mathbb{R}^n,

\displaystyle\lim_{r\rightarrow0}\frac{1}{|B_r(x)|}\int_{B_r(x)} f(t) dt = f(x)

where B_r(x):=\{y\in\mathbb{R}^n:|x-y|<r\} is the ball centered at x of radius r>0, and |B_r(x)| is the measure (or volume) of the ball.

For the rest of this post, we will use the notation f\in L^1 for f being integrable, and |A| to denote the Lebesgue measure or volume of the set A.

One might wonder, what does that have to do with differentiation? One particular Corollary of the proof (left as an exercise for the reader) is the following:

Corollary 1: Let f:\mathbb{R}\rightarrow\mathbb{R} be in L^1, and let F be given by

F(x):=\int_a^x f(t) dt.

Then, for almost every x\in\mathbb{R}, we have that the derivative of F at x exists, and


We start with the continuous case which has a fairly elementary proof.

Theorem 2: Let f:\mathbb{R}^n\rightarrow\mathbb{R} be continuous. Then, for every x\in\mathbb{R}^n,

\displaystyle\lim_{r\rightarrow0}\frac{1}{|B_r(x)|}\int_{B_r(x)} f(t) dt = f(x).

Proof: Let \epsilon>0. Since f is continuous, there exists an r>0 so that for t\in B_r(x), |f(t)-f(x)|<\epsilon. Then, since

\displaystyle\frac{1}{|B_r(x)|}\int_{B_r(x)}f(x) dt = f(x)

(noting the inside of the integral is the fixed value f(x)), we have that

\displaystyle\begin{array}{ll}\left|\frac{1}{|B_r(x)|}\int_{B_r(x)} f(t) dt - f(x)\right| &= \left|\frac{1}{|B_r(x)|}\int_{B_r(x)} f(t)-f(x) dt\right| \\ &\le \frac{1}{|B_r(x)|}\int_{B_r(x)}|f(t)-f(x)| dt \\ &<\frac{1}{|B_r(x)|}\int_{B_r(x)}\epsilon dt \\ &= \epsilon\end{array}.


Maximal Functions

Though many proofs exist of this result in great generality, here, we will present one based on some elementary harmonic analysis. The credit for this proof goes to Stein’s book Singular Integrals and Differentiability Properties of FunctionsWe start by introducing the concept of the maximal function.

Definition: Let f:\mathbb{R}^n\rightarrow\mathbb{R} be a function. Then, the maximal function of f is

\displaystyle Mf(x):=\sup_{r>0}\frac{1}{|B_r(x)|}\int_{B_r(x)}|f(t)| dt.

Note the similarities to the statement of the Lebesgue differentiation theorem. This definition more or less defines a “worst-case scenario” for the local integrability of the function f. A more rigorous treatment of the concept reveals that it is actually a fairly nice operator acting on L^p spaces, but for our purposes, we only need the following property.

Lemma 1:  Let f\in L^1. Then, for every \epsilon>0



\displaystyle||f||_1:=\int_{\mathbb{R}^n}|f(x)| dx

denotes the L^1 norm of f (or just skip the notation and think of it as the integral of the absolute value of f) and C is some positive constant depending only on n, the dimension of the space (C=5^n will work).

Although you are encouraged to read the proof the follow, it is highly technical and requires some fairly complicated measure-theoretic techniques (and a fairly technical lemma which is left unproven). The casual reader is then encouraged to just take the above lemma to heart and skip to the final section. In the language of L^p spaces, what this lemma is saying, is that the maximal function is of type weak L^1. Perhaps, in a later post, we will revist these concepts…

To prove this, we will need the following lemma of Vitali, specifically, a version from Stein’s book Singular Integrals and Differentiability Properties of Functions. Unfortunately, we will not give a proof of this theorem, though the interested reader will find the proof in the book stated, and related proofs in many introductory measure theory texts, for example Cohn’s Measure Theory

Lemma 2: Let E be a measurable subset of \mathbb{R}^n which is covered by the union of a family of balls \{B_i\}_{i\in I} of bounded diameter for some indexing set I. Then, from this family, we can select a disjoint subsequence B_1,B_2,\dots either finite or countably infinite so that

\displaystyle\sum_k |B_k|\ge C^{-1} |E|

where C is some positive constant (5^n will work).

Proof of Lemma 1: Taking for granted that Mf is measurable (the supremum of measurable functions is measurable), we have that the set

E_\epsilon :=\{x:Mf(x)>\epsilon\}

is Lebesgue measurable. So, by the definition of the maximal function, for each x\in E_\epsilon, there is a ball centered at x, denoted by B_x so that

\displaystyle||f||_1\ge\int_{B_x}|f(t)| dt \ge\epsilon|B_x|.

Thus, we have that |B_x|<(1/\epsilon)||f||_1 giving us that the balls \{B_x\}_{x\in E_\epsilon} have bounded diameters. Thus, by the lemma, we extract a sequence of balls \{B_n\} which are mutually disjoint and satisfy

\displaystyle\sum_{n=0}^\infty |B_n|\ge C^{-1}|E_\epsilon|.

Putting the above concepts together, we have that

\displaystyle||f||_1\ge\int_{\cup B_n} |f(t)| dt \ge\epsilon\sum_{n=0}^\infty |B_n|\ge\epsilon C^{-1}|E_\epsilon|.

Rearranging the above inequality gives the desired result



Proof of Lebesgue Differentiation

Now, we are prepared to prove the powerful Lebesgue differentiation theorem.

Proof of Lebesgue differentiation: Denote by f_r for some r>0 the function

\displaystyle f_r(x):=\frac{1}{|B_r(x)|}\int_{B_r(x)} f(t) dt.

Then, we can restate the theorem as f_r\rightarrow f almost everywhere as r\rightarrow 0. To this end, we introduce the following error function Ef.

\displaystyle Ef(x):=|\limsup_{r\rightarrow0} f_r(x)-\liminf_{r\rightarrow0} f_r(x)|.

Then, the points where \lim_{r\rightarrow0} f_r(x)\ne f(x) are precisely where Ef(x)>0. To this end, let \epsilon>0. We need to show that


Note that by theorem 2, for g continuous, Eg\equiv 0. By the density of the continuous functions in L^1, we have that f=h+g where g is continuous and ||h||_1 is as small as we like. A quick inspection of the operator E reveals that

Ef(x)=E(g+h)(x)\le Eg(x)+Eh(x)=Eh(x)

for each x. Moreover, Ef(x)\le Eh(x)\le 2Mh(x). So,


Therefore, we have by the measure-theoretic property that |A|\le |B| if A\subset B and lemma 1,


Letting ||h||_1\rightarrow 0 gives the required result.


Posted in Harmonic Analysis | Tagged , , , | 4 Comments

Gronwall’s Inequality

Introduction and Notation

In analysis, we rarely deal with equalities. Often times you have to “settle for” an inequality. Take, for example, the study of the Navier-Stokes equations

\displaystyle\left\{\begin{array}{l}u_t-\nu\Delta u+(u\cdot\nabla)u+\nabla p=f(x,t) \\ \nabla \cdot u=0 \end{array}\right.

In certain cases, with some smoothing operations applied (I will give the details at a later time), we can obtain the following differential inequality:


which we will make more rigorous sense of at a later time. To analyze the above system, we will use the famous Gronwall inequality, to appear later. Using this, we are able to bound the growth of the norms of the vector-valued function u to obtain the existence (in some sense) and structure of the solutions to the famous equation.

Now, we define the objects, in question for today. Suppose x:[0,T] \rightarrow \mathbb{R} is differentiable. Denote by

\displaystyle\frac{d}{dt} x=\lim_{h\rightarrow0}\frac{x(t+h)-x(t)}{h},

the classical derivative of x at the point t.

Gronwall’s Inequality: First Version

The classical Gronwall inequality is the following theorem.

Theorem 1: Let x be as above. Suppose x satisfies the following differential inequality

\displaystyle\frac{d}{dt}x(t)\le g(t)x(t)+h(t)

for g continuous and h locally integrable. Then, we have that

\displaystyle x(t)\le x(0)e^{G(t)}+\int_0^te^{G(t)-G(s)}h(s) ds,


\displaystyle G(t):=\int_0^t g(r) dr.

Proof: This is an exercise in ordinary differential equations. We introduce the integrating factor e^{-G(t)} and consider the following derivative:


by the product rule. Then, we can simply the second term on the right-hand side of the equation using the chain rule and the fundamental theorem of calculus as

\displaystyle x(t)\frac{d}{dt}e^{-G(t)}=-x(t)e^{-G(t)}\frac{d}{dt}G(t)=-x(t)e^{-G(t)}g(t).

Using this, and the assumed differential inequality on x, we have that

\displaystyle\frac{d}{dt}e^{-G(t)}x(t)\le e^{-G(t)}[g(t)x(t)+h(t)]-g(t)x(t)e^{-G(t)}=e^{-G(t)}h(t).

After simplification. Now, we again use the fundamental theorem of calculus, and the fact that integrals respect inequalities to obtain that

\displaystyle\int_0^t\frac{d}{ds}e^{-G(s)}x(s) ds=e^{-G(t)}x(t)-e^{-G(0)}x(0)\le\int_0^te^{-G(s)}h(s)ds.

Finally, we note that G(0)=0 and do some simplification to get that

\displaystyle x(t) \le x(0)e^{G(t)}+\int_0^t e^{G(t)-G(s)}h(s) ds.


This is the most commonly-seen version of Gronwall’s inequality. However, an integral form does exist, as wikipedia is quick to point out.

Gronwall’s Inequality: Second Version

Theorem 2: Assume that g,h,x:[0,T]\rightarrow \mathbb{R} with h and x continuous, g locally integrable, and h non-negative. Suppose that x satisfies the integral inequality

\displaystyle x(t)\le g(t)+\int_0^t h(s)x(s) ds,


\displaystyle x(t)\le g(t)+\int_0^t g(s)h(s)e^{H(t)-H(s)} ds

for H(s):=\int_0^s h(r) dr.

Proof: Define the function y as

\displaystyle y(s):=e^{-H(s)}\int_0^s h(r)x(r) dr.

Then, differentiating, and using the product rule, chain rule, and fundamental theorem of calculus, we have, as before, that

\displaystyle\frac{d}{ds}y(s)=e^{-H(s)}h(s)x(s)-h(s)e^{-H(s)}\int_0^s h(r)x(r) dr=h(s)e^{-H(s)}(x(s)-\int_0^s h(r)x(r) dr).

Thus, using our assumed inequality, we have that

\displaystyle\frac{d}{ds}y(s)\le g(s)h(s)e^{-H(s)}.

Note that this last step is where we need to assume that h(s)\ge 0. Then, integrating both sides from 0 to t, and noting that y(0)=0, we have that

\displaystyle y(t)=e^{-H(t)}\int_0^t h(r)x(r) dr\le\int_0^t g(s)h(s)e^{-H(s)} ds.

By the original integral inequality, and after rearranging the terms, we get that

\displaystyle x(t)\le g(t)+\int_0^t g(s)h(s)e^{H(t)-H(s)} ds.


Gronwall’s Inequality: Third Version

Both of the above theorems required the use of the fundamental theorems of calculus, and the continuity of the functions involved to invoke it. However, if we wanted to weaken the requirements on the functions involved, we simply need to invoke Lebesgue differentiation, a generalized version of the first part of the fundamental theorem of calculus. Here, we state the one-dimensional version, though higher dimensional versions exist in most measure theory and harmonic analysis texts.

Lemma 1: (Lebesgue Differentiation Theorem) Let f: \mathbb{R} \rightarrow \mathbb{R} be Lebesgue integrable. Then, for almost every x\in\mathbb{R},

\displaystyle F(x):=\int_a^x f(t) dt

is differentiable. Moreover, for such an x, F'(x)=f(x).

Similarly, we have the following generalization of the second fundamental theorem of calculus:

Lemma 2: If f: \mathbb{R}\rightarrow \mathbb{R} is integrable, and for every x\in[a,b], there exists a function F so that F'(x)=f(x), then

\displaystyle\int_a^b f(t) dt = F(b)-F(a).

In fact, we could even weaken our sense of what “derivative” might mean in this measure-theoretic sense. Even so, we have the following generalized version of theorem 1. Proven using the same methods, and the above two lemmas.

Theorem 3: Let x:[0,T]\rightarrow\mathbb{R} be integrable and differentiable on [0,T]. Suppose g,h:[0,T]\rightarrow\mathbb{R} are integrable so that

\displaystyle\frac{d}{dt}x(t)\le g(t)x(t)+h(t)

for almost every t\in[0,T]. Then,

\displaystyle x(t)\le x(0)e^{G(t)}+\int_0^t e^{G(t)-G(s)}h(s) ds

for almost every t\in[0,T] and

\displaystyle G(t):=\int_0^t g(r) dr.

Posted in ODE Theory | Tagged , , , , | 1 Comment