thesis-ClassicalMechanics/MainPaper.tex at main · assumptionsofphysics/thesis-ClassicalMechanics · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
\documentclass{article}[a4paper]
\usepackage{assumptionsofphysics}
\usepackage{graphicx}
\graphicspath{{images/}}
\usepackage{tikz}
\usetikzlibrary{shapes,backgrounds}
\usepackage{pgfplots}
\usepgfplotslibrary{fillbetween}
\pgfplotsset{compat=1.18}


%\title{From Two Assumptions to Hamilton's Mechanics}
%\author{Ryan Cosper}


\usepackage[margin = 1in]{geometry} %1 in Margins
\usepackage{graphicx}
\newcommand{\doublespace} {
  \renewcommand{\baselinestretch}{1.66}\small\normalsize
}
\newcommand{\exactdoublespace} {
  \renewcommand{\baselinestretch}{1.8}\small\normalsize
}
\newcommand{\oneandhalfspace} {
  \renewcommand{\baselinestretch}{1.24}\small\normalsize
}
\newcommand{\singlespace} {
  \renewcommand{\baselinestretch}{0.9}\small\normalsize
}

\newlength{\fiveblanklines}\setlength{\fiveblanklines}{0.7 in}
\newlength{\tenblanklines}\setlength{\tenblanklines}{1.5 in}
\singlespace

\begin{document}
\thispagestyle{empty}
\exactdoublespace
  \begin{center}
    \begin{LARGE} From Two Assumptions to Hamilton's Mechanics\end{LARGE}\\
    by\\
    Ryan Cosper
  \end{center}
  \singlespace

  \vspace{\fiveblanklines}
  \begin{center}
    \singlespace
    % If you are using this for prospectus, simply add prospectus after   \@paperType
A Thesis Presented in Partial Fulfillment\\
of the Requirements for the Degree\\
Bachelor of Science in Honors Physics
  \end{center}
  \vspace{\tenblanklines}
   \vfill
   \begin{center}
   \doublespace
   University of Michigan, Ann Arbor\\
   April 2022
   \end{center}
\newpage

\begin{abstract}

	This work is a re-derivation of Hamiltonian Mechanics from two assumptions about a classical system. Through this derivation we work to better understand the nature of classical systems and eliminate some of the misconceptions surrounding the relationship between classical theories.

\end{abstract}

\section*{Acknowledgments}

	I would like to thank Gabriele Carcassi for his guidance and seemingly endless patience and Christine Aidala for her support and encouragement. This work is a part of the larger Assumptions of Physics project at the University of Michigan. This work was funded in part by the University of Michigan Department of Physics.

\newpage
\tableofcontents
\newpage


\section{Introduction}

	The main purpose of this work is to re-derive Hamiltonian Mechanics from base assumptions at an approachable level. Through this derivation it also seeks to eliminate the misconceptions that plague current instructional literature and motivate a more thoughtful understanding of the subject.

	 Historically the basis for the laws of physics were generalized experiences. These laws were rooted in the physical world rather than a mathematical one. As it is treated now, Hamiltonian Mechanics lacks this physicality. It is taught as a mathematical reformulation of Newtonian Mechanics rather than its own physical theory. Many popular mechanics textbooks contain passages like this from Thorton and Marion's \textit{Classical Mechanics of Particles and Systems}: ``... we need not formulate a \textit{new} theory of mechanics--the Newtonian theory is quite correct--but only devise an alternate method of dealing with complicated problems."\cite{thornton_marion_2014} This assumption of an equivalence between Newtonian and Hamiltonian mechanics is a misconception.

	Another more egregious example of this kind of hand waving can be found in Douglas A. Davis's \textit{Classical Mechanics} ``Therefore, it is quite worthwhile to spend some considerable effort in reformulating the ideas held in Newtonian mechanics so we can solve otherwise intractable problems. Remember, this is only a reformulation so, as we have done before, we shall check the results by applying them to already familiar examples. There is no new information or new areas of validity. We will simply restate Newtonian mechanics in another form."\cite{davis_2012} This not only equivocates these two theories, but also asserts that Hamiltonian mechanics is only valid for systems in which Newtonian Mechanics is valid. This is not the case.

	Understanding when a theory applies is a critical part of Physics. Hamiltonian Mechanics describes the dynamics of a system by assuming that energy is conserved; we will call this assumption Deterministic and Reversible Evolution later on. Newtonian Mechanics relates the dynamics of a system to its kinematics; this assumption of a bijection between the kinematics and the dynamics is called Kinematic Equivalence.\footnote{For clarity a system's dynamics describe how it moves in phase space while its kinematics describe how it moves in physical spacetime.} These two different assumptions mean different physical theories are applicable to different systems.

	For example consider two systems suspended in a viscous fluid. In the first system the observer is at rest and a particle is moving toward them as it decelerates. In the other system the particle is at rest while the observer is moving toward it while decelerating. Kinematically these systems are identical to the observer, but their dynamics are fundamentally different.

	Hamiltonian Mechanics cannot accurately treat the first system as energy is not conserved. Take this as a fact for now; we will see why Hamiltonian Mechanics cannot describe dissipative systems later. Meanwhile, Newtonian Mechanics cannot describe our second system as we are working in a non-inertial frame and as such will observe fictitious forces; there is not an invertible map between the dynamics and the kinematics. Evidently these two formalisms are not equivalent; in fact, the relationship between our three favorite classical theories is best described by Figure 1.

\begin{figure}[!!ht]
\begin{center}
\begin{tikzpicture}

\node [label={\textbf{The Classical Picture}}] (C) at (1.2,3){};

% Set A
\node [draw,
    circle,
    minimum size =5cm,
    label={90:Hamiltonian}] (A) at (0,0){};

% Set B
\node [draw,
    circle,
    minimum size =5cm,
    label={90:Newtonian}] (B) at (2.4,0){};

% Set intersection label
\node at (1.2,0) {Lagrangian};

\end{tikzpicture}
\end{center}
\caption{Venn diagram showing the relationship between our three favorite classical theories.}
\end{figure}

 Now that we have motivated our approach, we will begin the derivation with basic physical assumptions about a system in order to show it obeys Hamiltonian Mechanics with a more physical justification. We will start with these physical assumptions, then translate those assumptions into precise mathematical definitions. This will lead us to our results.

\section{Infinitesimal Reducibility}

\begin{assump}[Infinitesimal reducibility]
	The state of the system is reducible to the state of its infinitesimal parts. That is, giving the state of the whole system is equivalent to giving the state of its parts, which in turn is equivalent to giving the state of its subparts and so on. This relationship holds the the other way as well. Giving the states of the smallest subdivisions of the system is equivalent to giving the state of the whole system.
\end{assump}

	%\emph{Explain in practical terms what reducibility is}

	What does it mean physically for a system to be reducible? This first assumption that underlies classical mechanics prescribes how a system can be broken down into its component parts. This assumption will provide us with a mathematical framework to describe the state of the system we are investigating.

	Let's consider a ball that we throw through the air. This ball will follow some path. We assume we can fully describe the state of the ball by its motion through the air. Now, suppose we draw a red dot on this ball. Then if we fully describe the motion of the ball, we have also equivalently described the motion of the red dot. The ball is a reducible system meaning knowledge of the state of the whole is equivalent to knowledge of the states of the parts. The internal dynamics of the system are accessible meaning we can know the states of the parts of the system given knowledge of the state of the whole. Additionally if we give the motion of all possible red dots on the surface of the ball, we know the motion of the whole ball. The state of the whole system can be known from the states of its parts.


\iffalse
\begin{figure}[!ht]
\centerline{\includegraphics[width=\linewidth,angle=90,scale=.25]{reddotdiagram.jpg}}
\caption{Motion of our ball and red dot example. Because this system is reducible we can break the whole system into its parts.}
\end{figure}
\fi
	%\emph{motivate the uniqueness of the distribution}

	Each state of the whole system is uniquely determined by the states of its parts, meaning that given the state of all parts, we can determine exactly the state of the whole system. Two identical systems are in the same state if and only if all of their parts are in the same states. The same is true the other way around. The states of the parts of the system are uniquely determined by the state of the whole. Thus we have an invertible, bijective map between the state of our whole system and the states of its parts. In our example, this means that there is exactly one path traveled by the red dot per distinct state of the whole ball. So if we throw the ball with no angular velocity, the red dot will travel a parabola through the air. We could then throw the ball with the same linear velocity so that its center of mass travels the same path as before, but if the ball is spinning we have a different path traveled by the red dot. So from two distinct states of the overall system, one with angular velocity one without, we find two distinct sets of states of the parts of the system. One spiraling trajectory and one parabola.

	 %If we consider a ball of irreducible material i.e. an electron, we cannot describe the state of a dot drawn on its surface. We could not say for instance that a photon scattering with that electron interacts specifically with the part of the electron with the dot. That is because the electron is not reducible; we cannot describe the state of the electron by breaking it into smaller parts and describing those parts. We cannot describe a process in which something interacts with only part of the electron.

	 %\emph{Describe what the process of subdivision is, and define the particle as the limit of that process}

	 We have described reducibility in general, but what does it mean to be infinitesimally reducible? If a system is infinitesimally reducible we can subdivide it as many times as we want and the smallest subdivisions will still be themselves reducible. If we imagine that we divide our system until we approach the limit of this subdivision we will arrive at a collection of infinitesimally small parts of the system; we call these smallest subdivisions \textit{classical particles}.

	 What does this mean physically? Continuing our previous example, imagine we cover the surface of the ball in red dots. We then remove these dots and replace each of them with a number of smaller dots. We repeat this process until the radius of the dots is approaching zero. Because the ball is assumed to be a reducible system, given the state of the whole ball, we know the states of all of the red dots drawn on it. It is important to note that the dots do not ever become points. They always have radii that are greater than zero. No dot is a point particle, rather they are infinitesimally small subdivisions. \textit{We can always divide them again}. Assuming a system to be infinitesimally reducible means that given the state of the whole system, we can describe the states of all the infinitesimally small parts of the system. Now we must codify this idea formally.

	Let's start with a discrete system as it is conceptually simpler. Consider a box with a fixed number of balls of varying colors as our system.\footnote{This is not an example of an infinitesimally reducible system as the balls are the limit of its reducibility, but it serves as an illustrative example.} Suppose the state of the whole system is determined by the number of balls of each color. We will define the \textit{state space} of the balls as $\mathcal{S}$ which spans all possible colors a ball can be. Each ball is at a point $s \in \mathcal{S}$ corresponding to its color. We call the state space of our whole system $\mathcal{C}$ which spans all possible color combinations of the balls. One state of the whole system is a point $c \in \mathcal{C}$. For a state $c \in \mathcal{C}$ we can define the number of balls in the box of each color as a \textit{distribution}, $\rho$, over the state space $\mathcal{S}$ of the balls. This distribution is unique to each state of the whole system. Two identical systems are in the same state if and only if their distributions are equivalent. For each distinct state $c \in \mathcal{C}$ of the whole system, we have exactly one $\rho$.

%\begin{figure}[!ht]
%\centerline{\includegraphics[width=\textwidth,angle=-90,scale=.35]{diagram2.jpg}}
%\caption{Two different states of the system map to two different distributions of the parts of the system over $\mathcal{S}$.}
%\end{figure}

\begin{figure}
\begin{center}
\begin{tikzpicture}
\begin{axis}[
    axis lines = left,
    xlabel = \(s\),
    ylabel = {\( \rho (s) \)},
    yticklabels={,,},
    xticklabels={,,},
    ymin = 0,
]
%Below the red parabola is defined
\addplot [
    domain=0:4.25,
    samples=100,
    color=red,
]
{-1*x^6+12*x^5-55*x^4+120*x^3-124*x^2+48*x+2};
\addlegendentry{\( \rho_{c_1} (s) \)}

\addplot [
    domain=0:3.4,
    samples=100,
    color=blue,
]
{-.5*x^4+x+x^3+2*x^2};
\addlegendentry{\( \rho_{c_2} (s) \)}


\end{axis}
\end{tikzpicture}
\end{center}
\caption{Two different states of the whole system map to two different distributions of the parts of the system over $\mathcal{S}$. Functions are not normalized.}
\end{figure}

	For each subset $U$ of $\mathcal{S}$, we can count the number of balls that are those colors; call this counting function $\mu : \mathcal{S} \to \mathbb{R}$. That is, for each $U \subseteq \mathcal{S}$ we have $$\mu(U) = \frac{\sum_{s \in U} \rho(s)}{\sum_{s \in \mathcal{S}} \rho(s)}.$$ This function is normalized by definition because $\mu(S) = 1$.

%\begin{figure}[!ht]
%\centerline{\includegraphics[width=\textwidth,angle=-90,scale=.35]{diagram3.jpg}}
%\caption{Discrete example of our counting function $\mu(U)$. (not normalized)}
%\end{figure}

\begin{figure}
\begin{center}
\begin{tikzpicture}
\begin{axis}[
    axis lines = left,
    xlabel = \(s\),
    ylabel = {\( \rho (s) \)},
    yticklabels={,,},
    xticklabels={,,},
    ymin = 0,
    xmax = 4,
]
\addplot[name path=f,domain=0:6,blue,samples = 50]{-.5*x^4+x+x^3+2*x^2};
\path[name path=axis] (axis cs:1.5,0) -- (axis cs:12,2);

    \addplot [
        thick,
        color=blue,
        fill=blue,
        fill opacity=0.1,
    ]
    fill between[
        of=f and axis,
        soft clip={domain=1.5:2},
    ];
\node [rotate=0] at (axis cs:  1.75,  4) {$\mu(U)$};
%\node [rotate=45] at (axis cs:  1.75,  9) {$x=1$};

\draw [decorate, decoration={brace,amplitude=15pt,raise=1pt}] (1.5,0) -- (2,0) node [midway, anchor = south, yshift = 1.75mm, outer sep=10pt,]{U};

\end{axis}
\end{tikzpicture}
\end{center}
\caption{One dimensional continuous example of our counting function $\mu(U)$. $U$ is a region of phase space and $\mu(U)$ is the fraction of the system that is found in this region.}
\end{figure}


	%\emph{formalize continuous case, improve example}

	If our system is continuous the main ideas above hold, but we must make two changes to the formalization. First, our distribution $\rho : \mathcal{S} \to \mathbb{R}$ becomes a continuous function that describes of states over the space and second, $\mu$ becomes an integral: $$\mu(U) = \frac{\int_{U} \rho(s) d\mathcal{S}}{\int_{\mathcal{S}} \rho(s) d\mathcal{S}}$$ where $d\mathcal{S}$ gives the number of states in an infinitesimal area. We will drop the denominator from now on unless it is necessary for notation's sake but keep in mind this normalization is implicit. See Figures 2 and 3 for a visual representation of this phase space and counting function.

	 For a physical example let's consider a classical gas distributed in a volume. The state of the whole is the distribution. Now we can see that our counting function will count the amount of gas in a region of this volume by integrating the density over the region in question.

%\begin{figure}[!ht]
%\centerline{\includegraphics[width=\textwidth,angle=-90,scale=.45]{diagram4.jpg}}
%\caption{One dimensional continuous example of our counting function $\mu(U)$.}
%\end{figure}

\begin{defn}
	Let $\mathcal{C}$ be the state space of a system. The system is \textbf{infinitesimally reducible} to the infinitesimal parts (i.e. particles) identified by the state space $\mathcal{S}$ if there exists a measure $dS$ such that for every state of the whole system $c \in \mathcal{C}$ there exists $\rho : \mathcal{S} \to \mathbb{R}$ such that for every $U \subseteq \mathcal{S}$ the counting function $\mu(U)$ corresponds to the fraction of the system found within those states; as usual this definition means $\mu(S) = 1$.
\end{defn}

\subsection{Constraint on Coordinate Transformations}
\iffalse
\begin{itemize}

\item We can now express density in terms of state variables. These state variables must be differentiable.

\item Prop - the state space must be a differentiable manifold

\item How should the change of state variable affect the density (units/value)?

\item On one side, it should be invariant. rho(s), like a temperate

\item On the other hand, units of the density are affected by the units of the state variables (mass example, include cart -> polar)

\item How do we solve this? Restrict state variable to "canonical", those that allow us to express the density in the correct units.

\item Jacobian must be unitary under canonical transformation

\item Prop - the state space must allow state variables that allow the expression of the density in the correct units (canoncial). Canonical transformations must have unitary and dimensionless Jacobian.
\end{itemize}
\fi

	%\emph{motivate necessity of labeling state space with numbers}
	%\emph{example and expand explanation. Discuss correct variables and set of quantities}

	If we want to further describe the state of the system we need to identify our states in $\mathcal{S}$ with numbers so that all states are uniquely labeled. This system of labeling will give us our \textit{state variables}. For example imagine an ideal gas in a box. We know that we can describe the state of this gas by giving its pressure, temperature, or volume. By giving two of these quantities we have completely described the state of the gas as the third can be ascertained from knowledge of the other two through the ideal gas law. The state of the gas is uniquely determined by the values of two quantities. Thus we have a one to one map from the state space to the values of a set of numerical quantities. There is a precise number of state variables for every system. From our ideal gas example if we were to also include the third quantity we would have introduced a redundant label. For each pressure-volume pair there is exactly one possible temperature so if we include temperature in our labeling we no longer have a one to one map between the state space and sets of values as some triplets do not correspond to possible states. For each state there is one well defined pressure-volume pair.

\begin{defn}
	A state variable assigns one numerical quantity to each state. Formally, it is a map $\xi : S \to \mathbb{R}$.\footnote{Note this $S$ indicates a subspace of the particle state space $\mathcal{S}$. Each $\xi$ maps a subspace  $S \le \mathcal{S}$ to $\mathbb{R}$.} A complete set of state variables fully identifies a state. Formally, it is an invertible  map $\xi^a : \mathcal{S} \rightarrow \mathbb{R}^n $. This gives us a state space that is locally isomorphic to $\mathbb{R}^n$ thus our state space is an n-dimensional \textbf{manifold}.
\end{defn}

	%\emph{Write $\rho$ as a density. Clarify density vs distribution}

	Expressing $\rho$ in terms of our new state variables is the next natural step. Consider a cannon ball that is shot through the air. If the ball is spinning the position and momentum of each infinitesimal region of the ball will vary. Let the position and momentum be our state variables. The parts of the system will be distributed across these positions and momenta. Writing our distribution in terms of these variables will give us a density. We want $\rho$ to be a map $\rho(\xi^a): \mathbb{R}^n \to \mathbb{R}$. When we defined our state variables we noted they created an invertible map between states and numbers labeling those states. Using this we see $\rho(\xi^a)$ is simply our distribution function $\rho(s)$ composed with the inverse of $\xi^a: \mathcal{S}\to \mathbb{R}^n$. That is: $$\rho(\xi^a) = \rho(s(\xi^a)).$$ This density is a map from our state variables to a numerical value that describes how much of the system occupies a region on this manifold.

	 Note that the density and the distribution are distinct functions that describe the same thing; specifically the distribution is $\rho(s) : \mathcal{S} \to \mathbb{R}$ and the density is a composition of the distribution function with the state variables i.e.~$\rho(\xi^a) : \mathbb{R}^n \to \mathbb{R}$. The density requires labeling the space with state variables to be defined while the distribution does not.\footnote{Note that we will use $\xi^a, \xi^b, \xi^c$ for state variables; $q^i, q^j, q^k$ for unit variables; $\xi^\alpha, \xi^\beta, \xi^\gamma$ for state variables that include time (e.g.~four momentum). For distributions/densities we use $\rho$; where we need to distinguish between the two we will use $\rho(s)$ for the distribution and $\rho(\xi^a)$ for the density. States are indicated with $s$ and state variables with $\xi$. Note the duplicity of $\rho$, $s$, and $\xi$. We use $\xi^a(s)$ as $\mathcal{S} \to \mathbb{R}^n$; $s(\xi^a)$ as $\mathbb{R}^n \to \mathcal{S}$.
Thus we have the following identities: $\rho(s(\xi^a)) = \rho(\xi^a)$; $\rho(s) = \rho(s (\xi^a(s)))$.}

	%\emph{jacobian of density under coord change -> differentiability (value of density at origin)}

	Under a change of state variables $\rho(\xi^a)$ must have a well defined transformation rule. For example say we have the density as a function of $x$ and $y$. Then if we change to polar coordinates we now want $\rho(r,\theta)$. How do we find $\rho(r,\theta)$ in terms of $\rho(x,y)$? With a little thought we can write $$\rho(x,y) = \rho(r,\theta)\begin{vmatrix}
\frac{\partial r}{\partial x} & \frac{\partial r}{\partial y} \\
\frac{\partial \theta}{\partial x} & \frac{\partial \theta}{\partial y}
\end{vmatrix}.$$ Evidently these partial derivatives need to be defined to have a well defined transformation of $\rho$. More generally our transformation rule is of the form $$\rho(\xi^a) = \rho(\hat{\xi}^b)\left|\frac{\partial\hat{\xi}^b}{\partial\xi^a}\right|.$$

	In our example above we do have one point at which the transformation is not well defined: the origin. We do not have well defined partial derivatives at the origin as theta is not well defined at that point. Thus our density is not well defined at the origin in polar coordinates.

	We will require that any state variables labeling our state space be \textit{differentiable} over all of $\mathcal{S}$. This ensures we can define a change of variables without introducing points in $\mathcal{S}$ where our density is not well defined. So our state space will be a manifold labeled by differentiable state variables.

\begin{prop}
	The state space of our classical particles $\mathcal{S}$ is an n-dimensional \textbf{differentiable manifold}.
\end{prop}

%\emph{give example of coordinate independent quantity to get reader to think about what that means; clarify independent/dependent vs invariant rho(x,y,z) vs T(A) T(x,y) = T(r,theta) vs rho with Jacobian}

	 As we saw above, the density $\rho(\xi^a)$ will be affected by the labeling of our state space. We also know that our distribution, $\rho(s)$ should be independent of our choice of state variables; the way the system is distributed over the state space does not change even if we choose new state variables to label the space.  This will give us an additional consideration when thinking about how $\rho$ transforms.

	 Consider an example using temperature. We can give the temperature at a point in space without a coordinate system. Take a point $A$ in three-dimensional space. There is no need for a conception of any other unit to define the temperature. Formally this means we can write $T(A): \mathcal{M} \to \mathbb{R}$. This is a direct map from spacetime to a number; we do not need to label spacetime using a coordinate system. This independence also means logically that changing our coordinate system does not change the numerical value of temperature at a point nor its units. We can write $T(A) = T(x,y) = T(r,\theta)$ meaning that as long as the point we are discussing remains the same, the value of $T$ is the same. So temperature is \textit{coordinate independent} and invariant under change of coordinates.\footnote{We say a quantity is \textit{coordinate independent} if it can be defined without a  coordinate system. To be \textit{coordinate invariant} means it will not change under a change of coordinates.}

	%\emph{$\rho$ must be invariant}

	We know that the amount of a system occupying a region of state space does not change with the variables we use to label that space. Our distribution $\rho(s)$ is coordinate independent thus it is unaffected  by the choice of state variables $\xi^a$. We want this to translate to our density $\rho(\xi^a)$ because the distribution and the density represent the same thing: how the system is distributed over state space. This means that under a change of state variables density will be invariant i.e.~$\rho(s(\xi^a)) = \rho(s(\hat{\xi}^b))$.

	%\emph{explain solving these apparent contradictions}

	We have found two seemingly contradictory transformation rules for our density $\rho(\xi^a)$. On one hand, $\rho$ must transform as a density i.e.~its value will vary under a change of variables; on the other hand $\rho$ must be invariant under a change of variables because it should match the distribution. We solve this issue by restricting which state variables we use to label our state space. We will throw out any choice of variables that do not produce a $\rho$ that obeys both of these restrictions.

%Diagram 4 here showing transform as density and invariance

\begin{prop}
	Because it is a density and is coordinate invariant, $\rho$ will transform as a scalar and a density under a change of state variables $\hat{\xi}^b = \hat{\xi}^b(\xi^a)$. Thus $\rho(\xi^a) = \rho(\hat{\xi}^b)\left|\frac{\partial\hat{\xi}^b}{\partial\xi^a}\right|$ and $\rho(s(\xi^a)) = \rho(s(\hat{\xi}^b))$. The state variables that allow $\rho$ to be expressed in a way that satisfies these conditions are called \textbf{canonical variables}.
\end{prop}

	%\emph{the consequences from the above conclusions, simplify}

	Using these two conditions together we have that $\left|\frac{\partial\hat{\xi}^b}{\partial\xi^a}\right| = 1$. Using our counting function let us see if we can make a little more sense of this physically.

	We can write our counting function $\mu(U)$ in terms of our state variables as $\mu(U) = \int_{\xi(U)} \rho(\xi^a) \prod_a d\xi^a$. If we change our state variables we know that that fraction of the system in a region $U \subseteq \mathcal{S}$ will not change. Thus $\mu(U)$ must be invariant under such a change. Let $\hat{\xi}^b = \hat{\xi}^b(\xi^a)$ be our change of variables. This gives us $$\mu(U) = \int_{\xi(U)} \rho(\xi^a) \prod_a d\xi^a = \int_{\hat{\xi}(U)} \rho(\hat{\xi}^b) \prod_b d\hat{\xi}^b.$$
	In more concrete terms we know that to find the total mass in a region $U$ from a density we write the integral $$\int_{U} dm = \int_{U} \rho_m dV.$$ If we want to conserve the value of this integral under a change of variables we know that if $\rho_m$ changes, the value of $dV$ must change accordingly. Proposition 4 however, tells us that the value of $\rho$ cannot change, thus we know the value of our 'volume' element $d\xi^a$ cannot change either. Great! This is exactly what proposition 4 tells us.

%	Using our transformation rule we can write $\int_{\hat{\xi}(U)} \rho(\hat{\xi}^b)d\hat{\xi}^b = \int_{\xi(U)} \rho(\hat{\xi}^b)\left|\frac{\partial\hat{\xi}^b}{\partial\xi^a}\right|d\xi^a = \int_{\xi(U)} \rho(\xi^a)d\xi^a$. We already know $\rho(\xi^a) = \rho(\hat{\xi}^b)$ by proposition 4; we can drop the integral to get $\rho(\xi^a) = \rho(\hat{\xi}^b)\left|\frac{\partial\hat{\xi}^b}{\partial\xi^a}\right|$ which implies that $\left|\frac{\partial\hat{\xi}^b}{\partial\xi^a}\right| = 1$. This means that the value of the volume element\footnote{this volume element describes volume on our manifold $\mathcal{S}$.} as expected doesn't change.

\begin{prop}
	The state space $\mathcal{S}$ must admit canonical state variables. A change of state variables must be differentiable and must have a unitary Jacobian i.e. $\left|\frac{\partial\hat{\xi}^b}{\partial\xi^a}\right| = 1$. Such a transformation is a \textbf{canonical transformation}.
\end{prop}

\begin{center}
\renewcommand{\arraystretch}{1.6}
 \begin{tabular}{p{0.1\textwidth} p{0.25\textwidth} p{0.55\textwidth}}
 \hline
 Symbol & Name & Description \\ [0.5ex]
 \hline\hline
 $\mathcal{C}$ & State Space of Whole System & Contains all possible states of the whole system. Each state $c$ maps to a unique distribution $\rho(s)$\\ [2ex]
 \hline
 $\mathcal{S}$ & State Space of Classical Particles & Contains all possible states of the parts of the system. Particles are distributed according to $\rho(s)$. \\ [2ex]
 \hline
 $\mathcal{M}$ & Space-Time & Physical spacetime. \\ [2ex]
 \hline
 $\rho(s)$ & Distribution of States & $\mathcal{S} \to \mathbb{R}$ \\
 \hline
 $\rho(\xi^a)$ & Density of States & $\mathbb{R}^n \to \mathbb{R}$ \\
 \hline
 $\mu(U)$ & Counting function & $\mathbb{P}(\mathcal{S}) \to \mathbb{R}$ \\
 \hline
 $\xi^a$ & State variables & $\mathcal{S} \to \mathbb{R}^n$ \\
 \hline
 $\rho(\xi^a)$ & Density & $\mathbb{R}^n \to \mathbb{R}$ \\
 \hline
 $\hat{\xi}^b(\xi^a)$ & Change of Variables & $\mathbb{R}^n \to \mathbb{R}^n$ \\ [1ex]
 \hline
\end{tabular}
\end{center}

\subsection{Unit Variables, Conjugate Pairs, and 2n Dimensional State Space}

\iffalse
\begin{itemize}
	\item Show that units are linked by a unit system. Not all unit for all quantities can be chosen independently. distance, time $\to$ velocity, acceleration; energy, entropy $\to$ temperature $\frac{\partial{S}}{\partial U} = \frac{1}{T}$. Some units are fundamental, some are derived.

	\item Therefore, there exists a subsets of state variables, that define the unit system. Call them q

	\item a change of units is a change only on the q.  The other variable will have a change of units that is derived from the fundamental unit. Independent units can be changed independently (they don't induce a change on the other fundamental units)

	\item Definition

	\item What are the relationships between units within state variable such that we can have canonical transformations. Start with set of variable with just one fundamental unit. Find that have only one state variable with derived units. That's the conjugate.

	\item General case of multiple independent units. Change one by one, find conjugates. State space is even dimensional.

	\item Proposition

\end{itemize}
\fi

	%\emph{Talk about unit system in general: some choices are free, some are dependent. Give example of dependent/independent unit variables (like velocity down). Dimensionless 1 math throws away units}

	We know that to define a density we need some kind of units of area or volume. We must remember that we are working with physical systems here. This means that the units of our state variables are important. Some of our state variables will define a unit and others will have their units derived from the units of other state variables. For example if distance in a direction is defined in terms of meters and time is defined in terms of seconds, velocity in that direction must be defined in meters per second and acceleration must be of units of meters per second squared. The units of velocity and acceleration are derived from the definition of the units of distance and time. Changing the definition of a fundamental unit will also change the definitions of all the units derived from it.

	%\emph{example no. 2}

	Another example of this relationship between variables and their units is the set energy, entropy, and temperature. If we define the units of energy and entropy, we have no choice in the units of temperature because the relation $T dS = dU \to T = \frac{dU}{dS}$ ensures that the units of temperature are equivalent to the units of energy divided by the units of entropy. We could just as easily define the units of energy and temperature which would give us the units of entropy through the same relation.

	 We will define a \textit{unit system} using a subset of our state variables which we will call our \textit{unit variables}. These unit variables will define the fundamental units underlying all other state variables. Changing these unit variables will induce a change in all of the other state variables that rely on the definition of that unit. Take our distance, velocity, and acceleration example; if we change the units of distance from meters to light years, our definitions of velocity and acceleration must change as well, but our units of time remain unchanged. Conversely if we change our units of time, the units of velocity and acceleration will still change, but our unit of distance is unaffected. Unit variables are independent of one another. Ostensibly, the definitions of temperature and time have nothing to do with one another.  Generalizing this fact we see that if we change one unit variable we do not induce a change in any other unit variables. This change of units must be unique given our knowledge that state variables are invertible bijections.

\begin{defn}
	We define a unit variable $q \in \xi^a$ as a state variable that is the definition of a unit. The subset of unit variables $q^i \subset \xi^a$ defines a unit system upon which the other state variables depend meaning that a transformation $\hat{q}^j = \hat{q}^j(q^i)$ induces a unique change of state variables $\hat{\xi}^b = \hat{\xi}^b(\xi^a)$. Unit variables are independent of one another.
\end{defn}

	%\emph{what actually happens under change of unit variables?}

	Now we must ask how a change of unit variables should affect the units of our density $\rho(\xi^a)$. Let's consider a typical example of a density: mass distributed over three dimensional space $\mathcal{M}$. If I have a mass distribution, and I ask you how much mass is there in a region $U \subseteq \mathcal{M}$, your answer would be a number followed by a unit. This response would be coordinate independent as the definition of the region $U$ does not require a coordinate system; your answer depends only on the defined region of interest and the units of mass; there is no dependence on the choice of coordinates used to describe the space. We can talk about points in this space without a defined coordinate system. Take a point $A \in \mathcal{M}$; this is the same point in all references, and we do not need to have a coordinate system defined to talk about point $A$. If I ask for the mass density at that point, however, I need to specify a unit like $m^3$ to describe volume. Formally, I have to specify the unit system of the space. In math terms this means that the density function is defined as $\rho_m : \mathbb{R}^3 \to \mathbb{R}$ not $\rho_m : \mathcal{M} \to \mathbb{R}$ just as we saw in the previous section. We cannot define a map directly from $\mathcal{M}$ to $\mathbb{R}$ because without a well defined notion of distance, density makes no sense. If we were to change our unit system to $cm^3$ the actual numerical value of the density at each point changes. The change of units labeling the space also means that the units of the density change as well. The unit system of the state space will determine the units used to describe density. See Figure 4 for an explicit example.

	As we saw in section $2.1$ the Jacobian of a change of variables will give the transformation rule of the density; the transformation rule for the units of the density will similarly be given by the units of the Jacobian. But we also know that like our counting function $\mu(U)$, we want our density to be unaffected by changes in our state variables in both its value and units. Because we will only use canonical state variables, we have ensured that the Jacobian of a change of state variables will be \textit{unitary}. This means that not only will the value be equal to one, but also that it will be unitless. This ensures that like our counting function, $\rho$ will have the same units after a change of variables.

	 %Continuing with our hypothetical change in variables we know that a change of unit variables will induce a unique change in the other state variables. Because this change is unique we need only specify a change in unit variables.

	 %So we will ourselves change only the unit variables; the other state variables must change in accordance with the new definitions of units such that the units of density do not change.

%\begin{figure}[!ht]
%\centerline{\includegraphics[width=\textwidth,angle=90,scale=.45]{diagram5.jpg}}
%
%\end{figure}

\begin{center}
\begin{figure}[h!]
\hspace*{.12\linewidth}
%\begin{minipage}{.2\textwidth}
\begin{tikzpicture}[scale=0.65]
\begin{axis}[
    axis lines = left,
    xlabel = \( cm \),
    ylabel = {\( cm \)},
    ymin = 0,
    ymax = 110,
    xmin = 0,
    xmax = 110,
]
\draw[line width=2pt] (50,50) -- (100,50);
\draw[line width=2pt] (100,50) -- (100,100);
\draw[line width=2pt] (100,100) -- (50,100);
\draw[line width=2pt] (50,100) -- (50,50);
\node [rotate=0] at (axis cs:  75,  75) {\Large $.0001\frac{kg}{cm^2}$};
\end{axis}
\end{tikzpicture}
%\end{minipage}
%\begin{minipage}{.2\textwidth}
\hspace{1cm}
\begin{tikzpicture}[scale=0.65]
\begin{axis}[
    axis lines = left,
    xlabel = \( m \),
    ylabel = {\( m \)},
    xmin = 0,
    xmax = 1.1,
    ymin = 0,
    ymax = 1.1,
]
\draw[line width=2pt] (.5,.5) -- (1,.5);
\draw[line width=2pt] (1,.5) -- (1,1);
\draw[line width=2pt] (1,1) -- (.5,1);
\draw[line width=2pt] (.5,1) -- (.5,.5);
\node [rotate=0] at (axis cs:  .75,  .75) {\Large $1\frac{kg}{m^2}$};
\end{axis}
\end{tikzpicture}
%\end{minipage}
\caption{Visualization of the how the units with which we label our space affect the density. Both areas contain exactly $.25kg$ uniformly distributed, but their density functions are different because their unit systems are different.}
\end{figure}
\end{center}

	Now we will examine formally this change of units. Let's start with the simplest case: one unit variable suffices to describe our unit system. We then can write our set of state variables as $\xi^a = \{q,k^\mu\}$ where $k^\mu$ could represent any number of other state variables. Let's suppose we perform an arbitrary change of units. We then have $\hat{q} = \hat{q}(q)$. How must our set $\{k^\mu\}$ change while upholding our restriction to a unitary Jacobian? Our Jacobian block matrix will be of the form $$1 = \left|\frac{\partial\hat{\xi}^b}{\partial\xi^a}\right| = \begin{vmatrix}
\frac{\partial \hat{q}}{\partial q} & \frac{\partial \hat{q}}{\partial k^\mu} \\
\frac{\partial \hat{k}^\nu}{\partial q} & \frac{\partial \hat{k}^\nu}{\partial k^\mu}
\end{vmatrix}.$$

	If there are no $k^\mu$ variables then our constraint equation becomes $\left|\frac{\partial\hat{\xi}^b}{\partial\xi^a}\right| = \left|\frac{\partial \hat{q}}{\partial q}\right| = 1$. This does not allow us to make an arbitrary change in units meaning this case is invalid. We must have one or more $k^\mu$ variable(s).

	Because $\hat{q}$ has no dependence on $k^\mu$ we know that $\left|\frac{\partial \hat{q}}{\partial k^\mu}\right| = 0$. This means that our constraint becomes $\left|\frac{\partial \hat{q}}{\partial q}\right|\left|\frac{\partial \hat{k}^\nu}{\partial k^\mu}\right| = 1$. We know that $\hat{q}$ only depends on $q$ so we can write $(\left|\frac{d \hat{q}}{d q}\right|)^{-1} = \left|\frac{d q}{d \hat{q}}\right|$. We then get $\left|\frac{\partial \hat{k}^\nu}{\partial k^\mu}\right| = \left|\frac{d q}{d \hat{q}}\right|.$ This provides only one constraint on our transformation. We know that we must have a unique transformation. So if we have two or more $k^\mu$ variables we cannot say that $q$ fully defines the unit system for all other state variables as a change in $q$ does not induce a unique change in the other state variables as it should. Thus we must have exactly one $k$ variable and our manifold will be two dimensional. This $k$ variable is called a \textit{conjugate variable}.

	What are the units of the conjugate variable $k$? Recall the earlier equality $\left|\frac{d\hat{q}}{dq}\right|\left|\frac{\partial \hat{k}}{\partial k}\right| = 1.$ It is important that we know this equality gives us a unitless quantity. This means we can relate the units of $k$ to the units of $q$. We determined earlier that $k$ is a function only of $q$ and as such the units of $k$ written as $[k]$ must only be a function of the units of $q$. Rearranging the earlier equality gives us $$\left|\frac{d\hat{q}}{dq}\right| = \left|\frac{\partial k}{\partial \hat{k}}\right|.$$ This means we will write the units of $k$ as $[k] = \frac{1}{[q]}$.

	%Recall the earlier equality $\left|\frac{d\hat{q}}{dq}\right|\left|\frac{\partial \hat{k}}{\partial k}\right| = 1$. Now suppose $q$ is defining a distance in meters and $\hat{q}$ defines distance in centimeters. This means the units of $\left|\frac{d\hat{q}}{dq}\right|$ would be $\frac{cm}{m}$. In order for the Jacobian to be unitary the units of $\left|\frac{\partial \hat{k}}{\partial k}\right|$ must then be $\frac{m}{cm}$. Now this allows us a choice of the units of $k$. The only restriction we have is that the units of $\frac{\hat{k}}{k}$ written as $\frac{[\hat{k}]}{[k]}$ must be $\frac{m}{cm}$. If the units of k were given as a density in kilograms per meter then the units of $\left|\frac{\partial \hat{k}}{\partial k}\right|$ would be $\frac{\frac{kg}{cm}}{\frac{kg}{m}}$ which would still result in a unitary Jacobian. By definition the units of $k$ only depend on the units of $q$. This means the units of $k$ are restricted to be precisely the inverse units of the corresponding $q$ i.e. $[k] = \frac{1}{[q]}$. Ostensibly we have a choice of the units of $k$ because we have a choice

	How does this generalize to higher dimensions? Let's consider a state space whose unit system is described by $\{ q^i \}^n_{i=1}$ where each $q^i$ is an independent unit variable. This means that a change in one of the unit variables will not affect the other unit variables and that this change will be a function only of the original unit variable. Now we will perform individual changes of unit variables. We will write the first of these changes as $\hat{q}^1(q^1) = \hat{q}^1$. Writing out the resulting change of units explicitly: $$\{ \hat{q}^j \}^n_{j=1} = \{ \hat{q}^1(q^1), q^2, ..., q^n \}.$$ Because we are fixing all of the $\{ q^i \}^n_{i=2}$ variables, this case is the same as the one unit variable system. This means that we will find precisely one conjugate $k_1$ variable that corresponds to $q^1$ whose units are the inverse of those of $q^1$. Continuing this process we see that the set of state variables must be given by $\{ q^i, k_i \}^n_{i=1}$ and that we must have a $2n$ dimension manifold.

\begin{prop}
	The state space of the particles is $2n$ dimensional. The state variables are organized in pairs $\{q^i, k_i\}$ where the set $\{q^n\}$ defines the unit system. These state variables have a well defined transformation rule with a unitary Jacobian.
\end{prop}

	%Now how do we ensure that our counting function $\mu(U)$ remains a pure number? We must constrain the units of our conjugate variables in order to do this. As we saw in figure [placeholder] a typical density changes as the variables used to label the space are changed. In our case however, we know that under a canonical change of variables our Jacobian is unitary. How can we understand this in terms of our change of units $\hat{q}^j = \hat{q}^j(q^i)$? We know we will have a corresponding change of conjugate variables $\hat{k}_j = \hat{k}_j(q^i,\hat{q}^j)$. We also know that the value of our volume element $d\xi^a = dq^i \land dk_i$ will not change in value or units. Thus we know that each $k_i$ must be of inverse units of its corresponding $q^i$. This means that any change in the unit variables $\{q^i\}$ will have a corresponding inverse change in conjugate variables $\{k_i\}$.\emph{change to differential justification}

	%\emph{needs work}

	%Generalizing an equality we used earlier to higher dimensions gives $$\left|\frac{\partial \hat{k}_j}{\partial k_i}\right| = \left|\frac{d q^i}{d \hat{q}^j}\right|.$$ Evidently the change of units in $q^i$ induces an inverse change in the units of $k_i$. We know that the units of $k_i$ are only defined by the units of its corresponding $q^i$. We we know that the units of $k_i$ must be exactly the inverse units of $q^i$ i.e. $[k_i] = \frac{1}{[q^i]}$.

\begin{prop}
	The units of conjugate variables $\{k_i\}$ are the inverse of the units of the corresponding unit variables $\{q^i\}$. The volume element $d\xi^a = d\xi^{2n} = dq^n \land dk_n$ is invariant in value and units under a canonical change of variables.
\end{prop}

We have now successfully described the structure of phase space by making one assumption about our system. The next section will elaborate on this structure and discuss the units we use to count states before we move on to our second assumption.

\subsection{Degrees of Freedom, Areas in State Space, and Poisson Brackets}

\iffalse
\begin{itemize}

	\item (assuming we established k is inverse units of q) We now fix the units to count states. In statistical mechanics, we use "hbar" to do that. An independent degree of freedom is a pair of variables as seen before. Therefore hbar dq dk measures states, therefore it is useful to define p = hbar k. P now depends on both the unit of q and the unit we used for the states.

	\item definition a degree of freedom is a pair of variable (q, p) such that dq dp = dS

	\item We want to characterize whether any pair of state variable can form an independent degree of freedom. We define an operator that takes two variables and returns the following. If the two variables cannot form a degree of freedom, for example, they are both defining an independent unit, then the operator returns 0. If they can define an independent degree of freedom, then they return the unit change of the area $df dg/dS$.

	\item Define $\{ f , g\}$ is an operator....

	\item Now, suppose we have canonical variable for all degrees of freedom $\{q^i, p_i\}$ such that $q^1$ and $p_1$ form the same degree of freedom identified by $f$ and $g$. Then there exists an invertible transformation that goes from one to the other. Therefore the poisson bracket is the Jacobian. If the d.o.f. is spread over multiple canonical variables, one finds that (sum over all d.o.f. of the canonical variables).

	\item Prop The Poisson bracket is sum of partials... (formula)
\end{itemize}
\fi

	We showed in section $2.1$ that the volume element expressed in canonical variables $\prod_a d \xi^{a} = d\xi^1 d\xi^2 \dots d \xi^{2n}$ is invariant under a canonical transformation. We want to relate this differential to our measure $dS$. How do the variables with which we label our state space relate to the number of possible states in that space? This requires a few considerations. What units will we use to count these states? The convention in statistical and quantum mechanics is to use $\hbar$ as the units of number of states; we will continue this convention. We define the units of our measure $dS$ to be $\hbar$. We write $p = \hbar k$ in order to define a new conjugate variable $p$ that depends not only on the unit of $q$, but also on the units used to count states. Here we are exercising our freedom to choose the units of the conjugate variables. We are giving $p$ the units of number of states over the units of $q$. This choice allows us to relate the state variables labeling the space to the number of states contained in that space in a precise way.

	In defining $p$ this way we arrive at the relation $dS = \prod \limits_{i = 1}^{n} dq^{i}dp_i$. From section $2.2$ we know that each conjugate variable is a function of one and only one unit variable. This means that each individual pair of $\{q^i,p_i\}$ variables will chart a 2D surface in phase space that can be described independently of the other state variables. We will call these pairings and the surfaces associated with them \textit{degrees of freedom} (d.o.f.).

	If we want to count the number of possible states over one d.o.f. we can integrate using the measure $dS^i$. The integral that gives the number of possible states in a region $U_i$ of our surface is $\int_{U_i} dq^i dp_i = \int_{U_i} dS^i$.

\begin{defn}
	A degree of freedom (d.o.f.) is a pair $\{q^i,p_i \}$ such that $dS^i = dq^i dp_i$. The measure $dS^i$ quantifies the number of possible configurations within an infinitesimal surface identified by that degree of freedom.
\end{defn}

	How do we count states over multiple degrees of freedom? For example imagine we have a state space of four dimensions where $q^1,q^2$ are positions along the $x$ and $y$ axes and $p_1,p_2$ are momenta along those axes. The two pairings $\{q^1,p_1 \}$ and $\{q^2,p_2 \}$ are our degrees of freedom. We can count the total number of possible states by first counting the possible configurations of position and momentum in each dimension then multiplying these two counts together because the degrees of freedom are independent: the choice of a configuration in one has no effect on the choice of a configuration in the other. Thus the surfaces are orthogonal and the total number of possible states is given by the product of the possibilities in each degree of freedom.

\begin{defn}
	Two degrees of freedom are \textbf{independent} if the number of configurations identified by them is the product of the configurations identified by the individual degrees of freedom. That is, $dS = dS^1 dS^2$. Since the volume is the product of the areas, independent degrees of freedom are orthogonal.
\end{defn}

	How do we determine if a pair of variables forms a degree of freedom? Not all variables are compatible with each other. Consider our earlier example of 4D state space. We could not define the $x$-position and $y$-momentum as a degree of freedom because a change of units of the $x$-position will not induce a change in the units of $y$-momentum. To formalize this we will define an operator that tells us if two state variables are compatible.\footnote{This idea may feel familiar. That's because the operator we will define is the classical equivalent of the quantum commutator.} This operator will take two state variables $f,g$ and return 0 if the variables are not compatible and do not form a d.o.f. or the unit change of the area $dfdg/dS$ if they are. This operator is called a \textit{Poisson bracket}.

\begin{defn}
	The Poisson bracket is defined as $\{f,g\} = \sum_i \frac{\partial f}{\partial q^i}\frac{\partial g}{\partial p_i} - \frac{\partial g}{\partial q^i}\frac{\partial f}{\partial p_i}$.
\end{defn}

	Let's suppose we have canonical variables such that each $\{q^i,p_i\}$ form a degree of freedom. If $f$ and $g$ are canonical variables that form the same d.o.f. as one of these pairs then the Poisson bracket of $f$ and $g$ is $\{f,g \} = dfdg/dS^i$. If not $\{f,g \} = 0$. We know that because our degrees of freedom are independent and because we've limited ourselves to the use of differentiable state variables only, there will exist an invertible transformation between $f,g$ and $q^i,p_i$. If we recall that the Poisson bracket returns the unit change of area $dfdg/dS^i$, we see that the Jacobian of this transformation is precisely the Poisson bracket of $f$ and $g$.

\begin{prop}
	The Poisson bracket $\{f, g\}$ translates the densities of states into densities per unit area of $f, g$. It is the Jacobian of the transformation $dfdg \rightarrow dq^idp_i$ i.e. $dfdg = \{f,g\}dq^idp_i$.
\end{prop}

\section{Deterministic and Reversible Evolution}

\iffalse
\begin{itemize}
	\item Describe what deterministic and reversible mean. Bijection from past states to future states. Not enough.

	\item Example: damped harmonic oscillator. While the map is invertible, the density associated to the initial state is smaller than the final state.

	\item introduce the math (evolution, density over time)

	\item definition: define evolution, define density over time, define det/rev evolution as properties of evolution and density (i.e. density conservation)

	\item Show single degree of freedom. q and p variables. The evolution is both invertible (det/rev) and differential (because we need a Jaobian well defined). Because densities are conserved, divergence is zero. Displacement is the curl of a potential H. Get the equations.

	\item Generalizes by noting each infinitesimal surface maps to infinitesimal surfaces of the same area (same number of states). Independent d.o.f. remain independent (i.e. remain orthogonal).

	\item prop with equations

\end{itemize}
\fi

\begin{assump}[Deterministic and Reversible Evolution]
	The system undergoes deterministic and reversible evolution meaning given the state of the system at any time, its state at all past and future times is known.
\end{assump}

	We will now make an assumption about the time evolution of the system. The resulting mathematical structure will describe this evolution in terms of our state variables.

	For a system to undergo deterministic and reversible evolution there must be a bijection between all past and future states of the classical particles that make up the composite system. We must also have that the density of states, $\rho$, is conserved over time. Now this does not mean that our classical particles cannot change state. It means that if we have a certain number of classical particles in a initial state, all of those particles and no others will evolve together and end up in the same final state. For example, if half of the system begins in state A and one of these particles is later found in state B then we know that exactly half of the system is found in state B at that time.

	Why must we have the additional requirement of the conservation of the density when it seems the existence of the bijection would be sufficient for the evolution of our system to be deterministic and reversible? Here is an example of a system for which we can write such a bijection, but for which the density of states is not conserved. We will see that the condition of the bijection alone is not strong enough to provide us with the desired mathematics.

	Imagine we have a one dimensional damped harmonic oscillator. We have a one to one map from past states to future states. But we also know that no matter the initial state, as the system evolves it will approach its rest state because the oscillator is damped. Our state variables will be position and momentum. We know that in the real world all measurements have finite precision. This informational granularity tells us that at some point in the above example we will not be able to tell the difference between the state in which the oscillator  is barely moving and its rest state; a measurement will not be able to distinguish between the two states, see Figure 5 for a visualization of this. Thinking of this in terms of our knowledge of the system we see that as time progresses and our states collapse towards the rest state, we lose information about the system. This means in reality our system is not reversible because once it passes a certain point we cannot deduce the past state of the system from its current state; when the system is at rest we have no idea of its initial condition. Thus the addition of the requirement that density be conserved over time is necessary to ensure we can always deduce the past states of the system from its current state in a physically meaningful way.

%\begin{figure}[!ht]
%\centerline{\includegraphics[width=\textwidth,angle=90,scale=.6]{streamfunctiondiagram.jpg}}
%
%\end{figure}
\pgfplotsset{ticks=none}
\begin{figure}
\hspace*{.22\linewidth}
\begin{tikzpicture}[scale = 1.7]
\tikzset{shift={(current page.center)},xshift=3cm}
   \begin{axis}[view={120}{30},
                axis lines=center,
                xlabel=$q$,ylabel=$t$,zlabel=$p$,
                xmin = -1,
                xmax = 1,
                zmin = -1,
                zmax = 1,
                no marks,
                xticklabels={,,},yticklabels={,,}]
     \addplot3+[line width=.5pt,no markers,samples=70,color=blue,  samples y=0,domain=0:5*pi,variable=\t]
                                      ({exp(-.3*\t)*cos(\t r)},.5*\t,{exp(-.3*\t)*sin(\t r)});
	\addplot3+[line width=.5pt,no markers,samples=70,color=violet, samples y=0,domain=0:5*pi,variable=\t]
                                      (-{exp(-.3*\t)*cos(\t r)},.5*\t,-{exp(-.3*\t)*sin(\t r)});
	\addplot3+[line width=.5pt,no markers,samples=70,color=cyan, samples y=0,domain=0:5*pi,variable=\t]
                                      ({-exp(-.3*\t)*sin(\t r)},.5*\t,{exp(-.3*\t)*cos(\t r)});
\addplot3+[line width=.5pt,no markers,samples=70,color=magenta, samples y=0,domain=0:5*pi,variable=\t]
                                      ({exp(-.3*\t)*sin(\t r)},.5*\t,-{exp(-.3*\t)*cos(\t r)});
	\addplot3+[line width=.5pt,no markers,samples=70,color=blue,  samples y=0,domain=0:5*pi,variable=\t]
                                      ({.3*exp(-.5*\t)*cos(\t r)},.5*\t,{.3*exp(-.5*\t)*sin(\t r)});
	\addplot3+[line width=.5pt,no markers,samples=70,color=violet, samples y=0,domain=0:5*pi,variable=\t]
                                      (-{.3*exp(-.5*\t)*cos(\t r)},.5*\t,-{.3*exp(-.5*\t)*sin(\t r)});
	\addplot3+[line width=.5pt,no markers,samples=70,color=cyan, samples y=0,domain=0:5*pi,variable=\t]
                                      ({-.3*exp(-.5*\t)*sin(\t r)},.5*\t,{.3*exp(-.5*\t)*cos(\t r)});
\addplot3+[line width=.5pt,no markers,samples=70,color=magenta, samples y=0,domain=0:5*pi,variable=\t]
                                      ({.3*exp(-.5*\t)*sin(\t r)},.5*\t,-{.3*exp(-.5*\t)*cos(\t r)});

\node [rotate=0] at (axis cs:  0,6,.28) {\Large $t_c$};
\draw[line width=1pt] (0,6,.1) -- (0,6,-.1);


  \end{axis}
\end{tikzpicture}
\caption{The evolution of a damped harmonic oscillator in phase space from $8$ different initial conditions. At some critical time $t_c$ we can no longer distinguish this system from the oscillator at rest given finite precision of measurement.}
\end{figure}

\begin{defn}
	A deterministic and reversible system is one for which all particles in a state are mapped to one and only one past or future state. That is, given a set of initial values $\xi^\alpha_0$ at a time $t_0$, there is only one possible evolution $\xi^\alpha(t)$, such that $\xi^\alpha(t_0) = \xi^\alpha_0$. Moreover, $\rho(\xi^\alpha(t))$ is a constant.
\end{defn}

	We will continue to consider a system with a single degree of freedom and state variables $\{q,p \}$. If we transport our 2D state space in time $t$, we can map the evolution of each classical particle in $\mathbb{R}^3$. The evolution of the state of each particle can be written as $\xi^\alpha(t)$ and the fraction of the system that is found in that state is $\rho(\xi^\alpha(t))$. Note we are using $\xi^\alpha$ to denote the state variables in the extended phase space that includes time i.e. $\xi^\alpha = \{ \xi^a, t\}$.

	 If we change infinitesimally in time we can write $\xi^\alpha(t + dt) = \xi^\alpha + \frac{d\xi^\alpha}{dt}dt$. We know that $\xi^\alpha(t)$ is differentiable because $\rho$ is always defined. We then will write a vector field $S^\alpha =\frac{d\xi^\alpha}{dt} \implies S = (\frac{dq}{dt},\frac{dp}{dt},\frac{dt}{dt} )$. This vector field describes the time evolution of our state variables.

	Thinking about our damped oscillator example, we see that the vector field $S$ for this system will have a nonzero divergence because the states of the classical particles starting from different initial conditions get closer together as the system evolves towards its rest state as illustrated in Figure 5.

	Instead applying the requirement that $\rho(\xi^\alpha(t))$ be conserved in time, we see that the divergence of the vector field must be zero: $$\mathbf{\nabla} \cdot S = \partial_\alpha S^\alpha = \frac{\partial S^q}{\partial q} + \frac{\partial S^p}{\partial p} + \frac{\partial S^t}{\partial t} = 0.$$ Note that this means that we can write our vector field $S$ as the curl of a vector potential by the fundamental theorem: $$S = \mathbf{\nabla} \times (-\theta + \mathbf{\nabla} f)$$ where $\theta$ is our vector potential and $\nabla f(q,p,t)$ is a gauge term. In component notation this equation is $$S^\alpha = \epsilon^{\alpha \beta \gamma}(\partial_\beta(\theta_\gamma + \partial_\gamma f)).$$ We know that $\theta$  will not be unique because we can choose what gauge we work in.\footnote{Remember the curl of the gradient of a scalar field is always zero so our choice of $f$ has no effect on $S$. That is to say $S$ is gauge invariant.} We will write theta as $\theta = (\theta_q, \theta_p, \theta_t)$ and choose our gauge such that $\theta_p = 0$ by setting $\frac{\partial f}{\partial p} = -\theta_p$. This means we can write $$\theta = (\theta_q, 0, \theta_t)$$ without loss of generality. Plugging this back into $\mathbf{\nabla} \times \theta = S = (\frac{dq}{dt},\frac{dp}{dt}, \frac{dt}{dt} )$ and looking at the third component we get $$\frac{dt}{dt} = \frac{d \theta_q}{dp}.$$ Here we see that $\frac{dt}{dt} = 1 = \frac{d \theta_q}{dp}$.\footnote{Note here that the $t$ in the numerator of this differential is technically different from the $t$ in the denominator. In the numerator $t$ is an affine parameter while in the denominator it is a state variable.} Integrating with respect to $p$ gives us $$\theta_q = p + g(q,t)$$ where $g$ is an arbitrary function and thus can be set to zero. We have arrived then at $\theta = (p,0,\theta_t)$. Let $\theta_t \equiv -H$. Plugging back in again to the cross product equation we arrive at a familiar system of equations $$S^q = d_t q = \partial _{p} H$$ $$S^p = d_t p = -\partial _q H$$ $$S^t = d_t t = 1.$$ This is the standard form of Hamilton's equations for a single degree of freedom where $H$ is the Hamiltonian.

	Generalizing this to a system with $n$ degrees of freedom we recall that each infinitesimal surface in phase space is mapped in time to a surface of the same area; this means that the number possible states remains the same and that independent degrees of freedom remain independent and thus their surfaces remain orthogonal. We can break down a higher dimensional system into its d.o.f.s in order to arrive at our final equations: $$d_tq^i = \partial_{p_i}H$$ $$d_tp_i = -\partial_{q^i}H.$$ This is the generalized form of Hamilton's equations.\footnote{For a much more in depth treatment of this generalization see \cite{Carcassi_2018}.} A system that evolves according to these equations is a Hamiltonian system.

\begin{prop}
	The deterministic and reversible evolution of an infinitesimally reducible system follows Hamilton's equations. So the evolution of the system can be written in the familiar way:
	$$d_tq^i = \partial_{p_i}H$$
	$$d_tp_i = -\partial_{q^i}H$$
\end{prop}

\iffalse
\section{Kinematic Equivalence}

\begin{assump}[Kinematic Equivalence]
	(More physicsy) Trajectories in spacetime suffice to recover trajectories in phase space. We know the precise trajectory in phase space given a trajectory in physical spacetime; there is an invertible bijection between the two.(diagram showing bijection)
\end{assump}

\begin{itemize}
	\item Describe what kinematic equivalence is. Use the photon as a counter-example.
\end{itemize}

\subsection{Weak Equivalence}

\begin{itemize}
	\item By weak equivalence we mean that the trajectories of a single particle are enough to identify the state of the that particle. We are focusing only on a single particle.

	\item Introduce notation. We use q, p and x, v even though q and x both identify position. We do that because $\partial/\partial q$ is different from $\partial / \partial x$.

	\item velocity is a always function of position and momentum (because of determinism). But, it must be now invertible.

	\item This means the Hessian of the Hamiltonian is defined and non-zero everywhere. We can define a Lagrangian.
\end{itemize}

\begin{defn}
	We take $x^i = q^i$. We then define $v_i = d_tx^i$. Weak equivalence means $v_i(q^i, p_i)$ is invertible.
\end{defn}

(photon counter-example $v^i$ function only of $x^i$)

\begin{prop}
	A classical system that satisfies weak equivalence is a Lagrangian system. Formally, we can a define a Lagrangian $\mathcal{L}$ that is convex and has unique solutions.
\end{prop}

\subsection{Full Equivalence}

\begin{itemize}
	\item We now extend the requirement on the composite state. We look at the transformation rule of the density from q, p to x, v. We want the change of unit to be dependent on q only.

	\item This means the Hessian is dependent on q only and can be defined as $m g^{ij}$.

	\item Find that the Hamiltonian is the one of a massive particle under scalar and vector potential forces.
\end{itemize}

The units that are used to express the density on position and velocity must be dependent only on $q^i$ if they are to fully define the unit system.

\begin{defn}
	Full equivalence means that Kinematic Equivalence extends to the composite system. So we have $\rho(q^i,p_j) = \left|J\right|\rho(x^i,v^j) = \left|\frac{\partial v^i}{\partial p_j}\right|\rho(x^i,v^j)$. The Jacobian is only a function of $q^i$.
\end{defn}

\begin{prop}
	A system that satisfies full kinematic equivalence obeys the laws of massive particles under potential forces.
\end{prop}

\section{Advanced topics}

\subsection{Action Principle}

The action principle that many courses in mechanics take \emph{a priori} is simple a consequence of our assumptions. We can find a physical justification for our mathematical structures.

\subsection{Kinematic vs dynamics}

Differences between Newtonian/Lagrangian/Hamiltonian mechanics.

Show that they are inequivalent. Newtonian Mechanics needs n functions to define the evolution while Lagrangian/Hamiltonian requires 1. All Lagrangians that have unique solution have a Hamiltonian. All Hamiltonians have unique solutions, but not necessarily a Lagrangian. (must follow discussion on the action principle) Mix with examples.

Kinematics and dynamics. Newtonian and Hamiltonian Mechanics necessarily define both. Technically, Lagrangian Mechanics is purely kinematic (equations are only in terms of position and velocity - you can't distinguish apparent forces from real forces). The only way to recover the dynamics is to assume the system is also Hamiltonian.

Kinematic aliasing (systems with different dynamics may have the same kinematics). Example of 3 different dynamics (friction, boosted, losing mass). This is confusing because, in some cases, you can treat non-conservative systems in Lagrangian mechanics as they look like conservative systems in a non-inertial frame (naturally, the Hamiltonian and the conjugate momentum will not correspond to what you expect).

How is kinematic aliasing resolved. Newtonian mechanics resolves kinematic aliasing by invoking inertial frames and by writing equations that are valid \emph{only} in inertial frames (you can distinguish apparent forces from real forces in an inertial frame).  In Lagrangian/Hamiltonian mechanics, the equations are invariant under coordinate transformation: we can't do that anymore. You fix the system to be conservative. If you had a system of equations that were invariant under all coordinate transformation for all frames, you would not be able to tell which forces are apparent and which forces are real.

\subsection{Entropy in classical mechanics}

Entropy is invariant under coordinate transformation - the same for all observers (invariance under time is less important).

Classical uncertainty principle: fixing the entropy of the distribution gives us an inequality that is saturated by Gaussian distributions. (discuss parallel with quantum mechanics - classical mechanics allow for minus infinite entropy)

Time evolution of entropy. It is conserved only under Hamiltonian evolution. If the evolution is not Hamiltonian, the entropy does the opposite of what one expected. (dissipative systems, like damped harmonic oscillator, concentrate the distributions around the attractor, and therefore) The problem is that the same statement at different time provides different amount of information/granularity of the description.
\fi

\section{Conclusion}

	We have now successfully derived all of Hamiltonian Mechanics from two basic assumptions about a system. By assuming that our system is Infinitesimally Reducible we derived the structure of phase space. With the assumption that the system undergoes Deterministic and Reversible Evolution we found a set of equations that describe the dynamics of the system. This is fundamentally different from Newtonian Mechanics, which assumes a connection between a system's kinematics and its dynamics.

	This derivation of Hamiltonian Mechanics, while seemingly miraculous, is simply a product of creating a one-to-one mapping of physical principles to mathematical structures. This precise way of understanding the physical world provides a much more clear picture of the underlying physics within the abstract mathematics.


\bibliographystyle{ieeetr}
\stepcounter{section} % Increase counter (section) by one step
\addcontentsline{toc}{section}{\thesection \quad References} % Add to ToC (at the section level)
\bibliography{Bibliography}


\end{document}