testactions/pm_EnsembleSpaces.tex at master · assumptionsofphysics/testactions · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000

\def\>{\rangle}
\def\<{\langle}

%\DeclareMathOperator{\mix}{mix}
%\DeclareMathOperator{\component}{comp}
%\DeclareMathOperator{\cospan}{cospan}
\newcommand\mix{\mathrm{mix}}
\newcommand\component{\mathrm{comp}}
\newcommand\cospan{\mathrm{cospan}}
\newcommand\dist{\mathrm{dist}}
\newcommand\hull{\mathrm{hull}}
\newcommand\shull{\mathrm{shull}}
\newcommand\chull{\mathrm{chull}}
\newcommand\support{\mathrm{supp}}
\newcommand\scap{\mathrm{scap}}
\newcommand\fraction{\mathrm{frac}}
\newcommand\fcap{\mathrm{fcap}}

\newcommand\vspan{\mathrm{span}}
\newcommand\cl{\mathrm{cl}}

\def\separate{\downmodels}
\def\nseparate{\ndownmodels}
\def\ortho{\perp}
\def\northo{\nperp}

%\NewDocumentCommand{\separate}{o}{
%	\downmodels
%	\IfNoValueTF{#1}{}{_{#1}}%
%}
%
%\NewDocumentCommand{\nseparate}{o}{
%	\ndownmodels
%	\IfNoValueTF{#1}{}{_{#1}}%
%}

\newcommand{\ens}[1][e] {\mathsf{#1}} % Ensemble
\newcommand{\Ens}[1][E] {\mathcal{#1}} % Ensemble space

\newcommand{\titledbreak}[1]{
	\newpage
	\vspace{3em}  % Add vertical space before the title
	\noindent
	\centerline{\huge\centering\textbf{#1}} % Print the titled break title in bold
	\par\nobreak
%	\vspace{0.5em}  % Add some vertical space after the title
	\noindent\rule{\textwidth}{0.5pt}  % Add a horizontal rule
%	\vspace{1.5em}  % Add vertical space after the rule
}


\colorlet{ensembleFill}{black!10!white}

\newenvironment{openproblem}[3]{\section{Open problem: #1}\label{#2} \emph{Tags: #3} \newline \newline}{}

\chapter{Ensemble spaces}

In this chapter we aim to develop a general theory of states and processes that is applicable to any physical system. The core concept is that of an ensemble as we have found that ensembles in both classical and quantum mechanics have a very similar structure that can be abstracted and generalized. The goal is to find necessary requirements for ensembles that can serve as basic axioms and then further suitable assumptions to recover the different theories (e.g. classical, quantum or thermodynamics).

The basic premise is that physical theories are primarily about ensembles. At a practical level, most of the time we can only prepare and measure statistical properties as we do not have perfect control over any system (i.e. all measurements are really statistical). The cases where properties can be prepared with one hundred percent reliability can still be understood as ensembles of identical preparations. At a conceptual level, the goal of physics is to write laws that can be repeatedly tested: every time that one prepares a system according to a particular procedure and lets it evolve in particular conditions, he will obtain a particular result. That is, the idea of repeatability of experimental results implicitly assumes that the objects of scientific inquiry are not single instances, but the infinite collections of all reproducible instances. This means that any physical theory, at the very least, will have to provide a mathematical representation for its ensembles.

By ensemble we mean what is usually meant in statistical mechanics: we have a preparation device that follows some known recipe; its output is varied but it is consistently varied (i.e. its statistical properties are well defined); the collection of all possible outputs taken as one object is an ensemble. In classical physics, ensembles are probability distributions over the full description of the system, over classical phase space. In quantum mechanics, ensembles are represented by density matrices and density operators. In the standard approach statistical ensembles are defined on top of the space of ``true'' physical states (e.g. microstates, pure states, ...). We will proceed in the opposite way: we will start from the ensembles and recover the states as the ``most pure'' ensembles. There are two main advantages: the first is that we can create a theory that is agnostic about what the fundamental states are, and is therefore general. The second is that this approach is more in line with experimental practice: the experimental data is about statistical ensembles only and the pure states are idealizations that are useful as a mental model or for calculation.

At this point, we have identified three main requirements for ensembles. First, they must be experimentally well defined. This means that there need to be enough experimentally verifiable statements to fully characterize them. This will impose a topology on the ensemble space. Second, we can always perform statistical mixtures: given two ensembles, we can create a third one by selecting the first or the second according to a certain probability. This will impose a convex structure on the ensemble space. Third, ensembles will need a well-defined entropy which quantifies the variability of the elements within the ensemble. Since the variability cannot decrease when performing a statistical mixture and can only increase up to the variability introduced by the selection, the entropy will have to satisfy certain bounds. From these axioms, many general results can be proven. An ensemble space will be a convex subset of a vector space that will extend over a bounded interval in each direction. The concavity of the entropy will impose a metric over the space, turning it into a geometric space. Each ensemble can be characterized by a subadditive measure, which becomes a probability measure in the classical case. This makes the space of physical theories a lot more constrained than one may imagine at first.

Many problem are still open, as they touch unsettled mathematical questions, particularly in the infinite dimensional case. Therefore this chapter will include conjectures that may or may not be true, which the reader is encouraged to try proving (or disproving).

Note: all $\log$s are assumed to be in base 2.

\section{Review of standard cases}

We will start this chapter by reviewing three cases: discrete classical ensembles, continuous classical ensembles and quantum ensembles. These will be useful both to form an intuition for ensembles and to serve as targets that the whole theory needs to reproduce. We will also go though a series of problematic details and exceptions that we will need to address in the development of the general theory.

\subsection{Discrete classical ensemble spaces}

\begin{defn}
	A \textbf{discrete classical ensemble space} is the space of probability distributions over a countable sample space equipped with the Shannon entropy. That is, a discrete classical ensemble space $\Ens$ is the space of probability distributions over a discrete set $X$ of countably many elements. Each element can thus be identified by a sequence $p_i$ such that $\sum_i p_i = 1$, where $p_i$ is the probability associated to each element $x_i \in X$. The entropy is given by $S(\ens) = - \sum_i p_i \log p_i$.
\end{defn}

\subsubsection{Finite case}

The space of classical distributions over a discrete space corresponds to a \href{https://en.wikipedia.org/wiki/Simplex}{simplex}. In the finite case, the pure states $X = \{x_i\}_{i=1}^{n} \subset \Ens$ are finitely many and each ensemble $\ens = \sum_i p_i x_i$ is uniquely identified by a decomposition of pure states. Effectively, each ensemble is a probability distribution over the pure states. Mathematically, each point of the space is a convex combination of the vertices. The simplex has a center point, which corresponds to the maximally mixed state, a uniform distribution over all pure states.

The entropy is given by the Shannon entropy $-\sum_i p_i \log p_i$. This means that the entropy of each pure state is zero and the entropy of the maximally mixed state is $\log n$ where $n$ is the number of pure states. The entropy increases as we go from pure states to the maximally mixed state. The level sets (i.e. the fibers) of the entropy form a series of concentric ``shells'' that foliates the space.

Note that imposing zero entropy on all pure states is a restrictive condition that does not apply in general. To see this, consider the case where the state is defined by the number of molecules for two substances. This space is the product of two independent variables $n_a$ and $n_b$. If we have a uniform distribution over $N_a$ cases of $n_a$ and $N_b$ cases of $n_b$, the total number of cases is $N_a N_b$. Therefore the entropy of the joint state is the sum of the entropy of the marginals. However, if we pair $n_a$ with the total number of molecules $n_{(a+b)}$ we have a problem. The issue is that the variable $n_{(a+b)}$ corresponds to a variable number of joint cases. Therefore the case where the ensemble space is a simplex but the entropy is not the Shannon entropy (i.e. it is the Shannon entropy plus the contributions of entropy from each vertex) is a physically meaningful case that should be possible in the general theory.

\subsubsection{Countable case}

The countable case is, in some respects, not well defined.

The obvious extension is to include all sequences $\{p_i\} \in [0,1]$ whose sum converges to one (i.e. the space of all probability measures over a countable discrete space). Since we cannot create a uniform distribution over infinitely many cases, there is no center point, there is no barycenter. Effectively, there is a ``hole'' in the middle.\footnote{It may be useful to characterize this ``hole'' and the limit points. There should be at least one limit point for each sequence $\{p_i\}$ whose sum converges to a finite $p < 1$. Intuitively, we can keep that part of the distribution constant while we spread the rest uniformly to all other cases. Each should reach a different limit point.}

However, the space of all probability measures is too large. Note that the entropy is not finite for all $\sum_i p_i =1$ (i.e. infinite convex combinations).\footnote{For details, see \href{https://arxiv.org/pdf/1212.5630.pdf}{J. Stat. Mech. (2013) P04010}.} Given that we want the entropy to exist and be finite for all ensembles, this generalization does not seem physically warranted.\footnote{This generalization is called a   \href{https://ncatlab.org/nlab/show/superconvex+space}{superconvex space} in some literature.}

Also note that expectation values are not guaranteed to be finite either, and requiring a particular observable to be finite further restricts the space. This restriction may be desirable for another reason: a discrete ensemble space has no notion of the ordering of the pure states. Physically, this would mean that the states with 1, 100, or 1 trillion particles are ``equally distant'' (i.e. all infinite permutations are allowed). Requiring the expectation of the number of particles to be finite (e.g. $\sum_i N(i) p_i < \infty$) should effectively encode the infinite ordering in the rate of convergence of the probability distributions (i.e. not all infinite permutations would be allowed).

Another problem is identifying the correct topology for the space. The entropy defines a notion of orthogonality, as we will see later, based on the disjoint support of distributions. This suggests that we need a space with an inner product, therefore we should require the probability distribution to be square integrable. However, the inner product could be defined on the square root of the probability distribution, more in line with the quantum case, which would be well defined as probability is never negative. It is not yet clear which is the correct case.

\subsubsection{No uncountable case}

The uncountably infinite case is not physically relevant, as the space cannot be given a second countable discrete topology. Also note that any set of real numbers whose sum is finite can have only countably many non-zero elements. To understand why, note that there can only be finitely many terms above any particular positive value if their sum is to remain finite. Effectively, the uncountable case would be stitching together infinitely many countable cases.

\subsection{Continuous classical ensemble spaces}

\begin{defn}
	A \textbf{continuous classical ensemble space} is the space of probability distributions over classical phase space equipped with the Shannon/Gibbs entropy. That is, it is the space of probability measures $\Ens$ over a symplectic manifold $X$ that is absolutely continuous with respect to the Liouville measure $\mu$. The entropy is given by the Shannon/Gibbs entropy calculated using the probability density (i.e. the Radon-Nikodym derivative between the probability measure $p$ and the Liouville measure $\mu$). That is, $S(\rho) = - \int_X \rho \log \rho d\mu$ where $\rho = \frac{dp}{d\mu}$.
\end{defn}

In the continuous case, the space of ensembles can be understood as the space of non-negative integrable functions over a symplectic manifold (e.g.  over phase space) that integrate to one. That is, if $X$ is a symplectic manifold, then $\Ens = \{ \rho \in L^1(X) \, | \, \rho(x) \geq 0, \, \int_X \rho(x) d\mu = 1 \} $ where $\mu(U)=\int_U \omega^n$ is the Liouville measure. This is a convex set, whose extreme points would be, in the limit, the Dirac measures (i.e. the probability measure all concentrated at a single phase space point). As we cannot reliably prepare a system at an infinitely precise position and momentum, these distributions are not physical. Also, they would correspond to minus infinite entropy. The Dirac measure, and in general all distributions over a set of measure zero, are excluded because they are not absolutely continuous. The absence of the extreme points is important as, when developing standard constructions in the ensemble space, one cannot rely on the existence of extreme points. In general, this points to a difference between the spectra (i.e. the possible values of a random variable) and pure states (i.e. extreme points), which will be even more pronounced in the quantum case.

The symplectic nature of the manifold is required to assign a frame-invariant density to states and a frame-invariant notion of independence between DOFs, as we saw in the classical mechanics section of reverse physics. The entropy is given by $- \int \rho \log \rho d\mu$ where $\mu$ is the Liouville measure and $\rho$ is the probability density over canonical coordinates. If a different measure is used, or if the coordinates are not canonical, the formula gives the wrong result.\footnote{It may be interesting to study the shell of zero entropy states. For example, it should not be path connected. All uniform distributions with support of the same finite size (in terms of the Liouville measure) will have the same entropy. The region, however, need not be contiguous. Since we cannot continuously transform a single region into two disjoint regions, there will be different distributions at zero entropy that cannot be transformed continuously.}


Similarly to the countable discrete case, the entropy can be infinite and expectation values can be infinite. The added complication is the frame invariance: it would not make sense to have finite expectation for position in one frame but infinite in another. Requiring all functions of position and momentum to have finite expectation restricts the distributions to those with finite support. Requiring all polynomial functions of position and momentum to have finite expectation restricts the distributions to those that decay faster than any polynomial. Note that the expectation of all polynomials of position and momentum are not enough to reconstruct the distribution. Furthermore, derivatives of the distribution over position and momentum are needed to determine how the distribution evolves over time given a time evolution map. This may suggest that the proper space of probability measures does not include all the absolutely continuous ones, but only those for which the probability density is a Schwartz function. These details are still to be understood.

The above consideration would seem to rule out probability measures with a discontinuous probability density. This is somewhat of an unclear point. Originally, we thought the probability density should be continuous as only continuous functions can physically represent experimental relationships. However, the relationship given by the measure is not between points and probability but rather between sets and probability. The probability density is, in a sense, not the prime physical object. The measure is. The probability density, in fact, is not uniquely defined as countably many discontinuities can be added and the measure is not changed. This seems to suggest that what happens at a single point, or over a set of measure zero, is not critical. However, it is unclear how to recover continuity of the space without continuity of the distributions. It could be that it is the space of deterministic and reversible transformations that defines continuity. One added benefit of allowing discontinuous probability densities is that the uniform distribution gives the maximum entropy within the set of probability distribution with support over a finite measure set. This makes entropy maximization considerations a lot easier.

Unlike the discrete classical case, subspaces and dimensionality of subspaces cannot be defined without the entropy. The issue is that we need a measure on the set of pure states, and the convex structure cannot provide it. The entropy does, however, as the supremum of the entropy for all distributions with support $U$ is $\log \mu(U)$. As we will see, the entropy can be used to both identify subspaces and recover the Liouville measure.

\subsection{Quantum ensemble spaces}

\begin{defn}
	A \textbf{quantum ensemble space} is the space given by the density matrices/operators of a Hilbert space equipped with the von Neumann entropy. That is, given a separable Hilbert space $\mathcal{H}$ for a quantum system, the ensemble space $\Ens$ is the space of positive semi-definite self-adjoint operators with trace one $M(\mathcal{H})$. The space of pure states $X$ is given by the projective space $P(\mathcal{H})$. The entropy of an ensemble $\rho \in \Ens$ is given by the von Neumann entropy $S(\rho) = -\tr(\rho \log \rho)$.
\end{defn}

\subsubsection{Finite dimensional case}

The simplest non-trivial case is the qubit, for which the Bloch ball is the space of ensembles $\Ens = M(\mathcal{H})$. The interior of the Bloch ball corresponds to mixtures  while the surface corresponds to the pure states $X = P(\mathcal{H}) = \{ |\psi\> \<\psi| \}_{\psi \in \mathcal{H}}$. In quantum ensemble spaces there is no unique decomposition in terms of pure states. Note that the space is exactly characterized by knowing which different mixtures provide the same ensemble.

Multiple decompositions make the ensemble space behave in a way that is a hybrid between the classical discrete and continuous. Pure states are properly a part of the ensemble space, as in the discrete case, and we can describe each mixture in terms of finitely many pure states. However, the pure states form a continuum, therefore we can also define probability densities over the space, convex integrals. For example, for a single qubit, the maximally mixed state (the center of the ball) can be equally described as the equal mixture of any two opposite states (e.g. spin up and spin down, or spin left and spin right). However, it can also be described as the equal mixture of the whole sphere.

Note that complex projective spaces are symplectic, which is what allows one to define frame invariant densities. The goal is to have one argument applied to the generic definition as to why the space of pure states must be symplectic. Also note that the two dimensional sphere is the only symplectic sphere. By homogeneity, we should be able to argue that the space is symmetric around the maximally mixed state, and is therefore a sphere. The symplectic requirement would select dimension two. Note that real and quaternionic spaces would be excluded by this argument.

The von Neumann entropy for the maximally mixed state is $\log n$ where $n$ is the dimensionality of the Hilbert space. Again we see that the maximum entropy gives us a measure of the size of the space. Note that, to calculate the von Neumann entropy, we are diagonalizing the density matrix $\rho$. This means finding a set of orthogonal pure states $x_i$ such that $\rho_i = \sum_i p_i x_i$ is a convex combination. Note that the convex hull of a set of $n$ orthogonal pure states is an $n$-dimensional simplex whose center is the maximally mixed state. Therefore, we are looking for a simplex that contains $\rho$ and the maximally mixed state. In the two dimensional case, $\rho$ is an interior point of the Bloch ball. Take the line that connects $\rho$ to the center. The two points of the sphere are the extreme points for the decomposition. The distance from the points will be proportional to the probability. Because of this property, the von Neumann entropy is the smallest Shannon entropy among all possible decompositions.


\subsection{Countably infinite dimensional case}

The countably infinite dimensional case presents similar problems as the classical case, and adds others. As in the classical infinite cases, the maximally mixed state (i.e. uniform distribution) is not in the convex space and the entropy is not finite for all infinite convex combinations. As in the classical continuous case, there is the issue of finite expectation of position/momentum in all frames. The problem is compounded by the fact that one cannot require finite expectation for all functions of position and momentum: finite support in position automatically implies infinite support on momentum, since the distribution in momentum is the Fourier transform of that in position.

The Hilbert space for a discrete variable with infinite range (e.g. number of particles) and a continuous variable (e.g. position/momentum) is the same. The first is defined as the space of square-summable complex sequences $l^2$ while the second is the space of square integrable complex functions $L^2$. Given that $L^2$ allows a countable basis, the two are isomorphic. This also means that all spaces with finitely many degrees of freedom are also isomorphic. This makes the problem of infinite expectations even more problematic.

Note that Schwartz spaces have finite expectation for all polynomial functions of position and momentum, given that the momentum operator is the derivative of position. Given that infinite permutations can change the rate of convergence, the Schwartz space has an idea of what is further away from the origin, unlike Hilbert spaces. We will likely want to use Schwartz spaces instead of Hilbert spaces to make the physics and mathematics more consistent.

\section{Axiom of ensemble and topology}

Statistical ensembles will be the cornerstone of our general theory for states and processes. In this section we will see how any physical theory must, at least, define a space of ensembles which define the output of all possible processes considered by the theory. Since a physical theory must allow for experimental verifiability, the ensemble space must be endowed with a $\mathsf{T}_0$ second countable topology.

We saw how the principle of scientific objectivity required science to be universal, non-contradictory and evidence based. If our goal is to find laws that govern the evolution of physical systems, however, this is not sufficient. Scientific laws will be statements of the type \statement{every time we prepare this type of system according to this procedure and let it evolve under these conditions, we will find the system in this configuration after some time.} ``Every time'' implies the principle of scientific reproducibility.
\begin{mathSection}
	\textbf{Principle of scientific reproducibility}.
	Scientific laws describe relationships that can always be experimentally reproduced.
\end{mathSection}

Consider the hypothesis that all life on earth descends from a single common ancestor. This is a scientific hypothesis that may be experimentally falsified, but it is about a single event. As such, it is not a scientific law. The theory of evolution through natural selection, instead, describes what always happens to a population given a set of circumstances, and is therefore a law. As such, it does not describe a particular set of living organisms or traits, it applies to all of them but, as a consequence, to none in particular.

The same applies to the laws of physics. Classical Hamiltonian mechanics or quantum mechanics will apply to certain classes of physical systems, describing the common behavior within each class over all possible instantiations, but none in particular. That is, the law is not describing a particular behavior of a particular system, but the common behavior of the aggregate of all similarly prepared systems at all possible times.

The subject of a physical law, then, is not a single system in a single state, but an ensemble: all possible preparations of equivalent systems prepared according to the same procedure. A general theory of states and processes, then, will be a theory about ensembles as this is the least restrictive requirement needed. Any physical theory will \emph{at least} provide us with a set of ensembles, and the physical laws must be able to describe the evolution of those ensembles.

Given that we are still talking about scientific investigation, ensembles must be experimentally well defined and the principle of scientific objectivity applies. This means that ensembles are the possibilities of an experimental domain, which means points of a $\textsf{T}_0$ second countable topological space where the topology corresponds to the natural one defined by the verifiable statements.

The verifiable statements for the ensemble space are statements about the ensembles themselves, either in terms of statistical quantities (e.g. \statement{the average energy of the particle is $3  \pm 0.5 \, eV$}) or in terms of preparation settings (e.g. \statement{the beam goes through a polarizer oriented vertically within 1 degree}). Probability ranges are also typical verifiable statements on ensembles (e.g. \statement{the coin toss will result in heads between 49 and 51 percent of the cases}). Note that the verifiable statements at the level of the ensemble are different from the verifiable statements at the level of each instance. Saying that a coin is fair, for example, is a statement on the ensemble while saying that the outcome at a particular time was heads is a statement on the instance. The two are unrelated: whether the coin is fair is an independent statement with respect to a particular instance. Therefore the topology of the ensemble space and the topology of the random variables are distinct conceptual and mathematical objects (e.g. the topology of the Bloch ball is the standard topology of $\mathbb{R}^3$ but the topology on the values of spin along a given direction is the discrete topology), and it will be much later that the two will be reconciled.\footnote{In fact, many details are still open.}

It should also be clear that ensembles are theoretical objects, idealizations: an infinite collection of instances cannot be realized. What is realized in a laboratory will be a finite version, with all limitations in terms of precision that go with it. One may ask: how can something so idealized represent physical objects? But this is exactly what we do in all other areas of physics: we talk about spheres, perfect fluids, isolated systems or immovable objects. All of these are idealized objects, and we model the world with such abstractions. Ensembles are useful idealizations precisely because they ignore details that are not relevant for the problem at hand. If we want to write physical laws, in fact, we can only write them on those features that are common to all instances. The ensemble represents exactly those and only those features.

Setting ensembles as a primary notion also solves another conceptual problem. The ensemble is not constructed as a limit of infinite instances, which would pose a number of problems. The ensemble describes the preparation procedure, and therefore the collection of instances is potential and comes before the instances. For example, a fair coin can be understood as the collection of instructions for producing and throwing a fair coin. The fact that a fair coin will produce, in a large trial, about half heads and half tails is a consequence of the type of preparation. This gives automatically an interpretation of probability that is more along the lines of propensity, which is more appropriate to express objective causal relations.

\begin{mathSection}\label{pm-es-axiomEnsemble}
	\begin{axiom}[Axiom of ensemble]
		The state of a system is given by an \textbf{ensemble}, which represents the collection of all possible outputs of a preparation procedure for a physical system. The set of all possible ensembles for a physical system is its \textbf{ensemble space}. Formally, an ensemble space is a $\mathsf{T}_0$ second countable topological space where each element is called an ensemble.
	\end{axiom}

	\begin{justification}
		In experimental settings, preparation procedures never prepare a system exactly in the same configuration. Experimental results, then, are always in terms of statistical preparations and statistical measurements. A physical theory must be able to talk about the possible statistical descriptions within the theory. States, then, can be understood as ensembles, idealized statistical descriptions, as those are what is connected to experimental practice.

		Equivalently, reproducibility is a basic requirement of a physical theory. A physical law, then, must be understood as describing a relationship that always exists whenever the same set of circumstances is replicated. Given that we need to always be able to replicate those circumstances ``one more time'', the relationship is about countably infinite preparations and results: ensembles. Therefore, to the extent that physics is about reproducible experimental results, the basic theoretical description of a system is in terms of ensembles. This justifies the use of ensembles as the fundamental object to describe the state of a system.\footnote{Note that reproducibility also already implies that all properties that characterize an ensemble must be relative to the procedure. If the properties depended, for example, on absolute space or absolute time, then different practitioners would not be able to prepare the same ensemble.}

		Ensembles are experimentally defined objects, and therefore they are possibilities of an experimental domain. This means that an ensemble space is a $\mathsf{T}_0$ second countable topological space where each element is an ensemble and the topology is induced by the verifiable statements.
	\end{justification}
\end{mathSection}

We should now verify that our three standard cases satisfy the axiom of ensemble.

\begin{mathSection}
\begin{prop}
	Classical discrete, classical continuous and quantum ensemble spaces satisfy the axiom of ensemble.
\end{prop}

\begin{proof}
	For the classical discrete case, including all infinite convex combinations, we have the subset of $\ell^1$ of all non-negative sequences that sum to one. Similarly, for the classical continuous case we have the subset of $L^1$ that corresponds to non-negative distributions that integrate to one. Both of these spaces, with the subspace topology, are separable, admit a countable orthonormal basis and can be given a topology that is $\textsf{T}_0$ and second countable. Note that, in case the correct formulation is in terms of square integrable functions, instead of simply integrable functions, the space is still $\textsf{T}_0$ and second countable.

	For the quantum case, the Hilbert space with its standard topology is also separable, will admit a countable orthonormal basis and can be given a topology that is $\textsf{T}_0$ and second countable. A density operator will be fully defined by its result on the basis, which means the space of ensembles is also a vector space with a countable basis and can be given a topology that is $\textsf{T}_0$ and second countable.
\end{proof}
\end{mathSection}

\section{Axiom of mixture and convex structure}

In this section we are going to see how the ability to perform statistical mixtures leads to a convex structure for ensemble spaces. Only mixtures of finitely many elements are guaranteed to exist, and the topology will tell us which infinite mixtures are possible. The convex structure also gives us a basic notion to compare ensembles: one ensemble can be a component of another if the second can be seen as a mixture of the first with something else. Two ensembles are separate if they have no common component.

As we saw before, an ensemble is the collection of all outputs of a preparation procedure. The idea is that we can always combine preparation procedures selecting the output among them with a given probability distribution. This statistical mixture is another preparation procedure which will correspond to an ensemble. The ensemble space of any physical theory, then, must allow statistical mixtures, which leads to a convex structure.

\begin{mathSection}
\begin{defn}
	Given a real number $p \in [0,1]$, its complement is defined as $\bar{p} = 1-p$.
\end{defn}

\begin{axiom}[Axiom of mixture]\label{pm-es-axiomMixture}
	The statistical mixture of two ensembles is an ensemble. Formally, an ensemble space $\Ens$ is equipped with an operation $+ : [0, 1] \times \Ens \times \Ens \to \Ens$ called \textbf{mixing}, noted with the infix notation $p \ens[a] + \bar{p} \ens[b]$, with the following properties:
	\begin{itemize}
		\item \textbf{Continuity}: the map $+(p, \ens[a], \ens[b])  \to p \ens[a] + \bar{p} \ens[b]$ is continuous (with respect to the product topology of $[0, 1] \times \Ens \times \Ens$)
		\item \textbf{Identity}: $1 \ens[a] + 0 \ens[b] = \ens[a]$
		\item \textbf{Idempotence}:  $p \ens[a] + \bar{p} \ens[a] = \ens[a]$ for all $p \in [0,1]$
		\item \textbf{Commutativity}: $p \ens[a] + \bar{p} \ens[b] = \bar{p} \ens[b] + p \ens[a]$ for all $p \in [0,1]$
		\item \textbf{Associativity}: $p_1 \ens_1 + \bar{p}_1\left(\frac{p_2}{\bar{p}_1}\ens_2 + \frac{p_3}{\bar{p}_1}\ens_3\right) =  \bar{p}_3\left(\frac{p_1}{\bar{p}_3} \ens_1 +  \frac{p_2}{\bar{p}_3}\ens_2\right) + p_3 \ens_3$ where $p_1, p_3 \in [0,1)$ and $p_1 + p_3 \leq 1$ and $p_2 = 1 - p_1 - p_3$
	\end{itemize}
\end{axiom}

\begin{justification}
	This axiom captures the ability to create a mixture merely by selecting between the output of different processes. Let $\ens_1$ and $\ens_2$ be two ensembles that represent the output of two different processes $P_1$ and $P_2$. Let a selector $S_p$ be a process that outputs two symbols, the first with probability $p$ and the second with probability $\bar{p}$. Then we can create another process $P$ that, depending on the selector, outputs either the output of $P_1$ or $P_2$. All possible preparations of such a procedure will form an ensemble. Therefore we are justified in equipping an ensemble space with a mixing operation that takes a real number from zero to one, and two ensembles.

	Given that mixing represents an experimental relationship, and all experimental relationships must be continuous in the natural topology, mixing must be a continuous function. In general, the mixing coefficient $p$ corresponds to a value from a continuously ordered quantity between zero and one, as defined in the previous chapter, and therefore the natural topology is the one of the reals.\footnote{It may be argued that rational numbers could be prepared exactly, as one may design a procedure that, for example, alternates the selection deterministically. Therefore one could have a countable subset of topologically isolated ensembles. It is not clear whether this would create problems or not. Since the topology of the reals would still be required as a subspace topology anyway, we leave investigating this case to future work.} This justifies continuity.

	If $p=1$, the output of $P$ will always be the output of $P_1$. This justifies the identity property. If $P_1$ and $P_2$ are the same process, then the output of $P$ will always be the output of $P_1$. This justifies the idempotence property. The order in which the processes are given does not matter as long as the same probability is matched to the same process. The process $P$ is identical under permutation of $P_1$ and $P_2$. This justifies commutativity. If we are mixing three processes $P_1$, $P_2$ and $P_3$, as long as the final probabilities are the same, it does not matter if we mix $P_1$ and $P_2$ first or $P_2$ and $P_3$. This justifies associativity.
\end{justification}

\begin{coro}
	An ensemble space is a convex space.
\end{coro}

\begin{proof}
	The properties of the axiom of mixture match the basic definition of convex spaces. For example, see  \href{https://ncatlab.org/nlab/show/convex+space}{https://ncatlab.org/nlab/show/convex+space} or \href{https://arxiv.org/abs/0903.5522}{arXiv:0903.5522}.  The notation and terminology will be slightly different to better map to physics ideas.
\end{proof}

\end{mathSection}

As we progress through the details of the theory, we will see that all linear structures in physics are, in one way or another, manifestations of this basic structure. For example, the linearity of the Hilbert space in quantum mechanics is connected to the linearity of density operators and expectations.

Before proceeding, we should now check that the axiom of mixture is satisfied by the standard cases.

\begin{mathSection}
\begin{prop}
	Discrete classical ensemble spaces, continuous classical ensemble spaces and quantum ensemble spaces satisfy the axiom of mixture.
\end{prop}

\begin{proof}
	The space $\Ens$ of probability measures, discrete or continuous, is a convex subset of the topological vector space of signed finite measures. It is therefore closed under convex combinations: if $\ens[a], \ens[b] \in \Ens$ are probability measures, then $p \ens[a] + \bar{p} \ens[b]$ is a probability measure. The properties of mixing are inherited from the properties of linear combinations, which include continuity. Therefore the discrete and continuous classical ensemble spaces satisfy the axiom of mixture.

	Similarly, the space of positive semi-definite self-adjoint operators with trace one is a convex subset of the topological vector space of self-adjoint operators. Therefore it is closed under convex combinations, which are continuous in the given topology, and it will satisfy the axiom of mixture.
\end{proof}
\end{mathSection}

\subsection{Finite and infinite mixtures}

The axiom of mixture only guarantees the existence of a mixture between two ensembles. We can mix elements recursively and extend the operation to finitely many ensembles. Commutativity and associativity make these mixtures independent of the mixing order, such that they depend only on the ensembles chosen and the mixing coefficients associated to each ensemble.

\begin{mathSection}
\begin{defn}[Finite mixture]
	Let $\{e_i\}_{i=1}^{n} \subseteq \Ens$ be a finite subset of ensembles and $\{p_i\} \in [0,1]$ be a finite set of coefficients such that $\sum_{i=1}^{n} p_i = 1$, then the \textbf{finite mixture}, noted $\ens[a] = \sum_{i=1}^{n} p_i \ens_i$, is defined to be
	\begin{equation}
		\begin{split}
	p_1 \ens_1 + (1-p_1)\left( \tfrac{p_2}{1-p_1} \ens_2 + \tfrac{1-p_1-p_2}{1-p_1} \left( \tfrac{p_3}{1-p_1-p_2} \ens_3 +  \tfrac{1-\sum_{i=1}^{3}p_i}{1-p_1-p_2}\Bigl(\ldots \right.\right. \\
\left.\left.\left.+\tfrac{1-\sum_{i=1}^{n-2}p_i}{1-\sum_{i=1}^{n-3}p_i}\left(\tfrac{p_{n-1}}{1- \sum_{i=1}^{n-2}p_i} \ens_{n-1} + \tfrac{p_n}{1- \sum_{i=1}^{n-2}p_i}\ens_n\right)\right) \right) \right).
		\end{split}
	\end{equation}
\end{defn}

\begin{check}
	For the definition to work, we need to make sure that the final ensemble does not depend on the order of mixing. We first check with three elements. Let $p_1, p_2, p_3 \in [0,1]$ with $p_1+p_2+p_3=1$. By commutativity, we can switch the second with the third element:
	\begin{equation}
	\begin{aligned}
		p_1 \ens[e]_1 + p_2 \ens[e]_2 + p_3 \ens_3 &= p_1 \ens_1 + \bar{p}_1\left(\frac{p_2}{\bar{p}_1}\ens_2 + \frac{p_3}{\bar{p}_1}\ens_3\right) = p_1 \ens_1 + \bar{p}_1\left(\frac{p_3}{\bar{p}_1} \ens_3 +  \frac{p_2}{\bar{p}_1}\ens_2\right) \\
		&= p_1 \ens[e]_1 + p_3 \ens[e]_3 + p_2 \ens_2.
	\end{aligned}
	\end{equation}

	Then, by commutativity and associativity, we can switch the first with the third element:
	\begin{equation}
		\begin{aligned}
	 p_1 \ens[e]_1 + p_2 \ens[e]_2 + p_3 \ens_3 &= p_1 \ens_1 + \bar{p}_1\left(\frac{p_2}{\bar{p}_1}\ens_2 + \frac{p_3}{\bar{p}_1}\ens_3\right) = p_1 \ens_1 + \bar{p}_1\left(\frac{1-p_1-p_3}{\bar{p}_1}\ens_2 + \frac{p_3}{\bar{p}_1}\ens_3\right) \\
	 &= \bar{p}_3\left(\frac{p_1}{\bar{p}_3} \ens_1 + \frac{1-p_1-p_3}{\bar{p}_3}\ens_2\right) + p_3 \ens_3 = p_3 \ens_3 + \bar{p}_3\left(\frac{p_1}{\bar{p}_3} \ens_1 + \frac{p_2}{\bar{p}_3}\ens_2\right) \\
	 &= p_3 \ens[e]_3 + p_2 \ens[e]_2 + p_1 \ens_1.
		\end{aligned}
	\end{equation}
	Since switching the first with the second is equivalent to switching the second with the third and the third with the first, we can reach all permutations.

	Note that the definition is recursive, therefore we can use proof by induction. The base case is a sequence of two elements. By commutativity, the order does not matter. Given a sequence of $n$ elements, the inductive hypothesis is that the order does not matter for the last $n-1$ elements. Therefore, it suffices to show that we can switch the first and the second element. Note that we can sum the last $n-2$ elements, thus converting this to a problem of three elements. We proved before that we can switch the first with the second element, and we can now re-expand the third element into the full sequence. Therefore, the order of mixing does not matter for any finite mixture.
\end{check}

\begin{remark}
	Note that we can collect and expand convex combinations into other convex combinations. Because coefficients always sum to one, when breaking the expression in two, we can always calculate the new coefficients from one part. For example, $p_a \ens[a] + p_b \ens[b] + p_c \ens[c] + p_d \ens[d] = p(\frac{p_a}{p} \ens[a] + \frac{p_b}{p} \ens[b]) + \bar{p}(\frac{p_c}{\bar{p}} \ens[c] + \frac{p_d}{\bar{p}} \ens[d])$ where $p = p_a + p_b$ is defined only on the left part. Since $1 = p + \bar{p} = p_a +p_b + p_c + p_d$, we automatically have that $\bar{p} = p_c + p_d$.
\end{remark}
\end{mathSection}

While mixtures of finitely many elements are always guaranteed to exist, the extension to mixture of infinite elements is not. First of all, an infinite mixture of ensembles with finite entropy does not necessarily have finite entropy.\footnote{For details, see \href{https://arxiv.org/pdf/1212.5630.pdf}{J. Stat. Mech. (2013) P04010}.} Secondly, it may lead to infinite expectation values which make ensembles not physically meaningful. For example, suppose we put $i$ particles in a box according to the distribution $\frac{6}{\pi^2 i^2}$ given that $\sum_{i=1}^{\infty} \frac{6}{\pi^2 i^2}=1$. The expectation $\sum_{i=1}^{\infty} \frac{6}{\pi^2 i^2} i$ diverges. Given that every finite preparation will have a finite expectation value, the actual implementation of that ensemble will necessarily give us a stream of preparations whose number of particles must keep increasing. No finite statistics is representative of the ensemble and, worst of all, the finite statistics will necessarily have averages of arbitrarily large differences. We therefore cannot simply look at the coefficients to understand whether the infinite mixture gives us a valid ensemble or not.

Whether an infinite mixture $\sum_{i=1}^\infty p_i \ens_i$ converges or not in the ensemble space is therefore determined by the topology (i.e. experimental verifiability). The axiom guarantees only finite mixtures and we let the closure of the topology handle the limits.

\begin{mathSection}
\begin{defn}[Infinite mixture]
	Let $\{e_i\}_{i=1}^{\infty} \subseteq \Ens$ be a sequence of ensembles and $\{p_i\} \in [0,1]$ be a sequence of coefficients such that $\sum_{i=1}^{\infty} p_i = 1$. Then the ensemble $\ens[a]$ is an \textbf{infinite mixture} of those ensembles if it is a topological limit of the sequence of finite mixtures $\sum_{i=1}^{n} \frac{p_i}{p_n} \ens_i$, where $p_n = \sum_{i=1}^{n} p_i$. If the infinite mixture is unique, we write $\ens[a] = \sum_{i=1}^{\infty} p_i \ens_i$.
\end{defn}

\begin{remark}
	We will see later that the entropy constrains the topology to be Hausdorff, therefore all infinite mixtures, if they exist, are unique.
\end{remark}
\end{mathSection}

It is unclear whether commutativity and associativity extend, or should extend, to infinite mixtures. For series, \href{https://en.wikipedia.org/wiki/Unconditional_convergence}{unconditional convergence} defines convergence that does not depend on infinite reordering, but it is unclear even whether this is a desirable property. Additionally, we would expect that if an infinite mixture is possible, any submixture should converge as well. That is, if $\sum_{i=1}^{\infty} p_i \ens_i$ converges, then $\sum_{i \in I} \frac{p_i}{p_I} \ens_i$ converges for all $I \subseteq \mathbb{N}$ where $p_I = \sum_{i \in I} p_i$.


We do know that the convex structure is not enough to guarantee the above property, as shown by the following counterexample provided \href{https://math.stackexchange.com/questions/5085283/does-a-convex-subcombination-of-a-convergent-convex-combination-converge}{on stack exchange}. Consider $\mathbb{R}$. It is a metrizable second-countable topological vector space, which means it is a convex space with a topology such that the mixing operation is continuous. It satisfies the axiom of ensemble and the axiom of mixture. Let $p_i = \frac{6}{\pi^2 i^2}$, which means $\sum_i p_i = 1$ and $\ens_i = (-1)^i i$. We have
\begin{equation*}
	\sum_{i=1}^{\infty} p_i \ens_i = \frac{6}{\pi^2} \sum_{i=1}^{\infty} \frac{(-1)^i}{i} = - \frac{6}{\pi^2} \ln 2
\end{equation*}
which means it converges. However, let $I = \{2,4,6,\dots\}$. We have
\begin{equation*}
	\begin{aligned}
		p_I &= \sum_{k=1}^{\infty} p_{2k} = \sum_{k=1}^{\infty} \frac{6}{\pi^2 4 k^2} = \frac{1}{4} \\
		\sum_{i \in I} \frac{p_i}{p_I} \ens_i &= 4 \sum_{i \in I} p_i \ens_i = 4 \sum_{k=1}^{\infty} \frac{6}{\pi^2 (4k^2)} 2k = \frac{12}{\pi^2} \sum_{k=1}^{\infty} \frac{1}{k} \to \infty.
	\end{aligned}
\end{equation*}
Therefore the convex combination of all even elements of the series diverges.

Note that the above counterexample works because the series is effectively the sum of two divergent series. The axiom of entropy will force the ensemble space to have a bounded intersection with every affine line. Therefore the above problem would not exist, because one cannot produce a divergent series from a bounded interval of the real line. It is not clear, though, whether this is enough to show that submixtures of convergent infinite mixtures converge. We leave this as conjecture \ref{pm-es-conjSubmixturesConverge}.

TODO: add conjecture for definition of convex integrals

\subsection{Common components and separateness}

The convex structure allows us to characterize ensembles based on whether they can be mixed into one another. For example, we can ask whether an ensemble is or is not the mixture of some other ensembles; or whether two ensembles can be expressed as a mixture of a common component.

\begin{mathSection}
\begin{defn}
	Let $\Ens$ be an ensemble space. Let $\ens[a] = \sum_i p_i \ens_i$ where $\ens[a], \ens_i \in \Ens$ and $p_i \in (0,1]$ such that $\sum_i p_i = 1$. We say that $\ens[a]$ is a \textbf{mixture} of $\{\ens_i\}$, each $\ens_i$ is a \textbf{component} of $\ens[a]$ and each $p_i$ is a \textbf{mixture coefficient}.
\end{defn}

\begin{figure}[H]
	\centering
	\def\angle{10}
	\def\radius{1.2}
	\begin{tikzpicture}
		%\draw [help lines] (0,0) grid (10,3);
		\coordinate (a)      at (0,0);
		\coordinate (aab)    at (0.5,0.75);
		\coordinate (ab)    at (1,1.5);
		\coordinate (b)      at (2,3);

		\draw[-, thick] (a) -- (b);
		\filldraw [black] (a) circle (1pt) node[left] {$\ens[a]$};
		\filldraw [black] (ab) circle (1pt) node[left] {$\frac{1}{2}\ens[a] + \frac{1}{2}\ens[b]$};
		\filldraw [black] (aab) circle (1pt) node[right] {$\frac{3}{4}\ens[a] + \frac{1}{4}\ens[b]$};
		\filldraw [black] (b) circle (1pt) node[right] {$\ens[b]$};

		\coordinate (center) at (4,1.5);

		\draw[fill=ensembleFill, thick] ($(center)+\radius*({cos(0+\angle)},{sin(0+\angle)})$) -- ($(center)+\radius*({cos(120+\angle)},{sin(120+\angle)})$) -- ($(center)+\radius*({cos(240+\angle)},{sin(240+\angle)})$) -- cycle;

		\coordinate (c)    at (6.5,0.5);
		\coordinate (d)    at (7.5,1.5);
		\coordinate (e)    at (8.5,2.5);

		\draw[-, thick] (c) -- (d);
		\draw[dashed, thick] (d) -- (e);
		\filldraw [black] (c) circle (1pt) node[left] {$\ens[c]$};
		\filldraw [black] (d) circle (1pt) node[right] {$\ens[d]$};
		\filldraw [black] (e) circle (1pt) node[right] {$\ens[e]$};
	\end{tikzpicture}
	%\includegraphics[width=0.5\textwidth]{tempimages/ConvexExamples.jpg}
\end{figure}

\begin{remark}
	In terms of the convex space, all the mixtures of two ensembles correspond to the segment between them; all the mixtures of three ensembles correspond to the triangle formed by the three elements and so on. An ensemble $\ens[c]$ is a component of a different ensemble $\ens[d]$ if the segment connecting $\ens[c]$ and $\ens[d]$ can be extended past $\ens[d]$. If two elements are not a component of each other, then they are the extreme points of the line that goes through the two. That is, the segment cannot be extended.
\end{remark}

\begin{figure}[H]
	\centering
	\begin{tikzpicture}
		\coordinate (a)   at (0,0);
		\coordinate (2ab)   at (1,0);
		\coordinate (a2b)   at (2,0);
		\coordinate (b)   at (3,0);

		\draw[-, thick] (a) -- (b);
		\filldraw [black] (a) circle (1pt) node[above left] {$\ens[a]$};
		\filldraw [black] (2ab) circle (1pt) node[above] {$\frac{2}{3}\ens[a] + \frac{1}{3}\ens[b]$};
		\filldraw [black] (a2b) circle (1pt) node[below] {$\frac{1}{3}\ens[a] + \frac{2}{3}\ens[b]$};
		\filldraw [black] (b) circle (1pt) node[above right] {$\ens[b]$};
	\end{tikzpicture}
%	\includegraphics[width=0.4\textwidth]{tempimages/ComponentNotOrder.jpg}
\end{figure}

\begin{remark}
	Note that two ensembles can be components of each other. Consider $\frac{2}{3} \ens[a] + \frac{1}{3} \ens[b]$ and $\frac{1}{3} \ens[a] + \frac{2}{3} \ens[b]$. We can write $\frac{2}{3} \ens[a] + \frac{1}{3} \ens[b] = \frac{1}{2}\left(\frac{1}{3} \ens[a] + \frac{2}{3} \ens[b]\right) + \frac{1}{2} \ens[a]$ and $\frac{1}{3} \ens[a] + \frac{2}{3} \ens[b] = \frac{1}{2}\left(\frac{2}{3} \ens[a] + \frac{1}{3} \ens[b]\right) + \frac{1}{2} \ens[b]$ (they are both midpoints along each other). Therefore a component is not necessarily ``smaller'' or ``better defined'' than the mixture. Mathematically, ``being a component of'' is not a partial order. It is reflexive and transitive, but it is not antisymmetric. In practical terms, we need something else to tell us whether we are, for example, taking a limit with components that become ``smaller and smaller.''
\end{remark}
\end{mathSection}

We can also characterize some ensembles based on what other ensembles they can admit as components. An extreme point is an ensemble that has only itself as a component. For example, a pure state will be an extreme point as it cannot be expressed as a mixture of any other states. Conversely, an internal point is an ensemble that admits any other ensemble as a component. For example, in a finite discrete classical space, the uniform distribution over all cases can be seen as the mixture of any other distribution with something else. A boundary point is an ensemble that is not  an internal point. The figure helps visualize the properties.

\begin{mathSection}
\begin{defn}
	Let $\Ens$ be an ensemble space. An \textbf{extreme point} $\ens \in \Ens$ is an ensemble that has no component distinct from itself. That is, there is no $\ens[a] \in \Ens\setminus\{ \ens \}$ such that $\ens = p \ens[a] + \bar{p} \ens[b]$ for some $p \in (0,1]$ and $\ens[b] \in \Ens$. An \textbf{internal point} $\ens \in \Ens$ is an ensemble for which every ensemble is a component. That is, for every $\ens[a] \in \Ens$ there is always $\ens[b] \in \Ens$ and $p \in (0,1]$ such that $\ens = p \ens[a] + \bar{p} \ens[b]$. A \textbf{boundary point} is any ensemble that is not an internal point.
\end{defn}

\begin{remark}
	The notion of internal point parallels \href{https://en.wikipedia.org/wiki/Algebraic_interior}{the similar notion} in topological vector spaces.
\end{remark}

\begin{figure}[H]
	\centering
	\begin{tikzpicture}
		%\draw [help lines] (0,0) grid (10,3);

		\coordinate (center) at (1.5,1.2);
		\def\angle{90}
		\def\radius{1.5}
		\draw[fill=ensembleFill, thick] ($(center)+\radius*({cos(0+\angle)},{sin(0+\angle)})$) node[right] {$\ens[a]$} -- ($(center)+\radius*({cos(120+\angle)},{sin(120+\angle)})$) node[midway, left] {$\ens[b]$} -- ($(center)+\radius*({cos(240+\angle)},{sin(240+\angle)})$) -- cycle  ;
		\filldraw [black] ($(center)+\radius/2*({cos(60+\angle)},{sin(60+\angle)})$) circle (1pt);
		\filldraw [black] ($(center)+\radius*({cos(0+\angle)},{sin(0+\angle)})$) circle (1pt);
		\filldraw [black] ($(center)-\radius*(0,1/4)$) circle (1pt) node[above] {$\ens[c]$};

		\coordinate (center) at (5,1.5);
		\def\radius{1.25}
		\draw [fill=ensembleFill, thick] (center) circle (\radius);
		\filldraw [black] ($(center)+\radius*({cos(45)},{sin(45)})$) circle (1pt) node[above right] {$\ens[a]$};
		\filldraw [black] ($(center)-\radius*(0,1/4)$) circle (1pt) node[above] {$\ens[c]$};
	\end{tikzpicture}
	%\includegraphics[width=0.8\textwidth]{tempimages/InteriorExteriorPoints.jpg}
	\caption{In both spaces, $\ens[a]$ is an extreme point while $\ens[c]$ is an internal point; $\ens[b]$ is a boundary point, but is not an extreme point (it is the mixture of the vertices on the same side). On the circle, all boundary points are extreme points.}
\end{figure}

\begin{prop}
	The set of all internal points $I_{\Ens}$ is a convex set.
\end{prop}
\begin{proof}
Let $\ens_1, \ens_2 \in I_{\Ens}$ be two internal points of $\Ens$. Since $\ens_1$ and $\ens_2$ are internal points, given any $\ens[a] \in \Ens$ we can find $\ens[b]_1, \ens[b]_2 \in \Ens$ such that
\begin{equation}
	\begin{aligned}
	\ens_1 & =p_1 \ens[a]+\bar{p}_1 \ens[b]_1 \\
	\ens_2 & =p_2 \ens[a]+\bar{p}_2 \ens[b]_2
	\end{aligned}
\end{equation}
for some $p_1, p_2 \in (0,1]$. Now let $p \in [0,1]$ and $\ens = p \ens_1+\bar{p} \ens_2$. We have:
\begin{equation}
\begin{aligned}
	\ens = p \ens_1+\bar{p} \ens_2 &= p(p_1 \ens[a]+\bar{p}_1 \ens[b]_1) + \bar{p} (p_2 \ens[a]+\bar{p}_2 \ens[b]_2)\\
	&=\left(p p_1+\bar{p} p_2\right) \ens[a]+\left(p \bar{p}_{1} \ens[b]_1+\bar{p} \bar{p}_{2} \ens[b]_2\right) \\
	&=\lambda \ens[a] + \bar{\lambda} \left(\frac{p \bar{p}_{1}}{\bar{\lambda}} \ens[b]_1+\frac{\bar{p} \bar{p}_{2}}{\bar{\lambda}} \ens[b]_2\right) = \lambda \ens[a] + \bar{\lambda} \ens[b] \\
\end{aligned}
\end{equation}
where $\lambda = p p_1+\bar{p} p_2$ and $\ens[b] = \frac{p \bar{p}_{1}}{\bar{\lambda}} \ens[b]_1+\frac{\bar{p} \bar{p}_{2}}{\bar{\lambda}} \ens[b]_2$. Therefore, given any $\ens[a] \in \Ens$, we can find a $\ens[b] \in \Ens$ such that $\ens = \lambda \ens[a] + \bar{\lambda} \ens[b]$ for some $\lambda \in (0,1]$. This means the convex combination of internal points is an internal point, and $I_{\varepsilon}$ is a convex set.
\end{proof}
\end{mathSection}

It is an open question whether this result generalizes to infinite mixtures. That is, whether the infinite mixture of internal points is still an internal point. To generalize the previous proof to the infinite case, one would need to show that the infinite mixture of $\ens[b]_i$ converges. If conjecture \ref{pm-es-conjSubmixturesConverge} is true, that is if submixtures of convergent infinite mixtures converge, then the infinite mixture of $\ens[b]_i$ would converge as it is a submixture of $\ens$. We leave this as conjecture \ref{pm-es-conjAlgebraicInteriorSigmaConvex}.

For topological vector spaces, the algebraic interior and the topological interior are related. For example, if $A$ is a convex subset of a TVS with non-empty topological interior, then the algebraic and topological interior coincide. It is an open question how many of these results hold for topological convex spaces as well. As an example, we would like to prove the following.

TODO: reorganize

\begin{conj}
	Let $\Ens$ be an ensemble space. The set of boundary points $B_{\Ens}$ is not necessarily a closed set and therefore $I_{\Ens}$ is not necessarily an open set.
\end{conj}
\begin{proof}
	Let $\Ens = \{ p_i \in \ell^1 \, | \,  \sum p_ i =1, \; p_i \in [0,1]\}$ be the set of probability distributions over countably infinitely many elements with the topology of $\ell^1$. Let $\ens \in I_{\Ens}$ be an interior point of $\Ens$. Since $\ens$ is in the interior, it corresponds to a probability distribution $p_i$ such that $p_i \in (0,1)$ for all $i$.

	Let $B_{r}(\ens) \subseteq \ell^1$ be an open ball of radius $r$ centered around $\ens$. Since $\sum p_i = 1$, there must be a $j$ such that $p_j < \frac{r}{2}$. Suppose, without loss of generality, that $j \neq 1$. Now consider the probability distribution given by $\lambda_i = (p_1 + p_j, p_2, \dots, p_{j-1}, 0, p_{j+1}, \dots)$. Since $\lambda_j = 0$, then $\lambda_i$ is a boundary point of $\Ens$. The distance between the two distribution will be $\sum|p_i - \lambda_i| = |p_j| + |-p_j| = 2 p_j < r$, which means $\lambda_i \in B_{r}(\ens)$. Therefore, the open ball around any internal point will contain a boundary point. Therefore the interior of $I_{\Ens}$ is the empty set, and $I_{\Ens}$ is not an open set.
\end{proof}

One may be able to show that the limit of a sequence of boundary points cannot be an interior point in the topological sense.

We now define the notions of separateness: two ensembles are separate, noted $\ens[a] \separate \ens[b]$ if they do not have a common component. That is, they are not a mixture of a common ensemble. In classical ensemble spaces, this is equivalent to probability measures with disjoint support, which is the extension of the concept. Separateness is a useful concept to characterize the relationship between ensembles as it has some useful properties. Most of all, it is an irreflexive symmetric relation: nothing is separate from itself and if $\ens[a] \separate \ens[b]$, then $\ens[b] \separate \ens[a]$.\footnote{Note that orthogonality in inner product vector spaces is also an irreflexive symmetric relationship.}

\begin{mathSection}
\begin{defn}
	Let $\Ens$ be an ensemble space and $\ens[a], \ens[b] \in \Ens$. We say that they \textbf{have a common component} if we can find $\ens[c] \in \Ens$, the common component, such that $\ens[a] = p_1 \ens[c] + \bar{p}_1 \ens_1$ and $\ens[b] = p_2 \ens[c] + \bar{p}_2 \ens_2$ for some $\ens_1, \ens_2 \in \Ens$ and $p_1, p_2 \in (0,1]$. Otherwise, we say they \textbf{have no common component}, or are \textbf{separate}, noted $\ens[a] \separate \ens[b]$. Two ensembles have a common component in $A \subseteq \Ens$ if the common component can be found in $A$, and are separate in $A$ if there is none. Two sets of ensembles $A, B \subseteq \Ens$ are separate if all the elements of one are separate from all the elements of the other. That is, $A \separate B$ if $\ens[a] \separate \ens[b]$ for all $\ens[a] \in A$ and $\ens[b] \in B$.
\end{defn}

\begin{figure}[H]
	\centering
	\begin{tikzpicture}
		%\draw [help lines] (0,0) grid (10,3);

		\coordinate (a) at (1.0,1.0);
		\coordinate (c) at (3.0,0.7);
		\coordinate (b) at (2.4,1.9);
		\def\aover{0.3}
		\def\bover{0.6}
		\filldraw [black] (a) circle (1pt) node[above right] {$\ens[a]$};
		\filldraw [black] (b) circle (1pt) node[above right] {$\ens[b]$};
		\filldraw [black] (c) circle (1pt) node[above right] {$\ens[c]$};
		\filldraw [black] ($(a)!-\aover!(c)$) circle (1pt) node[above right] {$\ens_1$};
		\filldraw [black] ($(b)!-\bover!(c)$) circle (1pt) node[above right] {$\ens_2$};
		\draw[thick] ($(a)!-\aover!(c)$) -- (c) -- ($(b)!-\bover!(c)$)  ;
	\end{tikzpicture}
	%\includegraphics[width=0.3\textwidth]{tempimages/CommonComponent.jpg}
\end{figure}

\begin{remark}
	If two ensembles $\ens[a]$ and $\ens[b]$ have a common component $\ens[c]$, then the ensemble space contains a triangle where $\ens[c]$ is a vertex and $\ens[a]$ and $\ens[b]$ are points on the sides that connect to $\ens[c]$.
\end{remark}

\begin{coro}
	The previous definitions obey the following:
	\begin{enumerate}
		\item every ensemble is a component of itself
		\item if $\ens[a]$ is a component of $\ens[b]$, then $\ens[a]$ and $\ens[b]$ have a common component and therefore they are not separate
		\item separateness is an irreflexive symmetric relation
		\item an ensemble is an extreme point if and only if it is separate from all other ensembles
		\item an ensemble is an internal point if and only if it is not separate from any ensemble
		\item if two ensembles are separate, then they are boundary points
	\end{enumerate}
\end{coro}

\begin{proof}
	1. Since by idempotence $\ens = p \ens + \bar{p} \ens$ for any $p$, then every ensemble is a mixture of itself, and therefore it is a component of itself.

	2. By idempotence, we can write $\ens[a] = p_1 \ens[a] + \bar{p}_1 \ens[a]$ for some $p_1 \in (0,1]$. Since $\ens[a]$ is a component of $\ens[b]$, we can write $\ens[b] = p_2 \ens[a] + \bar{p}_2 \ens[e]_2$ for some $p_2 \in (0, 1]$ and $\ens[e]_2 \in \Ens$. Therefore $\ens[a]$ and $\ens[b]$ have $\ens[a]$ as a common component.

	3. Since every ensemble is a component of itself, every ensemble has a common component with itself and therefore is not separate from itself. This proves that separateness is irreflexive. The definition of common component is symmetric and therefore so is separateness.

	4. An extreme point has only itself as a component, therefore it can have a common component only with itself.

	5. Every ensemble is a component of an internal point, therefore every ensemble is not separate from an internal point

	6. Since an internal point cannot be separate from any ensemble, then two ensembles that are separate are not internal points, and therefore are boundary points.
\end{proof}

\begin{prop}
	Let $\Ens$ be a discrete or continuous classical ensemble space and let $\ens[a], \ens[b] \in \Ens$ be two probability measures over the corresponding sample space $X$. Then $\ens[a] \separate \ens[b]$ if and only if they have disjoint support.
\end{prop}

\begin{proof}
	Note that the support of a convex combination of probability measures is the union of the support of the measures. Since we are restricting ourselves to probability measures that are absolutely continuous with respect to a measure $\mu$, if a probability measure $\ens[a]$ has support $U$, any probability measure $\ens[b]$ with support on a compact subset $V \subseteq U$ such that $\mu(V) \neq 0$ is a component of $\ens[a]$.

	Let $\ens[a], \ens[b] \in \Ens$ be two probability measures over the sample space $X$ that are absolutely continuous with respect to the corresponding $\mu$. Suppose $\ens[a]$ and $\ens[b]$ have overlapping support. The we can find a compact subset $U$ such that $\mu(U) \neq 0$ and is a subset of the intersection of their supports. Therefore, we can find a probability measure that is a component of both. Conversely, suppose they have disjoint support. Then they cannot have a common component, as the support of the common component would have to be a non-empty subset of both supports.
\end{proof}
\end{mathSection}

A key property of separateness is its relationship with mixtures. If an ensemble is separate from a mixture of two elements, it is separate from both elements and all their mixtures.

\begin{mathSection}
\begin{prop}[Separateness extends to all mixtures]\label{pm-es-separateExtendsMixtures}
	Let $\ens,\ens_1,\ens_2 \in \Ens$. If $\ens$ has no common component with a mixture of $\ens_1$ and $\ens_2$ then it has no common component with any mixture of $\ens_1$ and $\ens_2$ and with either $\ens_1$ or $\ens_2$. That is, if $\ens \separate p \ens_1 + \bar{p} \ens_2$ for some $p \in (0, 1)$ then $\ens \separate p \ens_1 + \bar{p} \ens_2$ for all $p \in [0, 1]$.
\end{prop}

\begin{figure}[H]
	\centering
	\begin{tikzpicture}
		%\draw [help lines] (0,0) grid (4,4);

		\coordinate (e) at (1.2,0.8);
		\coordinate (e1) at (0.2,2.5);
		\coordinate (e2) at (3.0,2.5);
		\coordinate (a) at ($(e1)!0.4!(e2)$);
		\coordinate (b) at ($(a)!0.5!(e2)$);
		\coordinate (c) at (3.5,0.3);
		\coordinate (d) at ($(b)!-0.4!(c)$);
		\coordinate (f) at ($(e)!-0.25!(c)$);
		\coordinate (g) at ($(a)!-0.195!(c)$);
		\filldraw [black] (e) circle (1pt) node[below] {$\ens$};
		\filldraw [black] (e1) circle (1pt) node[left] {$\ens_1$};
		\filldraw [black] (e2) circle (1pt) node[right] {$\ens_2$};
		\filldraw [black] (a) circle (1pt) node[above right] {$\ens[a]$};
		\filldraw [black] (b) circle (1pt) node[above right] {$\ens[b]$};
		\filldraw [black] (c) circle (1pt) node[above right] {$\ens[c]$};
		\filldraw [black] (d) circle (1pt) node[above right] {$\ens[d]$};
		\filldraw [black] (f) circle (1pt) node[left] {$\ens[f]$};
		\filldraw [black] (g) circle (1pt) node[above left] {$\ens[g]$};
		\draw[thick] (d) -- (e1) -- (e2);
		\draw[thick] (d) -- (c) -- (f);
		\draw[thick] (g) -- (c);
	\end{tikzpicture}
	%\includegraphics[width=0.3\textwidth]{tempimages/DistinctAndMixture.jpg}
\end{figure}

\begin{proof}
	Let $\ens \separate \ens[a] = p \ens_1 + \bar{p} \ens_2$ for some $p \in (0, 1)$. Let $\ens[b] = \alpha \ens_1 + \bar{\alpha} \ens_2$ with $0 \leq \alpha < p$. As shown in the figure, suppose $\ens[b]$ is not separate from $\ens$. Then we can find $\ens[c] \in \Ens$ such that $\ens[b] = \beta \ens[c] + \bar{\beta} \ens[d]$ and $\ens = \gamma \ens[c] + \bar{\gamma} \ens[f]$ for some $\ens[d], \ens[f] \in \Ens$ and $\beta, \gamma \in (0, 1)$.

	Setting $\epsilon = \frac{p - \alpha}{\bar{\alpha}}$ and $\lambda = \bar{\epsilon} \beta$ we have:
	\begin{align*}
		\ens[a] &= p \ens_1 + \bar{p} \ens_2 = \left(p - \frac{\bar{p}}{\bar{\alpha}} \alpha \right) \ens_1 + \frac{\bar{p}}{\bar{\alpha}} \alpha \ens_1 + \frac{\bar{p}}{\bar{\alpha}} \bar{\alpha}\ens_2 \\
		&= \left(\frac{p\bar{\alpha} - \bar{p}\alpha}{\bar{\alpha}} \right) \ens_1 + \frac{\bar{p}}{\bar{\alpha}} (\alpha \ens_1 + \bar{\alpha} \ens_2) = \left(\frac{p - p\alpha - \alpha + p \alpha}{\bar{\alpha}} \right) \ens_1 + \frac{1 - p + \alpha - \alpha}{\bar{\alpha}} (\alpha \ens_1 + \bar{\alpha} \ens_2) \\
		&= \frac{p - \alpha}{\bar{\alpha}}  \ens_1 + \left( 1 - \frac{p - \alpha}{\bar{\alpha}}\right) (\alpha \ens_1 + \bar{\alpha} \ens_2) = \epsilon \ens_1 + \bar{\epsilon} (\alpha \ens_1 + \bar{\alpha} \ens_2) = \epsilon \ens_1 + \bar{\epsilon} \ens[b] \\
		&= \epsilon \ens_1 + \bar{\epsilon} ( \beta \ens[c] + \bar{\beta} \ens[d] ) = \bar{\epsilon} \beta \ens[c] + \epsilon \ens_1 + \bar{\epsilon} \bar{\beta} \ens[d] = \lambda \ens[c] + \bar{\lambda} \ens[g]
	\end{align*}
	where $\ens[g] = \frac{1}{\bar{\lambda}}\left( \epsilon \ens_1 + \bar{\epsilon} \bar{\beta} \ens[d] \right)$. This means $\ens[a]$ and $\ens$ have a common component, which is a contradiction. Therefore $\ens \separate \alpha \ens_1 + \bar{\alpha} \ens_2$ for all $\alpha \in [0, p]$.

	We can repeat the argument switching $\ens_1$ with $\ens_2$ and find $\ens \separate \alpha \ens_1 + \bar{\alpha} \ens_2$ for all $\alpha \in [0, 1]$.
\end{proof}

\end{mathSection}

While separateness extends to mixtures, the converse property (i.e. mixtures preserve separateness) is not necessarily true. While it is true for classical spaces, it is not true for quantum spaces. This converse property is, in effect, a telltale of classicality.

\begin{mathSection}
\begin{defn}
	Let $\Ens$ be an ensemble space. We say that \textbf{mixtures preserve separateness in $\Ens$} if $\ens \separate \ens[a]$ and $\ens \separate \ens[b]$ implies $\ens \separate p \ens[a] + \bar{p} \ens[b]$ for all $p \in [0,1]$ and $\ens, \ens[a], \ens[b] \in \Ens$.
\end{defn}

\begin{prop}
	Mixtures preserve separateness in discrete/continuous classical ensemble spaces.
\end{prop}

\begin{proof}
	Let $\Ens$ be a discrete or continuous classical ensemble space. Let $\ens, \ens[a], \ens[b] \in \Ens$ such that $\ens \separate \ens[a]$ and $\ens \separate \ens[b]$. Then the support of $\ens$ is disjoint from the support of both $\ens[a]$ and $\ens[b]$. Since the support of a mixture of $\ens[a]$ and $\ens[b]$ is the union of the supports, $\ens$ has disjoint support from every mixture of $\ens[a]$ and $\ens[b]$. This means that mixtures preserve separateness in discrete/continuous classical ensemble spaces.
\end{proof}

\begin{prop}
	Mixtures do not preserve separateness in quantum ensemble spaces
\end{prop}

\begin{figure}[H]
	\centering
	\begin{tikzpicture}
		%\draw [help lines] (0,0) grid (4,4);

		\coordinate (center) at (2,2);
		\def\radius{1.5}
		\coordinate (b) at ($(center)+\radius*({cos(95)},{sin(95)})$);
		\coordinate (a) at ($(center)+\radius*({cos(-85)},{sin(-85)})$);
		\coordinate (e) at ($(center)+\radius*({cos(155)},{sin(155)})$);
		\coordinate (f) at ($(center)+\radius*({cos(-25)},{sin(-25)})$);
		\coordinate (o) at ($(a)!0.5!(b)$);
		\coordinate (oa) at ($(a)!-0.2!(e)$);
		\coordinate (ob) at ($(b)!-0.5!(e)$);
		\draw [fill=ensembleFill, thick] (center) circle (\radius);
		\filldraw [black] (a) circle (1pt) node[below left] {$\ens[a]$};
		\filldraw [black] (b) circle (1pt) node[above left] {$\ens[b]$};
		\filldraw [black] (e) circle (1pt) node[left] {$\ens$};
		\filldraw [black] (o) circle (1pt) node[above right] {$\ens[o]$};
		\filldraw [black] (f) circle (1pt);
		\draw[] (e) -- (a) -- (b) -- (e) -- (f);
		\draw[dashed] (b) -- (ob);
		\draw[dashed] (a) -- (oa);
	\end{tikzpicture}
	%\includegraphics[width=0.3\textwidth]{tempimages/MixturesDoNotPreserveSeparateness.jpg}
\end{figure}

\begin{proof}
	Let $\Ens$ be a quantum ensemble space. As shown in the figure, let $\ens[a], \ens[b] \in \Ens$ be two orthogonal pure states and let $\ens[e] \in \Ens$ be another pure state that is the nontrivial superposition of the two. These three states will be extreme points of a Bloch ball. Since they are all pure states, they are all extreme points and therefore are pairwise separate. Consider $\ens[o] = \frac{1}{2} \ens[a] + \frac{1}{2} \ens[b]$. This will be the center of the Bloch ball and will not be separate from $\ens$. Therefore mixtures do not preserve separateness in $\Ens$.
\end{proof}
\end{mathSection}

\subsection{Decomposability}

Since mixture preserving separateness is a telltale of classicality, it is insightful to find an alternative characterization. One of the differences between classical and quantum ensemble spaces is that quantum ensembles allow multiple decomposition in terms of pure states. We see here that, in fact, the lack of multiple decomposition is equivalent to mixture preserving separateness.

Note that, since a continuous classical ensemble space has no extreme points, we have to find a definition of multiple decomposability that makes no reference to the extreme points. Like for the definition of the entropy, we need to find a definition on mixtures of two elements that, when applied recursively, gives us the desired effect. The idea here is that we will always have multiple decompositions in terms of other ensembles, but in classical spaces we cannot have multiple decompositions where one element of the first decomposition has no common components with all elements of the second. So, if we start breaking an ensemble into separate components, while we may take different paths, we will reach the same final decomposition in terms of extreme points, if they exist.\footnote{There is an open issue as it is not clear whether we want to require actual separate decomposition or separate decomposition in the limit. In the continuous classical case, for example, it is not clear whether we should require probability densities to be continuous or not. If continuity is required, a probability density cannot be split into two probability densities with disjoint support without creating a discontinuity. However, we may think of a sequence of decomposition into three, one with support $A$, one with support $B$ and one with support on both. In the limit, the mixing coefficient for the third becomes smaller and smaller, meaning that the first two approach a discontinuous distribution.}

\begin{mathSection}
\begin{defn}
	An ensemble is \textbf{decomposable} if it can be expressed as a mixture of two distinct ensembles. An ensemble is \textbf{separately decomposable} if it can be expressed as a mixture of two separate ensembles. An ensemble is \textbf{multidecomposable} if it can be expressed as two decompositions where a component of one is separate from both components of the other. That is, $\ens = p \ens[a]_1 + \bar{p} \ens[a]_2 = \lambda \ens[b]_1 +\bar{\lambda} \ens[b]_2$ and either $\ens[a]_1 \separate \ens[b]_j$ or $\ens[a]_2 \separate \ens[b]_j$. An ensemble is \textbf{monodecomposable} if it is not multidecomposable. An ensemble is \textbf{separately monodecomposable} if it is both separately decomposable and monodecomposable.
\end{defn}

\begin{figure}[H]
	\centering
	\begin{tikzpicture}
		%\draw [help lines] (0,0) grid (10,3);

		% Equilater triangle
		\coordinate (center) at (1.5,1.2);
		\def\angle{90}
		\def\radius{1.5}
		\coordinate (e1) at ($(center)+\radius*({cos(0+\angle)},{sin(0+\angle)})$);
		\coordinate (e2) at ($(center)+\radius*({cos(120+\angle)},{sin(120+\angle)})$);
		\coordinate (e3) at ($(center)+\radius*({cos(240+\angle)},{sin(240+\angle)})$);
		\coordinate (a) at ($(e2)!0.7!(e3)$);
		\coordinate (b) at ($(e1)!0.7!(e3)$);
		\coordinate (e) at ($(e1)!0.768!(a)$);
		\draw[fill=ensembleFill, thick] (e1) -- (e2) -- (e3) -- cycle;
		\filldraw [black] (e1) circle (1pt) node[above] {$\ens_1$};
		\filldraw [black] (e2) circle (1pt) node[below left] {$\ens_2$};
		\filldraw [black] (e3) circle (1pt) node[below right] {$\ens_3$};
		\filldraw [black] (a) circle (1pt);
		\filldraw [black] (b) circle (1pt);
		\filldraw [black] (e) circle (1pt);
		\draw (e1) -- (a);
		\draw (e2) -- (b);

		% Cut triangle
		\coordinate (center) at (6,1.2);
		\def\angle{90}
		\def\radius{1.5}
		\coordinate (t) at ($(center)+\radius*({cos(0+\angle)},{sin(0+\angle)})$);
		\coordinate (e2) at ($(center)+\radius*({cos(120+\angle)},{sin(120+\angle)})$);
		\coordinate (e3) at ($(center)+\radius*({cos(240+\angle)},{sin(240+\angle)})$);
		\coordinate (e1) at ($(e2)!0.5!(t)$);
		\coordinate (e4) at ($(e3)!0.5!(t)$);
		\coordinate (a) at ($(e2)!0.7!(e3)$);
		\coordinate (e) at ($(e1)!0.412!(a)$);
		\draw[fill=ensembleFill, thick] (e1) -- (e2) -- (e3) -- (e4) -- cycle;
		\filldraw [black] (e1) circle (1pt) node[above left] {$\ens_1$};
		\filldraw [black] (e2) circle (1pt) node[below left] {$\ens_2$};
		\filldraw [black] (e3) circle (1pt) node[below right] {$\ens_3$};
		\filldraw [black] (e4) circle (1pt) node[above right] {$\ens_4$};
		\filldraw [black] (a) circle (1pt);
		\filldraw [black] (e) circle (1pt);
		\draw (e1) -- (a);
		\draw (e2) -- (e4);


		\coordinate (center) at (10,1.5);
		\def\radius{1.25}
		\coordinate (b) at ($(center)+\radius*({cos(95)},{sin(95)})$);
		\coordinate (a) at ($(center)+\radius*({cos(-85)},{sin(-85)})$);
		\coordinate (e) at ($(center)+\radius*({cos(155)},{sin(155)})$);
		\coordinate (f) at ($(center)+\radius*({cos(-25)},{sin(-25)})$);
		\coordinate (o) at ($(a)!0.5!(b)$);
		\draw [fill=ensembleFill, thick] (center) circle (\radius);
		\filldraw [black] (a) circle (1pt);
		\filldraw [black] (b) circle (1pt);
		\filldraw [black] (e) circle (1pt);
		\filldraw [black] (o) circle (1pt);
		\filldraw [black] (f) circle (1pt);
		\draw[] (e) -- (f);
		\draw[] (a) -- (b);
	\end{tikzpicture}
	\caption{Examples for mono- and multidecomposability.}\label{pm-es-fig-monodecomposability}
	%\includegraphics[width=0.4\textwidth]{tempimages/MultipleDecomposition.jpg}
\end{figure}

\begin{remark}
	As shown in Fig. \ref{pm-es-fig-monodecomposability}, take a classical discrete space for three points which is a triangle (simplex). The only three elements that are not decomposable are the extreme points. Mixtures of two points are decomposable and are also separately decomposable in only one way. Mixtures of three points are also separately decomposable, but in multiple ways: as a mixture of $\ens_1$ and a mixture of $\ens_2$ and $\ens_3$, or as a mixture of $\ens_2$ and a mixture of $\ens_1$ and $\ens_3$. Note, however, that they are not separately multidecomposable as the different components are not separate.

	Take a Bloch ball for quantum mechanics. All the elements of the surface are not decomposable and they are all pairwise separate. The middle point can be seen as the equal mixture of any pair of opposite points. Therefore the middle point, as well as any other point not on the surface, is not only separately decomposable but also multidecomposable.

	To see why we require only one component to be separate from the other two, consider the cut triangle. Here we can have multiple decompositions where not all elements are separate.
\end{remark}

\begin{coro}
	An ensemble $\ens \in \Ens$ is an extreme point if and only if it not decomposable.
\end{coro}

\begin{proof}
	If $\ens \in \Ens$ is decomposable, then it has at least two distinct components, one of which must not be $\ens$. Since an extreme point has only itself as a component, then $\ens$ is not an extreme point. Conversely, if $\ens$ is an extreme point, it cannot be the mixture of two distinct components as $\ens$ has only itself as a component.
\end{proof}

\begin{defn}
	An ensemble space is \textbf{separately decomposable} if every decomposable ensemble is separately decomposable, \textbf{multidecomposable} if every decomposable ensemble is multidecomposable, \textbf{monodecomposable} if every decomposable ensemble is monodecomposable and \textbf{separately monodecomposable} if it is both separately decomposable and monodecomposable.
\end{defn}

\begin{prop}
	Discrete/continuous classical and quantum ensemble spaces are separately decomposable.
\end{prop}

\begin{proof}
	For a classical ensemble space, only Dirac measures are not decomposable. For a discrete classical space, every measure that is not the Dirac measure is the mixture of two measures with disjoint support. For a continuous classical space, there are no extreme points, and any absolutely continuous measure is the mixture of two absolutely continuous measures with disjoint support. Since measures with disjoint support are separate, every decomposable ensemble is separately decomposable.

	In quantum mechanics, every density operator is the mixture of its eigenstates. If the density operator does not correspond to a pure state, it will have at least two eigenstates. Every mixed state, then, is a mixture of two orthogonal ensembles. Since orthogonal ensembles cannot have a common component, every decomposable ensemble is separately decomposable.
\end{proof}

\begin{prop}
	Given an ensemble space $\Ens$, $\Ens$ is monodecomposable if and only if mixtures preserve separateness in $\Ens$.
\end{prop}

\begin{figure}[H]
	\centering
	\begin{tikzpicture}
		%\draw [help lines] (0,0) grid (4,4);

		\coordinate (e) at (1.6,0.5);
		\coordinate (a) at (0.2,2.5);
		\coordinate (b) at (3.0,2.5);
		\coordinate (c) at ($(a)!0.4!(b)$);
		\coordinate (d) at (0.3,0.4);
		\coordinate (g) at ($(e)!-0.8!(d)$);
		\coordinate (f) at ($(c)!-0.4!(d)$);
		\filldraw [black] (e) circle (1pt) node[below] {$\ens$};
		\filldraw [black] (a) circle (1pt) node[left] {$\ens[a]$};
		\filldraw [black] (b) circle (1pt) node[right] {$\ens[b]$};
		\filldraw [black] (c) circle (1pt) node[above left] {$\ens[c]$};
		\filldraw [black] (d) circle (1pt) node[left] {$\ens[d]$};
		\filldraw [black] (f) circle (1pt) node[right] {$\ens[f]$};
		\filldraw [black] (g) circle (1pt) node[right] {$\ens[g]$};
		\draw[thick] (a) -- (b) -- (e) -- cycle;
		\draw[thick] (g) -- (d) -- (f);
	\end{tikzpicture}
	%\includegraphics[width=0.3\textwidth]{tempimages/MonodecomposabilityIsMixturePreserveSeparateness.jpg}
\end{figure}

\begin{proof}
	Suppose mixtures do not preserve separateness in $\Ens$. Then, as shown in the figure, we can find $\ens,\ens[a],\ens[b], \ens[c] \in \Ens$ such that $\ens \separate \ens[a]$, $\ens \separate \ens[b]$ and $\ens \nseparate \ens[c] = p \ens[a] + \bar{p} \ens[b]$ for some $p \in (0,1)$. Since $\ens \nseparate \ens[c]$, we can find $\ens[d], \ens[f], \ens[g]$ such that $\ens[c] = \lambda \ens[d] + \bar{\lambda} \ens[f]$ and $\ens = \mu \ens[d] + \bar{\mu} \ens[g]$.  Since separateness extends to all mixtures (\ref{pm-es-separateExtendsMixtures}) and $\ens[a] \separate \ens[e] = \mu \ens[d] + \bar{\mu} \ens[g]$, then $\ens[a] \separate \ens[d]$. Similarly, $\ens[b] \separate \ens[d]$, which means that $\ens[c]$ is separately multidecomposable. Therefore, if mixtures do not preserve separateness in $\Ens$, not all decomposable ensembles are monodecomposable which means $\Ens$ is not monodecomposable.

	Now suppose mixtures do preserve separateness, and let $\ens = p \ens[a]_1 + \bar{p} \ens[a]_2 = \lambda \ens[b]_1 +\bar{\lambda} \ens[b]_2$ be a decomposable ensemble. Since $\ens[a]_1$ is a component of $\ens$, then $\ens \nseparate \ens[a]_1$. Since $\ens$ is a mixture of $\ens[b]_1$ and $\ens[b]_2$ and mixtures preserve separateness, then either $\ens[a]_1 \nseparate \ens[b]_1$ or $\ens[a]_1 \nseparate \ens[b]_2$. Similarly, $\ens[a]_2 \nseparate \ens[b]_1$ or $\ens[a]_2 \nseparate \ens[b]_2$. Therefore $\ens$ is not multidecomposable. Since this applies to all decomposable ensembles $\ens$, $\Ens$ is monodecomposable.
\end{proof}
\end{mathSection}

These definitions may be enough to prove that every finite-dimensional separately monodecomposable convex space is a simplex. For the infinite case, it would be nice to compare this characterization to a Choquet simplex.

\begin{conj}
	A finite-dimensional (i.e. there is a set of finitely many elements whose hull has non-empty interior) separately monodecomposable convex space is a simplex.
\end{conj}

\subsection{Convex subsets and convex hull}

In many cases, we will need to discuss the sets that contain all their possible mixtures. One typically distinguishes two cases. A set is convex if it allows all possible finite mixtures. This may be too restrictive as it may not include all possible infinite mixtures. A set is closed and convex if it includes all finite mixtures and their topological limits. Given that infinite mixtures are the topological limit of finite mixtures, a closed convex set contains all infinite mixtures. However, not all topological limits can be expressed as infinite mixture. For example, on the real line $1$ can be seen as a limit of points within the open interval $(0,1)$, but not as infinite convex combination. Therefore we add the notion of $\sigma$-convex set, a set that is closed under infinite mixtures.\footnote{The mathematical properties of $\sigma$-convex sets are yet to be explored.}

\begin{mathSection}
\begin{defn}
	Let $\Ens$ be an ensemble space. We say $A \subseteq \Ens$ is \textbf{convex} if it closed under finite mixtures (i.e. $\ens[a],\ens[b] \in A$ implies $p\ens[a] + \bar{p} \ens[b] \in A$ with $p \in [0,1]$), \textbf{$\sigma$-convex} if it is closed under infinite mixtures (i.e. $\ens[a]_i \in A$ implies $\sum_i p_i \ens[a]_i \in A$ for all possible infinite mixtures) and \textbf{closed and convex} if it is both convex and topologically closed.
\end{defn}

\begin{coro}
	A closed and convex set is $\sigma$-convex. A $\sigma$-convex set is convex.
\end{coro}
\end{mathSection}

Given a set of ensembles $A$, we can ask for all ensembles that can be constructed from $A$. The hull of $A$ is the set of all finite mixtures of $A$, the $\sigma$-hull of $A$ is the set of all infinite mixtures of $A$ and the closed hull of $A$ is the set of all the topological limits of finite mixtures of $A$. Notably, the closed hull of $A$ is equivalent to the topological closure of the hull of $A$.

\begin{mathSection}
\begin{defn}
	Let $A \subseteq \Ens$ be a subset of an ensemble space. The \textbf{convex hull} of $A$, noted $\hull(A)$ is the set of all finite mixtures of elements contained in $A$ (i.e. it is the smallest convex set that contains $A$). The \textbf{$\sigma$-hull} of $A$, noted $\shull(A)$ is the set of all infinite mixtures of elements contained in $A$ (i.e. it is the smallest $\sigma$-convex set that contains $A$). The \textbf{closed hull} of $A$, noted $\chull(A)$ is the smallest closed convex set that contains $A$.
\end{defn}

\begin{remark}
	Note that, given a set $A$, not all elements of $\chull(A)$ can be understood as infinite mixtures. That is, we can have $\shull(A) \subset \chull(A)$. For example, let $\Ens$ be the line segment $[0,1]$ and consider the set $A=\left\{\frac{1}{2^i}\right\}_{i=0}^{\infty}$. Every point in $(0,1]$ can be expressed as a finite mixture of two elements of $A$, for example, $1$ and any number smaller than the target number. However, zero cannot be expressed as a convex combination of positive numbers, and therefore it is not an infinite mixture of $A$. However, zero is the limit of the sequence, and therefore it will be in the topological closure of $A$. This shows that the difference between $\sigma$-hull and convex hull exists already in finite dimensions. The convex hull and the $\sigma$-hull, instead, are the same in finite dimensions because \href{https://en.wikipedia.org/wiki/Carath%C3%A9odory%27s_theorem_%28convex_hull%29}{Carathéodory's theorem} allows us to rewrite any infinite convex combination into a finite one.

	For an example in which all hulls are different, consider the space of probability distributions $\Ens$ over countably many elements $X = \{x_i\}_{i=1}^{\infty}$. Let $\ens[a]_{ip} = p x_i + \bar{p} x_{i+1}$ and let $A = \left\{ \ens[a]_{ip}  \, | \, i \geq 1, p \in (0,1)  \right\} \subset \Ens$ be the set of all non-trivial mixtures of pairs of consecutive elements. A probability distribution with support over the full $X$ cannot be expressed as a finite convex combination of elements of $A$, and will therefore not be in the convex hull. However, it can be expressed as an infinite convex combination, and therefore it will be in the $\sigma$-hull. An element $x_i \in X$, is not in the $\sigma$-hull, but it will be in the closed hull, as $x_i = \lim_{p \to 1} p x_i + \bar{p} x_{i+1} = \lim_{p \to 1} \ens[a]_{ip} \in \chull(A)$.
\end{remark}

\begin{coro}
	Given $A \subseteq \Ens$, $\hull(A) \subseteq \shull(A) \subseteq \chull(A)$.
\end{coro}

\begin{proof}
	All finite mixtures are also infinite mixtures with $p_i = 0$ for all $i > n$ for some $n$. Therefore $\hull(A) \subseteq \shull(A)$. All infinite mixtures are topological limits of finite mixtures. Therefore $\shull(A) \subseteq \chull(A)$.
\end{proof}

\begin{prop}\label{pm-es-hullProp}
	All three hull operators are closures. That is, $\hull$ satisfies the following three properties:
	\begin{enumerate}
		\item \textbf{extensive}: $A \subseteq \hull(A)$
		\item \textbf{increasing}: $A \subseteq B \implies \hull(A) \subseteq \hull(B)$
		\item \textbf{idempotent}: $\hull(\hull(A)) = \hull(A)$
	\end{enumerate}
	and similarly do $\shull$ and $\chull$.
\end{prop}

\begin{proof}
	1. Every element of $A$ is trivially a mixture of elements of $A$. Therefore $A \subseteq \hull(A)$. Since $\hull(A) \subseteq \shull(A) \subseteq \chull(A)$, $A \subseteq \shull(A)$ and $A \subseteq \chull(A)$ as well.

	2. Let $\ens \in \hull(A)$. Then it is a finite mixture of some elements of $A$. Since $A \subseteq B$, then $\ens$ is also the finite mixture of some elements of $B$ and therefore $\ens \in \hull(B)$. The same logic applies to the $\sigma$-hull and closed hull replacing finite mixture with the appropriate operation.

	3. Since $\hull(\hull(A))$ is the smallest convex subset that contains $\hull(A)$, and since $\hull(A)$ is a convex subset, then $\hull(\hull(A))$ must be $\hull(A)$ since no smaller set can contain all elements of $\hull(A)$. The same logic applies to the $\sigma$-hull and closed hull.
\end{proof}

\begin{coro}
	A subset $A \subseteq \Ens$ is respectively convex/$\sigma$-convex/closed convex if and only if it is its own convex hull/$\sigma$-hull/closed hull.
\end{coro}

\begin{proof}
	Let $A \subseteq \Ens$ be a convex subset. By \ref{pm-es-hullProp} we have $A \subseteq \hull(A)$. By definition of convex set, we have $\hull(A) \subseteq A$. Therefore $A = \hull(A)$. Conversely, let $A \subseteq \Ens$ be a set of ensembles not necessarily convex and let $A=\hull(A)$. By definition, $\hull(A)$ is closed under finite mixture and is therefore a convex subset. The same logic applies to $\sigma$-convex and closed convex sets with the respective hulls.
\end{proof}

\begin{defn}
	We note $\mathfrak{co}_{\Ens}$ the set of all convex subsets of $\Ens$, $\mathfrak{sco}_{\Ens}$ the set of all $\sigma$-convex subsets of $\Ens$ and $\mathfrak{cco}_{\Ens}$ the set of all closed convex subsets of $\Ens$.
\end{defn}

\begin{prop}
	The sets $\mathfrak{co}_{\Ens}$, $\mathfrak{sco}_{\Ens}$ and $\mathfrak{cco}_{\Ens}$, as posets ordered by inclusion, are topped $\bigcap$-structures and therefore complete lattices.
\end{prop}

\begin{proof}
	Theorem 7.3 in Davey and Priestley's ``Introduction to Lattice and Order'' states that, given a closure operator, the set of all closures, ordered by inclusion, is a topped $\bigcap$-structure and, therefore, a complete lattice. Since $\mathfrak{co}_{\Ens}$, $\mathfrak{sco}_{\Ens}$ and $\mathfrak{cco}_{\Ens}$ are closures, the theorem applies.
\end{proof}


\begin{prop}
	The functions $\hull$, $\shull$ and $\chull$ are continuous from above. That is, given a decreasing sequence $A_i \subseteq \Ens$, $\hull(\lim\limits_{i \to \infty} A_i) = \lim\limits_{i \to \infty} \hull(A_i)$. Similarly for $\shull$ and $\chull$.
\end{prop}

\begin{proof}
	The above proposition is a consequence of the fact that the hulls are closure operations and they generate an intersection structure. This means that the intersection of hulls is the hull of the intersections.

	Let $A_i \subseteq \Ens$ be a decreasing sequence. That is, $A_{i+1} \subseteq A_i$. Then $A = \lim\limits_{i \to \infty} A_i = \bigcap A_i$. Since $\hull$ is order preserving, $\hull(A_i)$ is a decreasing sequence and $\lim\limits_{i \to \infty} \hull(A_i) = \bigcap \hull(A_i)$. Moreover, $\hull(A) \subseteq \hull(A_i)$ for all $i$ and therefore $\hull(A) \subseteq \bigcap \hull(A_i)$. Now let $\ens \in \hull(A)$. Then $\ens$ is a convex combination of elements of $A$. Since every element of $A$ is also an element of any $A_i$, then $\ens$ is also a convex combination of elements of $A_i$ for any $i$. Therefore $\ens \in \hull(A_i)$ for all $i$ which means $\ens \in \bigcap \hull(A_i)$ and therefore $\hull(A) \supseteq \bigcap \hull(A_i)$. Thus we have that $\hull(\lim\limits_{i \to \infty} A_i) = \lim\limits_{i \to \infty} \hull(A_i)$.

	Since we have only used closure properties of $\hull$, the same reasoning applies to $\shull$ and $\chull$ since they are closures.
\end{proof}

\begin{prop}
	The $\hull$ is continuous from below. That is, let $A_i \subseteq \Ens$ be an increasing sequence. Then $\hull(\lim\limits_{i \to \infty} A_i) = \lim\limits_{i \to \infty} \hull(A_i)$.
\end{prop}

\begin{proof}
	Let $A_i \subseteq \Ens$ be an increasing sequence. That is, $A_{i+1} \supseteq A_i$. Then $A = \lim\limits_{i \to \infty} A_i = \bigcup A_i$. Since $\hull$ is an increasing function, $\hull(A) \supseteq \hull(A_i)$ for all $i$ and therefore $\hull(A) \supseteq \bigcup \hull(A_i)$. Now let $\ens \in \hull(A)$. Then $\ens$ is a convex combination of finitely many elements $\ens[a]_j$ of $A$. Since $A$ is the union of all $A_i$, each $\ens[a]_j$ will be in some $A_i$. Since the sequence of $A_i$ is increasing, and there are only finitely many $\ens[a]_j$, we will find an $i$ such that $\ens[a]_j \in A_i$ for all $j$. This means that $\ens$ is a convex combination of elements of $A_i$ and therefore $\ens \in \hull(A_i) \subseteq \hull(A)$. Therefore $\hull(A) = \bigcup \hull(A_i)$ which means $\hull(\lim\limits_{i \to \infty} A_i) = \lim\limits_{i \to \infty} \hull(A_i)$.
\end{proof}


\begin{remark}
	Note that $\shull$ and $\chull$ are not, in general, continuous from below. This is because, in general, the union of closures is not the closure of the union. This is true, in particular, with topological closures, which is part of the definition of $\chull$.

	For example, consider the sequence $A_i = \left[0, 1-\frac{1}{i}\right) \subseteq \mathbb{R}$. These are convex sets therefore their closed hull is simply their topological closure. That is, $\chull(A_i) = \left[0, 1-\frac{1}{i}\right]$. We have $\lim\limits_{i \to \infty} A_i = \bigcup A_i = \left[0, 1\right)$ and $\lim\limits_{i \to \infty} \chull(A_i) = \bigcup \chull(A_i) = \left[0, 1\right)$ which is not a closed set and therefore different from $\chull(A) = [0,1]$. The closed hull of the limit is not the limit of the convex hull, even in a finite-dimensional space.

	For the $\sigma$-hull, we need an infinite-dimensional example. Conceptually, we are using the fact that a uniform distribution over the whole $[0,1]$ is the infinite convex combination of uniform distributions over countably many sets that cover the whole $[0,1]$. Let $\Ens$ be the space of probability measures defined over $[0,1] \subseteq \mathbb{R}$. Let $\{\ens[a]_i\}_{i=1}^{\infty}$ be the sequence of uniform distributions over $\left[\frac{1}{i+1}, \frac{1}{i}\right]$. Let $\ens$ be the uniform distribution over $[0,1]$. We have $\ens = \sum \frac{1}{i(i+1)} \ens[a]_i$ where $\sum \frac{1}{i(i+1)} = 1$. Therefore $\ens$ is the countable convex combination of $\ens[a]_i$. Let $A_j = \{ \ens[a]_i \, | \, i \leq j\}$. Note that $\ens[a]_i \separate \ens[a]_j$ for all $i \neq j$. Therefore, $\{\ens[a]_i\}$ are exactly all extreme points of $\shull(\bigcup A_j)$. This means that $\ens \notin \shull(A_j)$ for all $j$ while $\ens \in \shull(\bigcup A_j)$. Therefore the $\sigma$-hull of the limit is not the limit of the $\sigma$-hulls.
\end{remark}


\begin{prop}
	The topological closure of the hull is a convex set and, therefore, the closed hull.
\end{prop}

\begin{proof}
	Let $A$ be a convex set and $\bar{A}$ its topological closure. Let $\ens[a]_i, \ens[b]_i \in A$ be two sequences that converge to $\ens[a], \ens[b] \in \Ens$ respectively. We have $\ens[a], \ens[b] \in \bar{A}$ since they are topological limits. Consider $\ens_i = p \ens[a]_i + \bar{p} \ens[b]_i$. Since mixing is continuous, we have:
	\begin{equation}
		\begin{aligned}
			\ens &= p \ens[a] + \bar{p} \ens[b] = p \lim\limits_{i \to \infty} \ens[a]_i + \bar{p} \lim\limits_{i \to \infty} \ens[b]_i = \lim\limits_{i \to \infty} (p \ens[a]_i + \bar{p} \ens[b]_i) = \lim\limits_{i \to \infty} \ens_i.
		\end{aligned}
	\end{equation}
	But $\ens_i$ are finite mixtures of elements of $A$, and therefore $\ens_i \in A$ is a sequence of elements of $A$. The sequence converges, $\ens$ is the limit of a sequence of elements of $A$ and therefore $\ens \in \bar{A}$. That is, the topological closure of a convex set is also a convex set. But this means that $\bar{A}$ is a closed convex set. Since any closed convex set that contains $A$ will also need to contain $\bar{A}$, $\bar{A}$ is the closed hull of $A$.

	Let $A \subset \Ens$ be a subset not necessarily convex. Then $\hull(A)$ will be convex. Therefore its topological closure will be the closed hull of $A$.
\end{proof}
\end{mathSection}

There may be a relationship between the algebraic notion of $\sigma$-convexity and the topological notion of interior. For example, let $U$ be a convex open set. While it clearly cannot be closed convex, is it $\sigma$-convex? The idea is that convexity can only return points that are ``inside'' the set, and $\sigma$-convexity is required to fill in all the limits. If that is true, it would be natural to look at some sort of converse. Clearly, not all $\sigma$-convex sets are open, since all closed convex sets are also $\sigma$-convex. The question becomes whether, for $\sigma$-convex sets, the notion of algebraic boundary and topological boundary coincides. These questions raise the following conjectures.

\begin{conj}
	Let $U \subseteq \Ens$ be a convex open set. Then $U$ is $\sigma$-convex.
\end{conj}

\begin{defn}
	Given a set $A \subset \Ens$, $\ens[a] \in A$ is an internal point of $A$ if for any $\ens \in \Ens$ we can find $\ens[b] \in A$ such that $\ens[a] = p \ens + \bar{p} \ens[b]$ for some $p \in (0,1]$.
\end{defn}

\begin{conj}
	 Let $A \subseteq \Ens$ and let $\ens[a] \in \chull(A)$ be an internal point of $\chull(A)$. Then $\ens[a] \in \shull(A)$.
\end{conj}

\begin{conj}
	Let $A \subseteq \Ens$ be a $\sigma$-convex set. Then $\ens[a] \in A$ is an internal point of $A$ if and only if it is an interior point of $A$.
\end{conj}

\begin{remark}
	Note that if $A$ is not convex, this is clearly not true. For example, let $A \subseteq \mathbb{R}^2$ be an annulus (i.e. the region of two concentric circles). The points on the inner circle are internal points of $A$ according to the definition, but they are not interior points. An internal point of $A$ for a $\sigma$-convex set is guaranteed to be surrounded by an open interval along any direction. The question, as usual, is if this is enough to fit an open set.
\end{remark}

\subsection{Convex supremum}

Later in the chapter, we will need to create the non-additive generalization for probability measures and state counting measures. It turns out that both constructions can be understood as instances of a more general construction that starts with a function $f(\ens)$ of elements of the ensemble space and generates a function $cs_f(A)$ of sets of ensembles by asking what is the highest value of $f$ that is reachable by a mixture of elements of $A$. We are going to show that this construction alone presents many nice mathematical properties.

\begin{mathSection}
\begin{defn}
	Given a function $f : \Ens \to \mathbb{R}$, a \textbf{convex supremum} of $f$ is a set function $cs_f : 2^{\Ens} \to [- \infty, + \infty]$ such that $cs_f(A) = \sup(f(\hull(A)) \cup \{f_{\emptyset}\})$, with $f_{\emptyset} \leq \inf(f(\Ens))$, that returns the highest value of $f$ reachable by convex combinations of $A$.
\end{defn}

\begin{prop}\label{pm-es-convexSupremumProps}
	For any $f$, the convex supremum $cs_f$ has the following properties
	\begin{enumerate}
		\item range of $f$: $cs_f(A) \in [f_{\emptyset}, \sup(f(\Ens))]$
		\item increasing: $A \subseteq B \implies cs_f(A) \leq cs_f(B)$
		\item continuous from below: for any increasing sequence $A_i \subseteq \Ens$, $cs_f(\lim\limits_{i \to \infty} A_i) = \lim\limits_{i \to \infty} cs_f(A_i)$.
	\end{enumerate}
\end{prop}

\begin{proof}
	1. If $A$ is empty, $cs_f(A) = f_{\emptyset}$. If $A$ is non-empty, $cs_f(A)$ is the supremum of the subset of values returned by $f(\hull(A))$, which means $f_{\emptyset} \leq \inf(f(\Ens)) \leq cs_f(A) \leq \sup(f(\Ens))$. Therefore $cs_f(A) \in [f_{\emptyset}, \sup(f(\Ens))]$ for all $A$.

	2. If $A = B = \emptyset$, then $cs_f(A) = cs_f(B)$. If $B \neq \emptyset$, $cs_f(A) = f_{\emptyset} \leq \inf(\Ens) \leq cs_f(B)$. In the last case, since the $\hull$ is an increasing function, the image of a set through a map is an increasing function and the supremum is an increasing function, the convex supremum is an increasing function.

	3. Let $A_i \subseteq \Ens$ be an increasing sequence and $A = \bigcup A_i$. Since $cs_f$ is increasing, $cs_f(A_i)$ is also an increasing sequence. Since $\hull$ is continuous from below, we have:
	\begin{equation}
		\begin{aligned}
			cs_f(A) &= \sup(f(\hull(A))\cup\{f_{\emptyset}\}) = \sup(f(\bigcup\hull(A_i))\cup\{f_{\emptyset}\})\\
			&= \sup(\bigcup f(\hull(A_i))\cup\{f_{\emptyset}\}) = \sup(\bigcup (\sup(f(\hull(A_i))\cup\{f_{\emptyset}\}))) \\
			&= \sup(\bigcup cs_f(A_i)).
		\end{aligned}
	\end{equation}
	But since $cs_f(A_i)$ is increasing, the supremum is exactly the limit. Therefore the convex supremum is continuous from below.
\end{proof}

%TODO: turn it into a theorem for a non-trivial ensemble space (i.e. assume existence of two distinct).

\begin{remark}
	Note that the convex supremum is not continuous from above. Let $\Ens$ be a disc embedded in $\mathbb{R}^2$. Let $f : \Ens \to \mathbb{R}$ be a non-trivial linear function. Then $f$ will have a maximum and a minimum on two opposite points $\ens[a]$ and $\ens[b]$ on the circle that encloses the disc. Take a line that divides the disc in two halves, leaving the minimum and the maximum on different halves. Then over that line $f$ will take the minimum value $m > f(\ens[b])$ on one of the extreme points of the line. Consider a countable collection $\{\ens_i\}$ of points over that line and let $A_j = \{ \ens_i \, | \, i \geq j \}$. This is a decreasing sequence of infinite sets, where we are taking one point out at a time.  We have $f(\ens_i) \geq m$ for all $i$ and therefore $cs_f(A_j) \geq m$ for all $j$, which means $\lim\limits_{j \to \infty} cs_f(A_j) \geq m$. However, $\bigcap A_j = \emptyset$ and $cs_f(\emptyset) = f_{\emptyset} \leq \inf(f(\Ens)) = f(\ens[b]) < m$. This means that, in general, $cs_f(\lim\limits_{j \to \infty} A_j) \neq \lim\limits_{j \to \infty} cs_f(A_j)$ for a decreasing sequence.
\end{remark}

\begin{prop}\label{pm-es-convexSupremumContinuous}
	Let $A \subseteq \Ens$ and $f: \Ens \to \mathbb{R}$ be a continuous function, then $cs_f(A) = cs_f(\shull(A)) = cs_f(\chull(A))$.
\end{prop}

\begin{proof}
	The proposition is true if $A = \emptyset$, since $\emptyset = \hull(\emptyset) = \shull(\emptyset) = \chull(\emptyset)$.

	Now let $A \neq \emptyset$. Note that $\chull(A)$ is a convex set, meaning $\hull(\chull(A)) = \chull(A)$. This means that we are looking for the difference between the supremum of $f$ over the hull and the closed hull. Suppose $\ens \in \chull(A)$ but $\ens \notin \hull(A)$. Still, $\ens$ will be the limit of a sequence of $\ens_i \in \hull(A)$. Since $f$ is continuous, then $f(\ens)$ is the limit of the sequence $f(\ens_i)$, for which $f(\ens_i) \leq \sup(f(\hull(A)))$ for all $i$. Therefore, $f(\ens) \leq \sup(f(\hull(A))) = cs_f(A)$. Since $f(\ens) \leq cs_f(A)$ for all $\ens \in \chull(A)$, then $\sup(f(\chull(A))) = cs_f(\chull(A)) \leq cs_f(A)$. Since $cs_f$ is increasing and $\hull(A) \subseteq \chull(A)$, $cs_f(A) \leq cs_f(\chull(A))$. Therefore $cs_f(A) = cs_f(\chull(A))$.

	Note that $A \subseteq \shull(A) \subseteq \chull(A)$ and $cs_f$ is an increasing function. Therefore $cs_f(A) \leq cs_f(\shull(A)) \leq cs_f(\chull(A)) = cs_f(A)$ which means $cs_f(A) = cs_f(\shull(A))$.
\end{proof}

\end{mathSection}

Note that a measure is a monotonic set function continuous from below that is also non-negative and additive. It is easy to show that the convex supremum is non-negative if and only if $f$ is non-negative. Additivity, instead, is more complicated as it needs to be recovered on the lattice of subspaces.

\begin{mathSection}
\begin{coro}
	A convex supremum of $f$ is non-negative if and only if $f$ is non-negative and $f_{\emptyset} \geq 0$.
\end{coro}
\begin{proof}