From 0f9184742b2a90ee10431228ae226424873f2e46 Mon Sep 17 00:00:00 2001
From: Rick Moynihan <rick.m@swirrl.com>
Date: Mon, 19 Dec 2016 15:58:23 +0000
Subject: [PATCH 1/3] Initial n-dim draft

---
 spec/multi-header.json        |  18 +++++
 spec/n-dimension-data-spec.md | 119 ++++++++++++++++++++++++++++++++++
 2 files changed, 137 insertions(+)
 create mode 100644 spec/multi-header.json
 create mode 100644 spec/n-dimension-data-spec.md

diff --git a/spec/multi-header.json b/spec/multi-header.json
new file mode 100644
index 0000000..cb1cf60
--- /dev/null
+++ b/spec/multi-header.json
@@ -0,0 +1,18 @@
+{
+        "headers" : {"columns": {"year" : ["1999", "2000", "2001", "2002"],
+                                 "gender" : ["Male", "Female"],
+                                 "measure" : ["Count", "Ratio"]},
+
+                     "rows" : {"refarea" : ["S12000005", "S12000042", "S12000034", "S12000035", "S12000041", "S12000013", "S12000006", "S12000036", "S12000008", "S12000045",
+                                            "S12000033"]},
+
+                     "column_hierarchy" : ["year", "gender", "measure"]}
+
+
+        "data": [[[101, 20.3], [104, 21.2]],
+                 [[102, 20.6], [203, 31.3]],
+                 [[90, 19.4], [98, 19.6]],
+                 [[223, 30.3], [10, 1.3]],
+                 "..."
+                ]
+}
diff --git a/spec/n-dimension-data-spec.md b/spec/n-dimension-data-spec.md
new file mode 100644
index 0000000..568ea8b
--- /dev/null
+++ b/spec/n-dimension-data-spec.md
@@ -0,0 +1,119 @@
+# Supporting N-dimensional data & Multiple Measures
+
+This is a proposed data representation to generalise the `data`
+described in the core table spec.
+
+Essentially the idea is to provide one uniform representation for N
+dimensional data instead of the 4 different approaches for a single
+observation, an array of observations and a table of observations with
+no representation for higher dimensions previously proposed.
+
+This approach is intended to work for tables with an arbitrary number
+of dimensions, by providing multiple headers along one axis.  We
+acknowledge that we could provide multiple headers along both axis,
+allowing Roll/Ups and aggregations for example, however we have chosen
+not to support this at this time.  We may support this in a future
+extension.
+
+## Multiple Measures
+
+We believe the best way to support measures is to abstract over the
+two different kinds of cubes, by treating measures uniformly like
+they're another kind of dimension, even in the multiple-measure on a
+single observation case.
+
+From an API users perspective both styles of multi-measure cube should
+be made to look the same by listing the measures as headers like with
+other dimensions.  We should therefore adopt a "cell as value"
+approach rather a "cell as observation" approach, meaning in the case
+of multiple-measures on a single observation we should expand them out
+to be listed as a new cell.
+
+Value objects which occur within spreadsheet cells, will in both cases
+still link to the underlying observations URI, so the main difference
+an API would notice between both styles of dataset is merely that an
+observations `@id` would appear duplicated with different measure
+values in different cells.
+
+## Multiple Headers on one axis
+
+The JSON snippet below illustrates how we describe a multi-column
+dataset.  Where free dimensions are mapped into multiple columns, in a
+hierarchy specified by the `column_hierarchy` key.  This defines that
+the outermost column header must be `year`, followed by `gender` and
+`measure` e.g.
+
+```
++------------+-------------------------------------------+-------------------------------------------+-------------------------------------------+-------------------------------------------+
+|            |                      1999                 |                   2000                    |                   2001                    |                    2002                   |
+|            |---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+
+|            |        Male         |        Female       |        Male         |        Female       |        Male         |        Female       |        Male         |        Female       |
++------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+---------------------+
+|  Ref Area  |  Count   |   Ratio  |   Count  |  Ratio   |   Count  |  Ratio   |  Count   |   Ratio  |   Count  |  Ratio   |   Count  |  Ratio   |  Count   |   Ratio  |   Count  |  Ratio   |
++------------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+
+| S12000005  | 101      | 20.3     | 104      | 21.2     | 102      | 20.6     | 203      | 31.3     | 90       | 19.4     | 98       |19.6      | 223      | 30.3     | 10       | 1.3      |
++------------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+
+```
+
+The values for the headers are then defined inside the `headers` map
+under the keys `columns` and `rows`.
+
+Each header value is an ordered array of identifiers referencing each
+`dimension_value`.
+
+
+```json
+{
+        "headers" : {"columns": {"year" : ["1999", "2000", "2001", "2002"],
+                                 "gender" : ["Male", "Female"],
+                                 "measure" : ["Count", "Ratio"]},
+
+                     "rows" : {"refarea" : ["S12000005", "S12000042", "S12000034", "S12000035", "S12000041", "S12000013", "S12000006", "S12000036", "S12000008", "S12000045",
+                                            "S12000033"]},
+
+                     "column_hierarchy" : ["year", "gender", "measure"]}
+
+
+        "data": [[[101, 20.3], [104, 21.2]],
+                 [[102, 20.6], [203, 31.3]],
+                 [[90, 19.4], [98, 19.6]],
+                 [[223, 30.3], [10, 1.3]],
+                 "..."
+                ]
+}
+```
+
+We may in the future add support for multiple row headers, but assume
+that "rows" will be paged and that "columns" will be materialised.  In
+the case where a client is asking the server for "too much", it may
+respond with no key/value pair for `data`.
+
+There are several approaches for representing `data`, we could adopt a
+flat, row major order approach like with
+[json-stat](http://json-stat.org/) though have here proposed using
+nested arrays corresponding to the nesting of column headers.  This
+approach has some clarity benefits.  Above we represent each cell as
+the associated measure literal for illustration; but plan to store an
+object in each cell position that links back to the `@id` of the
+appropriate observation and store its measured value under the key
+`value`.
+
+## Sorting
+
+The keys `by_column` and `direction` are used to show which column
+dimension the data was sorted by.  For example the JSON snippet below
+shows that the data is sorted by the column 2000 / Male / Ratio
+column.  Sorting can only be done by the leaf columns.
+
+```json
+        "sorted" : {"by_column" : ["2000" "Male" "Ratio"]
+                    "direction": "asc"}
+```
+
+Alternatively applications can sort by the order of the row dimension
+values:
+
+```json
+        "sorted" : {"by_row" : "refarea"
+                    "direction" : "desc"}
+```

From 8bd7075bcc4ba1f2a6913c6aa555b8aee4e4be57 Mon Sep 17 00:00:00 2001
From: Rick Moynihan <rick.m@swirrl.com>
Date: Mon, 19 Dec 2016 15:59:38 +0000
Subject: [PATCH 2/3] Fix broken json

---
 spec/n-dimension-data-spec.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/spec/n-dimension-data-spec.md b/spec/n-dimension-data-spec.md
index 568ea8b..1848fa9 100644
--- a/spec/n-dimension-data-spec.md
+++ b/spec/n-dimension-data-spec.md
@@ -71,7 +71,7 @@ Each header value is an ordered array of identifiers referencing each
                      "rows" : {"refarea" : ["S12000005", "S12000042", "S12000034", "S12000035", "S12000041", "S12000013", "S12000006", "S12000036", "S12000008", "S12000045",
                                             "S12000033"]},
 
-                     "column_hierarchy" : ["year", "gender", "measure"]}
+                     "column_hierarchy" : ["year", "gender", "measure"]},
 
 
         "data": [[[101, 20.3], [104, 21.2]],
@@ -106,7 +106,7 @@ shows that the data is sorted by the column 2000 / Male / Ratio
 column.  Sorting can only be done by the leaf columns.
 
 ```json
-        "sorted" : {"by_column" : ["2000" "Male" "Ratio"]
+        "sorted" : {"by_column" : ["2000" "Male" "Ratio"],
                     "direction": "asc"}
 ```
 
@@ -114,6 +114,6 @@ Alternatively applications can sort by the order of the row dimension
 values:
 
 ```json
-        "sorted" : {"by_row" : "refarea"
+        "sorted" : {"by_row" : "refarea",
                     "direction" : "desc"}
 ```

From 24501f8dc0bbbc9a3599356a5690a99307021878 Mon Sep 17 00:00:00 2001
From: Rick Moynihan <rick.m@swirrl.com>
Date: Tue, 20 Dec 2016 12:44:41 +0000
Subject: [PATCH 3/3] Improve sorting description text

---
 spec/n-dimension-data-spec.md | 41 ++++++++++++++++++++++++++++-------
 1 file changed, 33 insertions(+), 8 deletions(-)

diff --git a/spec/n-dimension-data-spec.md b/spec/n-dimension-data-spec.md
index 1848fa9..d7ba6d0 100644
--- a/spec/n-dimension-data-spec.md
+++ b/spec/n-dimension-data-spec.md
@@ -100,20 +100,45 @@ appropriate observation and store its measured value under the key
 
 ## Sorting
 
-The keys `by_column` and `direction` are used to show which column
-dimension the data was sorted by.  For example the JSON snippet below
-shows that the data is sorted by the column 2000 / Male / Ratio
-column.  Sorting can only be done by the leaf columns.
+We provide two mutually exclusive methods for sorting data,
+`by_column_value` and `by_row_order`, both options also support a
+`direction` property which lets you specify either `asc` or `desc` for
+an ascending or descending order respectively.
+
+### Sorting by_column_value
+
+The key `by_column_value` is used to indicate which column dimension
+the data was sorted by.  Setting this means that all of the rows
+(including the row headers) will be sorted by either the `asc`ending
+or `desc`ending order of values in the specified column.  The value
+for the `by_column_value` key identifies the column to sort on by
+specifying a path to the column dimension.  For example the JSON
+snippet below shows that the data is sorted by the column 2000 / Male
+/ Ratio column.  It is only valid to sort on leaf columns, not parent
+ones.
 
 ```json
-        "sorted" : {"by_column" : ["2000" "Male" "Ratio"],
+        "sorted" : {"by_column_value" : ["2000" "Male" "Ratio"],
                     "direction": "asc"}
 ```
 
-Alternatively applications can sort by the order of the row dimension
-values:
+### Sorting by_row_order
+
+The other way to sort is by the order of the values in the row
+dimension.  This is an orthogonal, way to sort as you are sorting not
+by values in the data, but by the order of the free dimension that is
+mapped to the row axis.
+
+The actual algorithm used to sort `by_row_order` is implementation
+specific, but should where supplied use the appropriate properties
+defined in the code-list.  If no such properties are supplied
+implementations should choose to sort on another parameter, such as an
+associated label or identifier.
+
+As with `by_column_value` the `direction` can be set as either `asc`
+or `desc`.
 
 ```json
-        "sorted" : {"by_row" : "refarea",
+        "sorted" : {"by_row_order" : "refarea",
                     "direction" : "desc"}
 ```