Labelled graph formats

@katjabercic, I see that you have added a wiki page for the [**lsparse64**](https://github.com/DiscreteZOO/DiscreteZOO-core/wiki/lsparse64-format-for-undirected-labelled-multigraphs) format. I have some comments:
* Rather than base64, I'd prefer McKay's encoding - it is simpler and we want to be "compatible" (then I'd just name the format **lsparse6**)
* To represent the number of vertices and the size of the label, we could use McKay's representation with 1, 4 or 8 bytes - this way, we save space with small graphs while still keeping the possibility of storing larger graphs. The number of bits needed to represent a vertex can be deduced from the number of vertices.
* Do we really need the minimal and maximal degrees?

Now that I think of it, **sparse6** already supports multiple edges - all we're adding is labels. I think it might be wiser to simply extend **sparse6** to add labels to the edges in the order they're specified (this way, we can store the labels separately - in the database, we could refer to the basic graph, and additionally store the labels). So we may have something like this:
```
: num-of-vertices list-of-edges # size-of-label list-of-labels
```
An **lsparse6** parser can then behave similarly as a **sparse6** parser, except that it can find the `#` character in advance and assign labels as it reads the edges.

By the way, I was also thinking about a format for sparse directed graphs (so we could store Ferrán's automata). It would have the same basic structure as **sparse6**, so I'd call it **disparse6**. However, the edge list format should be changed to allow for directed edges. I suggest that each entry is like this (compare to **sparse6**):
  - if the first bit is 0, an out-neighbour of the current vertex follows
  - if the first bit is 1, then
    + if the next bit is 0, then the index of the current vertex is increased by one, and its out-neighbour follows
    + if the next bit is 1, then the current vertex is set to the given index

In other words, the edge list is a sequence `b[0] x[0] b[1] x[1] b[2] x[2] ... b[m] x[m]`, where each `b[i]` is either `0`, `10` or `11`, and each `x[i]` occupies `k` bits. Decoding the edges would go like this:
```
v = 0
for i from 0 to m:
    if b[i] = 11:
        v = x[i]
        continue
    else if b[i] = 10:
        v += 1
    add edge (v, x[i])
```
Special rules for padding are then not needed, as an extra string of 1 bits could only be interpreted as changing `v`, and no edge would be added.

We could also extend this format to add edge labels in the same way as above, thus obtaining **ldisparse6**.

What do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Labelled graph formats #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Labelled graph formats #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions