Skip to content

Labelled graph formats #3

@jaanos

Description

@jaanos

@katjabercic, I see that you have added a wiki page for the lsparse64 format. I have some comments:

  • Rather than base64, I'd prefer McKay's encoding - it is simpler and we want to be "compatible" (then I'd just name the format lsparse6)
  • To represent the number of vertices and the size of the label, we could use McKay's representation with 1, 4 or 8 bytes - this way, we save space with small graphs while still keeping the possibility of storing larger graphs. The number of bits needed to represent a vertex can be deduced from the number of vertices.
  • Do we really need the minimal and maximal degrees?

Now that I think of it, sparse6 already supports multiple edges - all we're adding is labels. I think it might be wiser to simply extend sparse6 to add labels to the edges in the order they're specified (this way, we can store the labels separately - in the database, we could refer to the basic graph, and additionally store the labels). So we may have something like this:

: num-of-vertices list-of-edges # size-of-label list-of-labels

An lsparse6 parser can then behave similarly as a sparse6 parser, except that it can find the # character in advance and assign labels as it reads the edges.

By the way, I was also thinking about a format for sparse directed graphs (so we could store Ferrán's automata). It would have the same basic structure as sparse6, so I'd call it disparse6. However, the edge list format should be changed to allow for directed edges. I suggest that each entry is like this (compare to sparse6):

  • if the first bit is 0, an out-neighbour of the current vertex follows
  • if the first bit is 1, then
    • if the next bit is 0, then the index of the current vertex is increased by one, and its out-neighbour follows
    • if the next bit is 1, then the current vertex is set to the given index

In other words, the edge list is a sequence b[0] x[0] b[1] x[1] b[2] x[2] ... b[m] x[m], where each b[i] is either 0, 10 or 11, and each x[i] occupies k bits. Decoding the edges would go like this:

v = 0
for i from 0 to m:
    if b[i] = 11:
        v = x[i]
        continue
    else if b[i] = 10:
        v += 1
    add edge (v, x[i])

Special rules for padding are then not needed, as an extra string of 1 bits could only be interpreted as changing v, and no edge would be added.

We could also extend this format to add edge labels in the same way as above, thus obtaining ldisparse6.

What do you think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions