Structured information
Information in this post is superseeded by a material page
Several systems were developed to take and organize smart notes and keep ones second brain. For example Roam ↗, Obsidian ↗, Notion ↗, DokuWiki ↗, Org mode ↗, Foam ↗, and TiddlyWiki ↗. There seems to be a big potential in what these systems could eventually achieve. In this post, we delve into a general model that describes a way to store and manage data for information systems, how it may help with improving and interconnecting them, and what these systems may introduce in the future.
Introduction
I made a few posts and drafts on graph databases, extending graphs to hold data, and designing knowledge graphs for specific cases. They tie to the same idea: structuring information in a general way. Such systems and interactions with them can be seen in fiction ↗, which explored ways of how they can be utilized.
Structure
Information is stored in vertices. Relations are stored as edges between vertices.
And even that is not set in stone. The structure is simple because the main part of the system is built upon it. Each of its properties will be described inside the system entries.
Single-file JSON datapoint
Store everything as a single object which has two attributes vertices and edges. Every entry in vertices represents a vertex and is identified with a number in attribute id. Every entry in edges represents an edge and is identified with a number in attribute id. Ids in vertices and edges are unique across vertices and edges. Every edge also contains mandatory parameters from and to representing its direction.
The main object is stored in JSON format in a file. As both, the format and storage may change in the future this description serves as a general and simplest one, and its purpose is to be implementable by any programmer in a reasonable time-span.
Sub-systems
Each of these entries provides an extension to the abilities of the system. All together should give a reasonable start of a system, compared to a bare-bone vertex-edge description.
Typing
Give primitive types with description to be used as axioms of the system. Define a way to build more complex types by listing what parameters are expected and what they mean. Build structure of types by composition (and possibly inheritance).
There is a rich theory built around typing ↗ which should be utilized when building the building blocks of the system’s type system.
Algorithm
Describes how a procedure should be performed and what are its inputs and outputs, potentially additional information as its time and space complexity.
Implementation
For a given algorithm, parameter restrictions and language, describe source code which provides the procedure described by the algorithm. Ideally, the source code should be able to be compiled and run to perform the task.
Mapping
It is expected that implementations of various systems will be created independently at the same time. Though such implementations may be incomparable as each may have features the other is missing, we can map equivalent features to enable quick transition from one to the other. This should also cover various versions of the same system. If two features are the same then transitioning from one to the other should yield no change.
Data format
For a given format name (as a string) this entry describes what data is stored and how its storage is structured.
Note that this includes both inner (eg. string in c++
) and outer formats (eg. txt file).
Transformation
Changes one data format to another. For a pair of formats, we store an algorithm to transform the first into the second. The starting and ending meaning of the data should be identical, so ideally, transforming data there and back again should yield the initial data. However, some transformation is lossy (png to jpg) and that should be saved as data for the transformation.
Note that this also covers parsing language source file into an initial AST representation.
Placeholder
As the system grows some information may be incomplete. Placeholder is there to take place of the incomplete information.
Versions
The data is kept in its last state however there are entries that represent previous versions history of the data.
Community interaction
Data may save opinions rather than doing the changes immediately. If an opinion has good support it may be accepted. This would be done either automatically or by a moderator – this ties to the privileges.
System
Of course, there is a meta-layer to all of this. Each of the mentioned systems can be described by an entry with a common type stored as data.
Possible implementation features
Information
Some public access points might be available to query common data. We may imagine services as weather, news, video, etc. as access points. Also, the meta information would be also available – such as which public access points provide which data and if there is a equality mapping between them.
Search and Indexing
The main feature to manage such data is to query it effectively by the user. Search is the main feature and hence it is a must that the implementation has some kind of querying. The indexed data might be either stored aside in a different datapoint or in the same datapoint with nodes of a special type.
Collaboration
The access point may be synchronized and allow the collaboration of users so that they may change the same data at the same time. The more structured the data is, the more sense it makes.
Users and Privileges
A datapoint may be used to store information about privileges, users, and users’ privileges. These privileges may be used to restrict access to another datapoint, but this would have to be managed by the access point, not the data itself.