Data integration and analytics is a bottleneck for solving our greatest challenges from doing science and creating general artificial intelligence, to everything in between. The demand for integrated data is indicated by the number of startups that focus on nothing more than collecting lists of well-aligned data-sets of interest and monetizing specialized queries. Well-aligned quality datasets is the gold-mine for endeavors involving inherently heterogeneous data, such as for drug discovery, complex designs, sociological research, and so on. Presence of multitude of data formats and standards makes any simple question, such as "get me a list of all world's dogs" - an insurmountable quest for yet another startup focusing on that specific domain. The existing solutions, such as linked ontology-aware data formats are insufficiently flexible and rich to be convenient for defining records with multi-vocabulary fields from arbitrary ad-hoc vocabularies, and lack support for definitions of value types, callable object interfaces and modification permissions, enabling objects to retain properties even after decoupling from the data management systems that originate them.
Current widely known solutions (such as Linked Data), are not entirely well suited for the problem, as they require large amounts of data to be serialized in the same format, that never is the case in the ever diversifying world, and there is no standard way to embed schemas, permissions and other context data to data items, necessary to make them reusable in queries.
Combining the RDF-based SPARQL (for alignment) with OAUTH2 (for permissioning) and some and a standard to securely encrypt data about query origin context (such as query origin identity keys, cookies, IP addresses, and definitions of schema versions of resources, where data came from) it may be possible to approach the desired data properties of retaining the ability to reuse data items as objects in the context of arbitrary programming languages, without the need to write custom integrations. However, this seem to have not been done, and there may be better solutions to address the problem.
For example, due to the diversity and complexity of systems on the web (protocols and formats), there may be other (better?) ways to approach the problem, based on plug-and-play philosophy for devices using drivers, allowing to abstract away web resource APIs, and have fully-featured polymorphic interactive data as a shared feature of all programming languages, treating websites and web systems (including decentralized ones) as operating system devices directly available as variables to programming languages.
Regardless of the choice or way of implementation, the data liquidity and systems interoperability seem to remain an important unsolved problem and bottleneck for faster progress in large number of domains of digital activity.
我对此表示赞同,因为这也是我想发生的事情。我看到了两种发生方式:
*人们合作并创建整合。
*人们不合作也不创建整合。在这种情况下,我们必须自己处理问题。我相信在操作系统或浏览器级别进行键盘记录是保持我们自己数据所有权的唯一方法。
I'm upvoting this as it's something I want to happen too. I see two ways of it happening:
People cooperate and create integrations.
People don't cooperate and don't create integrations. In which case we have to take matters into our own hands. I believe keylogging at the operating system or browser level is the only way to keep ownership of our own data.
[+]