First Monday

Tackling complexity in an interdisciplinary scholarly network: Requirements for semantic publishing by Nora Schmidt

Scholarly communication is complex. The clarification of concepts like “academic publication”, “document”, “semantics” and “ontology” facilitates tracking the limitations and benefits of the media of the current publishing system, as well as of a possible alternative medium. In this paper, requirements for such a new medium of scholarly communication, labeled Scholarly Network, have been collected and a basic model has been developed. An interdisciplinary network of concepts and assertions, created with the help of Semantic Web technologies by scholars and reviewed by peers and information professionals, can provide a quick overview of the state of research. The model picks up the concept of Nanopublications, but maps information in a more granular way. For a better understanding of which problems have to be solved by developing such a publication medium, e.g., inconsistency, theories of Radical Constructivism are of great help.


1. Introduction
2. Defining “academic publication”
3. The form of academic publication
4. Need for a new medium of scholarly communication
5. Semantic academic publishing and nanopublications: The state of the art
6. Semantics, meaning and ontology
7. Conclusion: Requirements for the Scholarly Network
8. Discussion: Ad hoc scholarly network instead of ex ante ontologies



1. Introduction

For the success of scholarly communication, the vitality of the Internet can be exploited only if researchers are provided with specific media which support the vitality of communication. Any new element added to the discourse must be made available to all researchers in the course of publishing, without delay. Any subsequent, non-authorized knowledge representation disguises research results. “Adding new elements of scholarly knowledge is achieved by adding nodes and relationships to this network” (Bourne, et al., 2011). Scholarly communication produces “reusable scholarly artifacts” (Bourne, et al., 2011) which I will label “semantics”.

The re-usability of scholarly artifacts can be improved if they are constructed in a less complex way themselves, e.g., compared to an academic article, and if the complexity of all artifacts together is less complex than current scholarly literature. Being complex means being made of many elements with each element having the capacity to connect to very few other elements only [1].

Disciplines are a good example for complexity: most researchers exclusively read and cite publications from their own fields. Topics which are currently no part of the respective field are selected by more progressive researchers: while discussing positions from neighboring fields, they establish relations between elements that have not existed before. This social selection process adds up to the complexity of a field, because the set of relations has broadened. However, selection can also reduce complexity.

The purpose of this paper is to create an outline for a scholarly communication medium which reduces complexity and increases the likeliness of successful scholarly communication.

To deal with complexity, controlled vocabularies, i.e., classifications and ontologies, are built to assist knowledge retrieval [2]. These knowledge organization systems define entities definitely, but meaning is never certain and subject to change. Generally, users are not allowed to update these systems. The complexity of scholarly communication should not be reduced at the expense of a plurality of meaning. Therefore, I argue that these systems may be useful for applications in economics or education, but they are too rigid for a real-time documentation of scholarly communication. The specialty of this type of communication is that scholars are still negotiating if they share the understanding of a certain term or statement. Therefore, these knowledge organization systems should be more adaptable on the fly.

Until today, the Semantic Web [3] has solely been used to model “facts” — with a few exceptions that I will discuss later. The Semantic Web is obliged to a philosophical concept that was introduced to computer science as a metaphor: ontology, understood as shared knowledge about everything there is. Research results are never shared knowledge but contested when they are published. Even encyclopaedic knowledge is in flux.

Even though this asymmetric relation between knowledge and knowledge representation is well-known, it has never been extensively discussed, even with new technological opportunities that might encourage solutions. Proposals for applying Semantic Web technologies to academic publishing are rare and mainly assume the form of specific applications, principally built to be implemented in the natural sciences. The humanities use Semantic Web technologies for editions and databases of previous knowledge, but not for research results. An interdisciplinary approach, theoretical foundations and reasonable definitions of vocabulary used to formulate instances of use are missing.

Supported by some patterns from social systems theory founded by Niklas Luhmann, I will outline a proposal for bridging this gap [4]. I have chosen this theory not only because it provides an in-depth analysis of the scholarly system (Luhmann, 1990), but also an elaborated concept of media and form. The main objective of my work is, for now, to develop requirements for networked Semantic Publishing.

Since I am no technical expert, I intend to start a discussion about a suitable medium with its potential users: how could the Semantic Web be employed to facilitate academic publishing rather than providing a detailed model that is difficult to discuss with potential users? In the beginning an idea should be developed. To facilitate comprehension, I closely relate the model of a Semantic Scholarly Network to classical scholarly publication.



2. Defining “academic publication”

How does an academic publication differ from other forms of publication or other forms of scholarly communication? The Berne Convention for the Protection of Literary and Artistic Works helps with a definition: first of all, publishing always happens “with the consent of their authors [...] provided that the availability of such copies has been such as to satisfy the reasonable requirements of the public” [5]. The presentation of original work such as architecture is excluded, hence publishing, in a certain sense, means reproducing.

In the case of an electronic publication, very common today, the original and the copy are exactly the same, although they might be saved to different storage media. Accessing can replace copying. Digital rights management seems to be a by-product of the transition from print to electronic. It thwarts another aspect of the definition: worldwide dissemination to the satisfaction of reasonable requirements of the public. Consequently, the feature of being a reproduction loses its defining character.

What makes a publication academic? First of all, it distributes research results. A research result depends on values developed in subject-specific discourses. Interdisciplinary commonsense may define research results as insights for a critical mass of peers. Therefore, research results are new knowledge which has been achieved according to standards of theory and method.

References to previous research are essential for academic publications [6]. As connective actions, references increase the chances for the publication to contribute to successful communication, of keeping the communication going. Niklas Luhmann [7] found three reasons why it is unlikely to achieve successful communication:

  1. Reaching the addressee is limited by media of dissemination, language skills and attention capacities which must be directed to a given communication attempt. Academic knowledge relies on specialized, worldwide accessible media of dissemination, because if it is not published, it is not academic knowledge, even if it is true [8].

  2. Information transfer in a literal sense is impossible in social systems and depends on the socialization of the addressee. Misunderstanding can lead to breaking communication off even before a single communication sequence has been completed.

  3. Successful communication changes the premises of the addressee’s behavior. Most addressees will not accept this easily, but it will occur if the addressee models a reply according to the communication attempt. To make this happen more frequently in specialized contexts, “specific symbolically generalized communication media” or, in short, “success media” have developed throughout human history. For the academic context, the likeliness of being cited rises if truth is not only stated, but based on robust methods and well thought out theories, supported by the reputations of authors. Truth is the successful medium of scholarly communication.

Truth-stating publications without references are like money in an unknown currency; and uncited academic publications are like buried treasures: both are not relevant in their particular contexts. Any academic publication can receive its first citation any time, like treasure seized and spent in a economic system. Without references, no contemporary scholar would study a publication as a contribution to a given academic discourse. Self-references reduce complexity in scholarly discourse. Productive discussions rely on presuming the knowledge of certain statements among scholars in the same field. This canon needs to be reproduced and developed through self-references [9].

References belong to the “paratext” (Genette, 1997) of a text. In the prefix “para-” lies the oscillation between two sides of a distinction which has a preferred side: the text. Paratext points from the text to something external, but also — especially in terms of citation analysis — from the external to the text. Both references and citations have the special feature of being paratext to several texts, connecting them. Paratext can be literal or subliminal, can itself be a text with its own paratext, such as a review. A single sentence can be text and paratext at the same time, depending on its function.

Genette’s literary concepts have not been used seriously in the analysis of academic publications. But as I will try to illustrate, its adoption is exceptionally useful. The paratext of academic publications is constituted by footnotes, title, authors, their affiliations and reputation, outline, metadata and other elements. “For us, accordingly, the paratext is what enables a text to become a book and to be offered as such to its readers and, more generally, to the public” [10]. Every paratext which connects the text to communication and therefore to society, no matter if in an affirmative or critical way, adds to the likeliness of communicating successfully. The “paratext provides a kind of canal lock between the ideal and relatively immutable identity of the text and the empirical (sociohistorical) reality of the text’s public [...], the lock permitting the two to remain ‘level’” [11].

Metadata relies on a system which relates one metadata set to others. Otherwise, it bears no more functionality than being a collection of paratexts of a single text. Current publication formats prevent machines from collecting not only paratexts which already exist, but also the main statements of a publication to connect them to statements from other publications. This process would create new paratexts that are extremely useful to accelerate academic discourse while simultaneously sparing researchers from time-consuming detailed reconstructions of the state of research.

After having clarified the significance of references to academic publications, I will very briefly discuss originality. Originality or novelty does not mean that a given publication represents a paradigm change. It suffices to present new knowledge in support of existing paradigms [12]. The resulting definition reads: an academic publication may be defined as the distribution of an academic document which refers to the authors’ original and to previous research results of others. The distribution must be induced and approved by its authors and addressed to a global audience.



3. The form of academic publication

If a publication is mainly facilitating the dissemination of research results to connect those results to academic discourse, this research needs to be provided in a suitable format. A research result can be any result produced according to academic standards in method and theory. Therefore, it can be data of any kind. Research results which are provided in forms that do not meet the requirements of academic publications as stated before, may be called “academic documents”.

A digital document should not be confused with an electronic file, because a digital document possibly consists of numerous files. A document is limited in meaning. A collection of files may serve as a contemporary document, e.g., the files from the Nuremburg trials. At the same time, every single one of those files is a document. The conditions calling something a document are, firstly, that it witnesses something [13] — in a very broad sense — and secondly, that it constitutes a certain nexus of meaning to a human. A digital document limits and broadens this definition in terms of machine processability. It broadens the general definition of a document and limits it only in terms of digital documents. In the latter case, the aspects of “witnessing” and “meaning” have to be reformulated for the possibility of machine processing. Ergo, a document is an artefact that witnesses something, and which must at least offer options of handling for humans or machines. In that very moment it is called a document [14].

To visualize this distinction between research result, academic document and publication, I use a notation inspired by George Spencer-Brown’s (2008) “token” [15]:

Academic Publication Academic Document Research Result

Any publication is, at the same time, both a research result and a document, while a research result is not a publication per se. Only an observer can draw a distinction and create a form with the help of a medium. Only forms can be observed, not the medium. With every new form, a new, additional medium with less elements and therefore less possibilities to create new forms has been created. In this sense, the cascading distinctions above are hierarchical. However, depending on what I aim to convey, I would have put that hierarchy differently. To transfer this idea to knowledge organization in general, the stability of hierarchies has to endure the discourse. For instance, taxonomies in biology might remain stable, whereas methods of child education would not.

For academic publications, from the chosen concept of documents follows that numerous files or documents can be compiled into one as long as at least one of the documents or the aggregation itself meets the definition of an academic publication.



4. Need for a new medium of scholarly communication

Academic publishing is the most important structure of the academic system. All other structures rely on publishing: reputation, positions in research organizations, funding, for example. To provide new knowledge to society, previous knowledge must be retrievable for scholars, increasingly so for the public as well. Closed access loses acceptance, especially for publicly funded research, as many funders’ policies indicate [16].

In scholarly communication, any contribution that follows the standards of method and theory is potentially relevant. Variations of previous knowledge can only be selected to become novel knowledge if they are regarded as possible truth in scholarly communication. This is achieved by the discussion of respective research results. The process is circular, because the selection, in turn, affects possible variations [17].

To relieve the system of selection processes which are too heavy, most of the potentially novel knowledge is (de-)selected implicitly: it is never discussed. For this to happen, the properties of publishing media play an important role: most notably peer review and impact measurement. Actually, the chances for being selected explicitly are non-transparently allotted before publication.

While reducing complexity on the one hand, implicit selection, as it is mostly done today, raises complexity because it fosters duplicate research. Knowledge options that could have been deselected explicitly to never come up again, actually can come up again. This paradox of reducing and potentially raising complexity at the same time can only be prevented if the standards in method and theory, the latter meaning mostly logic and clearness of argumentation, serve as the only criteria for review.

Complexity is also raised by the use of different languages. Although the proportion of English publications is rising, there are very good reasons to use other languages as well, especially in the humanities: different languages are different media. Therefore, the collection of forms that can be built with the help of these media are different and not fully replaceable. Although not all researchers may be able to read in a given language chosen by an author, they should be able to grasp not only the main results formulated in an abstract, but also to get the chance to integrate the publication in the discourse more specifically. This may only be possible on a more abstract level beyond a formulated text, with the help of metadata.

Publishing media with a thematic scope should be questioned in general: research has come a long way to justify any topic as researchable. Establishing new journals or even learned societies for new fields increases the complexity of academia even more because new elements are introduced. The success of “mega journals” that dropped the relevance criteria and a narrow scope confirms the hypothesis that contemporary retrieval technologies can fulfill the function of filtering and can replace subject separation.

The needs of all research communities are in large parts overlapping: the enormous and ever growing publication output must be monitored which is especially hard in interdisciplinary research. Additionally, a medium of scholarly communication should respond, as Clark, et al. (2014) noted, “to the identified problems of mishandled, degraded or fictitious citations and to scientific claims not properly grounded in evidence”. Any argumentation, including all evidence, has to be fully revisable and reproducible. While in the sciences this leads to publishing research data and open methodology, in humanities and social sciences, theory research is often about the development of analytical concepts. For instance, evidence can then be found by comparing the coverage of different concepts.

Furthermore, electronic publications — established at least as an option in all disciplines now — still need to be double-checked for integrity, because professional formatting can be easily imitated with an ordinary computer. Current electronic scholarly publishing relies on a format that has not changed much since the seventeenth century. Integrity would be no problem if the format was created by secure web-based software from user-entered data instead.

Knowledge retrieval systems based on keywords or full-text indexing cannot supply every information need. Sometimes, the information needed is not crucial in the context it stands, so it is consequently ranked lower in search results and never examined. In some cases, the information need can only be expressed by relating search terms logically (Ribaupierre and Falquet, 2013).

One of the problems of recent publishing media is the separation of publication from knowledge organization. Knowledge organization mostly depends on pre-defined vocabularies, so it is representative of “normal science” in a Thomas Kuhnian sense. What is missing is a system that can be used to visualize discourses that are taking place now, especially those outside accepted paradigms.

“Ontology learning”, the automatic development of ontologies from large text corpora, is evolving (see e.g., Wong, et al., 2012). I argue, for academic literature it would be even more useful to hand over the power of naming keywords, fields, theories, methods and so on to researchers themselves, not only in the text itself, but in metadata used for information retrieval as well. Of course, not only peers, but information professionals, should review every new entry and try to find redundancies or additional obvious connections to existing elements.

Researchers know best where to place their own work in the context of that of others. This would not overcome the unlikeness of successful communication in principle, but an additional level on which misunderstanding may be produced, could be skipped. A system which maps the essence of the scholarly discourse could also serve as a controlled vocabulary, omitting the inflexibility by which such vocabularies are usually characterized.

Another downside of knowledge organization systems is their dependency on natural language. Because of different levels of language richness, the exact mapping of terms is impossible. The subject indexing of a publication in one language cannot be used in a different language environment without the risk of errors, not to mention cultural biases.

A bottom-up approach is needed to create a universal vocabulary and therefore foster interdisciplinary understanding — as long as there is an interdisciplinary connection in research at all. In the course of time, the history of concepts could be read from a visualization which would be especially helpful for the humanities and social sciences. Additionally, concepts and assertions that are reproduced often should appear stronger in visualization. Therefore, a visualization of this kind could provide an overview of a scholarly field.



5. Semantic academic publishing and nanopublications: The state of the art

Shotton (2012; 2009) was the first to propose an extensional definition of semantic publishing, summed up as “anything that enhances the meaning” (Shotton, 2009), achieved by the following:

Although all of these aspects might improve the functionality of an article, not all of them “enhance the meaning” (understood as adding to the content), e.g., not all broaden the context that is provided with the publication to improve verifiability. Additionally, this type of definition is very inclusive and depends on technology available. It is not very persistent. Only two of the techniques that Shotton (2009) presented use semantic Web technologies: firstly, the semantic markup of the text that connects singular terms to Internet databases in order to retrieve context information and, secondly, the use of a citation ontology to define in which way a given paper refers to previous work. Both aspects do not include contingencies.

Enhancing an article with “meaning” also expands its complexity. “The fact that digital media make even more information available will only increase the problem. Digital texts, if we merely conceive of them as delimited containers that carry a certain amount of information, will not help us to solve this problem either” (Gradmann and Meister, 2008). I am looking for a way to reduce complexity while improving the connectivity of assertions. Skimming the information disclosed in publications should allow the reader to get an overview. This would increase the likeliness of successful communication.

While the markup approach of otherwise usual publications (see Shotton, et al., 2009; Groza, et al., 2009; de Waard, et al., 2009; Shum, et al., 2010, as well as Ribaupierre and Falquet, 2013) is not convincing for this purpose, the statement-based approach, e.g., nanopublications (see Concept Web Alliance, 2013), are more promising [18]: “The basic principle is: natural guidance of human authors to structure their data in such a way that computers understand them” (Mons and Velterop, 2009). As Clark, et al. (2014) already pointed out, “none of these models provide a means to build claim networks of arbitrary depth.”

Minimally, nanopublications consist of an assertion, composed of concepts from ontologies or URIs, provenance, a link to the source of research results, e.g., data, and a pubinfo with attribution, timestamp and URI of the nanopublication itself (therefore pubinfo can only be created automatically). The three main components are Triple [19] bundles which are interlinked and therefore become quads, named graphs.

The micropublication approach (Clark, et al., 2014) modifies and extends nanopublications which provide citable claims and supportive evidence (data, images etc.). Micropublications should additionally:

  1. use abstracts for the (semi-)automatic construction of a claims network;
  2. enable the analysis of the claims network, e.g., by identifying contrasting claims;
  3. group similarities and translations (similog-holotype model);
  4. formalize all contributions for machine processability;
  5. include discussion;
  6. unveil discrepancies with other positions to hint to areas for further research; and,
  7. integrate classical articles by semantic annotation.

Many of the needs formulated earlier are addressed, but some discrepancies persist: one of my main intentions is to find a way of following up the developments of discourses. In contrast to nanopublications, micropublications include claims in natural language. In my opinion, claims need to be formulated in a machine-processable form, too — not only the connection between claim, attribution and evidence. Clark, et al. do not run into the ontology challenge, because they do not see any need for a granularity that would allow to follow up the discourse in detail and in time with publishing.

Clark, et al. developed their model exclusively for biomedical scientists, although I cannot see why it should not be possible to micropublish in other disciplines. Kuhn, et al. (2013) present a similar extension of nanopublications, but explicitly address all disciplines. They integrated so-called AIDA sentences, written in English natural language to represent scientific claims. The acronym is decomposed like the following: AIDA sentences are entities that have to be Atomic, Independent, Declarative, and Absolute. While “atomic” and “independent” is, in my opinion, transparent, “declarative” means that the sentence is falsifiable. “Absolute” means that any uncertainty, which is recorded in the provenance section of the nanopublication, is ignored.

The AIDA model is not designed for automatic interpretation and does not represent a discourse that takes place in the form of academic publications. It assumes that sentences exist independently from authors. Therefore, with the help of a user interface, researchers are invited to link them to sentences with the same meaning or to press an “agree” button. It is unclear how this user-generated information is integrated in scholarly communication.

Another project relevant here modeled the philosophy of Ludwig Wittgenstein with the help of semantic Web technologies (Pichler and Zöllner-Weber, 2013): the upper classes are “source” (primary and secondary), “person” and “subject”. The primary sources were split into smaller units of thought, like e.g., “issue” or “perspective”, down to the smallest, “bemerkung”. This project proved that it is possible to formulate complex thoughts for machine processing, but, of course, in a different way than a medium of semantic publishing should: the Wittgenstein project is about annotating sources created by a single author.

The approach that is closest to satisfy the need of a scholarly communication medium described earlier is the PeriodO model (Golden and Shaw, 2016), again an application and extension for nanopublications. The PeriodO period gazetteer does not address the representation of broader interdisciplinary discourse, but “collects definitions of time periods made by archaeologists and other historical scholars [in the form of nanopublications ...]. The core of a period definition consists of text quoted from the original source indicating the name of the period, its temporal range, and the geographic region to which it applies” (Golden and Shaw, 2016), and a label with a period term. The discourse is only partly represented, because it is not recorded if and for what reason authors cited and criticised other period definitions, e.g., adapted period names, but modified the time span. The gazetteer can therefore help to identify relevant literature, but is not intended to replace a literature review.



6. Semantics, meaning and ontology

What is semantics and why should publishing become semantic? In social systems theory, semantics is “bewahrenswerte Sinnvorgaben[20] which translates to “specifications of meaning worth preserving”. The authority of semantics is created communicatively and contested in the actual context (which could be any social context). To make it possible to discuss a research problem without needing to define every single word, scholarly semantics is crucial: they set the expectations of shared knowledge in a certain field. Publications provide semantics to researchers.

Like any person, researchers are accustomed to semantics: not noticed as such most of the time, semantics become visible only when contested. Recent examples are “developing countries” or “globalization” — concepts for which an observer may not be exactly sure if they are used to describe an actuality or a historical situation or if they convey a critical view. This experience of contingency simply has to be transferred to a new medium: “bringing humanity fully into the information loop requires data structures and computational techniques that enable us to treat social expectations and legal rules as first-class objects in the new Web architecture” (Hendler and Berners-Lee, 2010).

I argue that the concept of semantics as used in social systems theory is more useful in this context than to translate it to “meaning”, like Shotton.

Meaning does not simply exist. Meaning has to be created by meaning-processing systems like psychic and social systems [21]. These systems operate with broad contingencies: it is always a risk to expect something particular from a person or from communication. Machines do not process meaning, they operate in causal structures — expectable, provided that the functionality is known, environmental influences put aside. Talking about the semantic Web as Internet enhanced with meaning cannot be appropriate if meaning cannot be transferred by machines.

Veltman (2004) argues that the limitation of the semantic Web to the “semantic primitives” — existence, coreference, relation, conjunction and negation (Sowa, 2000) — was a mistake. Then, the semantic Web will only reproduce Aristotle’s world view of substance and relation: “Everything is presented as if this is the way ‘it is’ ontologically, rather than providing frameworks whereby what a thing ‘is’, what it means, and how it relates to other things, change as the framework changes” (Veltman, 2004).

Veltman therefore suggests to differentiate between meanings, names and the connections of both, which are also called definitions: concepts. “Needed is an approach to semantics that places it in a larger context of semiotics, lexicology, lexicography, semasiology and onomasiology. [...] We need databases to reflect that meaning changes both temporally (whence etymology) and spatially, even within a culture (e.g., national, regional and local differences) and especially between cultures.” The W3C Web Annotation Working Group bases its model on an analogue construction when it differentiates between the target of an annotation, e.g., a word in a text, and its body, which could point to an encyclopedia explaining this word. A Web annotation always comprises of both target and body (Verspoor, et al., 2015).

Knowledge bases on the semantic Web which reflect cultural, historical and spatial differences cannot be reached with ontologies and Uniform Resource Identifiers (URI) alone. Although the concepts of ontology in philosophy and computer science differ in application, they suffer from the same weakness. While ontology in philosophy aims to find the order of things, as long as it can be decided if they are or not, ontology in computer science is a “formal, explicit specification of a shared conceptualization” (Gruber, 1993). Both approaches cannot deal with contingencies and negation. Although the scholarly system introduced this idea very early in its evolution, the basic concept of ontology lasts persistently: negative research results still do not have much value in the scholarly system and often remain unpublished (see Brembs, et al., 2013).

As Günther (1980) pointed out, ontological positions sometimes admit that a subject may observe an object erroneously and therefore include contingency. However, this contingency only refers to the object, never to the subject. What is left when the subject is gone? Only one thing is for sure: there remains nothing that can be observed. The object depends on the subject. This may be the reason why ontology takes the observing subject for granted, without any reflection. In research, in particular, it is highly important who observes something that may be discussed and therefore placed at the disposal of communication. For something to become an actual standard, it must be reproduced very often. A new publishing medium that is simultaneously an information retrieval system needs to visualize such “facts” appropriately.

In order to to sum up what has been said before, in Table 1 I collect some principles which persist in knowledge organization and should be replaced.


New paradigms about knowledge and knowledge representation
Table 1: New paradigms about knowledge and knowledge representation.




7. Conclusion: Requirements for the Scholarly Network

After these considerations, I can grasp more precisely how a semantic publication can be defined: an academic publication which makes use of semantic Web technologies to build a network through its concepts, assertions and references to other semantic publications, academic documents and other resources. Henceforth, this network will be referred to as Scholarly Network.

The Scholarly Network should be developed according to users’ needs and the possibilities of available media and not resemble its predecessor, print scholarly publications. Users’ needs in focus here are to retrieve relevant information with less effort and let one’s own research results be part of information retrieved by those to whom it is relevant.

I will sum up the features needed for the Scholarly Network as new medium of scholarly communication to address all of the problems mentioned earlier:

  1. open access;
  2. interdisciplinary;
  3. interlingual;
  4. crediting the author for every variation or adding of concepts or other assertions;
  5. persistent; keeping all versions of assertions;
  6. maintained by non-profit institutional cooperation for reliability;
  7. peer-reviewed according to soundness, not relevance;
  8. consist of machine readable elements;
  9. differentiate between meaning, name and concept;
  10. highly connective while preventing redundancies;
  11. emphasize highly reproduced elements: (“facts”); and,
  12. include negative results.

Now, all prerequisites have been collected to formulate the requirements for a Scholarly Network. To fit all disciplinary needs and make maintenance affordable, the system architecture should be as simple as possible. A contribution to the Network may be made by anyone with reasonable Internet skills (which researchers usually have). This satisfies the concept of the semantic Web as its inventors suggested:

“rather than focusing on the challenges of creating large and expressive ontologies by specialized knowledge experts, the large scale social mechanisms we envision require that we must instead figure out how we can maximally break down the task of turning messy human knowledge into a shared information space that is useful to everyone. The smaller we can make the individual steps of this transformation, the easier it will be to find humans who can be incentivized to perform those steps” (Hendler and Berners-Lee, 2010).

A short description of how a Scholarly Network can be created from the contributions of authors is delivered promptly: complexity, in a first step, should be reduced by the distillation of crucial assertions and definitions extracted from research results by the author. In a second step, assertions and definitions can be split into other items such as terms and meanings. These should be dynamically bundled with items the author refers to and with those items referring to them. There would be no need to repeat any item, but to link to it or to create a new version of it. What follows is more details of the Scholarly Network (see Figure 1):

  1. The Scholarly Network consists of the following elements: terms, meanings, concepts, resources and assertions. Concepts (or: definitions) are built from terms and meanings. Assertions are built from concepts and can be enriched by resources.
  2. Meanings are itself expressed with the help of concepts — then they are assertions as well — and/or through links to external resources, e.g., to an author identification system or a repository. The meaning of abstract objects cannot be expressed through links to external resources (at least now).
  3. A meaning can be connected to an unlimited number of terms and vice versa. Most of these multiple names may be translations into different languages, for example, a French term can have a meaning formulated in English. Language is an attribute that can be used for filtering.
  4. Connectors, needed for the expression of assertions, are concepts as well. Mostly less ambiguous than concrete or abstract objects, they are less likely to create misunderstanding. Their definitions are used as if they are commonsensical. Although connectors are comparatively stable, the Scholarly Network allows the redefinition of connectors that are provided through a vocabulary. Using a pre-defined vocabulary of connectors will enable researchers to compare similar assertions easily. Connectors should be capable of creating hierarchies which are a possible type of contestable assertions.
  5. All elements can evolve and therefore must be versioned according to standards. The usage of a certain version of a certain element at a specific moment has to be reconstructable. Therefore, each element — except external resources — is provided with an URI which is created dynamically from a base Uniform Resource Locator (URL), as well as with an identifier and a time stamp.
  6. The persistence of external resources has to be ensured as well. When an external resource is used in the Scholarly Network, it has to be checked for persistence on a regular basis. In case of problems with reaching a URL, error notifications are sent to the operators of the Scholarly Network.


Model of the most important aspects of the Scholarly Network
Figure 1: Model of the most important aspects of the Scholarly Network.


To create an assertion, an existing concept can be “copied” for the usage in a different context, or it can be created from scratch. After the formulation of the assertion, the copy of the concept will be connected to all other assertions which make use of the concept, but the relation to the “original” will always be the strongest — from the point of view of a single assertion. Viewed without any special focus, e.g., when the term of the concept has been queried, the concept copied most often will stand out. Depending on the point of view, different relations are emphasized. The network has to be imagined as three-dimensional.

This proposal does not exclude the possibility to reuse existing ontologies. As mentioned previously, although connectors might be comparatively stable, there has to be a possibility to declare them as concepts if needed. Alternatively, one of the sources which could be used is the Citation Typing Ontology [22] which includes more than 40 properties that help characterizing the type of a reference.

ScholOnto (see Groza, et al., 2009) would also be a good source for connectors. Its properties help describe causalities, similarities and hierarchical relations. The ontology’s specialty is polarization which facilitates the expression of more or less positive or negative implications of the relations.

Why not start with an existing knowledge base like DBpedia [23], the semantic Web twin of Wikipedia, Wikidata [24], or even the enormous knowledge base Cyc [25] that allows automatic reasoning to a certain extent? Besides the fact that Cyc is a commercial project and not fully openly accessible, all three knowledge bases support neither the attribution of concepts or assertions nor accurate versioning. Furthermore, DBpedia is a data dump and Wikidata uses its own data model: Linked Data is only available as RDF export (see Erxleben, et al., 2014). The aims of all three knowledge bases diverge from being a medium of scholarly publishing. They collect common knowledge that is rarely relevant in an academic context and if it is, meanings would be put differently in most cases. Although their URIs might be persistently accessible, it is not guaranteed that the content it refers to has much to do with the content it refers to in the near future. Using DBpedia’s URIs as resources should be at least possible in a Scholarly Network.

The formulation of examples that demonstrate how assertions for the Scholarly Network can be built could be subject to experiments in the context of future research.



8. Discussion: Ad hoc scholarly network instead of ex ante ontologies

This proposal for a Scholarly Network — to make this very clear — does not attack current notions of an academic article. Especially in the humanities, meaningful nuances of argumentation can be expressed only in lengthy natural language text, in narration. Although I believe that in many cases, especially in the sciences, technology and medicine, a set of contributions to the Scholarly Network should be sufficient to disseminate research results, the academic system, especially its reputation mechanisms, is not quite ready for this step. However, I claim it possible and valuable to distill the essence of a text or a research result. Abstracting does that already; a semantic publication adds important features.

One of the consequences of the Scholarly Network is that metadata in classical scholarly publishing, like author names, becomes data without the need for metadata. Of course, these data can still be understood as metadata of resources which are linked to the Network, but this view misses the central elements of the network, the assertions. Therefore, assertions should be established as the central viewpoint, as “text”, and all other elements as paratext.

Because of their inflexibility, ontologies are of limited use for a Scholarly Network. If publishing is accompanied by knowledge organisation, assertions are enriched with transformed versions which are created automatically when different meanings of concepts are added by authors. Therefore, scholarly communication could be enriched with the help of automation. Whether these enrichments provide insight and whether they will be included in the discourse eventually will be decided by scholars.

Inevitably, redundancies or unconnected synonyms will be created in the Scholarly Network, remaining unnoticed. Such inconsistencies are normal in any communication. Of course, anyone editing or reviewing the Network will check for inconsistencies and correct them in collaboration with authors. Inconsistencies irritate and, therefore, sometimes also point to necessary further research and lead to new ideas. Ultimately, the network is created and used by humans. It is not software but instead data collection. Applications are built only on top of the data structure.

A critical mass of content is needed for broad adoption of a new medium. Compared to research literature repositories, the network approach has the advantage of not only letting users add their own statements, but also those that they refer to. In this way, the Network could scale up in short time, with a small group of early adopters.

The complexity of scholarly publishing will not decrease until not only every active researcher uses the network, but until the history of research is also included in a sufficient way. It will be sufficient to use all historical concepts, assertions and authors still in use. The good news is that any researcher will be encouraged to include historical elements. The problem is not only the complexity created in the transition period, but the media discontinuity, or “medienbruch”, which forces researchers to search for literature in physical as well as in digital libraries. Library catalogues could be easily connected to the Scholarly Network if they provide their data as linked data [26]. Unfortunately, the first media discontinuity from print to digital has not been overcome.

The additional effort needed to build the Scholarly Network will also benefit the writing process of the narrative article or book: creating the contribution to the Network will improve the argumentative clarity or even the analysis of the research results. The author is forced to identify the elements of argumentation and to arrange them in thoughtful fashion. This would indeed be a well-invested effort! End of article


About the author

After her studies in library and information science, sociology and art history, Nora Schmidt currently is a Ph.D. student in information studies at the University of Lund, continuing her work on scholarly communication. Between 2011 and 2015, she gained professional experience at the Vienna University Library’s Open Access Office.
E-mail: nora [dot] schmidt [at] kultur [dot] lu [dot] se



This article was funded by a VFI (Verein zur Förderung der Informationswissenschaft) recognition grant. The author wishes to express her gratitude to the supervisors of her Master’s thesis, Peter Schirmbacher and Martin Gasteiner, for their advice on the project described in the paper. Otto Oberhauser reviewed a previous version of this text. Thanks go to First Monday’s anonymous reviewers as well.



1. I refer to the concept of complexity formulated by Niklas Luhmann; see Luhmann, 1995, p. 23, and references there.

2. See e.g., Stock and Stock, 2008, p. xi.

3. I presume the reader is aware of the basic concept of the Semantic Web as introduced by Tim Berners-Lee and James Hendler (2001). For a graphic model see World Wide Web Consortium, “Semantic Web levels,” at, accessed 3 April 2016. For introduction I also recommend Shadbolt, et al., 2006.

4. This paper is based on my Master’s thesis (Schmidt, 2014). Here, I have tightened, updated and extended my arguments.

5. World Intellectual Property Organization, “Berne Convention for the Protection of Literary and Artistic Works” Article 3.3, at, accessed 3 April 2016.

6. See Latour, 1987, p. 33.

7. Luhmann, 1981, as well as 1995, chapter 4, p. vii.

8. Stichweh, 1994, p. 164.

9. The need of self-referencing may also be the reason why research data that do not include references will, in my opinion, not be accepted as full academic publication in the near future or carry the same reputational weight. Some argue for this acceptance to encourage researchers to make their data accessible, e.g., Kueffer, et al. (2011) or Krumholz (2012). Only if data is cited, will it truly “count”. It is more likely that research data and methodological details will become an expected or even obligatory component of publication. Otherwise, research results cannot be verified.

10. Genette, 1997, p. 1.

11. Genette, 1997, pp. 407.

12. See Bachmann-Medick (2009) as well as Luhmann (1990), pp. 216 and 576.

13. This statement derives from the etymology of the word. See e.g., Buckland, 1998.

14. This definition in its simplicity and inclusiveness is useful in the present and in similar contexts, like for OAI-ORE; see e.g., Haslhofer (2012). For a much more elaborated concept see Pédauque (2003).

15. For Luhmann’s adoption of form theory and the references to Gotthard Günther and Heinz von Foerster, see Luhmann (1993). Interestingly, this distinction corresponds analogously to the main distinction of the Functional Requirements for Bibliographic Records (FRBR), omitting the exemplar level here (International Federation of Library Associations and Institutions (IFLA), 1998):

Manifestation Expression Work

16. See the Registry of Open Access Repositories Mandatory Archiving Policies (ROARMAP), at, accessed 3 April 2016. For a bibliometric analysis on access to openly available research literature see Davis (2011).

17. Luhmann, 1990, pp. 216 and 576.

18. Connecting both is also conceivable; see Clare, et al., 2011.

19. It is suggested to use the TriG extension of Turtle, at, accessed 3 April 2016.

20. Luhmann, 1997, p. 887.

21. See Luhmann, 1995, chapter 2.

22. CiTO, According to LODStats,, CiTO is almost unused, although it was published in 2010. Both accessed 3 April 2016.

23. See, accessed 3 April 2016.

24. See, accessed 3 April 2016.

25. See, accessed 3 April 2016.

26. For an exemplary contribution to the debate about linked library data, see Byrne and Goddard, 2010.



Doris Bachmann-Medick, 2009. Cultural turns: Neuorientierungen in den Kulturwissenschaften. Reinbek bei Hamburg: Rowohlt.

Tim Berners-Lee and James Hendler, 2001. “Publishing on the semantic Web,” Nature, volume 410, number 6832 (26 April), pp. 1,023–1,024.
doi:, accessed 3 April 2016.

Philip Bourne, Tim Clark, Robert Dale, Anita de Waard, Ivan Herman, Eduard Hovy and David Shotton (editors), 2011. “FORCE11 Manifesto: Improving the future of research communications and e-scholarship,” (28 October), at, accessed 3 April 2016.

Björn Brembs, Katherine Button and Marcus Munafò, 2013. “Deep impact: Unintended consequences of journal rank,” Frontiers in Human Neuroscience (24 June).
doi:, accessed 3 April 2016.

Michael Buckland, 1998. “What is a ‘digital document’?” at, accessed 3 April 2016.

Gillian Byrne and Lisa Goddard, 2010. “The strongest link: Libraries and linked data,” D-Lib Magazine, volume 16, numbers 11–12.
doi:, accessed 3 April 2016.

Amanda Clare, Samuel Croset, Christoph Grabmueller, Senay Kafkas, Maria Liakata, Anika Oellrich and Dietrich Rebholz-Schuhmann, 2011. “Exploring the generation and integration of publishable scientific facts using the concept of nano-publications,” In: Alexander García Castro, Christoph Lange, Evan Sandhaus and Anita de Waard (editors). Semantic Publishing 2011: Proceedings of the First Workshop on Semantic Publishing 2011, pp. 13–17, and at, accessed 3 April 2016.

Tim Clark, Paolo Ciccarese and Carole Goble, 2014. “Micropublications: A semantic model for claims, evidence, arguments and annotations in biomedical communications,” Journal of Biomedical Semantics (4 July).
doi:, accessed 3 April 2016.

Concept Web Alliance, 2013. “Nanopublication guidelines” (15 December), at, accessed 3 April 2016.

Philip Davis, 2011. “Open access, readership, citations: A randomized controlled trial of scientific journal publishing,” FASEB Journal, volume 25, number 7, pp. 2,129–2,134.
doi:, accessed 3 April 2016.

Anita de Waard, Simon Buckingham Shum, Annamaria Carusi, Jack Park, Matthias Samwald and Ágnes Sándor, 2009. “Hypotheses, evidence and relationships: The HypER approach for representing scientific knowledge claims,” In: Tim Clark, Joanne Luciano, M. Scott Marshall, Eric Prud’hommeaux and Susie Stephens (editors). Semantic Web Applications in Scientific Discourse 2009: Proceedings of the Workshop on Semantic Web Applications in Scientific Discourse, at, accessed 3 April 2016.

Fredo Erxleben, Michael Günther, Markus Krötzsch, Julian Mendez and Denny Vrandečić, 2014. “Introducing wikidata to the linked data Web,” In: Peter Mika, Tania Tudorache, Abraham Bernstein, Chris Welty, Craig Knoblock, Denny Vrandečić, Paul Groth, Natasha Noy, Krzysztof Janowicz and Carole Goble (editors). The Semantic Web — ISWC 2014. Lecture Notes in Computer Science, volume 8796. Berlin: Springer, pp. 50–65.
doi:, accessed 3 April 2016.

Gérard Genette, 1997. Paratexts: Thresholds of interpretation. Translated by Jane Lewin. Cambridge: Cambridge University Press.

Patrick Golden and Ryan Shaw, 2016. “Nanopublication beyond the sciences: The PeriodO period gazetteer,” PeerJ Computer Science, volume 2, p. e44.
doi:, accessed 3 April 2016.

Stefan Gradmann and Jan Christoph Meister, 2008. “Digital document and interpretation: Re-thinking ‘text’ and scholarship in electronic settings,” Poiesis & Praxis, number 5, pp. 139–153.
doi:, accessed 3 April 2016.

Tudor Groza, Siegfried Handschuh, Tim Clark, Simon Buckingham Shum and Anita de Waard, 2009. “A short survey of discourse representation models,” In: Tim Clark, Joanne Luciano, M. Scott Marshall, Eric Prud’hommeaux and Susie Stephens (editors). Semantic Web Applications in Scientific Discourse 2009: Proceedings of the Workshop on Semantic Web Applications in Scientific Discourse, at, accessed 3 April 2016.

Thomas Gruber, 1993. “A translation approach to portable ontologies,” Knowledge Acquisition, volume 5, number 2, pp. 199–220.
doi:, accessed 3 April 2016.

Gotthard Günther, 1980. “Das problem einer transklassischen Logik,” In: Gotthard Günther. Beiträge zu einer operationsfähigen Dialektik, Band 3. Hamburg: Felix Meiner Verlag, pp. 73–94.

Bernhard Haslhofer, 2012. “The SciLink Project: From document-centric to resource-oriented publications,” iConference’12, at, accessed 3 April 2016.

James Hendler and Tim Berners-Lee, 2010. “From the semantic Web to social machines: A research challenge for AI on the World Wide Web,” Artificial Intelligence, volume 174, number 2, pp. 156–161.
doi:, accessed 3 April 2016.

International Federation of Library Associations and Institutions (IFLA) Study Group on the Functional Requirements for Bibliographic Records, 1998. “3. Entities,” In: IFLA Study Group on the Functional Requirements for Bibliographic Records. Functional Requirements for Bibliographic Records: Final report. The Hague: IFLA, at, accessed 3 April 2016.

Harlan Krumholz, 2012. “Open science and data sharing in clinical research: Basing informed decisions on the totality of the evidence,” Circulation: Cardiovascular Quality and Outcomes, volume 5, number 2, pp. 141–142.
doi:, accessed 3 April 2016.

Christoph Kueffer, Ülo Niinemets, Rebecca Drenovsky, Jens Kattge, Per Milberg, Hendrik Poorter, Peter Reich, Christiane Werner, Mark Westoby and Ian Wright, 2011. “Fame, glory and neglect in meta-analyses,” Trends in Ecology & Evolution, volume 26, number 10, pp. 493–494.
doi:, accessed 3 April 2016.

Tobias Kuhn, Paolo Emilio Barbano, Mate Levente Nagy and Michael Krauthammer, 2013. “Broadening the scope of nanopublications,” In: Philipp Cimiano, Oscar Corcho, Valentina Presutti, Laura Hollink and Sebastian Rudolph (editors). The semantic Web: Semantics and big data. Lecture Notes in Computer Science, volume 7882. Berlin: Springer, pp. 487–501.
doi:, accessed 3 April 2016.

Bruno Latour, 1987. Science in action: How to follow scientists and engineers through society. Milton Keynes: Open University Press.

Niklas Luhmann, 1997. Die Gesellschaft der Gesellschaft. Frankfurt am Main: Suhrkamp.

Niklas Luhmann, 1995. Social systems. Translated by John Bednarz, Jr., with Dirk Baecker. Stanford, Calif.: Stanford University Press.

Niklas Luhmann, 1993. “‘Was ist der Fall?’ und ‘Was steckt dahinter?’ Die zwei Soziologien und die Gesellschaftstheorie,” Zeitschrift für Soziologie, volume 22, number 4, pp. 245–260, and at, accessed 3 April 2016.

Niklas Luhmann, 1990. Die Wissenschaft der Gesellschaft. Frankfurt am Main: Suhrkamp.

Niklas Luhmann, 1981. “Die Unwahrscheinlichkeit der Kommunikation,” In: Niklas Luhmann. Soziologische Aufklärung 3. Soziales System, Gesellschaft, Organisation. Wiesbaden: VS Verlag für Sozialwissenschaften, pp. 25–34.
doi:, accessed 3 April 2016.

Barend Mons and Jan Velterop, 2009. “Nano-publication in the e-science era,” In: Tim Clark, Joanne Luciano, M. Scott Marshall, Eric Prud’hommeaux and Susie Stephens (editors). Semantic Web Applications in Scientific Discourse 2009: Proceedings of the Workshop on Semantic Web Applications in Scientific Discourse, at, accessed 3 April 2016.

Roger Pédauque, 2003. “Document: Form, sign, and medium, as reformulated for electronic documents,” at, accessed 3 April 2016.

Alois Pichler and Amélie Zöllner-Weber, 2013. “Sharing and debating Wittgenstein by using an ontology,” Literary & Linguistic Computing, volume 28, number 4, pp. 700–707.
doi:, accessed 3 April 2016.

Hélène de Ribaupierre and Gilles Falquet, 2013. “A user-centric model to semantically annotate and retrieve scientific documents,” ESAIR ’13: Proceedings of the Sixth International Workshop on Exploiting Semantic Annotations in Information Retrieval, pp. 21–24.
doi:, accessed 3 April 2016.

Nora Schmidt, 2014. “Semantisches Publizieren im interdisziplinären Wissenschaftsnetzwerk. Theoretische Grundlagen und Anforderungen,” Konrad Umlauf (editor). Berlin: Institut für Bibliotheks- und Informationswissenschaft, Humboldt-Universität zu Berlin, at, accessed 3 April 2016.

Nigel Shadbolt, Tim Berners-Lee and Wendy Hall, 2006. “The semantic Web revisited,” IEEE Intelligent Systems, volume 21, number 3, pp. 96–101.
doi:, accessed 3 April 2016.

David Shotton, 2012. “The five stars of online journal articles — A framework for article evaluation,” D-Lib Magazine, volume 18, numbers 1–2.
doi:, accessed 3 April 2016.

David Shotton, 2009. “Semantic publishing: The coming revolution in scientific journal publishing,” Learned Publishing, volume 22, number 2, pp. 85–94.
doi:, accessed 3 April 2016.

David Shotton, Katie Portwin, Graham Klyne and Alistair Miles, 2009. “Adventures in semantic publishing: Exemplar semantic enhancements of a research article,” PLOS Computational Biology, volume 5, number 4 (17 April), e1000361.
doi:, accessed 3 April 2016.

Simon Buckingham Shum, Tim Clark, Anita de Waard, Tudor Groza, Siegfried Handschuh and Agnes Sandor, 2010. “Scientific discourse on the semantic Web: A survey of models and enabling technologies,” Semantic Web Journal, at, accessed 3 April 2016.

John Sowa, 2000. Knowledge representation: Logical, philosophical, and computational foundations. Pacific Grove, Calif.: Brooks/Cole.

George Spencer-Brown, 2008. Laws of form. Leipzig: Bohmeier.

Rudolf Stichweh, 1994. Wissenschaft, Universität, und Professionen: Soziologische Analysen. Frankfurt am Main: Suhrkamp.

Wolfgang Stock and Mechtild Stock, 2008. Wissensrepräsentation: Informationen auswerten und bereitstellen. München: Oldenbourg.

Kim Veltman, 2004. “Towards a semantic Web for culture,” Journal of Digital Information, volume 4, number 4, at, accessed 3 April 2016.

Karin Verspoor, Jin-Dong Kim and Michel Dumontier, 2015. “Interoperability of text corpus annotations with the semantic Web,” BMC Proceedings, volume 9, supplement 5, p. A2.
doi:, accessed 3 April 2016.

Wilson Wong, Wei Liu and Mohammed Bennamoun, 2012. “Ontology learning from text: A look back and into the future,” ACM Computing Surveys, volume 44, number 4, article number 20.
doi:, accessed 3 April 2016.


Editorial history

Received 7 July 2015; revised 2 April 2016; accepted 24 April 2016.

Creative Commons Licence
This paper is licensed under a Creative Commons Attribution 4.0 International License.

Tackling complexity in an interdisciplinary scholarly network: Requirements for semantic publishing
by Nora Schmidt.
First Monday, Volume 21, Number 5 - 2 May 2016