ABSTAT is a framework that provides a better understanding of big and complex linked data sets.
A user who wants to know which data set better fits his needs should be able to answer to questions such as: what types of resources are described in each data set? What properties are used to describe the resources? What types of resources are linked and by means of what properties? How many resources have a certain type and how frequent is the use of a given property?
Considering the size and the number of available linked data sets and that they make use of ontologies (they too might be large) to describe the semantics of their data, answering the above questions by only looking at ontologies or by using explorative queries could be too hard. Therefore it's more efficient to explore a data set using ABSTAT.
The ABSTAT produces a summary of linked data sets that is correct and complete with respect to the assertions of the data set and whose size scales well with respect to ontologies and data set size.
The key feature of a summary is the use of minimal type patterns to represent an abstraction of the data set. A minimal type pattern is a triple (C, P, D) that represents the occurrence of assertions
<a,P,b> in the RDF data, such that C is a minimal type of the subject a and D is a minimal type of the object b according to a terminology graph, which is introduced to represent the data ontology. By considering patterns that are based on minimal types we are able to exclude several redundant patterns from the summary. As a consequence, summaries based on our model are rich enough to represent adequately the whole data set, and small enough to avoid redundant information.
Finally, a summary also includes statistics about the occurrence of minimal type patterns and types in the data.