Atanas Hristov PhD, in Knowledge Discovery in Big Data from Astronomy and Earth Observation, 2020 8.2 Query Processing Steps However, if the cost exceeds the threshold, then reoptimization is indeed triggered. Typically, if the reestimated cost is within a small threshold of the original costs, then no reoptimization needs to be performed, i.e., the space of potential query plans does not need to be searched. Even with this observation, the search space remains large, and the CQP cost estimator must make use of a variety of heuristics to make an estimate of the selectivities of subexpressions that it has not yet attempted to execute. The CQP work exploits the fact that equivalent subexpressions (regardless of query plan) will always have the same cardinality and selectivity. Given cardinality and selectivity results from the currently executing plan, the query processor must account for the fact that the current plan only explores a very small piece of the (exponential) search space, and only gives a small amount of real information. During the polling step, the system must determine (1) how the remainder of query processing will proceed (especially if the data sources' cardinalities are unknown) and (2) the costs and selectivities of alternative query plans. The plan status is generally polled every few seconds, as new aggregate performance trends become evident. The corrective query processing approach is based on performing frequent reestimations of query execution cost. Zachary Ives, in Principles of Data Integration, 2012 Cost reestimation Research should be conducted by a collaboration of teams originating from the database and Artificial Intelligence/Semantic Web domains.ĪnHai Doan. It seems that in the presence of ontologies, many optimizations can be considered for query processing. Intuitively, inferred data may need to be removed when some new data is deleted from the database, and many new triples may be generated from the insertion of a single triple.
In Chapter 8 we will see that the popular materialization of inferences has an important impact for update operations. In the case of multiple indexes, this can have a dramatic impact on the overall store performance. This is mainly motivated by the need to maintain indexes when data is updated.
Given a partitioning method, this imposes the consideration of novel query processing methods that will enable us to perform some of the associated tasks in parallel without requiring too much data exchange over a network. With the emergence of very large RDF data sets (several billions of triples), distribution over a cluster of commodity hardware will be more and more frequent. The lack of stability and maturity of some NoSQL stores may slow the adoption of these technologies for RDF stores. ĭue to the absence of standards for NoSQL systems, non-native RDF systems using one of these approaches have an entry effort to provide.All non-native RDF engines directly benefit from RDBMS query processing facilities. Query processing in RDF stores leverages from the research and development in the field of RDBMS.