SPATIAL DATA QUALITY AND FIT FOR PURPOSE
Introduction
Every country, which is a member state of the United Nations Framework Convention on Climate Change, usually has a national spatial system whose primary role involves observation of land uses and development of forest covers in selected areas. Different maps that were drafted at different times can display variations that can be attributed to changes in land occupations and improper classifications. Knowledge of these aspects, therefore, creates the need to identify the differences and subsequently eliminate them.
Spatial data quality is a matter of concern that aids in most of the decision making processes and analyses for legitimate purposes. There is, therefore, a significant need to identify more resources that positively relate to the spatial quality of data. This record is used for decision-making and hence the need to cope with the quality offered by spatial data in the same (Wu & Buttenfield, 1994).
Reasons for Defects in Spatial Data Quality
The discovery of Geographical Information Systems (GIS) contributed highly to an increase of available spatial data from satellites. According to Larrivée et al. (2011), users from other disciplines that are not related to the GIS have increased tremendously, thus, its growth and advancement. Spatial data is now readily used for various purposes, even in daily lives, to meet the ever-changing needs and requirements of different groups of people. Several reasons are behind the use of spatial data, and they include: Many people are often unaware of the data quality standards. Existing high data rates, together with compatibility with various sorts of applications, can also get freely used at all stages. Other reasons include the nature of the GIS that does not display the spatial quality and the enormous gap that often exists between producers and final users of the GIS.
The Spatial Dataset
One of the most critical sections of spatial data quality research involves the description of error and uncertainty in spatial data. To further our understanding of these concepts, we need to, first of all, familiarize ourselves with the process of how spatial data are derived and simplified into ordinary data that can be easily understood. As follows: Conceptualization refers to the specification of what should be considered while studying the selected objects. Measurement1 is a term that categorically specifies the measuring methods and, finally, the measurement requirements for capturing of the final data.
There are generally three categories that explain uncertainties that accrue during the attempts of the derivation of the data, especially from a definite situation. They include error, vagueness, and ambiguity. The meaning of error is the differences occurring in the value of the particular property of a substance, which is measured with a specific error that is undetected. Under normal circumstances, an error can get successfully measured if a clear explanation of the same is available.
Vagueness is, therefore, an aspect that arises as a result of existing inadequate definitions. The significant causes of uncertainty in the spatial dataset include poor storage of documents and if objects of concern comprise mostly of those covered with hair. Ambiguity can be defined as a general sort of misunderstanding that results from weak and indefinite definitions. The opinions may also differ, thus causing the dataset to be further ambiguous as well.
There are, however, solutions to the different uncertainties, as discussed above. Vagueness can be controlled by conforming to certain norms that can be set and followed strictly by all parties. These rules, however, need to be tracked to testing and data referencing stages, thereby enhancing eligibility of the data to everyone. Ambiguity involves a consensus on whatever that constitutes the desired truth. Responsible producers should opt to collect reference data according to the specific applications of the data that could be undergoing testing in question. The best solution also involves the use of equipment that is free from errors. Reference data should be more accurate if testing instruments are abandoned and instead adopt more accurate ones.
Therefore, in conclusion, errors cannot be measured easily, and spatial data interpreted especially in cases where ambiguity and vagueness have been ignored in both the test and reference data. Suitable models should also be adopted to help in the description of uncertainties and propagation of appropriate results by the changes that could have contributed to variances in the results.
Eligibility of the Spatial Data
There has been a significant rise in informal methods of describing uncertain conditions and how to effectively use the same in determining the eligibility of Spatial data. The general process of assessing the model can be grouped into three major stages as follows: an in-depth exploration into the position of the data reveals that valuable information is present, thus determining its suitability. The other step involves an investigation to establish whether there are chances of experiencing any sort of difficulties in all attempts to find out its uses, as the final stage consists of researching to find out whether the value of the information conforms to the final requirements.
Elements of Spatial Data Quality
The table below shows the Elements of Spatial data quality.
Positional Accuracy | Runoff | USA-SDTS | ISO TC411 | CEN TC285 | ICA |
Lineage | X | N | X | X | X |
Completeness | X | N | X | X | X |
Logical Consistency | X | N | X | X | X |
Attribute accuracy | X | N | I | X | X |
Semantic Accuracy | X | N | I | X | X |
A usage, Purpose constrains | X | N | N | X | X |
Resolution | X | I | N | I | N |
Variation in quality | X | I | N | I | N |
Metaquality | X | I | N | I | I |
Temporal quality | X | I | N | I | X |
N=explicitly recognized element in a section of metadata
X= explicitly recognized component of data
I=implies a recognizable constituent
Positional accuracy
This is a precision measure that matches figures as contained in spatial data. A separation is usually made between relative and absolute positional accuracy, and another one made between the vertical and horizontal positional accuracy. Relative positional accuracy refers to an estimated accuracy that exhibits interconnection features with data from other sources but provided that they are arranged in a similar manner. On the other hand, precision in total position defines determined correctness in coordinate test values concerning matching references that are on a similar arrangement of the reference unit. The cost of the absolute positional accuracy is, however, more essential in situations where one intends to combine the datasets.
Lineage
This idea is an estimate of the history of a geographic data set. A further explanation into the same implies that it is a source of all the statistical resources that were successfully obtained by conducting careful studies into the same. These methods also include all the necessary changes that are incurred in the production process.
Completeness
The meaning of the term is the absence of complete data, while at the same time, there is a presence of excess data. There are, however, two significant types of completeness, data correctness, and the totality of models. The definition behind general completeness of data revolves around compliance with certain required guidelines. In contrast, model completeness is a measure that seeks to determine the connection of contents of a spatial data set with the specific information that is needed for a particular application. A producer will mostly provide detailed descriptions of the information regarding a product. They facilitate the production of applications for which other users may consider the contents of the spatial data set as conforming to the desired guidelines.
Logical consistency
The term means the general bonding of the relationships that are available in the system of the whole statistical structure. Consistent consistency aids in the establishment of valid data that can have been successfully tested and presented for final analysis. The presence of logical inconsistency would, however, mean a negative implication on the structure of the data, thus implying negative feedback on the same.
Attribute Accuracy
Attribute accuracy implies the sharpness or excellence of all attributes in addition to the positional and other characteristics that can trace relations to temper, thus producing the best quality of a spatial data set. The four well-known ratios of determining characteristics involve interval, ordinal, and nominal. The nominal scale is a representation of a non- uniform size, such as the land cover. The accuracy of symbolic attributes usually gets well explained with the help of an error matrix. Nevertheless, the accuracy of ratio attributes is generally defined with similar measures as those that are meant for positional attribute
Semantic Accuracy
The term refers to another form of relationship that is usually used to widely express the bonding between separate data that were initially bonded and are now not conforming to their initial conditions in the spatial data.
A usage, Purpose Constrains
This is a vital feature of the spatial data that assists potential data users in the assessment processes for the collected data sets. With the element, a more detailed understanding of the data and its interpretation is necessary, thereby aiding users to analyze further and explain the various components that constitute the data.
Resolution
This usage is another element of the spatial data that requires one to be aware of any confusion that can arise from the process of interpretation of the terms and scale, thus facilitating quick and efficient relaying of information to the necessary stakeholders. Often an analysis requires data at a specific resolution, and therefore this information on the decision is essential in the first step of the fitness-for-use assessment.
Variation in Quality
The definition of this element refers to the nature of the spatial data to exhibit various aspects that are unique to each selected data. However, homogeneity of the data is sometimes experienced mainly basing on the texture as well as description of the nature of the same information as it is anticipated on the final elements as per the data that is in picture. The variation in quality is an atypical feature of spatial data.
Metaquality
This element is a unique feature that provides additional description on an initial definition of a feature contained in various data varieties. Therefore, it is the basing point for the establishment of the value of the contents contained in specific data after testing.
Temporal quality
This element offers a short time assessment of the accuracy exhibited by the spatial data. The item, however, has some sub-elements that aid in estimating the exact accuracy. They include temporal validity, therefore a subject to time, rate of change, which is defined as a phenomenon such as currency that can estimate a variation n a specific depicted data. Temporal consistency is the ability to conform to a particular state for some a relatively short period as well as the last update (Sanderson & Ramage, 2007)
Sources of Errors in Spatial Data Quality Fit for Purpose
In the current society of both the U.K. and Wales, a close observation of various maps of land that were taken at different dates will show some variances, yet the place is constant. The plans are, therefore, overestimated as a result of several factors. Vullings et al. (2015) stated that the primary three sources of these uncertainties are semantic differences, errors of classification, and positional errors.
The leading causes of change in the detection of the errors
The primary sources of the risks are the root causes of failure, which, in turn, transform into new stages that hamper accuracy in the determination of spatial data. The three significant points of origin are the semantic differences that generally exist in the definition of the land covering matter in between the datasets and reduced registration of the parcels of land or sometimes the effects of boundaries that are made of objects. The ultimate source of the errors in the general sparse classification of the land feature is a matter that corrupts the real capturing of all the required elements for accuracy purposes.
However, the weight of the issue, semantic uncertainties need to be dealt with before any other solutions can be found in attempts to eliminate the errors. A consistency in classification methods, also known as ontology, is one of the significant remedies against semantics. However, it is expensive to enact measures that can successfully control the condition since it requires excellent and massive restructuring of the system. Loss of details can be another challenging factor to the future efforts that are meant to correct the mechanism.
The reduction of misregistration is perceived to provide a lasting solution that can fix adverse effects. This measure can, therefore, reduce the rate of classification errors since misregistration is one of the primary sources of increased classification errors. Experts can, however, not provide lasting solutions to the challenge since their knowledge is hard, prone to even more problems, and there is generally less confidence in the approach.
Classification errors may also arise in cases where wrong registration has taken place, as a result of allocating tasks to unqualified and incompetent members to the available spatial items. However, the extent of an error depends highly on an estimated total coverage of specific data; thus, the format is as opposed to various known sources of error.
Measures of Change in land Cover
There are four significant aspects of concern on matters that involve a shift in land cover for further accuracy and thus minimization of errors in the spatial data. According to Li, Zhang & Wu (2012), these elements mostly revolve around the detection of the changes that have occurred, identification of the qualitative changing aspects, assessment to ascertain an estimated distance by which some patterns related to the data keep changing. With the aid of a transition matrix, one can, therefore, quantify the aspects to adopt a numerical analysis of the same.
An Illustration of point-wise and region-wise on calculations involving crop covers
Crop cover A
Maize | Maize | Maize |
Yams | Oranges | Oranges |
Maize | Yams | Maize |
Crop cover B
Maize | Maize | Maize |
Maize | Maize | Maize |
Maize | Maize | Maize |
Transition Matrix
A B
Maize Maize Yams Oranges SUM
Yams 4 2 3 9
Oranges 2 5 6 13
SUM 6 7 9 22
Critical Evaluation of Spatial Data
The need to embrace new changes in the use of spatial data quality and legitimate Purpose also calls for improvements in land cover change estimates. This ideology is an additional idea to the need to improve data quality, which has, for a long time, achieved less. This idea calls for the employment of an error matrix as a much safer and more accurate technique in addressing the matter of occurrence of errors.
An error matrix provides a more precise and thus reliable form of direct estimation since it can play multiple roles in spatial data Quality Fit for Purpose. The model can adequately describe the quality of a single date data set, improve land cover change estimates. The method can also be used directly by anyone at their discretion since it is easily understandable and offers room for different compromising situations. Lush, Bastin & Lumsden (2012) postulated that error matrices had posted very few cases of challenging moments involving misclassifications since they have high accuracy levels in comparison with the other traditional methods.
References
Larrivée, S., Bédard, Y., Gervais, M., and Roy, T., 2011, October. New horizons for spatial data quality research. In 7th International Symposium on Spatial Data Quality (ISSDQ 2011). ISSDQ. Coimbra, Portugal.
Li, D., Zhang, J., and Wu, H., 2012. Spatial data quality and beyond. International Journal of Geographical Information Science, 26(12), pp.2277-2290.
Lush, V., Bastin, L., and Lumsden, J., 2012. Geospatial data quality indicators.
Sanderson, M., Stickler, G., and Ramage, S., 2007. 1Spatial: Spatial Data Quality and the Open Source Community. OSGeo Journal, 2(1).
van Oort, P.A., 2006. Spatial data quality: from description to application. Wageningen Universiteit.
Vullings, W., Bulens, J.D., Rip, F.I., Boss, M., Meijer, M., Hazeu, G., and Storm, M., 2015, June. Spatial data quality: What do you mean. In Proceedings of 18th AGILE International Conference on Geographic Information Science, Lisbon, Portugal (pp. 9-12).
Wu, C.V., and Buttenfield, B.P., 1994. Spatial data quality and its evaluation. Computers, environment, and urban systems, 18(3), pp.153-165.