Data Quality Standard and Checklist

Open Data Policy Appendix D

Data Quality Standard and Checklist

The following provides Tempe’s standards for data quality and for completing a data quality review. A checklist for completing a data quality review is provided at the end of this document.

1. The importance of good data quality

1.1 The availability of complete, accurate, relevant, accessible, and timely data is important in supporting decision-making, planning, resource allocation, accountability, and the delivery of service outcomes and priorities. For example:

Strategic planning

High quality data and information is used to plan the City’s vision and goals and informs the City’s decision-making process.

Financial planning

Financial data must be reliable to enable the City to set budgets and forecasts to support service planning.

Service planning

Accurate data about the volume and type of services delivered and activities undertaken is essential to ensure appropriate allocation of resources and future service delivery.

Performance management

Accurate data enables the identification and resolution of any shortfalls against standards and targets.

Service improvement

Accurate data enables the analysis of service provision against user needs and overall efficiency and effectiveness.

Customer support

Accurate data enables the delivery of relevant and timely services and ensures that the customer and other parties involved can be kept informed where appropriate.

Efficient administration

Data needs to be provided to an appropriate standard and in such a way that the full range of stakeholders, partners and agencies can access the information they need easily and quickly.

Adherence with audit processes

Data needs to be available for timely, reliable, and accurate reporting to support the City’s internal and external audit regimes.

Accountability, transparency, and Open Data

High quality data is essential in delivering the City’s transparency and open data agenda.

Table 1.1

1.2 Data quality is particularly important in operations and decision support for local government activities. A local government’s ability to provide services and meet the needs of the community relies on access to high quality data. Successfully increasing the quality of an organization’s data requires a shared definition of data quality, clear processes for identifying data quality issues, defined steps for addressing those issues, and a process for identifying possible solutions to address the quality issue.

1.3 Poor data quality can have a significant legal, professional, and financial impacts and can damage a city’s reputation. Examples of risks associated with poor data quality include:

  • decisions are based on inaccurate or out of date information

  • at risk individuals in the community are not identified or provided resources

  • missed opportunities for data integration

  • poor performance is not identified and addressed/missed opportunities for improvement,

  • published information is misleading or incorrect

  • good performance is not recognized and rewarded

  • poor use of resources and inefficiency

  • policies are ill-founded and impacts are not properly identified.

1.4 Bias refers to systemic errors or tendencies that influence analysis results, lead to incorrect or misleading conclusions, and affect the quality and credibility of decisions. Data quality impacts bias in data in several ways, including:

  • Incomplete data can lead to bias because it can skew the results of analysis. For example, if a study is conducted on a sample of people but only half of the people provide their income, the results of the study may be biased towards people with higher incomes.

  • Inaccurate data can lead to incorrect conclusions. For example, if a study is conducted on a sample of people but the data on their age is inaccurate, the results of the study may be biased towards older or younger people.

  • Biased data collection can lead to bias in the data itself. For example, if a survey is conducted but only people who are likely to agree with a particular viewpoint are asked to participate or choose to respond, the results of the survey will be biased towards that viewpoint.

  • Biased data analysis can also lead to bias in the results of analysis. For example, if a researcher is looking for a particular result, they may be more likely to interpret the data in a way that supports their hypothesis.

Ensuring that data is high quality can help to reduce the risk of bias in data. This can be done by following good data collection practices, using reliable data sources, and conducting unbiased data analysis.

2. Fitness for Use

There is no one definition of data quality that provides a single set of agreed upon standards that must be met in completion to define data as high quality. While there should be set standards for what makes good quality data, the data quality should be defined in the context of whether the data are fit for a particular purpose or use. Whether or not data are fit for use depends on many factors, including the application of the data, the quality that are required for a specific purpose, and the expectations of the users and what they define as useful information.

If the data are used by different people with different contexts, there may be several fitness for use reviews, and they may not have the same results. When data are used by multiple departments or different units within departments, fitness for use discussions should include both data producers and consumers with the goal of defining an agreed upon definition of what fitness for use means for that data.

Suggested improvements related to the seven standards should be reviewed based on fitness for use, the cost of making data improvements and the impacts of adjusting one data quality element on another data quality element. The discussion should not be around creating “perfect” data, but for data that provides the information needed for a given purpose/context.

For this policy, data quality is defined as fitness for use based on the seven standards of good data quality presented in Section 3.

3. What makes for good quality data and information?

3.1 There are seven standards of good quality data:

Accuracy

Data should be sufficiently accurate - error free - for its intended purposes and at the appropriate level of detail. Data should be captured once, although it may have multiple uses. Higher levels of accuracy are more likely to be achieved when captured as close to the event as possible.

Sometimes, the need for accuracy must be balanced with the importance of the uses of the data and the timeliness, cost, and effort of collection. Where compromises are made on accuracy, limitations must be clear to users.

Validity

Data should be recorded and used in compliance with relevant requirements, rules, and definitions, ensuring consistency with similar organizations. Data sources should be provided to document where the data come from (source system, report, website, etc.).

In the absence of actual data, proxy data may be used, although consideration must be given to how well this data is able to satisfy the intended purpose.

Reliability

Whether using manual or computer-based systems or a combination, data should be collected using stable and consistent methods, ensuring that when used for comparison or to monitor progress over time, variations in collection processes do not impact analysis or performance evaluation.

Data definitions and processes should be documented.

Timeliness

Data should be created as close to the event occurrence as possible. Data should be available frequently and promptly enough for it to be of value to decision making and service delivery.

Relevance

Data should be relevant to the intended use. It should be defined, collected, and analyzed with its intended use and audience in mind. Requirements may change over time, so relevance should be considered during quality reviews.

Completeness

Data requirements and collection processes should be clearly specified based on the needs of the organization. All relevant elements should be captured by completing all relevant fields (for both electronic and paper sources).

Missing, incomplete or invalid records provide an indication of data quality and can also point to problems in the recording of data. Applications should require completion of fields for mandatory data.

Appropriateness

Collection and recording of data must fulfill a legitimate and clearly defined business purpose. It should be relevant to the purpose of the application/system.

Representativeness

Data should accurately reflect the population from which it is drawn. A representative dataset is one that is not biased towards any single group or subgroup. This is important for data quality because it impacts our ability to make informed inferences about the population.

Table 3.1

4. Achieving good quality data in City of Tempe

4.1 In order to achieve high data quality, the Data Team will collaborate with the Data Governance Team, departments, data stewards and other IT units on the following:

Governance and leadership

All staff are aware of their responsibilities relating to data quality through onboarding and, where appropriate, job descriptions.

Systems and processes

  • Data is stored, used, and shared in accordance with the law, including those for data protection and freedom of information.

  • All data collection, analysis and reporting processes by the city will be covered by clear procedures, which are easily available to all relevant staff, and regularly reviewed and updated.

  • Data Quality is a core component when specifying / procuring IT Systems.

  • We work constructively with partners and external organizations to provide assurance of data quality.

Skills

  • All staff are aware of the legal and statutory requirements, as well as the importance of good data quality and their own contribution to it.

  • All staff receive appropriate training in relation to data quality aspects of their work.

Table 4.1

4.2 Contributing to data quality.

All staff

Data quality is an integral part of all city data systems or processes. All staff engage with data creation at some point, whether paper or electronic. As a result, all staff are responsible for maintaining data quality.

Data Steward

Accountable for the data quality within their work group and responsible to work with appropriate managers to address any quality issues.

Data Governance Committee

Responsible for drafting and amending this policy. Share best practice around Data Quality.

Data Team

Provide advice and guidance about evaluating data quality and fitness for use of data. Where applicable, they will include data quality reviews when helping to develop or providing input on new or existing business processes. If issues are identified, the Data Team will work with the data steward/department and/or IT teams on possible solutions.

Solutions Architects

Provide advice and guidance about evaluating data quality and fitness for use of data. Where applicable, they will include data quality reviews when evaluating line-of-business applications and systems or when providing input on business process enhancements. If issues are identified, the review outcomes will be shared with the Data Team, who will work with the data steward/department on possible solutions.

Table 4.2

5. Data Quality Reviews

5.1 There are several activities that may trigger a data quality review, including, but not limited to:

  • Preparation to publish new open data, including updates to existing data

  • Preparing for new software integrations or migrating existing data to a new system

  • When data will be shared internally within or between city departments

  • When data will be shared with trusted partners or other government entities

  • New performance measures or indicators are created

  • When data migration is being considered for a new software application, or

  • Business processes are being reviewed

  • New business processes are being defined and implemented

  • When someone identifies possible data quality issues (internal and external users)

5.2 Identification of data quality issues and resolution

If a dataset is identified as having quality issues, the Data Team will work with the data steward and other involved parties to review the specific issues and proposed changes to work toward resolution, where appropriate. If changes are recommended, the Data or other IT team will work with the data steward to develop a plan to make the changes, review the plan with the team identified above, and then implement the changes.

Examples of possible solutions are

  • Changing standard operating procedures to ensure that all data are entered to improve completeness.

  • Collecting/recording data as close to the event as possible to ensure timeliness.

  • Making fields required to improve completeness.

  • Using dropdown menus with predefined options to improve accuracy and reliability or

  • Improving data automation processes so that data are refreshed more frequently to ensure timeliness.

Checklist for reviewing data quality in practice:

The following questions provide guidance for the questions to ask during a data quality review. Documentation of the review should include written responses to these questions along with other relevant information.

Submit completed review documents as a ticket through the IT Service Desk. Include the name of the dataset being reviewed and ask for it to be sent to the GIS Queue.

For checklists completed as part of the Open Data Process, follow the submission instructions provided on the SharePoint site.

Documenting data quality (metadata, reports, etc.):

Last updated