Basic Glossary to Avoid Getting Amongst the Data: 1st Part, from “A” to “D”
Glossary - 1st part: from "A" to "D"
The aim of this post and its second part is to help you understand some of the most common technical terms that frequently appear in the field of data science.
Artificial intelligence; API; beacons; big data; blockchain; business intelligence; csv; data anonymization; data dictionary; data governance; data horizontality; data lake; data mining; data model; data space; data sovereignty; deep learning and digital twins
ARTIFICIAL INTELLIGENCE: the ability of a machine to imitate the functioning of the human mind through actions such as reasoning, learning, creativity or planning. Its most evolved developments are structured around the massive processing of information and the application of calculation algorithms to automate decision making. This is referred to as machine learning. As is the case with the human mind, the more practice and study (the greater the amount of data processed), the more the system learns and its calculations will be more effective for the purposes of decision making, problem solving or predicting behaviour. Artificial intelligence is often referred to by its acronym (AI).
API (APPLICATION PROGRAMMING INTERFACE): this is a mechanism that communicates two systems that in principle are unrelated. It works as a connector between two independent platforms or software in a way that allows them to share data and functionalities. For example, a tourist route planning app could, using an API, employ AEMET weather data and visitor flow data generated by flow meters to combine them with user preferences and recommend a route adapted to the time, volume of tourists and tastes of your customer. Or if you wanted to develop an app that displayed information similar to Dataestur's dashboards, you could use your API service to obtain the information available in its databases.
BEACONS: small devices (measuring approximately one or two centimetres), with unique identification, GPS and Bluetooth technology. This makes it possible, for example, for tourists' to use their mobile phones to establish where they are or show them tailormade proposals. To this end, the user is required to have an app installed on their mobile, since the beacon only sends the app activation notification. It is triggered by the signal and displays the customised notification bearing the location and user settings in mind.
BIG DATA: a large amount of data that requires computer technology for it to be managed and analysed. Sometimes, it is generically used in reference to the series of analytics and technologies used to analyse these large data sets. Also known as macro data, mass data or large-scale data. For example, a database containing all tourist flights to Spain would be mass data that, although it can provide relevant information, requires computer technology to manage, analyse and extract knowledge.
BLOCKCHAIN: frequently associated with cryptocurrencies although it is a technology that allows data to be shared between computers securely through the storage of information in decentralised and encrypted databases.
Its name is key to understanding how it works. A network is made up of multiple nodes (computer equipment). Blockchain technology transforms each exchange of information into an encrypted block and sends that block to all nodes. Each device validates the existence of the block and includes it in the chain. In other words, it contains information about that block itself, as well as the previous and subsequent blocks. Thus, the information is decentralised (because it is stored in all the nodes) and security is reinforced (if there was an attempt to modify a block on a computer, the remaining nodes would alert about the change; or if a node was deleted, the information would remain on the remaining computers). As a result, the records will always be unchangeable and cannot be deleted. If there is any need to amend information, a new record must be stored with the change. In short, blockchain technology removes intermediaries, facilitates direct access to information amongst those exchanging it, decentralises the process, reinforces security and optimises the traceability of information. This technology is used in cryptocurrency transactions and for tourism and is mainly applied to provide security and reliability to transactions and reservations, eliminating intermediaries and offering alternative payment methods or rewards (through cryptocurrencies or tokens)
BUSINESS INTELLIGENCE: also abbreviated to BI. This refers to the transformation of existing data at a company or business into knowledge by analysing it. It is critical for the data to be useful in decision making. Dashboards and KPI or indicator tracking or are BI tools.
CSV: this is the most widespread open file format for sharing large volumes of data represented in tables. It offers two main advantages: They are compatible with a variety of programs and they are less than half the size of other widely used files, such as Excel documents. Furthermore, they are easily converted to a traditional table format with calculation programs such as Excel or similar.
DATA ANONYMISATION: this is a key process for the protection of privacy and consists of making data anonymous using techniques that reduce the risk of people being identified. This is a complex process, which requires advanced profiles and methodologies and includes performing a reidentification risk assessment and a management plan over time. Reidentification is usually performed by linking it with data from other complementary sources to search for connections that make it possible to indirectly link the data to a natural person. As a result, pseudonymisation techniques, such as encryption or replacing key identification data with others not included in the registry, are not enough.
DATA DICTIONARY: a necessary document in any database as it lists the metadata of the information, including its origin, format, use, field definitions, possible transformations and the values it can take on. This must not be confused with a data catalogue (directory that makes it easy to locate the information) or a business glossary (functional definitions of the field of study to which the data belongs).
DATA GOVERNANCE: these are the series of rules used to manage data from the moment it is acquired, during its use and management and up until its elimination. Data governance involves setting standards to govern the entire data lifecycle, identifying who can access the data and who is responsible for its accuracy and reliability. The ultimate goal is to ensure data quality by making it more reliable, complying with regulatory, legal and industry standards, while guaranteeing a single version of the data. To this end, in 2022, the European Union approved the "Data Governance Law", which will encourage the more extensive reuse of protected public sector data.
DATA HORIZONTALITY: this is the capacity of data to be of equal interest to a variety of stakeholders from different sectors. It is understood from the perspective that data is considered a non-rival resource and its use and sharing contribute to increasing the capillarity of the business. A piece of information may be of interest to several companies in the same sector, but also to those of related branches and others that may engage in lines of business with no apparent initial relationship. For example, flight reservation data at origin is important for destination hotels, but also for its entire tertiary industry, for security forces and for local administration.
DATA LAKE: this is the repository where an organisation's raw data is saved. Although the information is not necessarily structured or the data prepared for use, it is important to implement a catalogue of the information collected, data traceability standards, a security strategy for the same and the connection with the tools for later use to process, analyse or apply artificial intelligence.
DATA MINING: this entails exploring large data sets to detect patterns, anomalies and correlations that help predict outcomes. It combines statistical, artificial intelligence and machine learning techniques that technology currently allows to be applied automatically or semi-automatically. Data mining removes noise and cleans information, identifies relevant data and facilitates its assessment. This would be the step previous to business intelligence and is aimed at detecting patterns and trends to be taken into account as part of the decision-making process.
DATA MODEL: this is the representation of the elements of a data set and their relationships and connections to one another, usually in the form of flowcharts. When it comes to smart data models, they must ensure the compatibility and interoperability of data while eliminating redundancies, facilitating recovery and reducing storage needs. Therefore, the data model covers aspects from the homogenised data structure to make it compatible and interoperable between systems to the detail of the data displayed, the semantic and standardisation conventions applied, the identification of attributes, the entities involved and the relationships between the data.
DATA SPACE: a decentralised ecosystem for the voluntary and secure exchange of data. It is built around common components (building blocks) that must guarantee, in addition to interoperability, data sovereignty and trust. In other words, it must have the capacity to identify and verify the participants and the correct application of the access and use rules. Furthermore, each owner retains control over their data and determines the requirements and modes of use. When it comes to tourism, the Data Office, in collaboration with Segittur, is working on the creation of a data space for the tourism sector.
DATA SOVEREIGNTY: the legal face of data governance, referring to the rules or legislation for the protection of data throughout its life cycle and the regulation of data possession by companies and governments. To this end, the Organic Law on Data Protection, in Spain, and the GDPR (General Data Protection Regulation) and the Cybersecurity Law, at a European level, are key aspects of data sovereignty. This legislation answers questions about who owns the data, who can store it, how it can be used, how it must be protected and the consequences of its misuse. Along with the IT infrastructure, data sovereignty plays an essential role when it comes to guaranteeing data security.
DEEP LEARNING: this represents a transition from machine learning towards a more scalable system. The machine structures its data processing in a similar way to the human neural processing and, although it harnesses structured data, some of these "neural" layers are responsible for analysing raw data to detect characteristics that distinguish one piece of data from another. Technological progress such as image interpretation, speech recognition and understanding are based on this type of machine learning.
DIGITAL TWIN: an exact digital replica of something in the real world and can refer to almost anything: an object, a process or a space, for example. Unlike other virtual representations, it is characterised by also recreating its operation in real time as it processes the data it receives from its original using smart sensors. As it processes more data, it is able to predict and reproduce situations employing simulation and machine learning techniques. The application of this technology represents a major challenge as part of the digitalisation of processes and data management. For example, when applying it to a tourist attraction, sensors would have to be installed to obtain real-time information on the variables to be taken into account and it must have the capacity to manage all the information.
- If you would like to consult the rest of the terms of this basic glossary, check out the second part of the post. It includes the following definitions:
Federated data; GIS (Geographic Information System); interoperability; IoT; KPIs; machine learning; metaverse; NFC; UNE standards; use cases; sandbox; SQL and query; token.