Experimental use case: generating a sentiment-based index to measure travellers' opinions on tourism in Spain

16-10-2024

Experimental use case: generating a sentiment-based index to measure travellers’ opinions on tourism in Spain

An index is an indicator that helps us to assess how a specific area performs over time. It makes it easier to generate comparisons between different elements or of the same element periodically.

For the proposed use case, the index aims to measure travellers' satisfaction with tourism in Spain. The construction of an indicator from a database has been proposed. To do this, it is necessary to collect user opinions on different online channels and classify them according to their sentiment: Positive, negative or neutral.

The development of the index involves a process that covers everything from the validation of the approach, the collection of data and its transformation, to the construction and interpretation of the result. Therefore, it involves setting goals and requirements regarding the data, assessment criteria and the processing of data. We used the free R software to carry out these specific processes, although other types of software could be chosen.

The process carried out is detailed schematically below and is available in the Methodology incorporated at the end of the publication.

2. Preparing and weighting the strata

Opinions are grouped into "strata" or homogeneous groups, according to channel and language. The purpose of these groupings is to ensure representative samples in each stratum and to simplify the model. It may be necessary to set levels until the appropriate groupings for the sample and the index needs are reached.

In the methodology developed, each mention has been evaluated using a double weighting: The weight of the stratum in the sample and its representativeness in the population. This approach has two objectives: First, to ensure that the strata with more opinions have greater influence on the index and, second, to incorporate a value external to the sample, such as the population or a similar parameter. For example, if the database contained a similar number of reviews in Galician as in Castilian, in order to maintain the representativeness of the frequency of language use in Spain, a lower weight would be assigned to the strata with reviews in Galician compared to those containing reviews in Castilian.