Experimental use case: generating a sentiment-based index to measure travellers’ opinions on tourism in Spain

In this post, we tell you how SEGITTUR has developed a methodology to create a sentiment-based index that measures the general opinion of travellers about tourism in Spain.
Building an index to measure tourism results
An index is an indicator that helps us to assess how a specific area performs over time. It makes it easier to generate comparisons between different elements or of the same element periodically.
For the proposed use case, the index aims to measure travellers' satisfaction with tourism in Spain. The construction of an indicator from a database has been proposed. To do this, it is necessary to collect user opinions on different online channels and classify them according to their sentiment: Positive, negative or neutral.
The development of the index involves a process that covers everything from the validation of the approach, the collection of data and its transformation, to the construction and interpretation of the result. Therefore, it involves setting goals and requirements regarding the data, assessment criteria and the processing of data. We used the free R software to carry out these specific processes, although other types of software could be chosen.
The process carried out is detailed schematically below and is available in the Methodology incorporated at the end of the publication.
Process diagram for developing a global sentiment-based index
Process scheme for developing a global sentiment-based index
1. Data processing
We gather opinions of Internet users about tourism in Spain. These opinions are classified by channel (blogs, social networks, news, etc.), by language and by sentiment.
It is essential to cleanse the data, removing duplicate or invalid opinions and coding the variables to facilitate analysis. For more information about how data has been cleansed in this use case, see the documentation at the end of this post.
2. Preparing and weighting the strata
Opinions are grouped into "strata" or homogeneous groups, according to channel and language. The purpose of these groupings is to ensure representative samples in each stratum and to simplify the model. It may be necessary to set levels until the appropriate groupings for the sample and the index needs are reached.
In the methodology developed, each mention has been evaluated using a double weighting: The weight of the stratum in the sample and its representativeness in the population. This approach has two objectives: First, to ensure that the strata with more opinions have greater influence on the index and, second, to incorporate a value external to the sample, such as the population or a similar parameter. For example, if the database contained a similar number of reviews in Galician as in Castilian, in order to maintain the representativeness of the frequency of language use in Spain, a lower weight would be assigned to the strata with reviews in Galician compared to those containing reviews in Castilian.
Summary of key variables
Diagram of developed strata levels
3. Obtaining a Global Index
Each review is scored based on sentiment: 100 for positive, 50 for neutral and 0 for negative.
This score is then multiplied by the weight of the stratum to which the opinion belongs.
The sum of all these scores gives us the value of the index for the period evaluated, in our case, monthly.
In short, this methodology allows for the creation of an index based on online opinions, offering a clear and representative view of traveller satisfaction. In addition, it can be applied to other areas such as brand reputation or satisfaction with a specific service.
For more details on the methodology and its application, see the document attached to this post.
Experimental use case: Generating sentiment-based indices from digital listening