Risk assessment for handling hazardous substances within the European industry: Available methodologies and research streams

After the Seveso disaster occurred more than 40 years ago, there has been an increasing awareness of the potential impacts that similar accident events can occur in a wide range of process establishments, where the handling and production of hazardous substances pose a real threat to society and the environment. In these industrial sites denominated “Seveso sites,” the urgent need for an effective strategy emerged markedly to handle hazardous activities and to ensure safe conditions. Since then, the main challenging research issues have focused on how to prevent such accident events and how to mitigate their consequences leading to the development of many risk assessment methodologies. In recent years, researchers and practitioners have tried to provide useful overviews of the existing risk assessment methodologies proposing several reviews. However, these reviews are not exhaustive because they are either dated or focus only on one specific topic (e.g., liquefied natural gas, domino effect, etc.). This work aims to overcome the limitations of the current reviews by providing an up‐to‐date and comprehensive overview of the risk assessment methodologies for handling hazardous substances within the European industry. In particular, we have focused on the current techniques for hazards and accident scenarios identification, as well as probability and consequence analyses for both onshore and offshore installations. Thus, we have identified the research streams that have characterized the activities of researchers and practitioners over the years, and we have then presented and discussed the different risk assessment methodologies available concerning the research stream that they belong to.


INTRODUCTION
categorized in lower-and upper-tier establishments) (Paltrinieri & Reniers, 2017). The directive defines hazardous substances and quantity thresholds for lower and upper tiers (European Parliament and Council, 2012). But, how did we arrive at the definition of a directive for the regulation of major chemical accident hazards at the EU level? The catastrophic accident that occurred in a chemical plant located in Seveso (Italy) in 1976 led to the urgent adoption of legislation on the prevention and control of such accidents. Therefore, the first industrial safety Directive, the so-called Seveso-Directive (Directive 82/501/EEC), was amended in 1982 and, subsequently, replaced first by Seveso-II (Directive 96/82/EC) and then by the current Seveso-III (Directive 2012/18/EU) in 2012 aiming at providing a proper regulatory framework based on strategic process safety and risk management given the lessons learned from accidents occurred over time. It is worth mentioning that Seveso Directive is implemented and adopted through national legislation; nevertheless, each EU Member State may maintain or develop its regulations to strengthen its application. Pey et al. (2009) and Laurent et al. (2021) investigated the differences existing in applying different regulations highlighting the nonuniform approach to defining and assessing major hazards (methodologies, inspection practices, etc.) as well as the lack of coherent definition to establish thresholds for risk and/or consequence tolerance or acceptance. Therefore, efforts should be addressed on harmonizing legislation to ensure a uniform Europe-wide standard. If Seveso Directive deals specifically with onshore major accident hazards involving dangerous substances, the safety of the offshore oil and gas industry is, instead, regulated by parallel directives, such as the one establishing minimum requirements for preventing major accidents in offshore oil and gas operations and limiting the consequences of such accidents (European Council, 2013). The likelihood and the consequences of accidents can be reduced by appropriate risk management processes . As reported in the ISO 31000: 2018 (International Standard Organization (ISO), 2018), a risk management process "involves the systematic application of policies, procedures, and practices to the activities of communicating and consulting, establishing the context and assessing, treating, monitoring, reviewing, recording and reporting risk." Hence, a risk management process should dynamically pursue the aims of (i) identifying, analyzing, and assessing potential hazards in a system or related to an activity, and (ii) defining and introducing control measures to eliminate or reduce potential risks to people, assets, or the environment (Rausand & Haugen, 2020). This process comprises four main phases, that is, risk assessment, risk treatment, risk communication, and risk monitoring and review. Specifically, the risk assessment phase is fundamental for the prevention of accident events and the mitigation of their consequences (Dunjó et al., 2010). The classic "triplet definition of risk" by Kaplan and Garrick (1981) states that risk can be expressed by what can go wrong (scenario), what likelihood it will have (probability), and how severe consequences will be (consequence). Thus, with "risk assess-ment," we also refer to the techniques for identification of hazards and accident scenarios (e.g., hazard and operability study [HAZOP], failure mode and effect analysis [FMEA], or layers of protection analysis [LOPA]), probability, and consequence analyses, considering the term from a synecdochic perspective.
To tackle these problems, a multidisciplinary and interdisciplinary approach is required; thus, methods and approaches are combined with knowledge from different disciplines and fields such as engineering, statistics, psychology, social sciences, medicine, and so forth (Society for Risk Analysis (SRA), 2018b). Hence, risk assessment is considered as a scientific field based on the combination of these methods and knowledge to: (i) develop risk principles and research aiming at conceptualizing, assessing, managing, communicating, and governing risk Aven, 2018a; Society for Risk Analysis (SRA), 2018a) and (ii) study and treat the risk of specific activities (the design and operations of industrial sites, investment, natural phenomena, etc.) aiming at preventing accident events and mitigating their consequences (Christou & Papadakis, 1998;Zio, 2007). Indeed, efforts have been made in defining control measures aiming at reducing the likelihood of undesired outcomes through the development of proper regulatory environments within which all stakeholders make a significant contribution to the risk mitigation and control related to industrial accidents (M. D. Christou & Papadakis, 1998). Hence, due to its great importance, the interest in risk assessment has exponentially increased since the first studies in the early 1970s ( Pasman & Reniers, 2014), leading to the development of such a high number of methodologies that researchers and practitioners have tried to review them to provide useful overviews and classifications. Siu (1994), for example, depicted an overview of the risk assessment methodologies related to dynamic systems, whereas Khan and Abbasi (1998) presented a stateof-the-art review of risk assessment methodologies applied to chemical process industries. Tixier et al. (2002) then identified 62 different risk assessment methodologies for industrial plants and grouped them into three different categories, that is, identification, evaluation, and hierarchization. The work of Marhavilas et al. (2011) had a similar scope; they aimed to classify, categorize, and analyze the main risk assessment methodologies published in the period 2000-2009. Pitblado and Woodward (2011) and Animah and Shafiee (2020) then focused on risk assessment methodologies for liquefied natural gas (LNG). The former investigated the lesson learned and historical progress, the different prediction methods adopted, and the actual unresolved technical issues, whereas the latter proposed a categorization of the state-of-the-art publications on the topic. Moreover, Reniers and Cozzani (2013) and Necci et al. (2015) provided critical reviews on quantitative risk assessment (QRA) methodologies focusing on the domino effect in chemical and energy industrial sectors. Similarly, Villa et al. (2016) also provided an overview of the risk assessment methodologies related to the chemical process industry, but they focused on dynamic methodologies.
However, although the just-mentioned reviews significantly contribute to providing an overview of the risk assessment methodologies available, they cannot be considered thoroughly exhaustive because they are either dated (Khan & Abbasi, 1998;Marhavilas et al., 2011;Siu, 1994;Tixier et al., 2002) or limited in scope because they focus only on one specific topic (e.g., LNG, domino effect, etc.) (Animah & Shafiee, 2020;Necci et al., 2015;Pitblado & Woodward, 2011;Reniers & Cozzani, 2013;Villa et al., 2016). There is, hence, the need for an up-to-date work that not only focuses on a specific topic but also provides a comprehensive and complete overview of the risk assessment methodologies for EU industrial sites. The aim is also to aid researchers and practitioners in making more effective knowledge-based decisions through the review's trends and advances in risk assessment methodologies highlighting the focal challenges to developing effective preventing strategies. In this work, we aim to fulfil this need. In particular, thanks to a novel and automated unsupervised machine learningbased clustering technique followed by a manual cleansing, we were able to identify five main research streams that have characterized the activities of researchers and practitioners over the years. We have presented and discussed the different risk assessment methodologies available in relation to the research stream that they belong to. The five research streams identified are (i) risk assessment methods for Seveso sites, (ii) bow-tie diagrams and safety barriers, (iii) process safety management, (iv) data-based risk assessment, and (v) health and environmental analysis. In each of them, we have discussed the different risk assessment methodologies proposed and studied by researchers and practitioners, their evolutions and modifications over the years, and their potential future developments.
The article is organized as follows: Section 2 presents the methodology adopted, whereas Section 3 illustrates the five main topics identified and summarizes the works that constitute these topics. Section 4 then deals with the discussions, whereas in Section 5, conclusions and future research perspectives are depicted.

METHODOLOGY
To develop the up-to-date and comprehensive overview of the risk assessment methodologies for Seveso sites that represents the main goal of this work, we have carried out an extensive literature analysis. Moreover, we have decided to identify the research streams that have characterized the activities of researchers and practitioners over the years and to present and discuss the different risk assessment methodologies available according to the research stream that they belong to. To do so, we have leveraged an automated clustering followed by a manual cleansing of the results of the literature analysis. The use of an automated clustering based on unsupervised machine learning techniques represents a novel approach, and it allows us to reduce the inevitable bias of literature review works. Such bias, however, can-TA B L E 1 Search keywords used in the systematic literature review

Group A Group B
"risk assessment" "risk analysis" "Seveso" "offshore directive" "industr*" not be fully avoided, and manual cleansing of the obtained clusters is also required. The adoption of machine learning algorithms in systematic reviews can be considered a valuable tool to assist researchers in the screening of the set of collected literature. To the best of the authors' knowledge, the implementation of such techniques for risk assessment literature review is unknown, despite their increasingly recurrent usage. Nevertheless, in the last years, several studies have explored automated classification techniques to assess the relevancy of the literature on specific-oriented topics. For instance, literature research revealed automated classification focused on medical (Cohen et al., 2006;Timsina et al., 2016) and biomedical (Wallace et al., 2010) sectors, cloud manufacturing (Ellwein et al., 2020;Lolli et al., 2022;, food safety (van den Bulk et al., 2022), and product returns (Duong et al., 2022). Recently, several authors presented different methodologies (Jaspers et al., 2018;Weißer et al., 2020), optimization techniques (Adinugroho et al., 2017;, and frameworks Simonetto et al., 2022;Tauchert et al., 2020) aiming at contributing to performing an objective, transparent, and reproducible documents-based search approach and enhancing the quality of the literature review.
In the following, we will provide the details of the extensive literature analysis (Section 2.1) and of how the results of the literature analysis have been analyzed (Section 2.2), that is, automated clustering (Section 2.2.1) and manual cleansing (Section 2.2.2).

Literature analysis
The proposed literature analysis is performed according to the principles of systematic reviews as described by Tranfield et al. (2003) with the purpose to ensure proper reproducibility and improving reporting quality. The basic idea is to develop a methodical approach involving the following steps: (i) documents identification through database searching, (ii) automated clustering of the selected papers in accordance with their representativity and suitability for the purpose of the study, and (iii) manually screening and eligibility of the clustered papers aiming at excluding the ones deemed irrelevant or at reassigning them to more adequate clusters after the full-text reading by the authors of this study. As said, the last two steps will be presented in the following Section 2.2. The first step consists of the documents' identification through the definition of the specific keywords employed for paper selection as reported in Table 1. The procedure involves the adoption of a two-level keyword structure to coherently cover the published works related to the topic under investigation. Composed as follows, this structure based on a combination of keywords allows the collection of multiple and large-scale search terms: Group A has settled the main keywords that delineate the core topic of the review, that is, "risk assessment" or "risk analysis," whereas in Group B was reported the subordinate keywords that explicitly depict the search scope. In this case, the keywords "Seveso" and "offshore directive" are used to include regulations for both onshore and offshore facilities, whereas "industry*" is employed to focus the search on the industrial domain. Thus, the logical operators "AND" and "OR" are applied to generate Boolean keyword combinations "(keyword of Group A) AND (keyword of Group B OR another keyword of Group B)" as reported in Hosseini et al. (2019).
The Elsevier's database-denominated Scopus was used for the literature analysis, where the adopted search-based approach was automatically performed. Then, the following restricted criteria were settled as a threshold for papers to be included in this study: • search string limited to "Title, Abstract and Keywords"; • papers limited to journal articles, reviews, conference papers, and book chapters (technical reports, thesis, etc., were excluded); • papers published in the English language; • papers were only considered once.
Hence, a total of 559 articles were obtained from the identification step of the proposed literature analysis. Then, starting from this start set, the goal is to classify and investigate the main research streams in the field of risk assessment and analysis aiming at specifying the contributions and at identifying gaps and future research directions. To do so, a subject matter-oriented categorization procedure performed through an automated unsupervised machine learning-based clustering technique (step 2) as well as the inclusion/exclusion criteria introduced for the manual cleansing (step 3) are presented in the following section. Finally, the achieved results of the literature analysis were then analyzed by considering the final set of documents and the related clusters as described in Section 3.

Automated clustering
The second step involves the classification of the papers collected in the first step to facilitate and enhance the detection of the main research streams that have characterized the activities of researchers and practitioners over the years. To do so, the start set resulting from the literature analysis was analyzed by means of the Orange software for machine learning (Demšar et al., 2013). Only the article titles and abstracts were considered as a corpus of text documents to preprocess. The keywords are excluded because they are not consistently reported throughout the articles. A preprocessing of the corpus was crucial for achieving a better quality analysis of the results (Cantini et al., 2022;Denny & Spirling, 2017;Simonetto et al., 2021;Uysal & Gunal, 2014). It included: • transformation to lower case, removal of accents, detection of html tags to parse out text only, and removal of urls; • tokenization to break the text into smaller components; • normalization for stemming and lemmatization to words; • filtering to remove a selection of stop-words.
In addition, a word count for each article (considering only title and abstract) was carried out. In this way, each article was characterized by a vector whose values in each dimension correspond to the number of times the term appears in the article (Singhal, 2001). The cosine distance (i.e., the cosine of the angle between two vectors of an inner product space) gives a useful measure of how different two documents are likely to be in terms of their subject matter. For this reason, a matrix of distances between all the articles in the corpus was computed. The matrix of distances allowed hierarchical clustering to be performed. Hierarchical clustering is an unsupervised technique for data exploration analysis, which seeks to build a hierarchy of clusters. It produces a binary merge tree, starting from the leaves (the articles) and proceeding by merging two by two the "closest" subsets (stored at nodes) until the root of the tree containing all the elements is reached (Nielsen, 2016). The graphical representation of this binary merge tree is a dendrogram. Therefore, the automated clustering is based on hierarchical clustering resulted in seven different clusters (Figure 1). The results of the hierarchical clustering were also validated by two other techniques, that is, Louvain clustering and K-means clustering. The algorithm of Louvain clustering provided by Orange is based on the Louvain method for community detection, which is a method to extract communities from large networks (Blondel et al., 2008;Lambiotte et al., 2008). K-means clustering is an unsupervised learning algorithm classifying a given data set into a number of clusters defined by the letter "K," which is fixed beforehand (Demšar et al., 2013). Orange runs the K-means clustering algorithm for several values of "K" and selects the one with the highest silhouette. The silhouette value contrasts average distance to elements in the same cluster with the average distance to elements in other clusters.
As a result of this validation, the Louvain clustering confirmed the optimal number of seven clusters, whereas Kmeans clustering provided the silhouette values in Table 2, showing a high silhouette value for the same configuration (i.e., for a number of clusters equal to seven). The K-means analysis was carried out considering only values between five and eight for the parameter "K" to avoid clusters that are excessively large (or small) and might not be representative for the purpose of this study.

Manual cleansing
Once the number of clusters was validated, the third step of the search-based approach involves the manual cleansing of the obtained clusters. An overall research stream is identified for each cluster through the screening procedure based on F I G U R E 2 Schematic of the literature analysis process the full-text reading of the articles included. Moreover, documents proved not to be appropriate with the preestablished clusters are reassigned to other clusters, whereas the ones deemed irrelevant or to be outside our scope are excluded from the study. Concerning the criteria exclusion, the papers were read by all the authors to confirm adequacy with the topics, whereas any discrepancies or conflicts were resolved by double-check-oriented discussion and final consensus among all the authors. Thus, the silhouette values associated with each article (measuring how similar the article is to its own cluster compared to other clusters) are considered for both defining the overall research stream (high silhouette articles generally provide a clear outline of the cluster topic) and reassigning or excluding articles. If required, this substep may also lead to a slight rearrangement of the cluster configuration. This substep must be carried out manually, as the automated process cannot avoid misinterpretation of article topics. Therefore, after this refinement, a number of 200 documents were collected. The reassignment and the exclusion actions have led to a reduction of the number of clusters from seven to five. One cluster (cluster C1 in Figure 1) was, in fact, not considered relevant for this study, as it mainly focused on laboratory safety in the case of exposure to toxic substances, whereas two clusters (clusters C2 and C5 in Figure 1) were merged due to the vicinity of their topics. Thus, the different steps involved in this literature analysis process are reported in Figure 2. Finally, the research streams corresponding to each cluster are the following: 1. risk assessment methods for Seveso sites (deriving from clusters C2 and C5 in Figure 1); 2. bow-tie diagrams and safety barriers (deriving from cluster C3 in Figure 1); 3. process safety management (deriving from cluster C4 in Figure 1); 4. data-based risk assessment (deriving from cluster C6 in Figure 1); 5. health and environmental analysis (deriving from cluster C7 in Figure 1).
In the following, we will describe separately each of these research streams, discussing the related risk assessment methodologies proposed and studied by researchers and practitioners, their evolutions and modifications over the years, and their potential future developments.

CLUSTERS ANALYSIS AND DISCUSSION
As said, the final set of documents achieved by the systematic literature review was grouped in different topic clusters aiming at providing comprehensive coverage of the specific subjects analyzed. In the following subsections, we will focus on the content analysis of the most relevant selected papers drawing a clear picture of the research domain with the purpose of providing an integrated and synthesized overview of the current state of knowledge. Nevertheless, a summary of the collected studies proposed in this literature review is given in Appendix A. Hence, the main research findings in the field of risk assessment and analysis were discussed for each cluster highlighting the different methodologies and approaches adopted as well as the recent trends and future perspectives. Moreover, the word clouds for each cluster have been developed to depict immediate insights into the most prominent themes on which the selected papers are focused.
This word-frequency analysis allows displaying the diverse research concerns characterizing the topic clusters because the size of each word in the cloud is in proportion to its probability within the topic. Finally, the distribution of the collected papers has been provided according to (i) a temporal axis, (ii) citations and journal perspective, and (iii) type of articles published (original research, case study, or review article). Thus, Figure 3 exhibits the temporal distribution of published papers from 1984 to 2020 (up to May 2020), the percentage over time, and the cumulative trend aiming at providing an index of the popularity and the relevance of the risk assessment in the Seveso sites. In this figure, two different trends can be distinguished. It can be observed that the trend of publications gradually increases during the years from 1984 On the other hand, the collected documents are distributed over a total of 52 different journals, 4 books, and 12 con-ference proceedings. Table 3 summarizes the top 15 sources reported by both number of documents and percentage related to the total of those collected. It is worth mentioning that the risk assessment topic in Seveso sites seems to be discussed mainly in such high-quality peer-reviewed journals than in conference proceedings. As can be seen, Journal of Loss Prevention in the Process Industries and Journal of Hazardous Materials are the leading journals in this field with a total of 60 documents and 2476 citations resulting in a contribution equal to 30 % of the collected dataset and approximately 58% of the total number of citations, respectively. In Table 4, the general structure of the citations for the entire dataset and each specific cluster has been reported. This structure shows that fewer than 4 % of the collected documents have at least or more than 100 citations, whereas most of them are categorized within the class characterized by fewer than 20%. Concerning the different clusters, cluster C3, that is, bow-tie diagrams and safety barriers, and cluster C6, that is, data-based risk assessment, are the ones that denote the higher percentage value of documents that have received at least or more than 100 citations, whereas cluster C7, that is, health and environmental analysis, only covers the classes referring to the documents that have received less than 50 citations. Finally, Tables 5 and 6 report a classification of the basic structure and the individual characteristic of the collected documents by journal classification and defined clusters, respectively. Thus, the number of the documents published in each journal and cluster has been reported, while four different categories have been defined according to their type of content: framework, mathematical model, review, or descriptive analysis. In these tables, books have not been included, while concerning the category referred to descriptive analysis, we considered all the documents that cannot be included in the other categories but they deal with surveys, managerial and organizational aspects, qualitative analysis, human factors in risk assessment, comparative evaluation of regulations, and so on. It is worth mentioning that a large portion of the studies examined more contents, for example, both mathematical models and case studies. Table 5 shows that the Journal of Loss Prevention in the Process Industries and the Journal of Hazardous Materials are the most effective journals in the field were examined. It can be observed that, among the selected categories, mathematical models and case studies are the most dealt with. Indeed, most international regulations require the adoption of QRA techniques aiming at supporting the decision-making process for the industrial sites potentially involved with catastrophic failure consequences. Therefore, over the years, various methodologies based on mathematical and computational models have been developed and applied by researchers and practitioners resulting in a key role in the field of risk analysis for identifying and quantifying potential accident probabilities and consequences.
Same considerations could be made by analyzing the results reported in Table 6, where the characteristics of the collected documents are categorized within the defined clusters. It is worth mentioning that mathematical models is the most frequently discussed topic for four clusters over five (bow-tie diagrams and safety barriers, process safety management, data-based risk assessment, and health and environmental analysis), while for risk assessment methods for Seveso sites cluster, most of the documents are focused on case study. However, one significant achievement is the limited number of literature reviews; especially, concerning the bow-tie diagrams and safety barriers, no specific document is found out.

Risk assessment methods for Seveso sites
The cluster "Risk assessment methods for Seveso sites" consists of 68 articles, namely, 67 journal papers and 1 conference proceedings. By analyzing the word cloud reported in Figure 4, the cluster has a focus on risk assessment methodologies for Seveso industrial sites and specifically focuses on major accident scenarios. However, different subdomain topics have been identified and analyzed in the following. Indeed, research streams based on QRA techniques, land-use planning (LUP) policy, and domino effects provide a high contribution emerging in terms of popularity and importance within this cluster.
This cluster represents the core of the current study, and as described in the introduction and confirmed by the high number of review studies herein included Li et al., 2017;Pasman et al., 2017;Tixier et al., 2002), it has also represented a major area of interest for other researchers and practitioners.
At first, to a large extent, the ideas and principles of risk assessment and management methodologies have been rooted in the nuclear sector (Pasman & Reniers, 2014) where quantified risk analysis has been conceptualized. However, over time, they emerged as a suitable solution for managing and controlling major hazards; thus, they have developed increasingly effective practices, methods, and techniques covering most of the industrial, economic, and societal fields. This is highlighted by Campos , who compared how risk is managed across the chemical and the nuclear industrial sectors in Italy, and recalled by Pasman Health and environmental analysis 24 7 7 6 5 4 a The number of documents reported in this column might be lower than the sum of the other columns as some works deal with more than one content.

F I G U R E 4
Word cloud of "Risk assessment methods for Seveso sites" cluster and Reniers (2014) and Taveau (2010), who described how the lessons learned from the nuclear industry are nowadays employed within the chemical industry. Most of these lessons learned have been transferred to the Seveso regulations and policies, which were discussed by several authors from the points of view of application, national specificities, gaps, improvement, and validation by the regulator (Bottelberghs, 2000;Hawksley, 1992;Lindhout & Reniers, 2017;Naime & Andrey, 2013;Pasman et al., 2009;Siddiqui et al., 2012). For example, Fabbri and Contini (2009) analyzed the inspection criteria and practices adopted by the different national authorities for the implementation of the Seveso Directive. Over the years, researchers and practitioners have widely focused on QRA, either providing frameworks that assist the risk assessment process or developing new/adapting existing methodologies. Dealing with the former, Filippin and Dreher (2004) developed a framework that supports the integration of risk assessment methodologies with normal business management processes. Arunraj and Maiti (2009) and Basheer et al. (2019) then provided a template and a framework for QRA of Seveso sites by estimating and aggregating the major losses. Another consistent framework is illustrated in , where both processes risks from hazards and operators risks are taken into consideration aimed at optimizing the risk management. These works reported as key aspects of a correct risk assessment those that define the accident scenarios affecting the risk assessment and that adopt proper mitigation measures. Specifically, once the plausible threats and their impacts/effects are identified, the system's weaknesses need to be identified to identify the proper mitigation measures. Einarsson and Rausand (1998) proposed a framework for vulnerability analysis of complex industrial systems. It is worth mentioning that the vulnerability of industrial sites is highly influenced by the occurrence of extreme weather events (Cruz & Krausmann, 2013;Nivolianitou et al., 2004). Dealing with the development of new risk assessment methodologies and/or the adaptation of existing ones, Felegeanu et al. (2016) combined the strength of already existing methods (Mosar, ARAMIS, Checklist, Octave, and Mehari) to develop a new risk assessment methodology called combined analysis risk method and industries security/dangerous substances (CARMIS/DS) that aims to determine, both quantitatively and qualitatively, the risk or safety level for the installations/technologies used in the manufacturing process. Planas et al. (2006) then presented a methodology to develop a specific risk severity index. Furthermore,  proposed a quantitative methodology called ExSys-LOPA for the risk assessment of a conversion unit within a petroleum refinery that was developed through the combination of experts' knowledge in accident scenario identification and layer of protection analysis (LOPA). Qureshi (1988) then adopted the HAZOP methodology, similar to Bernatik and Libisova (2004) and Fuentes-Bargues et al. (2017), who presented the integration of HAZOP and fault tree analysis for the unloading/storage of dangerous substances in maritime port facilities and for gasholders in industrial facilities, respectively. Antonioni et al. (2009) then proposed a QRA methodology focusing on the domino effect aimed at avoiding escalation hazards in the chemical and process industry. Always dealing with QRA, Bonvicini et al. (2012) developed a methodology based on the cutoff criteria to support QRA to account for the offsite population by evaluating the probability of its presence in potential impact areas when a major accident occurs. D'alessandro et al. (2016) then presented a QRA methodology consisting of hazardous zones definition, loss-of-containment evaluation, and vulnerability models for an LNG facility, whereas Ηatzisymeon et al. (2019) applied QRA based on operational risk management (ORM) to the life cycle of the biodiesel production from used cooking oil by investigating its impact on the entire supply chain. Furthermore, Landucci et al. (2017) studied QRA in the field of hazardous materials transportation, focusing on the analysis of current procedures and tools as well as on the different methodologies available. Krausmann et al. (2011) andGheorghiu et al. (2014) studied QRA for the assessment or the risks related to another type of threat for Seveso sites, that is, threats deriving from natural agents (a.k.a. Natech events); the former provided a nine steps-based innovative approach framed into QRA, whereas the latter compared the Individual Risk and Societal Risk achieved between conventional technological risk and Natech risk, focusing on specific events such as an earthquake for two petroleum product tanks. Moreno et al. (2019) then applied a QRA based on a consequence-based approach for a biomass power plant, focusing on the Biofine process to identify relevant accident scenarios (RAS). Furthermore, Cozzani and Zanelli (2015) discussed the available tools for QRA and the related open problems, focusing on the quantitative area risk analysis (QARA) techniques for LUP, which has represented a theme of high research interest in the last few years. LUP allows the interaction between sources of risk and vulnerabilities in surrounding areas to be addressed. Christou et al. (2011), in fact, discussing the activities of the European Working Group operating under the coordination of the European Commission's Joint Research Centre, emphasized the importance of LUP to guarantee a high level of security and well-being to the population. Moreover, the authors provided an overview of methodologies, tools, and approaches as a guidance to support EU Member States. In fact, as already mentioned above, a significant amount of research works on LUP have been undertaken over the last decades due to its strategic importance in accident prevention. Some frameworks have also been developed in this perspective (Hauptmanns, 2005), but most of the works deal with the development of new methodologies. Ale et al. (1999), for example, presented a methodology based on performance criteria to evaluate the factors influencing the societal risks when a new LUP is implemented, whereas Laheij et al. (2000) dealt with the societal risks by adopting a distance density figures (DDFs) approach. The methodology proposed by Ma et al. (2015) was based on the ARAMIS project and consisted of the simultaneous analysis of the severity of the major accidents and the vulnerability of the surrounding environment; hence, both severity and vulnerability indexes were used in the LUP practice. Török et al. (2011) focused the analysis instead on the safety distance in LPU when fires and flammable hazardous materials are involved, whereas Papazoglou et al. (2000) presented a multicriteria methodology for supporting the decision-making process in LUP. The geographic information system (GIS) tool, in particular, has been widely used in the LUP. For example, a GIS-based maps threat analysis was used by  to integrate the consequence analysis, the results of QRA, and the environmental vulnerability analysis, whereas Török et al. (2020) combined the GIS tool with consequence and risk modeling software to include territorial compatibility assessments in the analysis of accident scenarios. Moreover, another topic that has widely been investigated when dealing with LUP is the domino effect. Khakzad and Reniers (2015), for example, aiming to optimize the plant layout around a major hazard installation, considered the impact of the domino effects on the LUP requirements through a Bayesian network methodology. Seveso II [96/82/EC] and III [2003/105/EC] Directives, in fact, place particular emphasis on domino effects by pointing out the need to identify which establishments are potentially affected by scenarios where the consequences of a major accident may be increased because of it. However, addressing domino effects constitutes a significant challenge to risk management due to the complexity of evolution prediction and modeling the accident scenario (Reniers & Cozzani, 2013). A number of studies addressing this subject are available in the literature. Reniers et al. (2005b) and Antonioni et al. (2009) analyzed the domino effect framed into QRA. The former proposed an approach based on damage probability estimation, vulnerability assessment, and risk recomposition, whereas the latter provided an outlook on the current practices focusing on the chemical industry. Cozzani et al. (2006) then investigated the equipment damage models to identify the proper criteria characterizing the escalation of accident scenarios. Moreover, different frameworks have been developed to properly deal with domino effects. Reniers et al. (2005a), for example, proposed an external domino accident prevention framework by combining risk analysis with risk evaluation, whereas Ghasemi and Nourai (2017) referred to the spacing optimization in LUP analysis by focusing on thermal radiation accidents in the presence of storage tanks. Furthermore, Jia et al. (2017) developed a fivelevel hierarchical framework in which the domino effect is analyzed by considering the equipment vulnerability assessment approach. Further works on domino effects can be found in the comprehensive review presented by Necci et al. (2015).

Bow-tie diagrams and safety barriers
The "Bow-tie diagrams and safety barriers" cluster consists of 26 articles, of which 15 are journal papers and 11 are conference papers. From the analysis of the word cloud reported in Figure 5, it can be seen that these documents deal with the risk analysis (which is a part of the risk assessment together with risk identification). Particularly, within the risk analysis, bow-tie diagrams and safety barriers emerge to be very important; thus, this cluster reports their implementation focusing on the strength and the limitation within the risk assessment. The high importance of bow-tie diagrams and safety barriers within the risk analysis is due to the fact that bow-tie diagrams and safety barriers represent the two main features of the ARAMIS methodology. The ARAMIS methodology was developed from 2002 to 2005 to answer the growing F I G U R E 5 Word cloud of "Bow-tie diagrams and safety barriers" cluster interest of industries for a clear methodology for risk analysis, and since then, it has become the "supportive tool to speed up the harmonized implementation of SEVESO II Directive in Europe" (Salvi & Debray, 2006). The ARAMIS methodology consists of six main steps (identification of major accident hazards, identification of the safety barriers, assessment of their performances, evaluation of safety management efficiency to barrier reliability, identification of reference accident scenarios, assessment and mapping of the risk severity of reference scenarios and of the vulnerability of the plant surroundings), and bow-tie diagrams and safety barriers are part of the first, second, and third steps, respectively. Bow-tie diagrams are used in the identification of major accident hazards, and they are centered on a critical event with the causes leading to the critical event on the left (i.e., fault tree) and the consequences of the critical event on the right (i.e., event tree) (De Dianous & Fiévez, 2006). Moreover, bowtie diagrams also capture the influence of safety systems. Gowland (2006) reported that the identification of major accident hazards can also be carried out in combination with other methods, such as HAZOP study, "What if" and FMEA. Particularly, Afefy (2015) was the first to integrate FMEA with bow tie and to apply it in the process industry (Emisal company in Fayoum city), showing the great potential of this approach. Similarly, Delvosalle et al. (2006) and Tugnoli et al. (2013) applied bow-tie diagrams to two complementary approaches for the identification of a reference accident scenario, that is, the methodology for the identification of reference accident scenarios (MIRASs) and the methodology for the identification of major accident hazards (MIMAH). It is worth mentioning that Wilday et al. (2011) presented an innovative methodology named dynamic procedure for atypical scenarios identification (DyPASI) as a complementary part of the MIMAH bow-tie methodology. Furthermore, a bow-tie approach based on adopting a fault tree and event tree for each critical event is also used in the methodology developed by Zhang et al. (2017) for accident scenario identification, that is, worst maximum credible accident scenarios (WMCAS).
In addition, bow ties can also be used for safety management and organizational learning. In fact, due to the relative simplicity of their graphical representation particularly compared to other risk-analysis tools such as fault trees, event trees, and Bayesian networks (Duijm, 2009), bow-tie diagrams are also easily understandable by nonexperts, and this can thus enhance the safety management and contribute to the organizational learning for safety (Chevreau et al., 2006;Duijm, 2009). However, although their graphic representation represents the main advantage of bow-tie diagrams, it also represents their main drawback because it does not allow the consideration of the dynamic aspect of real systems. Badreddine and Amor (2013) overcame this drawback by proposing a new approach where bow ties are mapped on to a Bayesian network and are learned from real data; in such a way, classical bow ties are enriched with a probabilistic numerical component that can be used for dynamic implementation of preventive and protective barriers. The adoption of a bowtie model mapped in a Bayesian network is also presented by van Staalduinen et al. (2017), where a quantitative security risk analysis (QSRA) is performed to protect the critical process infrastructure. A similar limitation can also be found for safety barriers. During the design phase, all barriers are assumed to be fully functioning, and the assessment of their performances in the third step of the ARAMIS methodology (which can also be carried out using the LOPA according to Gowland, 2006) is carried out considering this status. This, of course, leads to the determination of a certain risk. However, barriers degrade in service and so do their performances, thus F I G U R E 6 Word cloud of "Process safety management" cluster increasing the risk. Therefore, to keep the risk level as low as possible, a barrier management that considers the dynamic nature of safety barriers needs to be considered. An example can be found in Pitblado et al. (2016a).

Process safety management
The "Process safety management" cluster consists of 51 articles, of which 32 are journal papers, 17 are conference papers, and 2 are book chapters. From the analysis of the word cloud reported in Figure 6, it can be seen that these documents deal with the risk assessment and safety management system (SMS) with a special focus on their interconnections and integrations. In addition, the emerging trend dealing with the development of new advanced dynamic methodologies able to handle significant issues such as the aging of equipment and the optimization of the emergency planning has been identified and analyzed in this cluster. Although new applications of the risk assessment methodologies have recently been shown, such as that reported by Bernechea and Arnaldos (2014) where QRA has been combined with inherently safer design (ISD) to optimize the design of storage facilities, their main use is related to the development of a safety management strategy (SMS). Specifically, Demichela and coauthors stated that "a correct and careful risk analysis is necessary to design and implement a SMS able to pursue its objectives" (Demichela et al., 2004). As seen before, different types of risk assessment methodologies have been developed over the years, both quantitative and qualitative (the latter mainly for unquantifiable known hazards and unknown hazards) Lindhout, 2019). The most common risk assessment methodologies have been briefly reviewed by Maroño et al. (2006), who highlighted their pros and cons. In their work, Maroño et al. (2006) pointed out the criticalities of some methodologies with respect to the accuracy of the results of their risk analysis phase, also taking into account the uncertainties of the input data such as frequency and failure rates (in this perspective, Jain et al., 2018, proposed the use of Bayesian analysis) because this would have affected the SMSs. To overcome the criticalities that they highlighted, they proposed a new methodology called "PROCESO" that "attempts to contribute towards the development of a more comprehensive safety assessment method" by combining the advantages of the reviewed methodologies.
In turn, SMS also affects the results of the risk analysis. In fact, it has been reported that management factors represent a frequent underlying cause of many accidents that have occurred in the past, especially the release of hazardous substances in chemical installations (Hurst et al., 1991;Paté-Cornell & Bea, 1992). It is thus clear that different SMSs lead to different risks. Therefore, in the last few years, many researchers have focused on integrating SMSs with risk analysis approaches, particularly QRA. The first attempts consisted mainly of judgmental modifications of the frequencies of releases according to the results of audits of the SMS Pitblado et al., 1990). The main drawback of these methodologies was the fact that the modification of the frequency was subjective. Papazoglou and coauthors tried to overcome this limitation by developing a methodology (called I-Risk) for the estimation of the frequencies of release of hazardous substances . The methodology was then further developed by Demichela and Piccinini (2006). Specifically, they modified the last two steps of the I-Risk methodology, that is, the modeling of the SMS and the modification of the frequency of loss of containment according to the SMS. However, these methods did not receive great acceptance in practice mainly due to the fact that the assessment of the management influence on risk was done in the form of an audit, which is complex in its application. Acikalin (2009b) tried to overcome this limitation by developing a new methodology that introduces a scoring system and a management factor. From what was just stated, it emerges that the adoption of a proper SMS is a key aspect because it affects the results of the risk assessment. Several SMSs have been developed over the years, as shown by the existence of several reviews (Kirchsteiger, 2002Li & Guldenmund, 2018). Specifically, Li and Guldenmund (2018) reviewed more than 40 SMSs, reported that 86% of them use an audit as an assessment tool for the safety management, and reported Hale's SMS as the most complete because it contains both risk control elements and learning elements (Hale, 2005).
Based on the above-described importance of the SMSs, it is thus clear that researchers are continuously focusing on their improvements. Specifically, two main research streams have emerged, that is, the necessity (i) to consider the aging of equipment (Bragatto & Milazzo, 2016) and (ii) to optimize the emergency planning (Fabiano et al., 2016). Dealing with the former, aging of plants and equipment due to corrosion and other phenomena is a serious issue, and therefore, the SMS should ensure that each critical equipment is subjected to a schedule of checks adequately planned to guarantee the attainment of safety requirements . In this perspective, Bragatto and Milazzo (2019) suggested using a system dynamic concept to improve safety management, where the general focus of safety studies needs to shift from the analyses of previous failures to the prevention of critical events considering accidents as changes in performance, whereas Bevilacqua et al. (2020) developed a Digital Twin reference model for risk prediction and prevention by enabling predictive maintenance applications. Similarly, Paltrinieri et al. (2019) proposed a machine learning algorithm based on deep neural networks (DNN) for risk prediction in response to a system's operating condition using data provided by the World Offshore Accident Database (WOAD). Dealing with the optimization of emergency planning, instead, Antonionia and Moreno (2019) suggested using a continuous improvement approach to optimize the emergency planning, and they developed a procedure for this. Fundamental for a successful application of a continuous improvement approach is to learn from previous, whereas (both previous incidents and "near misses"), and hence, effective communication is needed Ramsay, 1999).

Data-based risk assessment
The clustering algorithm has identified a group of 31 articles in total for the "Data-based risk assessment" cluster, consisting of 23 journal papers, 7 conference papers, and 1 book contribution. Figure 7 shows the word cloud resulting from the clustering process, from which it can be seen that these contributions are mainly focused on data-based methods for risk analysis. This cluster is organized as follows: first, an overview of data-based risk assessment methods and their evolution over time is provided aiming at depicting their importance for the identification and prioritization of proper safety improvements. Second, the most suitable methodologies adopted by researchers and practitioners have been reported, while finally, the documents dealing with shared datasets, case studies, and specific data-oriented tools have been analyzed. Due to the great consequences that accident events have on safety and operations performance, it is crucial to properly estimate their occurrence with accurate quantitative risk analysis methods. In the past 10 years, there has been an evolution in the risk analysis methodologies, moving from more traditional probabilistic ones like fault tree, event tree, and bow tie to more advanced dynamic ones, such as Bayesian approaches, because the traditional ones have limited capabilities to handle evolving conditions and data unavailability (a comprehensive state-of-the-art analysis of these methods for chemical industrial applications can be found in Roy et al., 2014). Zavadskas and Vaidogas (2008) demonstrated that the uncertainty in failure probabilities due to evolving conditions and unavailable data can be reduced by applying a Bayesian updating procedure when new data on equipment failures are obtained. A relevant example of a dynamic approach has been developed by Khakzad et al. (2012), where the data related to failure probability of the primary events are estimated and updated constantly when physical parameters of the system vary. Moreover, this is integrated with a Bayesian approach for the dynamic estimation of the failure probability of the safety barriers. In a following research, the same authors applied bow-tie and Bayesian network methods in the quantitative risk analysis of blowouts in drilling operations . First, they built the bow-tie model combining a fault tree and an event tree for potential accident scenarios. Then, individual Bayesian networks and an object-oriented Bayesian network were developed, considering common cause failures and conditional dependencies along with performing probability updating and sequential learning using accident precursors. Similarly, Babaleye et al. (2019) and Chang et al. (2019) applied the Bayesian network for a dynamic safety analysis of the plugging and abandonment of oil and gas wells and the deep-water drilling riser, respectively. Furthermore, Bayesian networks have also been used for evaluation of the domino effect, where Khakzad et al. (2018) and Zeng et al. (2020) implemented a dynamic model of wildfire spread by using the Bayesian network. Noret et al. (2012), instead, presented a dynamic risk assessment model based on uncertainty propagation methods to be applied when accident explosions occur. Berdouzi et al. (2018) then used dynamic simulation to answer this need for dynamicity, and they combined it with HAZOP analysis and decision matrix risk assessment to assess the risk of an exothermic reaction in a semibatch reactor, whereas Paltrinieri and Reniers (2017) introduced three complementary methods to address dynamic risk assessment of high impact low probability (HILP) events F I G U R E 7 Word cloud of "Data-based risk assessment" cluster on different levels, that is, dynamic hazard identification, dynamic analysis of initiating events, and dynamic analysis of consequences.
The Bayesian network and its simulation can also be used to capture very well the relationships between causes and effects: Unnikrishnan et al. (2015), in fact, reported that the ability to simulate the network in the forward direction in predictive mode (causes to effects) as well as in diagnostics mode (effects to causes) is the most important advantage of the Bayesian network. Most of the time, they are integrated with a bow-tie model to identify the risk factors of the failure event and the potential consequences. An example can be found in Wu et al. (2019), who modeled H 2 S leakage in sour gas fields during pressure drilling phases with a bowtie approach for the accident cause-consequence analysis and integrated dynamic characteristics with probability estimation thanks to dynamic Bayesian networks. A similar work can be found in Zhang et al. (2018).
In the absence or with the scarcity of data about major accidents, several approaches have been developed based on data related to near accidents (accident precursor data). Khakzad et al. (2014) proposed a methodology based on hierarchical Bayesian analysis based on accident precursor data for the risk analysis of major accidents. A multinomial likelihood function has been applied to model the dependency and interaction between data related to accidents and near accidents. The methodology has been applied to drilling operations. A similar approach has been developed for the Bhopal disaster and accident releases of hazardous chemicals from process facilities . They proposed to use a precursor-based Bayesian network approach for probability estimation and loss functions for consequence assessment. The dynamicity of the problem is taken into consideration by updating the risk profile with real-time operational data.  then proposed a holistic precursor-based risk assessment framework for rare events, implementing a hierarchical Bayesian approach (HBA) for the probability estimation, whereas accident precursor data have been utilized to find the most informative precursor upon which the consequence of a rare event is estimated. Also, in this case, the risk profile is updated as soon as new information is available.
In addition to the new methodologies, this cluster included a group of papers related to the dataset, case studies available, and tools developed to support the analysis of these datasets. In Ditali et al. (2000), the authors developed a software tool called Atlantide for the consequence analysis in liquefied petroleum gas (LPG) installation. The software is based on models and equations developed in the literature that take into account many physical parameters and external factors. Jacobsson et al. (2010) then presented the major accident reporting system (MARS), which is a system that was established and maintained by the European Commission in order to collect information related to major industrial accidents in the EU Member States in the context of the Seveso II Directive. The main goal was to collect information for a better understanding of the accidents through the determination of the causes, particularly the underlying causes. An interesting contribution to case studies is Pasman (2011), where a historical overview of the Dutch process equipment failures is given, analyzed by explaining the policy backgrounds and comparing with other data.  then analyzed different datasets available, and they concluded that the UK HSE Hydrocarbon Release Database (HCRD) provides the basis for the best leak frequency data. Furthermore, Moura et al. (2017) developed a framework to verify if tendencies and patterns observed in major accidents were appropriately contemplated by risk studies. They also developed an attribute list to validate risk assessment studies and to ensure that the influence of human factors, technological issues, and organizational aspects was properly taken into account.

Health and environmental analysis
The "Health and environmental analysis" is the last cluster identified by the algorithm, and it contains in total 24 documents, of which 23 are journal papers and 1 is a book contribution. Figure 8 depicts the word cloud resulting from the clustering process, where it is clear that this cluster contains contributions that are mainly focused on the analysis of the environmental and health impact of accidents. Hence, this cluster first reviewed documents that separately are focused on one of these two critical aspects, and then the integrated approaches including both environmental and health risk assessments have been presented. Environmental and health risk assessment is a complex and interdisciplinary research area that is used to estimate potential risks to protect ecosystems and human health from accidents. In the last decades, many articles have been developed in this area, with most of them treating separately environmental risk assessment (ERA) and health risk assessment; few of them present integrated approaches.
The first methods have been developed decades ago. Concerning the ERA methodologies, an example is the work presented by Stefanis and Pistikopoulos (1997) where a quantitative methodology has been developed linking process reliability considerations to environmental impact analysis within a process optimization framework. Recently, driven by the implementation of the Seveso III Directive,  collected and analyzed experiences, knowledge, as well as new approaches for the prevention of major accidents with impacts on the environment. After presenting a statistical analysis of environmental accidents in Czech Republic and Italy, a methodological approach has been illustrated where selected methodologies are used for ERA. Moreover, researchers and practitioners have not only focused on the plant but have also considered other sectors that represent a hazard to the environment, such as the transportation of hazardous materials such as oil and oil products by pipeline. In fact, although extremely rare, these can lead to loss of containment events. An ERA model has been developed by Bonvicini et al. (2015) for spills from pipelines, estimating the risk of contamination both physically and economically. Moreover, natural disasters can also be critical causes of environmental accidents, and therefore, they have recently received much attention. In Han et al. (2019), for example, an ERA based on an analytic hierarchy process and fuzzy evaluation has been implemented for an industrial site considering typhoon disasters. As reported many years ago by , accidents in the chemical industry do not only have serious effects on the environment but also the well-being of many persons. Moreover, the authors also highlighted that human factors may influence process safety, and they developed a preliminary technique to include them in the risk assessment. Similarly, Schlechter (1996) attributed a relevant role to the management of change to ensure the safety of the chemical industry, and he proposed to add the Major Hazardous Installation Regulation to the Occupational Safety and Health Administration (Act 85 of 1993). Roy et al. (2003) then applied QRA based on the fault tree analysis technique and consequences analysis to the storage and purification sections within a process plant involving hazardous substances aiming to assess the risk to the surrounding population, whereas O' Mahony et al. (2008) reported a method to evaluate the emergency plan in case of the spreading of a toxic cloud considering a safety report's worst-case scenario in terms of potential loss of life or serious health effects. Similar works were also carried out by Heinälä et al. (2013) and Gnoni and Bragatto (2013). Another interesting study on accidents and health hazards has been carried out by Pałaszewska-Tkacz et al. (2017), who analyzed data concerning chemical incidents in Poland collected in 1999Poland collected in -2009. Due to the high number of cases and fatalities, they concluded the need for a systematic analysis of hazards and their proper identification, such as a health risk assessment, both qualitative and quantitative. Hoek et al. (2018) studied the risks related to living close to industrially contaminated sites, and thanks to a multimedia framework, they discussed models applied in numerous sites in Europe and identified 10 methodologies helpful for the health risk assessment (i.e., CSOIL, CLEA, Atlantic RBCA, HOUGH Model, RISKNET, S-Risk, POPs Toolkit, RAIS, MERLIM, and INTEGRA). At the end, they recommended refining the exposure assessment in epidemiological studies by including the use of more sophisticated exposure metrics instead of simple proximity indicators. They also highlighted that more studies are needed to validate the models. Furthermore, legislation plays an important role in the protection of communities near major-hazard installations. Niemand et al. (2016) analyzed and compared the legislation in South Africa to literature and the legislation of other countries, suggesting the inclusion of vulnerability studies and the refinement of appropriate decision-making instruments such as risk assessment.
Integrated approaches have been recently developed to include both environmental and health risk assessments. Jiang et al. (2012) introduced a framework able to define a warning area and the impact on the functional area, societal impact, and human health and ecology system. Then, the framework has been implemented in a decision tool (software on GIS platform) of real-time risk assessment on the emergency environmental decision support system for response to chemical spills in a river basin. Other models, such as ecological models, allow ecological risks of pollutants to ecosystems, communities, and populations to be assessed. Tavakoly , for example, gave a guideline on short-term ecological risk assessment schemes involving dioxin chemicals and their effects on ecosystems. Frattini and Manning (2015) then defined an integrated environment and health risk assessment methodology (REHRA) based on the dramatic cyanide spill of Baia Mare (Romania). The methodology aims to help decision makers and authorities implement an emergency plan in order to reduce the integrated risk.

DISCUSSION
Despite the fact that the main interest of researchers and practitioners in the field of industrial risk assessment for hazardous substances can be classified into five different clusters, the groups demonstrated close relationships and relative overlapping among each other. Such vicinity of the cluster themes is represented by Figure 9, where the clusters are depicted by irregular shapes intruding into each other. The "risk assessment methods for Seveso sites" cluster represents the core of the literature analysis and the main Risk assessment methods for Seveso sites F I G U R E 9 Schematic representation of the five different clusters and their relationship to each other research stream, whereas the other four clusters can be considered "satellite" clusters, that is, clusters that descend from the "risk assessment methods for Seveso sites" cluster but that have become stand-alone research streams due to the high interest they have attracted. For this reason, Figure 9 is represented as a spiral revolving around the mentioned core cluster. While "process safety management" and "health and environmental analysis" focus on topics that are historically closer and complementary to the core cluster (for this reason, they are represented next to it), the remaining clusters ("bow-tie diagrams and safety barriers" and "data-based risk assessment") span over areas that are relatively farther from the center and represents an evolution toward topics that progressively extend the edge of industrial risk assessment for hazardous substances. Therefore, Figure 9 provides an overview of these methodologies and research streams evolution and may serve as a practical basis for beginners who will start addressing such a multifaceted topic.
Among the different papers included in the core cluster, there is also one of the most important reviews on the topic of risk assessment . It is interesting to note that the main findings of this work by Tixier et al. (2002) still hold true: the risk assessment methodologies can still be described as deterministic or probabilistic and as qualitative or quantitative methodologies, and the risk assessment methodologies include at least one of the three main phases identified by Tixier et al. (2002) (i.e., identification phase, evaluation phase, and hierarchization phase). Moreover, the relevance of the work by Tixier et al. (2002) is even more evident if we consider the fact that the developments in the field because their work has corresponded to the crucial aspects and shortcomings that they had depicted. Tixier et al. (2002), for example, stressed the importance of the identification phase, stating that it "is essential because it establishes the bases of the risk analysis." Since then, new methodologies have been developed to fill this gap and to properly identify the different factors that might affect the risk analysis, for example, DyPASI, WMCAS, and MIMAH (Delvosalle et al., 2006;Tugnoli et al., 2013;Wilday et al., 2011;Zhang et al., 2017). Interestingly, in the risk assessment methodologies developed after the work by Tixier et al. (2002), the identification of "environment" (meant as site environment, topographical data, and population density), as well as its evaluation, has become increasingly relevant, whereas this was depicted as a lacking aspect by Tixier et al. (2002) ("few methods take into account the environment").
Notwithstanding in Section 3.1, the importance of a control strategy development for major accident hazards involving toxic, explosive, and flammable substances has been underlined, considerations of health and environmental aspects should be made. Indeed, in the core cluster, "environment" and "health" or "human" represent inputs of the risk assessment adopted for the examination of accident scenarios on the basis of local constraints such as, for instance, the environment composition in the proximity of the hazardous installations, the risk evaluation for workers or exposed population, or the socioeconomical context assessment. However, the concepts of "environment" and "health" also represent an output of the risk assessment. This is emphasized within "health and environmental analysis" cluster where they are defined as specific risk indexes referred to the safety of ecosystems and of human beings. Despite a different perspective, the proposed analysis showed an existing close correlation between these two clusters. Indeed, it is worth mentioning that currently, both industry and public authorities recognize the need to reduce potential catastrophic consequences posed by major accidents in terms of health and environmental outcomes becoming a critical priority. In this scenario, researchers and practitioners have thus focused on developing specific risk assessment methodologies that concur to protect ecosystems and human health from accidents, leading to the formation of ERA methodologies and health risk assessment methodologies, respectively (see Section 3.5).
Moreover, the analysis of the core cluster also evidenced its close correlation with the "process safety management" cluster. As a matter of fact, the Seveso Directive requires putting into effect an SMS for establishments handling hazardous substances in the EU. As said, SMS consists of the systematic and proactive approach to safety risk management including organizational structures, policies, and procedures. Thus, it attends the implementation of methods and practices for hazard identification that may affect both safety and mitigation measures on the basis of occurrence and consequences magnitude of accident events. In particular, from the analysis of the cluster, it has emerged that researchers and practitioners have mainly focused on the development of new SMSs (more than 40 SMSs have been identified by Li & Guldenmund, 2018, in their review) and on their interconnections and inte-grations with risk assessment methodologies. Of particular interest is the latter aspect. Indeed, risk assessment and SMSs influence each other: on the one hand, a correct and precise risk assessment is fundamental to designing and implementing the correct SMSs, whereas, on the other hand, SMSs have been reported to represent a frequent cause of accidents. Hence, this explains the interest that researchers and practitioners have attributed to the integration of SMSs with risk analysis approaches (an example of this is the development of the I-Risk methodology; Papazoglou et al., 2002Papazoglou et al., , 2003 considering them as complementary aspects for a proper safety decision-making process. As already mentioned above, some topics of the main cluster have been so widely investigated to constitute stand-alone research streams. For example, we have already shown that the works developed after that of Tixier et al. (2002) have highly focused on the identification phase (i.e., risk analysis), but in a specific case, the interest has been so strong that a separate cluster could be identified, that is, the "Bowtie diagrams and safety barriers" cluster. This cluster groups the risk analysis methodologies using bow-tie diagrams as input data Tixier et al., 2002, classified them as "Plans or diagrams" input data) to identify the different factors affecting the risk analysis. In fact, bow-tie diagrams allow the major accident hazards to be identified. Moreover, they also allow the influence of managerial actions like the introduction of safety systems to be captured. Specifically, in this cluster, the SMSs here considered are the safety barriers, indicating that the combination of bow-tie diagrams and safety barriers has represented one of the main interests of researchers and practitioners in the last few years. Based on the analysis of the papers of this cluster, it has emerged that researchers and practitioners have focused on investigating the possibilities of combining bow-tie diagrams with other methods (e.g., "HAZOP," "What if," "FMEA") to identify major accident hazards (Gowland, 2006), and the results have been promising (Afefy, 2015).
In some of the works contained in the just-mentioned clusters, it was possible to identify some hints of the necessity to consider the dynamic aspects of the systems where the changes in performance are taken into account. In the last few years, this has widely attracted the attention of researchers and practitioners so much to constitute a separated cluster, that is, the "data-based risk assessment" cluster. In fact, especially since the advent of Industry 4.0 technologies that allow huge amounts of real-time data to be accessed and managed, the risk analysis methodologies have experienced an evolution, moving from traditional methodologies (e.g., fault tree, event tree, etc.) to more advanced dynamic methodologies able to handle evolving conditions. In particular, from the analysis of this cluster, it has emerged that Bayesian approaches represent the most used techniques (alone or in combination with other methods, such as bow-tie approach) to consider the dynamicity of the systems, that is, to address time-dependent effects in risk assessment aiming at providing a precise estimation of emerging and increasing risks throughout the process lifetime.
Finally, it is worth mentioning that a significant aspect has emerged as a common cross-cutting thread for all the identified clusters. The analysis of the different documents collected in this literature review revealed that the risk-based decision-making process has to face situations characterized by a state of emergence and, especially, large uncertainty. If, on the one hand, this has led researchers and practitioners to develop suitable approaches, techniques, and methods for this purpose aiming at boosting risk-oriented discipline, as pointed out in this work, on the other hand, the concept of knowledge and lack of knowledge characterizations of risk estimation is still a challenging aspect. Indeed, this issue has been strongly outlined by several authors over years (Aven, , 2018aLathrop & Ezell, 2017;Rae et al., 2012;Suokas & Rouhiainen, 1989) creating a general consensus regarding the potential implications of the lack of critical attitude on assumptions, data, or model uncertainties for risk coping. This could result in imprecise risk characterization and conceptualization and a significant missassessment of the risk leading to a severe practical impact on the decisionmaking process. Therefore, the significance of knowledge that is gained through both theoretical and practical ways (for instance, through experience, informed strategy, testing, etc.) is crucial in narrowing the unavoidable uncertainties in the risk field and in how to conceptualize and govern risk. Thus, although these issues are not the main focus of this work, from the perspective of the present authors, the awareness of a proper knowledge-oriented view is a key point to meeting the current and future challenges for risk assessment and management.

CONCLUSIONS AND FUTURE RESEARCH
In the last decades, researchers and practitioners have focused increasingly on risk assessment, and this has led to the development of an extensive number of different methodologies to systematically identify, analyze, and mitigate failure risks aiming at preventing major industrial accident hazards. As things stay today, review works trying to summarize and categorize the different risk assessment methodologies available are either dated or limited in scope because they focus only on one specific topic. The current work aims to overcome these drawbacks by providing an up-to-date and comprehensive overview of all risk assessment methodologies. Moreover, not only this work provides an up-to-date and comprehensive overview of all risk assessment methodologies, but it also categorizes these methodologies into five different clusters, which correspond to the five research streams that have characterized the activities of researchers and practitioners over the years, and identifies research advancements, emerged methodologies, recent trends, and perspectives in each of these clusters. Specifically, the five research streams that have characterized the activities of researchers and practitioners over the years are "Risk Assessment methods for Seveso sites," "Bow-tie diagrams and safety barriers," "Process safety management," "Data-based risk assessment," and "Health and environmental analysis," and they have been identified by first automatically clustering through a novel and unsupervised machine learning technique the 559 articles resulting from the literature analysis and then by manually cleansing.
First, building upon this investigation, the bibliometric analysis of the collected documents has been carried out aiming at providing the overall picture of the available methodologies and research streams on risk assessment for handling hazardous substances within the EU industry. By investigating the distribution over time, as well as the publications and citation structures, it emerged an increasing trend of the published documents on the topic in the last 10 years. This is related to growing attention paid worldwide to risk and safety issues as well as to specific legislation requirements such as the Seveso Directives. Concerning the content of the collected documents, it is shown that most of them are currently focused on mathematical models and their application in real industrial cases. Hence, the continuous efforts into increasingly effective and efficient approaches and techniques are topics of fundamental interest both for researchers and practitioners, resulting also as a future direction in the field. Then, the different clusters have been examined by discussing the content of the most relevant selected papers aiming at drawing a clear picture of the research domain. The analysis of the reviewed articles underlined that the cluster "Risk Assessment methods for Seveso sites" is the most extended one, representing the research stream that catches the interest of researchers and practitioners the most. This cluster deals with the evaluation, modification, and development of risk assessment methodologies, and the analysis of the works contained in this cluster demonstrates how some critical aspects highlighted by Tixier et al. (2002) in their review are still current. Specifically, the risk identification and both steps of the identification and evaluation of such factors as site environment, topographical data, and population density still play a central role, and researchers and practitioners have developed new approaches to deal with these critical aspects, especially focusing on approaches that include risk assessments into the LUP process or GIS. Moreover, due to its relevance underlined by Seveso II [96/82/EC] and III [2003/105/EC] Directives, a relevant research challenge that emerged in this cluster is the domino effect.
Concerning the other clusters, they can be considered as subclusters or satellites of the first one. In the "Bow-tie diagrams and safety barriers" cluster, the identified recent trends are mainly focused on the integration between the SMSs, considered as the safety barriers, and bow-tie diagrams or with other methods, such as HAZOP, What-if approach, and FMEA to identify major accident hazards. In the "Process safety management" cluster, the most interesting topic concerns the combination of SMSs and risk assessment approaches. Then, another cluster is the "Health and environmental analysis," where the concept of environment is referred to as the ecosystem consisting of an output of the risk analysis. The recent progress on this topic led to the development of context-specific research streams such as ERA methodologies and health risk assessment methodologies. Finally, the last identified cluster is the "Data-based risk assessment" that includes the recent advances in risk assessment mainly related to the new digitalization processes and systems leading to dynamic approaches able to manage evolving conditions. In this regard, we consider that future research on risk assessment may be particularly focused on the subject related to this cluster.
The advances in digitalization processes and systems have, in fact, led to the development of smart factories that rely more and more on key enabling technologies to optimize the management of operations. In this scenario, the opportunity of exploiting cyber-physical systems (CPSs) and real-time data might allow the increase in the predictability of risk identification to help decision makers in adopting proper intervention measures. The applications of artificial intelligence and machine learning hold great promise for enhancing the risk analysis discipline by means of the development of data-driven approaches. These approaches allow the analysis of big datasets in real time to identify patterns or leading indicators related to normal operations and past unwanted events with the purpose of predicting future scenarios and, consequently, recommending effective actions. Another emerging technology is the Digital Twin, that is, the digital representation of a physical system. Digital Twin may be adopted to simulate asset performance in live operations, improving operational awareness through the assessment of degradation processes, control strategies, maintenance activities, isolations, or failure potential consequences on asset health. Thus, this knowledge is useful to develop prognostic models for anticipating the likelihood of hazards and implementing proactive approaches to reduce future risk.
Finally, future research may aim to address some limitations of this study. Indeed, the aspects that require more attention are related to the uncertainty encompassing all the process phases and metrics that characterized the risk assessment. As a matter of fact, the use of conventional hazard identification techniques is bound to be conditioned by subjective judgments depending on which method is used. This inherent limitation affects risk assessment resulting in inadequate hazard identification, lack of knowledge and rigor, insufficient data, complexity, completeness, reproducibility, and relevance of experience. These issues represent critical points within the risk-based discipline that result in increased awareness of the researchers and practitioners toward the study and the development of suitable methodologies, approaches, and procedures to tackle them. Nevertheless, also in this light, this work could be a suitable agenda that substantially contributes to outlining the considerable progress of risk assessment methodologies during the last decades discussing and categorizing the recent advancements within the most relevant research streams-based clusters.