One of the main advantages of the Ethereum blockchain is the possibility of developing smart contracts in a Turing complete environment. These general-purpose programs provide a higher level of security than traditional contracts and reduce other transaction costs associated with the bargaining practice. Developers use smart contracts to build their tokens and set up gambling games, crowdsales, ICO, and many others. Since the number of smart contracts inside the Ethereum blockchain is several million, it is unthinkable to check every program manually to understand its functionality. At the same time, it would be of primary importance to group sets of Smart Contracts according to their purposes and functionalities. One way to group Ethereum’s smart contracts is to use topic modeling techniques, taking advantage of the fact that many programs representing a specific topic are similar in the program structure. Starting from a dataset of 130k smart contracts, we built a Latent Dirichlet Allocation (LDA) model to spot the number of topics within our sample. Computing the coherence values for a different number of topics, we found out that the optimal number was 15. As we expected, most programs are tokens, games, crowdfunding platforms, and ICO.

Smart contracts categorization with topic modeling techniques

Ibba, Giacomo;Ortu, Marco;Tonelli, Roberto
2021-01-01

Abstract

One of the main advantages of the Ethereum blockchain is the possibility of developing smart contracts in a Turing complete environment. These general-purpose programs provide a higher level of security than traditional contracts and reduce other transaction costs associated with the bargaining practice. Developers use smart contracts to build their tokens and set up gambling games, crowdsales, ICO, and many others. Since the number of smart contracts inside the Ethereum blockchain is several million, it is unthinkable to check every program manually to understand its functionality. At the same time, it would be of primary importance to group sets of Smart Contracts according to their purposes and functionalities. One way to group Ethereum’s smart contracts is to use topic modeling techniques, taking advantage of the fact that many programs representing a specific topic are similar in the program structure. Starting from a dataset of 130k smart contracts, we built a Latent Dirichlet Allocation (LDA) model to spot the number of topics within our sample. Computing the coherence values for a different number of topics, we found out that the optimal number was 15. As we expected, most programs are tokens, games, crowdfunding platforms, and ICO.
2021
9788883171086
Blockchain; Smart Contract; Ethereum; LDA; Topic Modeling; Ponzi Scheme; Token; ICO; Smart Contracts Trends
File in questo prodotto:
File Dimensione Formato  
paper_6.pdf

accesso aperto

Tipologia: versione editoriale
Dimensione 283.31 kB
Formato Adobe PDF
283.31 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/346773
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact