How does it work
Last updated
Last updated
The prediction modules attempt to provide an estimation of the likelihood that an ethereum address is involved in scam activities. The result is the average of 2 sub-models (=modules) that analyze the same data from a different perspective.
Each module is trained on the same input data on the assumption that scammers will leave recognizable on-chain footprints compared to genuine users.
The entire blockchain history of transactions, traces and ERC20 transfers for an address is provided as inputs to the model. We relied on external scam reports and other manual observations for our training set labeling.
Each module has a prediction accuracy of >95%. It is possible for the user to customize the predicted class of the model by changing the prediction threshold for scam/no-scam (default threshold is set at .50
).
Our research has shown that accuracy tends to be maximized for threshold values between .39
and .59
.
Below .39
: Recall is maximized at the expense of accuracy. There are less false negatives but much more false positives;
Above .59
: Precision is maximized at the expense of accuracy. There are less false positives but much more false negatives.
These thresholds can be modified to satisfy your end need. For example:
Case 1: Flag scammers with very high confidence and give the benefit of the doubt to those that are not clear: in this case you'll want to set a higher threshold and thus get a higher precision;
Case 2: Err on the side of safety, even if it means wrongly flagging a few legit users as scammers: in this case you can lower the threshold and thus get a higher recall and minimize the false negatives generated by the model.