Ad

How I Can Check If A String Is Likely To Be Generated By A Bot?

- 1 answer

I have spam issue. Some bot (I believe) is getting around Google recaptcha and inserting strings like the following into forms on my site:

dtbNPRpfcz

VvAJEXqueSKscY

Does anyone know of any JS or C# code I can use that would give a high probability of indicating that the above string is randomly generated?

If I could check the fields being filled and know that several of them were likely to be bot generated then I could block the submission.

The above strings seem to have more than a normal number of ucase chars for example.

Update: Currently looking at using a password strength checker against some of the strings. If the string is above weak then it's likely to be spam. My web host said "try another recaptcha".

Update:

Well. I've learned a lot over this and gained some useful code so thank you very much for your input and answers. However, after ignoring the problem for the weekend I looked at it again. I noticed that the spam bot was getting around ALL the form validation. Then the penny dropped. The bot was going directly to route and posting to it. I had not set up CSRF (Cross Site Request Forgery). This meant an agent could post to the url from outside the site's domain. Doh!

I had added this to the forms:

 @Html.AntiForgeryToken()

But some of my routes were missing the code to check it:

  try
            {
                this.ValidateCsrfToken();
            }
            catch (CsrfValidationException)
            {
                return Response.AsText("Csrf Token not 
valid.").WithStatusCode(403);
            }

So. Apologies for wasting your time. That fixed it immediately.

Ad

Answer

Random string detection is complicated and is related to machine learning. I don't recommend to implement it on your own, perhaps spell-checking JS/C# libraries do help.

Apart from that, regarding to bot prevention, I try to make a few suggestions:

  • Make sure you have implemented Google recaptcha correctly. Use reCAPTCHA v3 if possible, and make sure you have verified g-recaptcha-response on backend side. Google recaptcha does not 100% reliable and can be bypassed by some Anti Captcha solutions, but correct implementation is the basic.

  • Filter out suspicious IP address. Block the IP address from which randomly generated strings are sent out.

Ad
source: stackoverflow.com
Ad