Gmail recently upgraded its biggest spam filter in years, with the ability to understand the relevance or characteristics of text.
Google recently posted a Security blog post detailing Gmail’s spam filter, which the company calls one of the biggest defense upgrades in recent years. This is a new text classification system called efficient and flexible text vector generator (RETVec). Google says this can help understand the relevance and characteristics of text, which are emails filled with special characters, emoticons, misspellings, and other junk that humans previously did not know. readable but not easily understood by machines. Previously, spam emails filled with special characters easily bypassed Gmail’s defense system.
While any spam filter can eliminate an email that says, “Congratulations! A $1,000 balance is available for your jackpot account,” the majority of the letters in the email go in. In the endless depth of the Unicode standard, users can find characters that look like they are part of the regular Latin alphabet.
Google says RETVec technology is trained to be resilient to character-level operations including insertions, deletions, misspellings, homonyms, LEET substitutions… Trained RETVec model based on a new character encoding that can efficiently encode all characters and words of the UTF-8 set. As a result, RETVec performs out of the box in over 100 languages without the need for lookup tables or fixed vocabulary sizes.
Google said the effectiveness has changed markedly. Methods that use fixed vocabulary sizes or lookup tables for homonyms are resource-intensive. Meanwhile, RETVec has only 200,000 parameters instead of millions, so even though Google’s spam filtering cloud platform is large enough, it can run on a local device. RETVec is open source and Google hopes it will eliminate homophone attacks.
RETVec works much like the TensorFlow machine learning model that uses visual similarity to determine the meaning of words instead of their actual character content. This approach has led to major improvements, with Google saying that replacing Gmail’s spam classifier with RETVec improved spam detection rates over baseline by 38% and reduced positive rates. fake calculation down to 19.4%. Using RETVec reduced the model’s TPU usage by 83%, making the implementation of RETVec one of the largest upgrades in recent years. The company has been testing RETVec internally for the past year and has deployed it to all users’ Gmail accounts.