How Google Finds App Store Spam

Posted @ Oct 31 2017 by

Google is hunting for app store spam in the Play store among the applications available there.

Patenting Finding App Store Spam

I like looking at patents involving search and the Web is because they explain problems search engines are trying to solve. These patents show us inventions intended as solutions to address those problems. Sometimes, they show us how someone might learn to solve their problems by studying how other companies tackled similar problems.

This month, a Google patent was granted that finds app store spam. With Google inventing and running the Android operating system, it developed a marketplace for apps running on Android devices. As a competitor of Apple, it has a lot to learn from Apple. The Google Patent mentions a couple of articles cited by the patent applicants that possibly influenced their writing. Reading those is a good introduction to the patent.

Interestingly, those articles focus upon the Apple App store, as opposed to one run by Google. It makes sense that Google would also look at the problems that the Apple App store might have been having to help them run their own App Store.

The first of these articles is one called, Identifying spam in the iOS app store

The abstract from this paper tells us about apps and problems related to people spamming the Apple App Store:

Popular apps on the Apple iOS App Store can generate millions of dollars in profit and collect valuable personal user information. Fraudulent reviews could deceive users into downloading potentially harmful spam apps or unfairly ignoring apps that are victims of review spam. Thus, automatically identifying spam in the App Store is an important problem. This paper aims to introduce and characterize novel datasets acquired through crawling the iOS App Store, compare a baseline Decision Tree model with a novel Latent Class graphical model for classification of app spam, and analyze preliminary results for clustering reviews.

In the introduction to the paper, we are told more about problems with spam apps:

Developers of spam apps (malicious developers) are primarily interested in gaining monetary profit or leaching valuable user data, such as address book contacts. Popular, seemingly legitimate apps can leak user data quietly [2, 4], so it is feasible that spam apps would attempt to do the same.

It is possible to learn a lot from looking at the problems that others have experienced:

A malicious developer could post spam reviews by using several throwaway iTunes user accounts i.e. “sockpuppets”. Apple has attempted to decrease the frequency of spam by requiring users to purchase and download an app before being able to review it. However, sockpuppet user accounts can still be created using iTunes Gift Cards, and the potential for profit and stolen user data could justify the cost.

This paper tells us that they worked to identify spam. They do this by looking at review patterns involving spammy behavior in the App Store.

The other paper tells us about some other things that Apple tried to do to identify spam behavior:

Apple May Have Tweaked App Store Ranking Algorithm, Making Downloads Matter Less

Interestingly, this paper also discusses Apps in the Android store:

Google, which constantly tweaks its Android Market rankings, may have begun weighting an app’s ratio of daily active users to monthly active users — a measure of stickiness — more heavily in recent weeks, according to teen-focused social network MyYearbook. The company had noticed suspicious ranking fluctuations across its entire portfolio of apps. Google did not comment on this.

We are told that at one point in time, downloads used to be very important in the Apple Store. This is a significant change:

The changes are a big deal because Apple app store rankings have to date relied heavily on an app’s download rate. This has allowed an entire cottage industry to flourish. Networks like Flurry, Tapjoy and W3i allow developers to pay for downloads, which bump their apps into the top of the charts where they can get even more downloads from having the extra visibility. If they’re good, they stick at the top of the charts. If they’re bad, they fall quickly.

The Google patent is:

Detecting application store ranking spam
Inventors: Kaihua Zhu and Ping Wu
Assignee: GOOGLE INC.
US Patent: 9,794,106
Granted: October 17, 2017
Filed: March 4, 2013

Abstract

A server, which may be configured to manage distribution of content to users, may receive content related information associated with a particular user, and analyze the content related information. Such analysis may comprise comparing parameters in the content related information with corresponding predefined parameters in the server for determining acceptable content related activities, and classifying users based on the analysis of the content related information. The content related information may comprise one or more of content usage related data, content download related metrics, or user session related metrics relating to one or more sessions utilized by users in conjunction with use of content managed via the server.

It helps to read those articles first before reading through this patent. They provide a sense of what is at risk as well as what has changed, and why the patent focuses on the things it does.

We are told that content usage related data are important metrics that are made up of:

  1. Data generated in electronic devices during use of content by the user,
  2. Market data relating to number of purchases or updates of particular content, and
  3. Third party data of content use activities.

The patent tells us about download related metrics that include such things as:

  1. A percentage of a particular type of content from all content downloaded by the user
  2. A maximum number of content downloaded in a single day
  3. A total number of content downloaded
  4. A maximum number of content downloaded in a single week

Another thing that a store might look at is how people looking for apps might act in an App store:

User session related metrics could include:

  1. A percentage of content downloaded from search by the user
  2. A number of queries issued by the user
  3. A percentage of content downloaded from browsing and/or clickthrough
  4. An average session duration and/or a delay from search to download
  5. A percentage of content downloaded from direct inbound traffic.

Application-Usage Based Metrics to Find App Store Spam

In addition to looking at how people behave in an app store, the devices running apps may collect data about how those apps are used by people who install them.

We are given details of the motivations behind such an approach and how it works, in the description to the patent:

For example, with application usage-based ranking and/or spam detection, applications may be ranked based on usage instead of the total download number. In this regard, usage may be far more expensive to generate than download, and thus making the cost of generating application download spam too costly to be sustainable. Accordingly, client devices (e.g., electronic devices, such as device) may collect and/or obtain usage related metrics. Examples of usage related metrics may comprise operating system (OS) related metrics and/or other API related information, such as a number of times a particular application starts and how long users use it; market metrics, such as number of times the application gets updated and how many times in-application purchases (as application markets handle the payment); and third party data. In this regard, specialized third party application entities may collect and/or obtain application usage of hundreds of thousands application usage across hundreds of millions of devices, and/or to provide that data. The application management server may initially determine the trustworthiness of the usage related metrics. Once determined trustworthy, the application management server may combine and/or analyze all different usage related information, which allow determining more optimally how an average user would use a particular application (e.g., how much time using the application), and thus the application management server may rank applications (or adjust any existing ranking) accordingly. In addition, the application management server may use the ranking and/or adjustment to ranking in making a determination regarding user classification.

Conclusion: Identifying Abnormalities

The patent also provides details on how download interaction and user session data can be used to identify Spammers. These details also help determine Apps that might not be legitimate. This is the kind of user data that might be reviewed when an App is selected and downloaded:

The application management server may, for example, obtain, collect, or receive data relating to user search queries made through the market search box;
links clicked on the market pages;
market page user views;
time spent on each page;
and/or the download event application user (bought) downloaded, installed.

The patent tells us that information is also collected when users go through a discovery phrase and find an application, and that they are watching carefully for unusual activity:

For each application downloaded, the application management server may identify the reason why the application is downloaded and may generate a set of user session related metrics corresponding to that download. The application data analyzer may then determine the percentage across the overall user population to identify abnormality. Example of session metrics may pertain to such things as number of queries the user issued during a particular sessions (and/or total query in particular period–e.g., per day); percentage of application downloaded from search; percentage application downloaded from browsing and clickthrough; percentage application downloaded from direct inbound traffic; average session duration; and/or delay from search to download. Accordingly, obtaining user session related metrics for the overall user population may allow for determining applicable expected session related criteria (e.g., threshold(s)), which would in turn be used (e.g., comparison) in determining where session metrics corresponding to particular user’s application(s) fall (i.e., in comparison to the overall user population), and thus allow for classification of the user.

Unusual behavior related to how people behave in an App Store can help point out Apps that might not be Apps that people would really want to use or download. We saw the success that both Google and Apple had with the very popular Pokemon Game last year. Providing people with a popular app can be worth the effort of fighting App Store Spam.

1 Comment

  1. Pingback: How Google Finds App Store Spam – iPHONEBiZZ.com

Leave a Comment