wiki:AptoideImagesDetector

App Moderation of Mature Content

Project Details

Purpose of this project

Write an API that is capable with at most 6% error accuracy of checking if an app has mature content.

Project Description

When uploading new apps, there is a box to the user check if it is explicit content. But what if the user marks explicit content as non-explicit content? Our engineers would have to, minutes after the app has been uploaded, change the settings by themselves.

This project tries to automate some things about that process, trying to give more feedback about the probability of the app having explicit content.

The main goal was to develop a system that detected, with at most 20% of error percentage, explicit content on apps.

The goal was achieved with at most 6% of error percentage.

Project Source Code

Source Code

Road-map

  1. Phase 1 (July 11 to July 15)
  • Tutor Meeting and talking about the project
  • Research to understand the state of the art
  • Goals definition
  • Download small database of images on aptoide store hand-labeled as explicit/non-explicit
  • Automated tests on open-source software and decide on which one to rely on
  1. Phase 2 (July 18 to July 22)
  • Optimizing the best possible configuration of Illustration2Vec
  • Research of other possible open-source software for text processing
  • Creation of a dataset of description and images of explicit applications
  1. Phase 3 (July 25 to August 5)
  • Optimizing the software previously researched
  • Possible integration of other algorithms that could improve the results
  • Automated tests
  1. Phase 4 (August 8 to August 12)
  • Fixing possible bugs
  • Write automated tests
  • Comparing the final results with the initial tests
  • Final presentation

Weekly Reports

Week 1

  • Tutor meeting and choose of project methodology
  • Understand state of art technology
  • Download dataset of images from Aptoide store to test
  • Writing of Automated tests
  • Spreadsheet writing with the results of the tests
  • Road-map planning
  • Initial Presentation planning
  • Optimizing Illustration2Vec for specific usage
  • Beginning of research for open-source software for text-mining

Week 2

  • Research of open-source algorithms for text processing
  • Writing of script that test some algorithms for text classification
  • Writing of scripts to download Aptoide apps information and save it on a database
  • Creation of a database with ±4000 apps (explicit and non-explicit) with information about icons, screenshots, id, title, description, minimum age, repository, category and wurl
  • Manually label the screenshots, icons and descriptions as explicit/non-explicit
  • Beginning of the tests with the images and description from the Aptoide apps

Week 3

  • Tests to find the right features for the Machine Learning model.
  • 5-Fold Cross Validator tests for choosing the best classifier for the Machine Learning model.
  • Tests of parameter choosing for the previous classifiers
  • Creation of a script using Machine Learning to get the information both from text and images and decide if it is explicit content or not
  • 5-Fold Cross Validator tests for choosing the best suitable classifier and tests for parameter choosing
  • Creation of a bigger dataset, with 1772 explicit apps and 8153 non-explicit apps (with id, description, category, curl, minimum age, icons and screenshots) and hand-label explicit/non-explicit apps
  • Confirmation and correction of the classifiers previously chosen and rerun of all the tests previously made, now with a bigger database

Week 4

  • Finalizing "App Moderation for Mature Content". Final Tests
  • Django-based API creation for interaction with the software build
  • Beginning of the Documentation
  • Beginning of the Internship Report
  • Final Internship Presentation

Week 5

  • Reading some configurations from a config file
  • Server installation and Troubleshooting
  • Add to API synchronous and asynchronous requests
  • JSON to the requests responses changed
  • Implementing a cache memory to save the analysis results
  • Added to the url the option to force reloading the cache
  • Optimization of the response times of the server
  • Finalizing the documentation and the report

Trainee details

Trainee Name

Diogo Daniel Soares Ferreira

Past Experience

Academic projects.

Current Situation

Just finished the 2nd year @ Computer Engineering and telematics, University of Aveiro.

Motivation for the Project

Obtaining work experience coding in the Summer, while working on a business environment.

Mentor

Ehsan Nia

Last modified 2 years ago Last modified on Aug 12, 2016, 2:43:19 PM

Attachments (2)

Download all attachments as: .zip