App Moderation of Mature Content
Project Details
Purpose of this project
Write an API that is capable with at most 6% error accuracy of checking if an app has mature content.
Project Description
When uploading new apps, there is a box to the user check if it is explicit content. But what if the user marks explicit content as non-explicit content? Our engineers would have to, minutes after the app has been uploaded, change the settings by themselves.
This project tries to automate some things about that process, trying to give more feedback about the probability of the app having explicit content.
The main goal was to develop a system that detected, with at most 20% of error percentage, explicit content on apps.
The goal was achieved with at most 6% of error percentage.
Project Source Code
Road-map
- Phase 1 (July 11 to July 15)
- Tutor Meeting and talking about the project
- Research to understand the state of the art
- Goals definition
- Download small database of images on aptoide store hand-labeled as explicit/non-explicit
- Automated tests on open-source software and decide on which one to rely on
- Phase 2 (July 18 to July 22)
- Optimizing the best possible configuration of Illustration2Vec
- Research of other possible open-source software for text processing
- Creation of a dataset of description and images of explicit applications
- Phase 3 (July 25 to August 5)
- Optimizing the software previously researched
- Possible integration of other algorithms that could improve the results
- Automated tests
- Phase 4 (August 8 to August 12)
- Fixing possible bugs
- Write automated tests
- Comparing the final results with the initial tests
- Final presentation
Weekly Reports
Week 1
- Tutor meeting and choose of project methodology
- Understand state of art technology
- Download dataset of images from Aptoide store to test
- Writing of Automated tests
- Spreadsheet writing with the results of the tests
- Road-map planning
- Initial Presentation planning
- Optimizing Illustration2Vec for specific usage
- Beginning of research for open-source software for text-mining
Week 2
- Research of open-source algorithms for text processing
- Writing of script that test some algorithms for text classification
- Writing of scripts to download Aptoide apps information and save it on a database
- Creation of a database with ±4000 apps (explicit and non-explicit) with information about icons, screenshots, id, title, description, minimum age, repository, category and wurl
- Manually label the screenshots, icons and descriptions as explicit/non-explicit
- Beginning of the tests with the images and description from the Aptoide apps
Week 3
- Tests to find the right features for the Machine Learning model.
- 5-Fold Cross Validator tests for choosing the best classifier for the Machine Learning model.
- Tests of parameter choosing for the previous classifiers
- Creation of a script using Machine Learning to get the information both from text and images and decide if it is explicit content or not
- 5-Fold Cross Validator tests for choosing the best suitable classifier and tests for parameter choosing
- Creation of a bigger dataset, with 1772 explicit apps and 8153 non-explicit apps (with id, description, category, curl, minimum age, icons and screenshots) and hand-label explicit/non-explicit apps
- Confirmation and correction of the classifiers previously chosen and rerun of all the tests previously made, now with a bigger database
Week 4
- Finalizing "App Moderation for Mature Content". Final Tests
- Django-based API creation for interaction with the software build
- Beginning of the Documentation
- Beginning of the Internship Report
- Final Internship Presentation
Week 5
- Reading some configurations from a config file
- Server installation and Troubleshooting
- Add to API synchronous and asynchronous requests
- JSON to the requests responses changed
- Implementing a cache memory to save the analysis results
- Added to the url the option to force reloading the cache
- Optimization of the response times of the server
- Finalizing the documentation and the report
Trainee details
Trainee Name
Diogo Daniel Soares Ferreira
Past Experience
Academic projects.
Current Situation
Just finished the 2nd year @ Computer Engineering and telematics, University of Aveiro.
Motivation for the Project
Obtaining work experience coding in the Summer, while working on a business environment.
Mentor
Ehsan Nia
Attachments (2)
-
report.pdf
(171.7 KB) -
added by dferreira 3 years ago.
Final Report
-
Documentation.pdf
(137.5 KB) -
added by dferreira 3 years ago.
Documentation
Download all attachments as: .zip