Algorithms to predict age and gender for device IDs

Algorithms to predict age and gender for device IDs


Axonix provides a transparent programmatic marketplace that enables mobile publishers to achieve the highest possible prices for their inventory whilst adhering to any controls and selling rules they define. Axonix operates as an Ad Exchange in several markets where it has access to Mobile Network Operator (MNO) anonymised customer data that they are permitted to use for targeting purposes. Axonix extends this data by modelling to predict likely presence of those data attributes for the rest of the population.

The scope of this project was to develop a large scale distributed artificial intelligence system to predict the likelihood of a device attribute (gender, age group) by looking at device behaviour.

  1. ok_iconCreated with Sketch.
    Technologies: Apache Spark, Scala, Amazon Web Services
  2. ok_iconCreated with Sketch.
    Delivery: ~500 hours
  3. ok_iconCreated with Sketch.
    The payback time for the project investment was measured in months rather than years
  4. ok_iconCreated with Sketch.
    Axonix achieved savings by replacing the previous solution with the in house system designed by Ciklum

Initial data

  • 2.5+ Pb of raw real-time bidding (RTB) logs
  • anonymised information about user behaviour
  • application should run on an AWS EMR cluster


  • minimize processing costs
  • update models every week without rerunning the pipeline with historical data
  • full control and ownership over the process

Ciklum Data and Analytics engineers developed a Spark/Scala application to predict attributes (gender, age group) based on the behaviour similarities with anonymised seed data where these attributes were known. The application can be run on an AWS EMR cluster that has access to the RTB data. The predictions, models and evaluation reports are saved to Amazon S3 storage.

  • Data processing pipeline efficiently aggregated RTB records into the most appropriate format for the models to run.
  • Anomaly detection algorithm removed outliers from the modelling dataset. This improved modelling accuracy by
    removing excessive noise in the data.
  • Each model outputs a set of reports that can be used for accuracy evaluation and confidence threshold selection
Ciklum took the time to understand our requirements, took care to involve us closely in the development of the project, delivered high quality well documented code and made sure to help us deploy it correctly in our environment. Ciklum provided Axonix with a high quality product which was delivered on time and on budget
Donal Kane
CTO at Axonix

Apache Spark framework made preprocessing and modelling stages computationally efficient and fast.

  • The final solution supports the training of multiple models in parallel for a given country data set, with a total of 9 models.
  • Each model supports adjustable confidence levels for requirements or seed data changes.

Transform your raw technical data into clear business insights

Our Data Analytics services
Next story

Technology platform to support growth in the US

Read more
What's your challenge? Let us deliver the talent and expertise to help you solve it.
Upload file

(File requirements: pdf, doc, docx, rtf, ppt, pptx)

By submitting completed “Contact Us” form, your personal data will be processed by Ciklum Group and its subsidiary entities worldwide. Please read our Privacy Notice for more information. If you have any questions regarding your rights or would subsequently decide to withdraw your consent, please send your request to us.

Join our team
Do you have a passion for engineering? We’ve been looking for you.
Check open vacancies