This page contains the coefficients obtained on the work Characterizing and Modeling Session-Level Mobile Traffic Demands from Large-Scale Measurements, presented at ACM IMC 2023. We also present a example on how to generate samples. We present models for 30 different services observed at the production network of a large mobile operator in France, with all values representing transport layer session-level statitics.
DOI link: https://doi.org/10.1145/3618257.3624825
In case of doubts, please contact andrefelipe.zanella@telefonica.com
In case you use this data in your study, please cite our work using the following:
@inproceedings{10.1145/3618257.3624825,
title={Characterizing and Modeling Session-Level Mobile Traffic Demands from Large-Scale Measurements},
author={Zanella, André Felipe and Bazco-Nogueras, Antonio and Ziemlicki, Cezary and Fiore, Marco},
year={2023},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
doi={10.1145/3618257.3624825},
url= = {https://doi.org/10.1145/3618257.3624825},
booktitle = {Proceedings of the 23rd ACM Internet Measurement Conference},
location = {Montreal, QC, Canada},
series = {IMC '23}
}numpy==2.3.2
pandas==2.3.1
pomegranate==1.1.2
scipy==1.16.1
Here's a brief description on what's included on this page. Please note this work was developed in Python 3.9 and most of the statistical work was done using Scipy and Pomegranate libraries. We strongly suggest using those for better compatibility. All of those are used on sample_generator.ipynb file, as an usage example generating samples on
For every app, what was the percentage of sessions and traffic
Considering all antennas included on our study, we present the average of each decile ranked their traffic load, with
Please note peak hours is represented by a Gaussian distribution (with
In case you're not using Scipy to generate those distributions (both with scipy.stats.norm and scipy.stats.pareto), you may have to double check the correct way to convert its loc and scale parameters appropriately.
For every application, we present the probability distribution function modeled as a general mixture model (composed purely of Gaussian distributions). We suggest using pomegranate.distributions.Normal() for each individual Gaussian, and pomegranate.gmm.GeneralMixtureModel() to create the mixture.
Please note all applications will have columns columns u_g representing the mean
Also note is that the generated random traffic samples
We model for each app the relation between the traffic consumed by a specific transport layer session and its expected duration. For a service
Note that during our example workflow, we actually generate first the traffic value and later obtain the duration by the inverse function of this powerlaw (calculated by func_invpowerlaw on the example). On the data files, coefficient
There are some cases where the traffic from a sampled session may be located at the extreme of its right tail, and on some apps this may lead to sessions lasting an incorrectly long ammount (i.e. days). We recomend filtering out all sessions above a certain duration limit (i.e. 1h). This is a short-comming on the way the data is collected and how we choose to draw the relation between session traffic and duration.
Be aware of the Pomegranate version. The 1.0 update changed a lot the way the library works and the classes used. The current example was updated to reflect the status of this latest release.
This work was supported by BANYAN project, which received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement no. 860239; by NetSense, grant no. 2019-T1/TIC-16037 funded by Comunidad de Madrid; by the research project CoCo5G (Traffic Collection, Contextual Analysis, Data-driven Optimization for 5G), grant no. ANR-22-CE25-0016, funded by the French National Research Agency (ANR); and by the Regional Government of Madrid through the grant 2020-T2/TIC-20710 for Talent Attraction.
