Be careful with Intel turbo boost! It can screw your benchmarking! And run slower when dealing with parallel programs!

I wrote several months ago a little application called machPQ.py (I’ll open the code soon…) which calculates the active, reactive and also the apparent power in machine terminal’s over the time domain, for electromagnetic transients analysis. The files that this program have to crunch often have 1.E6 lines or more.



MANAUARA_PQFig. 1: All this work to generate this kind of images.


Due to those large files this application was taking long time to finish it’s calculations 1h-3h, hence I started to transcript it into a parallel paradigm using python as well.

The problem begins when I tried to benchmark the parallel version and compare with the single threaded one. The single threaded in some runs was being faster than the parallel version! That was driving me crazy! I don’t know why, but something told me that I should take a look at the processor state (my laptop is a Dell XPS 15 L502x with i7 processor).

And damn! I was right! With the turbo boost enabled [3] the computer running multiple threads got hotter faster and then slowed the clock speed, therefore being slower than the single thread version, or just slightly faster (depending on how hot the day was).

So, to disable the turbo boost, I used, from [1]:

# echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo

And then the magic happened! In this way, with the parallel version fighting in fair conditions with single thread version the expected results came up.

I’ll not do a long discussion over the data, but just to summarize:

  • When the code was running with turbo boost enable, the time needed to complete the simulation using the parallel version was only by 4.13 % smaller than the single threaded version (Simulation 1);
  • Now, with turbo boost disable, the non parallel version took 42.02 % more time time than the parallel version – Oh, yeah! – (Simulation 3);
  • Running the code into n-crap-vidia, with optimus [2], again, we got a nice speed up of 48.34 % (comparing the bigger time to the smaller) (Simulation 4);
  • The parallel code running directly into cpu (Simulation 3) took 2.61 % more time than into n-crap-vidia (Simulation 4). However, this mismatch is so small and I just performed a single simulation that it is not possible to verify any trend here;
  • The single thread version running with turbo boost enabled (Simulation 1) was 24.21 % faster than the single threaded version when turbo boost was disabled (Simulation 3);
  • The parallel version with turbo boost disabled (Simulation 3) was 8.77 % faster than the parallel version with turbo boost enabled (Simulation 1);

From the above analysis we can conclude:

  • This variable clock speed is a pain in the ass when doing benchmarks!!! Even disabling the turbo boost, the clock can also be reduced if the temperature is high;
  • As the major programs are still single threaded leaving the turbo boost enabled is a good idea;
  • For very demanding multiple process or multiple threaded programs, it’s a good idea disabling the turbo boost;
  • Using the GPU through bumblebee seems interesting and deserves further tests.

All the data used to analyze the performance and speed up due to the code parallelism are shown clicking here ->

Simulation 1: With turbo boost enabled

single_threadFig. 2: Simulation 1 – Single thread version.


multi_threadFig. 3: Simulation 1: Parallel threads version.


leonardo@AL:~/projects/machPQ$ time python machPQ_s_plot.py teste.adf 1.E-6 63. ; time python machPQ_teste_parallel.py teste.adf 1.E-6 63.
Abrindo arquivo de dados…
…OK!
Iniciando processamento…
…OK!
Gravando resultados em disco…
…OK!

Tempo de processamento: 186.784282

real 3m7.005s
user 3m7.135s
sys 0m0.079s
Abrindo arquivo de dados…
…OK!
Iniciando processamento…
…OK!
Gravando resultados em disco…
…OK!

Tempo de processamento: 179.066593

real 2m59.248s
user 5m14.349s
sys 1m11.704s


Simulation 2: With turbo boost enabled and running into n-crap-vidia

leonardo@AL:~/projects/machPQ$ time optirun python machPQ_s_plot.py teste.adf 1.E-6 63. ; time optirun python machPQ_teste_parallel.py teste.adf 1.E-6 63.
Abrindo arquivo de dados…
…OK!
Iniciando processamento…
…OK!
Gravando resultados em disco…
…OK!

Tempo de processamento: 203.956192

real 3m52.100s
user 3m24.476s
sys 0m0.228s
Abrindo arquivo de dados…
…OK!
Iniciando processamento…
…OK!
Gravando resultados em disco…
…OK!

Tempo de processamento: 183.662801

real 3m20.575s
user 5m29.300s
sys 0m58.710s


Simulation 3: With turbo boost disabled

single_thread_noturboFig. 4: Simulation 3 – Single thread version.


multi_thread_noturboFig. 5: Simulation 3 – Parallel threads version.


leonardo@AL:~/projects/machPQ$ time python machPQ_s_plot.py teste.adf 1.E-6 63. ; time python machPQ_teste_parallel.py teste.adf 1.E-6 63.Abrindo arquivo de dados…
…OK!
Iniciando processamento…
…OK!
Gravando resultados em disco…
…OK!

Tempo de processamento: 232.012078

real 3m52.313s
user 3m52.349s
sys 0m0.107s
Abrindo arquivo de dados…
…OK!
Iniciando processamento…
…OK!
Gravando resultados em disco…
…OK!

Tempo de processamento: 163.367201

real 2m43.608s
user 4m36.375s
sys 1m22.991s


Simulation 4: With turbo boost disabled and running into n-crap-vidia

leonardo@AL:~/projects/machPQ$ time optirun python machPQ_s_plot.py teste.adf 1.E-6 63. ; time optirun python machPQ_teste_parallel.py teste.adf 1.E-6 63.
Abrindo arquivo de dados…
…OK!
Iniciando processamento…
…OK!
Gravando resultados em disco…
…OK!

Tempo de processamento: 236.172199

real 3m59.293s
user 3m56.594s
sys 0m0.125s
Abrindo arquivo de dados…
…OK!
Iniciando processamento…
…OK!
Gravando resultados em disco…
…OK!

Tempo de processamento: 159.207409

real 2m41.755s
user 4m35.806s
sys 1m18.623s


References:

[1] – http://luisjdominguezp.tumblr.com/post/19610447111/disabling-turbo-boost-in-linux
[2] – http://bumblebee-project.org/
[3] – http://en.wikipedia.org/wiki/Intel_Turbo_Boost

Acknowledgements

English revised by my love @anielampm =) Thank you!!

One thought on “Be careful with Intel turbo boost! It can screw your benchmarking! And run slower when dealing with parallel programs!

Leave a Reply