Misunderstood Mobile Benchmarks Are Hurting The Industry and Consumers (2024)

Share to Facebook
Share to Twitter
Share to Linkedin

I have been in and around the benchmarking and benchmarketing scene for 25 years in the PC, server, and now smartphone and tablet markets. Benchmarks have been on a cyclical nature for years and the cycle is fairly predictable. Benchmarks cycle between manufacturer, consortium, benchmark company and industry standard- led formations. There are hybrids as well, like manufacturer-led consortiums, too.

Over the course of the past few years, there has been a proliferation of inappropriate or misunderstood benchmarks in the mobile world, and those benchmarks serve to do nothing other than help users generate a single number, a benchmark score, that is supposed to quantify the performance and by proxy, the experience of that device. This impacts companies like chipmakers or chip designers Apple , ARM Holdings , Huawei, Intel , MediaTek, NVIDIA, Qualcomm and Samsung Electronics . It also impacts handset makers likeApple, HTC , Lenovo-Motorola, LG, Sony and Samsung Electronics and the decisions they make. Most importantly, it impacts consumers and I'll give examples why.

The problem with some of these mobile benchmarks and the scores that they generate is that they don’t accurately reflect a user’s experience once they’ve gotten a device and use it. Simply put, the numbers generated do not directly correlate to the user’s experience with the device and device manufacturers, and the press and reviewers using them are unfortunately misleading consumers by using these benchmarks. I don't think it's to intentionally mislead, I just think it's a lack of understanding and maybe a lack of desire to do the extra work.

Some of thesebenchmarks and the people who use them in reviews have been responsible for proliferating the 8-core myth, too. These benchmarks are simply run to see the fastest theoretical performance that the system could be tested at, without regard to battery life, operating systems, applications or real world use cases. Many of them simply load up all of the cores to their maximum frequency, which phones never operate at other than benchmarks. As a result, these benchmarks have been given the label across the industry as"inaccurate or inappropriate benchmarks"that don’t accurately represent a user’s experience.

Why should we even care about this?

You may be asking, "why should I even care"? First of all, if you look at the history of microprocessor or SoC pricing, you will find a direct correlation between perceived performance and pricing. Don't even think of invoking the "Apple rule" as they have dominated the mobile SoC benchmarks for most of five years.

Then there are consumers. Consider first an example of something I read this morning in the DailyMail:

"£99 Tesco tablet beats £300 Apple rival in speed test: Consumer study shows price and brand is not guarantee to finding best performing device"

In this example, the DailyMail used GeekBench to justify the article and headline. We all know in the industry that a 99 poundTesco tablet doesn't outperform a 300 poundApple iPad mini 3 on real benchmarks or the experience. Admittedly, the DailyMail example is the worst I have seen, but I see this kind of stuff every time I read reviews about a new smartphone or tablet. And I cringe. You should, too.

So if you are an SoC manufacturer like Huawei, MediaTek, Qualcomm , or Samsung Electronics, you take a "can't beat them, join them" approach and add more processor cores to your SoC. Thus we have the 8-core myth, 8 cores so that you look better on inappropriate benchmarks. Some have done this to get the "64-bitness", too. So how is that 64-bit Android thing working out?Apple and Intel have not taken this approach of wantonly adding meaningless CPU cores and I applaud them for taking the high road. Qualcomm, I believe, will move back to a different approach with their future Kryo core. NVIDIA took an approach in the middle.

Why do some use inaccurate or inappropriate mobile benchmarks?

The reason that people use inaccurate benchmarks is because these benchmarks make it really easy to simply download, press a button and get a number telling you how fast or slow your smartphone is, in theory. It takes a lot longer to run a benchmark that reflects real-world usage. Part of the reason for this has been because press and device manufacturers have been publishing their scores in these inaccurate or inappropriate benchmarks and give credibility to these scores. But the reality is that these benchmarks don’t even remotely test what a normal user would be doing on their smartphone. These benchmarks are known as synthetic benchmarks. They generally test the components of a computer or in this case a smartphone to see their highest performance in an absolute best case scenario, usually without much context about how those components are be used.

AnTuTu and Geekbench, the most commonly used misunderstood or inappropriate mobile benchmarks

From my experience, along with many others experts that I have talked to within the industry on this topic, the general agreement is that AnTuTu and Geekbench are the two mobile benchmarks that are the most used and misunderstood. AnTuTu and Geekbench are both commonly used by both device manufacturers and reviewers to quantify the performance of a smartphone in order to show how it performs against others. The problem with these benchmarks is that they do not actually test the smartphone’s actual performance as a system, but rather components of the SoC or some other part of the whole system. In some scenarios, some of these benchmarks may be useful to point to certain capabilities of the CPU, but are not a useful representation of the whole system’s performance or experience.

These benchmarks are also easily manipulated and tricked by device OEMs as we saw last year when Anandtech exposed a multitude of smartphone manufacturers ‘boosting’ their benchmarking performance when using some of these benchmarks. They were flat out cheating in benchmarks in order to look better on reviews done by the press. Unsurprisingly, the most cheating was found in AnTuTu where LG, HTC , ASUS and Samsung Electronics were all caught cheating in the benchmark.

Some lessons could be learned from the PC world, from my experience, where I found problems in smartphone benchmarking in the past and suggested a list of remedies. Let's dive into AnTuTu and GeekBench.

AnTuTu

AnTuTu originally started out as just a single system test where you pressed a single button and let the test generate you a score based on a bunch of individual system tests. Now, the app is in version 5.7 and still has the single button test, but now includes separate tests for HTML5, video, display, stability and battery.

The standard test is still a combination of single thread floating, single thread integer, full CPU integer and full CPU floating performance tests. These don’t do anything other than tell us exactly what the maximum single thread and multi-core performance of the CPU could be if applications could fully utilize all of the cores and battery consumption wasn’t a factor.

The benchmark also tests RAM performance, multitasking performance, storage I/O and database I/O. These tests, once again, operate mostly as silos and don’t particularly do a good job of telling the user how the phone will perform at all in real world scenarios. The last two tests of this benchmark are a VERY simple 2D test with tons of bouncing shapes and a very low quality 3D test that looks nothing like any game I’ve ever seen on any decent phone in the last five years, meaning that it doesn’t really test the gaming capability of the phone. Then, the benchmark takes all of these individual scores and combines them together to give you your AnTuTu score, an aggregated number that doesn’t really amount to anything.

The problem with using or reporting on AnTuTu is that it purports to be or is used as a full system benchmark, when it really doesn’t do anything other than test different components of the system, fairly poorly, and then generates a composite score based on those individual scores. AnTuTu’s main test doesn’t incorporate any of the other tests they offer like HTML5, battery life, video, stability or screen test which could provide better insights into system performance. This is because if they did incorporate these tests, it would take too long to test and wouldn’t really be as popular of a test as it is today.

See Also

What is the GeekBench score? - Answers

Geekbench

Geekbench is a cross-platform benchmark that started out on MacOS and iOS, popularized by its ability to both run on iOS and Android and provide some feedback about certain aspects of a CPU’s architecture and theoretical capabilities. It is currently on version 3 and now supports testing on Windows, Mac, Linux, Android and iOS. Geekbench isn’t as bad of an offender as AnTuTu when it comes to being a misleading or misunderstood benchmark, but it does only test two components of a smartphone, the CPU and memory, and doesn’t do so in any real world scenarios. This leaves out really important components like the GPU, which is fast-becoming a compute workhorse in a heterogeneous compute environment or storage.

The tests it runs are CPU integer and floating point calculations in both single core and multi-core modes as well as memory single core and multi-core. As a result, the benchmark may provide some insights as to the architectural comparison of the CPU in the system, but in the case of two different smartphones with the same SoC this benchmark provides limited to no value. This benchmark should only really be used to compare CPUs from different operating systems and platforms and how they stack up against each other, not a benchmark for comparing phones, especially not for a review.

Geekbench certainly has its place in benchmarking, but it doesn’t particularly make sense to be including it the way that smartphone reviewers have been doing in their reviews. As a result, this becomes an inappropriate mobile benchmark because of the way it gets utilized by the press. Geekbench should be commended for their transparency of how and what they’re testing exactly, but it doesn’t change the fact that it’s being misused in mobile testing.

Reviews using balanced benchmarks

Although there are plenty of reviewers out there using benchmarks in their reviews, there are a select few experienced reviewers that are using the right benchmarks to compare smartphones. Their editorial on these makes technical and experiential sense, too.

Reviews using no benchmarks

In addition to having reviewers that use good benchmarks, we also have reviewers that simply don't use benchmarks at all. For some, they simply chalk it up to being about the experience and if the experience is okay, there's no need for a benchmark. However, this is a dangerous notion because reviewers may miss something critical in a phone's performance without having any numbers. It also leaves the "experience" up to someone who runs very different apps, tech usage history and even hand size to determine what a good experience is. Having benchmarks also lends credibility to a review when there might be some sort of issues and figuring out the culprit.

This list excludes the countless forums around the internet that also quote AnTuTu and Geekbench performance across the globe and non-English speaking publications.

Using the wrong benchmarks pulls the industry in the wrong direction

Because of the creation, use and promotion of these inaccurate, misunderstood, and/or gameable benchmarks, we are seeing smartphone manufacturers and SoC vendors dedicating time and engineering resources to ensuring that their performance in these benchmarks is up to expectations. After all, if so many people are using or mischaracterizing AnTuTu and Geekbench, it lends them credibility even when it shouldn’t. It seems like those same resources could be working on further improvements to issues we all have, things like battery life.

Additionally, vendors are adding features that make the misrepresentative benchmarks look better, like by adding more CPU cores beyond what any piece of software can use to improve the experience outside of battery life. This propagates the8-core myth.

Additionally, because so many reputable tech blogs don’t run ANY benchmarks at all, they are essentially giving the ones that do more credibility when they show AnTuTu and other benchmarks. While it is understandable that some reviewers either don’t have the time to run benchmarks or aren’t satisfied with the quality of benchmarks, it still isn’t an excuse to not have ANY at all. Additionally, benchmarks are supposed to lend credibility to your experience and explain any kinds of performance differences between devices, be they good or bad.

What needs to be done?

I stand by mycolumn I wrote in 2013talking about what needs to be done. Let me recap what I said:

The best benchmarks reflect real world usage models
Never rely on one benchmark
Benchmark shipping devices
Application-based benchmarks are the most reliable
Look for transparency
Look for consistency

Reviewers need to be using a suite of benchmarks that best exemplify real world usage. That means benchmarks that:

utilize real game engines for their 3D benchmarks, like 3DMark
benchmarks that best mirror applications like Basemark X
those that use real applications or reflect them for their testing like PCMark.

At a minimum, the benchmark results must reflect real-world experiences. I do think that the industry can be doing a better job in the future creating a consortium-led approach to benchmarking.

Vendors like Apple, ARM Holdings, Huawei, HTC,Huawei,Intel, Lenovo, LG, MediaTek, Qualcomm, Sony, and Samsung Electronics need to do more to move this forward because its the right thing to do. And they know it.

Let me know what you think below.

You can find Patrick Moorhead, President & Principal Analyst ofMoor Insights & Strategyon theweb,Twitter,LinkedInandGoogle+.

Note: This blog contains contribution from Anshel Sag, technologist and staff writer for Moor Insights & Strategy.

Disclosure: My firm,Moor Insights & Strategy, like all research and analyst firms, provides or has provided research, analysis, advising, and/or consulting to many high-tech companies in the industry, including ARM Holdings, Huawei,Intel, Lenovo, NVIDIA, Qualcomm, andSamsung Electronics, cited in this article. No employees at the firm hold any equity positions with any companies cited in this column.

Misunderstood Mobile Benchmarks Are Hurting The Industry and Consumers (2024)

FAQs

What is the problem with benchmarking? ›

Lack of context. In most cases, the benchmark data being compared is stripped of its short-term and long-term context, meaning it tells you what a competitor or internal team has achieved, with no information on how they achieved it. This makes it difficult to pinpoint meaningful areas for improvement.

Read On ›

What are the criticism of benchmarking? ›

Benchmarks work with numbers and therefore suggest that the measurements need to be absolutely exact. This is not correct- benchmarking should only give indicators as to where improvements can be made. Some benchmarking organisations produce huge amounts of numbers but very little useful information.

Discover More Details ›

Why does benchmarking fail? ›

Most of the time companies neglect an important step which is to define clear and unambiguous objectives and they jump straight into evaluating performance. Without a clear purpose, benchmark analysis fails to meet expectations.

How do you overcome benchmarking problems? ›

Lack of commitment from top management or the absence of a dedicated benchmarking team can hinder the success of benchmarking initiatives. To overcome this, organizations should assign clear responsibilities, establish a benchmarking culture, and integrate benchmarking into their strategic planning processes.

See Details ›

What are the pros and cons of benchmarks? ›

PROS: Accelerates progress, promotes innovative thinking, provides hard data on performance. CONS: Requires adjustment of practices, focuses on how things are accomplished, may not provide exact targets.

Find Out More ›

What are two disadvantages of benchmarking? ›

5 Disadvantages of benchmarking

Copying or imitating others without taking into account your own strengths, weaknesses, and uniqueness can be a risky endeavor. Additionally, comparing yourself to unrealistic or unsuitable benchmarks can be disheartening and misleading.

Tell Me More ›

Is benchmarking good or bad? ›

Benchmarking is an effective way of learning what others are doing particularly well, and then using this knowledge to determine how and where you can improve your own operations. By learning from others, you can expand your perspective and identify new ways and better ways of working.

Show Me More ›

What is the risk associated with improper benchmarking? ›

Perhaps the most significant risk is that the results may not be apparent or provide businesses with a decisive strategy which means it is open to interpretation. Another risk is errors.

Explore More ›

What is the key negative issue associated with internal benchmarking? ›

The Cons of Internal Supply Chain Benchmarking

Internal benchmarking creates a closed loop in which it may be hard to raise standards beyond those of the best-performing teams, business-units, or facilities; It may not be as effective as external benchmarking when it comes to encouraging innovation.

Which of the following is most common criticism of benchmarking? ›

Another criticism is that benchmarking focuses on 'keeping up' by emulating approaches already in the market rather than considering new options to leap ahead. It also might inadvertently copy negative approaches. Finally, the idea of 'best practice' might be considered to be flawed.

Show Me More ›

What happens if you fail a benchmark? ›

Students are given remediation in their reports based on their incorrect answers. So even if they fail, they can get the help they need to get back on track.

Read The Full Story ›

What triggers benchmarking? ›

Common areas that you may want to target for benchmarking analysis include cost per unit, time to produce each unit, quality of each unit, and customer satisfaction. The performance metrics you get from these targets can be compared against others to help you determine best practices for improving your operations.

See Details ›

How can benchmarking be improved? ›

10 Ways to Improve Your Internal Benchmarking Programs

Establish Productivity Goals & Targets. ...
Provide Timely Feedback Regarding Productivity Data. ...
Be Sure to Discuss the Importance of Productivity Metrics with Workers. ...
Understand What the Numbers Are Telling You. ...
Don't Take Someone Else's Metrics and Make Them Yours.

More items...

Get More Info Here ›

Can benchmarking reduce risk? ›

Benchmarking can help you create a shared vision and goals, align your expectations and standards, and improve your trust and transparency. Benchmarking can also help you leverage the expertise, resources, and networks of your partners to achieve better outcomes and reduce risks.

What is benchmarking solution? ›

Benchmarking is a powerful tool for consultants who want to solve problems and improve performance for their clients. It involves comparing your current situation with the best practices or standards of others in the same or similar fields.

What are the barriers to benchmarking? ›

Some of these barriers relate to lack of time, cost issues, lack of knowledge and poor strategic planning. Despite these barriers there are many benefits that can be derived from implementing benchmarking programs.

View Details ›