CPU cache misses and branch mispredictions waste CPU cycles and affect program performance. Such software inefficiencies could be neither eliminated by existing compilers nor avoided by developers. In this paper, we propose a novel approach, named PERDICE, to automatically discover such performance bugs by leveraging concolic execution. PERDICE adopts a new path exploration algorithm to discover such software inefficiencies. In particular, we measure performance losses in the granularity of program locations (e.g., instructions, source code lines) instead of paths to avoid getting stuck into the code without software inefficiencies. Moreover, when scoring test inputs, our new approach prefers the test inputs incurring increments in performance losses. This strategy allows PERDICE to avoid getting stuck into the software inefficiencies that have been found. We have implemented PERDICE for both PC (X86 instructions) and Android smartphones (ARM instructions). The experimental results with real-world desktop software and Android native code show that PERDICE outperforms the other four popular algorithms and PROFs (a multi-path performance profiler) in terms of the speed to discover software inefficiencies and the severity (i.e, amount of wasted CPU cycles) of inefficiencies.