Contenido del Curso
Optimization Techniques in Python
Optimization Techniques in Python
Maximizing Sorting Efficiency
When it comes to sorting in Python, choosing the right technique or approach is crucial for optimizing performance, especially when dealing with large datasets or specific requirements.
Built-in Sorting
Whenever you need to sort a list in Python, except for some rare special cases, it's almost always best to use one of its two highly optimized sorting tools: the sorted()
function or the sort()
method. Both are implemented in C and use Timsort, a hybrid algorithm that combines merge sort and insertion sort for efficiency.
When to Use:
- Use
sorted()
for general-purpose sorting on any iterable without modifying the original data; - Use
sort()
when working specifically with a list and in-place modification is acceptable.
Both methods are efficient, but list.sort()
may be only slightly faster for very large lists as it avoids creating a new list. However, use sorted()
if you need to keep the original list intact.
Partial Sorting with heapq
For cases where you only need the smallest or largest elements of a dataset, sorting the entire data is unnecessary. Python's heapq
module provides efficient methods like heapq.nsmallest()
and heapq.nlargest()
to extract these elements without fully sorting the iterable, making it faster and more memory-efficient.
Let's compare the performance of the sorted()
function and the heapq.nsmallest()
function for retrieving the 10
smallest numbers** from a list:
import heapq import os decorators = os.system('wget https://codefinity-content-media-v2.s3.eu-west-1.amazonaws.com/courses/8d21890f-d960-4129-bc88-096e24211d53/section_1/chapter_3/decorators.py 2>/dev/null') from decorators import timeit_decorator import random # Generate a large list of random integers numbers = [random.randint(1, 1000000) for _ in range(1000000)] @timeit_decorator(number=10) def partial_sort_heapq(): return heapq.nsmallest(10, numbers) @timeit_decorator(number=10) def partial_sort_sorted(): return sorted(numbers)[:10] # Compare performance heapq_result = partial_sort_heapq() sorted_result = partial_sort_sorted() # Ensure both methods give the same result print(heapq_result == sorted_result)
As you can see, in our particular example heapq.nsmallest()
is roughly 10 times faster.
However, if the number of largest or smallest elements (n
) you want to retrieve is close to the total number of elements in the list, heapq
is often slower than using the sorted()
function or the .sort()
method.
For example, let's now retrieve 100000
smallest elements of the list:
import heapq import os decorators = os.system('wget https://codefinity-content-media-v2.s3.eu-west-1.amazonaws.com/courses/8d21890f-d960-4129-bc88-096e24211d53/section_1/chapter_3/decorators.py 2>/dev/null') from decorators import timeit_decorator import random # Generate a large list of random integers numbers = [random.randint(1, 1000000) for _ in range(1000000)] @timeit_decorator(number=10) def partial_sort_heapq(): return heapq.nsmallest(100000, numbers) @timeit_decorator(number=10) def partial_sort_sorted(): return sorted(numbers)[:100000] # Compare performance heapq_result = partial_sort_heapq() sorted_result = partial_sort_sorted() # Ensure both methods give the same result print(heapq_result == sorted_result)
The sorted()
function in this case clearly outperforms heapq
.
¡Gracias por tus comentarios!