Contenido del Curso

Optimization Techniques in Python

1. Understanding and Measuring Performance

Introduction to Python Performance Timing and Benchmarking Basics Measuring Function Performance Challenge: Implementing Benchmarking

2. Efficient Use of Data Structures

Lists and NumPy Arrays Sets and Tuples Challenge: Choosing Optimal Data Structures Using the collections Module Challenge: Handling Customer Requests

3. Enhancing Performance with Built-in Tools

Leveraging map() and List Comprehensions Handling Large Files Maximizing Sorting Efficiency Efficient String Operations

Sets and Tuples

Before we proceed with sets and tuples, it is important to mention that we won't discuss dictionaries here.

Set

A set offers an average O(1) time complexity for insertions, deletions, and lookups, meaning these operations are performed in constant time, regardless of the set’s size. This makes sets much faster than lists for membership testing and operations like adding or removing elements, where lists require O(n) time complexity (time grows linearly with the size of the list) in the worst case.

When to use:

You need unique elements, ensuring that there are no duplicates in your collection;
Fast membership testing is required, making sets ideal for tasks like checking for the existence of an item;
You are performing operations such as set unions, intersections, or differences, which sets support with optimized methods;
Order does not matter, as sets are inherently unordered, and there’s no need to maintain insertion order.


              1234567891011121314151617181920
            
# Removing duplicates from a list using a set
numbers = [1, 3, 2, 3, 5, 4, 5]
unique_numbers = set(numbers)
print(f'Unique Numbers: {unique_numbers}')

# Fast membership testing
names = {'Alice', 'Bob', 'Charlie'}
print(f'Is Alice in the set? {"Alice" in names}')
print(f'Is Eve in the set? {"Eve" in names}')

# Set operations: union, intersection, and difference
set_a = {1, 4, 3, 2}
set_b = {3, 5, 4, 6}
print(f'Union: {set_a.union(set_b)}')
print(f'Intersection: {set_a.intersection(set_b)}')
print(f'Difference: {set_a.difference(set_b)}')

# Removing elements from a set safely with discard
names.discard('Alice')  # Safe removal, no error if the element doesn't exist
print(f'Names after removal: {names}')

Let's now compare the performance of a set to a list in membership testing:


              12345678910111213141516171819
            
import os
os.system('wget https://staging-content-media-cdn.codefinity.com/courses/8d21890f-d960-4129-bc88-096e24211d53/section_1/chapter_3/decorators.py 2>/dev/null')
from decorators import timeit_decorator

# Create a large list and set with the same elements
large_list = list(range(10000000))
large_set = set(large_list)

# Test membership for an element at the end
element_to_find = 9999999

@timeit_decorator(number=50)
def test_membership(element, collection):
    return element in collection

print('List:')
print(test_membership(element_to_find, large_list))
print('Set:')
print(test_membership(element_to_find, large_set))

Tuple

Tuples are typically used when you need to ensure that the data cannot be changed or as a key in a dict or element in set (because tuples are hashable).

Better than lists: when you need immutable data, want to use the collection as a dictionary key or set element, or when you need memory-efficient storage for a fixed-size collection of items;
Better than NumPy arrays: when your data is non-numerical or when immutability is crucial. While NumPy arrays are designed for numerical computations and are mutable by default, tuples provide safety for non-numerical data or small, structured collections that must remain constant.


              12345678
            
# Each tuple in the list represents an immutable student record
students = [
    (1834, 'James', 'Johnson'),
    (2749, 'Alice', 'Smith'),
    (4923, 'Bob', 'Brown')
]
# Attempting to modify a tuple will raise a TypeError
students[0][1] = 'Fred'

Since the students list contains student records (ID, first name, last name) that need to be read-only, it is better to use tuples for each record instead of lists. Additionally, as mentioned above, tuples are slightly more memory-efficient compared to lists.

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 2. Capítulo 2

Optimization Techniques in Python

Sets and Tuples

Set

Tuple

1. Which of the following scenarios is best suited for using a set instead of a list?

2. You have a dataset with millions of records and need to frequently check if specific values exist within it. Which data structure is the most efficient for this purpose?

3. You are creating a record for each student that includes a unique ID, first name, and last name. The data should not be altered once created. Which data structure would be the most appropriate?