Conteúdo do Curso
Optimization Techniques in Python
Optimization Techniques in Python
Efficient String Operations
String operations are a common task in Python programming, and optimizing them can significantly enhance performance, especially when working with large datasets or repetitive tasks.
Efficient String Concatenation
When working with many strings, it’s essential to use the most efficient method for concatenation. Using the +
(+=
) operator repeatedly is inefficient for large datasets, as it creates a new string each time. Instead, using str.join()
is much faster and more memory-efficient.
Let's compare the performance of two approaches for concatenating strings with newline characters into a single string. The first uses a for
loop with the +=
operator, while the second leverages the more efficient str.join()
method.
import os decorators = os.system('wget https://codefinity-content-media-v2.s3.eu-west-1.amazonaws.com/courses/8d21890f-d960-4129-bc88-096e24211d53/section_1/chapter_3/decorators.py 2>/dev/null') from decorators import timeit_decorator # Simulated lines of a report lines = [f"Line {i}" for i in range(1, 1000001)] # Inefficient concatenation @timeit_decorator(number=50) def concat_with_plus(): result = "" for line in lines: result += line + "\n" return result # Efficient concatenation @timeit_decorator(number=50) def concat_with_join(): return "\n".join(lines) + "\n" # Add final newline for consistency result_plus = concat_with_plus() result_join = concat_with_join() print(result_plus == result_join)
Precompiling Regular Expressions
When working with regular expressions in Python, performance can become a concern, especially when dealing with large datasets or repetitive pattern matching. In such cases, precompiling the pattern is a useful optimization technique.
Precompiling ensures that the regex engine doesn't recompile the pattern every time it's used, which can significantly improve performance when the same pattern is applied multiple times across a dataset. This approach is particularly beneficial in scenarios like filtering, validation, or searching in large text files.
Let's compare the performance of two approaches for validating usernames using regular expressions. The first approach uses the re.match
function with the pattern defined inline each time it's called. The second, more efficient approach, precompiles the regex pattern using re.compile
and reuses it for all validations.
import os import re decorators = os.system('wget https://codefinity-content-media-v2.s3.eu-west-1.amazonaws.com/courses/8d21890f-d960-4129-bc88-096e24211d53/section_1/chapter_3/decorators.py 2>/dev/null') from decorators import timeit_decorator # Simulated usernames usernames = ["user123", "admin!@#", "test_user", "invalid!"] * 100000 # Naive approach @timeit_decorator(number=10) def validate_with_re(): pattern = r"^\w+$" return [bool(re.match(pattern, username)) for username in usernames] # Optimized approach @timeit_decorator(number=10) def validate_with_compiled_re(): compiled_pattern = re.compile(r"^\w+$") return [bool(compiled_pattern.match(username)) for username in usernames] result_without_precompiling = validate_with_re() result_with_precompiling = validate_with_compiled_re() print(result_without_precompiling == result_with_precompiling)
The pattern r"^\w+$"
is a regular expression used to validate usernames. Here's what each part of the pattern means:
^
: asserts the start of the string;\w
: matches any alphanumeric character (letters, digits) and underscores;+
: matches one or more of the preceding character class (\w
);$
: asserts the end of the string.
Together, this pattern ensures that the username consists of only alphanumeric characters or underscores and doesn't include any special characters, spaces, or invalid symbols.
Obrigado pelo seu feedback!