PyArrow Arrays: Creation and Inspection
PyArrow is the official Python interface to the Arrow ecosystem, allowing you to work directly with Arrow arrays and data structures in your Python code. In the previous section, you explored Arrow arrays and their data types at a conceptual level. With PyArrow, you can now create and manipulate these arrays programmatically, opening up efficient, columnar data processing in your Python workflows.
To create a simple Arrow array with PyArrow, you first import the pyarrow library. The most common entry point is the pyarrow.array() function, which takes a Python sequence (such as a list) and, optionally, a specific Arrow data type. If you do not specify a type, PyArrow will infer it from your data. For example, to create an array of integers, you can write:
12345678import pyarrow as pa # Create an Arrow array from a Python list of integers data = [1, 2, 3, 4, 5] arr = pa.array(data, type=pa.int32()) print(arr) print("Type:", arr.type)
A pyarrow.Array is an immutable, fixed-length sequence representing a column of data in Arrow format. Key methods for inspection include:
type: returns the Arrow data type of the array;length(): returns the number of elements;null_count: returns the number of null or missing values;to_pylist(): converts the array to a standard Python list.
Once you have created an Arrow array, you can inspect its properties to better understand your data. For instance, you might want to know the array's length, its data type, or how many null values it contains. Building on the previous creation example, you can access these properties directly:
12345678import pyarrow as pa arr = pa.array([1, 2, None, 4], type=pa.int32()) print("Array:", arr) print("Length:", arr.length()) print("Type:", arr.type) print("Null count:", arr.null_count)
Obrigado pelo seu feedback!
Pergunte à IA
Pergunte à IA
Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo
Incrível!
Completion taxa melhorada para 8.33
PyArrow Arrays: Creation and Inspection
Deslize para mostrar o menu
PyArrow is the official Python interface to the Arrow ecosystem, allowing you to work directly with Arrow arrays and data structures in your Python code. In the previous section, you explored Arrow arrays and their data types at a conceptual level. With PyArrow, you can now create and manipulate these arrays programmatically, opening up efficient, columnar data processing in your Python workflows.
To create a simple Arrow array with PyArrow, you first import the pyarrow library. The most common entry point is the pyarrow.array() function, which takes a Python sequence (such as a list) and, optionally, a specific Arrow data type. If you do not specify a type, PyArrow will infer it from your data. For example, to create an array of integers, you can write:
12345678import pyarrow as pa # Create an Arrow array from a Python list of integers data = [1, 2, 3, 4, 5] arr = pa.array(data, type=pa.int32()) print(arr) print("Type:", arr.type)
A pyarrow.Array is an immutable, fixed-length sequence representing a column of data in Arrow format. Key methods for inspection include:
type: returns the Arrow data type of the array;length(): returns the number of elements;null_count: returns the number of null or missing values;to_pylist(): converts the array to a standard Python list.
Once you have created an Arrow array, you can inspect its properties to better understand your data. For instance, you might want to know the array's length, its data type, or how many null values it contains. Building on the previous creation example, you can access these properties directly:
12345678import pyarrow as pa arr = pa.array([1, 2, None, 4], type=pa.int32()) print("Array:", arr) print("Length:", arr.length()) print("Type:", arr.type) print("Null count:", arr.null_count)
Obrigado pelo seu feedback!