Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda PyArrow Arrays: Creation and Inspection | Working with PyArrow
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Apache Arrow and PyArrow for Data Scientists

bookPyArrow Arrays: Creation and Inspection

PyArrow is the official Python interface to the Arrow ecosystem, allowing you to work directly with Arrow arrays and data structures in your Python code. In the previous section, you explored Arrow arrays and their data types at a conceptual level. With PyArrow, you can now create and manipulate these arrays programmatically, opening up efficient, columnar data processing in your Python workflows.

To create a simple Arrow array with PyArrow, you first import the pyarrow library. The most common entry point is the pyarrow.array() function, which takes a Python sequence (such as a list) and, optionally, a specific Arrow data type. If you do not specify a type, PyArrow will infer it from your data. For example, to create an array of integers, you can write:

12345678
import pyarrow as pa # Create an Arrow array from a Python list of integers data = [1, 2, 3, 4, 5] arr = pa.array(data, type=pa.int32()) print(arr) print("Type:", arr.type)
copy
Note
Definition

A pyarrow.Array is an immutable, fixed-length sequence representing a column of data in Arrow format. Key methods for inspection include:

  • type: returns the Arrow data type of the array;
  • length(): returns the number of elements;
  • null_count: returns the number of null or missing values;
  • to_pylist(): converts the array to a standard Python list.

Once you have created an Arrow array, you can inspect its properties to better understand your data. For instance, you might want to know the array's length, its data type, or how many null values it contains. Building on the previous creation example, you can access these properties directly:

12345678
import pyarrow as pa arr = pa.array([1, 2, None, 4], type=pa.int32()) print("Array:", arr) print("Length:", arr.length()) print("Type:", arr.type) print("Null count:", arr.null_count)
copy
question mark

What is the main purpose of PyArrow as described in this chapter?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 3. Capítulo 1

Pergunte à IA

expand

Pergunte à IA

ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

bookPyArrow Arrays: Creation and Inspection

Deslize para mostrar o menu

PyArrow is the official Python interface to the Arrow ecosystem, allowing you to work directly with Arrow arrays and data structures in your Python code. In the previous section, you explored Arrow arrays and their data types at a conceptual level. With PyArrow, you can now create and manipulate these arrays programmatically, opening up efficient, columnar data processing in your Python workflows.

To create a simple Arrow array with PyArrow, you first import the pyarrow library. The most common entry point is the pyarrow.array() function, which takes a Python sequence (such as a list) and, optionally, a specific Arrow data type. If you do not specify a type, PyArrow will infer it from your data. For example, to create an array of integers, you can write:

12345678
import pyarrow as pa # Create an Arrow array from a Python list of integers data = [1, 2, 3, 4, 5] arr = pa.array(data, type=pa.int32()) print(arr) print("Type:", arr.type)
copy
Note
Definition

A pyarrow.Array is an immutable, fixed-length sequence representing a column of data in Arrow format. Key methods for inspection include:

  • type: returns the Arrow data type of the array;
  • length(): returns the number of elements;
  • null_count: returns the number of null or missing values;
  • to_pylist(): converts the array to a standard Python list.

Once you have created an Arrow array, you can inspect its properties to better understand your data. For instance, you might want to know the array's length, its data type, or how many null values it contains. Building on the previous creation example, you can access these properties directly:

12345678
import pyarrow as pa arr = pa.array([1, 2, None, 4], type=pa.int32()) print("Array:", arr) print("Length:", arr.length()) print("Type:", arr.type) print("Null count:", arr.null_count)
copy
question mark

What is the main purpose of PyArrow as described in this chapter?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 3. Capítulo 1
some-alt