Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære PyArrow Arrays: Creation and Inspection | Working with PyArrow
Apache Arrow and PyArrow for Data Scientists

bookPyArrow Arrays: Creation and Inspection

PyArrow is the official Python interface to the Arrow ecosystem, allowing you to work directly with Arrow arrays and data structures in your Python code. In the previous section, you explored Arrow arrays and their data types at a conceptual level. With PyArrow, you can now create and manipulate these arrays programmatically, opening up efficient, columnar data processing in your Python workflows.

To create a simple Arrow array with PyArrow, you first import the pyarrow library. The most common entry point is the pyarrow.array() function, which takes a Python sequence (such as a list) and, optionally, a specific Arrow data type. If you do not specify a type, PyArrow will infer it from your data. For example, to create an array of integers, you can write:

12345678
import pyarrow as pa # Create an Arrow array from a Python list of integers data = [1, 2, 3, 4, 5] arr = pa.array(data, type=pa.int32()) print(arr) print("Type:", arr.type)
copy
Note
Definition

A pyarrow.Array is an immutable, fixed-length sequence representing a column of data in Arrow format. Key methods for inspection include:

  • type: returns the Arrow data type of the array;
  • length(): returns the number of elements;
  • null_count: returns the number of null or missing values;
  • to_pylist(): converts the array to a standard Python list.

Once you have created an Arrow array, you can inspect its properties to better understand your data. For instance, you might want to know the array's length, its data type, or how many null values it contains. Building on the previous creation example, you can access these properties directly:

12345678
import pyarrow as pa arr = pa.array([1, 2, None, 4], type=pa.int32()) print("Array:", arr) print("Length:", arr.length()) print("Type:", arr.type) print("Null count:", arr.null_count)
copy
question mark

What is the main purpose of PyArrow as described in this chapter?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 3. Kapitel 1

Spørg AI

expand

Spørg AI

ChatGPT

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

Suggested prompts:

How can I handle null values in Arrow arrays?

What other data types can I use with pyarrow.array()?

Can I convert an Arrow array back to a Python list?

bookPyArrow Arrays: Creation and Inspection

Stryg for at vise menuen

PyArrow is the official Python interface to the Arrow ecosystem, allowing you to work directly with Arrow arrays and data structures in your Python code. In the previous section, you explored Arrow arrays and their data types at a conceptual level. With PyArrow, you can now create and manipulate these arrays programmatically, opening up efficient, columnar data processing in your Python workflows.

To create a simple Arrow array with PyArrow, you first import the pyarrow library. The most common entry point is the pyarrow.array() function, which takes a Python sequence (such as a list) and, optionally, a specific Arrow data type. If you do not specify a type, PyArrow will infer it from your data. For example, to create an array of integers, you can write:

12345678
import pyarrow as pa # Create an Arrow array from a Python list of integers data = [1, 2, 3, 4, 5] arr = pa.array(data, type=pa.int32()) print(arr) print("Type:", arr.type)
copy
Note
Definition

A pyarrow.Array is an immutable, fixed-length sequence representing a column of data in Arrow format. Key methods for inspection include:

  • type: returns the Arrow data type of the array;
  • length(): returns the number of elements;
  • null_count: returns the number of null or missing values;
  • to_pylist(): converts the array to a standard Python list.

Once you have created an Arrow array, you can inspect its properties to better understand your data. For instance, you might want to know the array's length, its data type, or how many null values it contains. Building on the previous creation example, you can access these properties directly:

12345678
import pyarrow as pa arr = pa.array([1, 2, None, 4], type=pa.int32()) print("Array:", arr) print("Length:", arr.length()) print("Type:", arr.type) print("Null count:", arr.null_count)
copy
question mark

What is the main purpose of PyArrow as described in this chapter?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 3. Kapitel 1
some-alt