Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте PyArrow Arrays: Creation and Inspection | Working with PyArrow
Apache Arrow and PyArrow for Data Scientists

bookPyArrow Arrays: Creation and Inspection

PyArrow is the official Python interface to the Arrow ecosystem, allowing you to work directly with Arrow arrays and data structures in your Python code. In the previous section, you explored Arrow arrays and their data types at a conceptual level. With PyArrow, you can now create and manipulate these arrays programmatically, opening up efficient, columnar data processing in your Python workflows.

To create a simple Arrow array with PyArrow, you first import the pyarrow library. The most common entry point is the pyarrow.array() function, which takes a Python sequence (such as a list) and, optionally, a specific Arrow data type. If you do not specify a type, PyArrow will infer it from your data. For example, to create an array of integers, you can write:

12345678
import pyarrow as pa # Create an Arrow array from a Python list of integers data = [1, 2, 3, 4, 5] arr = pa.array(data, type=pa.int32()) print(arr) print("Type:", arr.type)
copy
Note
Definition

A pyarrow.Array is an immutable, fixed-length sequence representing a column of data in Arrow format. Key methods for inspection include:

  • type: returns the Arrow data type of the array;
  • length(): returns the number of elements;
  • null_count: returns the number of null or missing values;
  • to_pylist(): converts the array to a standard Python list.

Once you have created an Arrow array, you can inspect its properties to better understand your data. For instance, you might want to know the array's length, its data type, or how many null values it contains. Building on the previous creation example, you can access these properties directly:

12345678
import pyarrow as pa arr = pa.array([1, 2, None, 4], type=pa.int32()) print("Array:", arr) print("Length:", arr.length()) print("Type:", arr.type) print("Null count:", arr.null_count)
copy
question mark

What is the main purpose of PyArrow as described in this chapter?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 3. Розділ 1

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Suggested prompts:

How can I handle null values in Arrow arrays?

What other data types can I use with pyarrow.array()?

Can I convert an Arrow array back to a Python list?

bookPyArrow Arrays: Creation and Inspection

Свайпніть щоб показати меню

PyArrow is the official Python interface to the Arrow ecosystem, allowing you to work directly with Arrow arrays and data structures in your Python code. In the previous section, you explored Arrow arrays and their data types at a conceptual level. With PyArrow, you can now create and manipulate these arrays programmatically, opening up efficient, columnar data processing in your Python workflows.

To create a simple Arrow array with PyArrow, you first import the pyarrow library. The most common entry point is the pyarrow.array() function, which takes a Python sequence (such as a list) and, optionally, a specific Arrow data type. If you do not specify a type, PyArrow will infer it from your data. For example, to create an array of integers, you can write:

12345678
import pyarrow as pa # Create an Arrow array from a Python list of integers data = [1, 2, 3, 4, 5] arr = pa.array(data, type=pa.int32()) print(arr) print("Type:", arr.type)
copy
Note
Definition

A pyarrow.Array is an immutable, fixed-length sequence representing a column of data in Arrow format. Key methods for inspection include:

  • type: returns the Arrow data type of the array;
  • length(): returns the number of elements;
  • null_count: returns the number of null or missing values;
  • to_pylist(): converts the array to a standard Python list.

Once you have created an Arrow array, you can inspect its properties to better understand your data. For instance, you might want to know the array's length, its data type, or how many null values it contains. Building on the previous creation example, you can access these properties directly:

12345678
import pyarrow as pa arr = pa.array([1, 2, None, 4], type=pa.int32()) print("Array:", arr) print("Length:", arr.length()) print("Type:", arr.type) print("Null count:", arr.null_count)
copy
question mark

What is the main purpose of PyArrow as described in this chapter?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 3. Розділ 1
some-alt