Feather: A Fast On-Disk Format for Data Frames for R and Python, powered by Apache Arrow

Wes McKinney, Software Engineer, Cloudera
Hadley Wickham, Chief Scientist, RStudio

This past January, we (Hadley and Wes) met and discussed some of the systems challenges facing the Python and R open source communities. In particular, we wanted to see if there were some opportunities to collaborate on tools for improving interoperability between Python, R, and external compute and storage systems.

One thing that struck us was that while R’s data frames and Python’s pandas data frames utilize very different internal memory representations, they share a very similar semantic model. In both R and Panda’s, data frames are lists of named, equal-length columns, which can be numeric, boolean, and date-and-time, categorical (factors), or string. Every column can have missing values.

Around this time, the open source community had just started the new Apache Arrow project, designed to improve data interoperability for systems dealing with columnar tabular data.

In discussing Apache Arrow in the context of Python and R, we wanted to see if we could use the insights from feather to design a very fast file format for storing data frames that could be used by both languages. Thus, the Feather format was born.

What is Feather?

Feather is a fast, lightweight, and easy-to-use binary file format for storing data frames. It has a few specific design goals:

Lightweight, minimal API: make pushing data frames in and out of memory as simple as possible
Language agnostic: Feather files are the same whether written by Python or R code. Other languages can read and write Feather files, too.
High read and write performance. When possible, Feather operations should be bound by local disk performance.

Code examples

The Feather API is designed to make reading and writing data frames as easy as possible. In R, the code might look like:

library(feather)
path <- "my_data.feather"
write_feather(df, path)
df <- read_feather(path)

Analogously, in Python, we have:

import  ...read more
Source:: http://blog.rstudio.org

Feather: A Fast On-Disk Format for Data Frames for R and Python, powered by Apache Arrow

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112