Accessing Parquet Files From Spark SQL Applications
Spark SQL supports loading and saving DataFrames from and to a variety of data sources and has native support for Parquet. For information about Parquet, see Using Apache Parquet Data Files with CDH.
To read Parquet files in Spark SQL, use the SQLContext.read.parquet("path") method.
To write Parquet files in Spark SQL, use the DataFrame.write.parquet("path") method.
To set the compression type, configure the spark.sql.parquet.compression.codec property:
sqlContext.setConf("spark.sql.parquet.compression.codec","codec")The supported codec values are: uncompressed, gzip, lzo, and snappy. The default is gzip.
For an example of writing Parquet files to Amazon S3, see Examples of Accessing S3 Data from Spark.
Page generated August 14, 2017.
<< Accessing Avro Data Files From Spark SQL Applications | ©2016 Cloudera, Inc. All rights reserved | Building Spark Applications >> |
Terms and Conditions Privacy Policy |