How to dump data from Redshift to JSON

Extracting data from redshift is nearly as common as getting data in. Sometimes, the results of hard computations done in Redshift are necessary for serving systems. Other times, a large export is needed for analysis in Excel or other tools. UNLOAD command can be used to extract data from redshift to s3 in various formates like Delimited or fixed-width formate.

UNLOAD ('select * from my_table')
  TO 's3://bucket_name/path/to/my_filename_prefix'
  WITH CREDENTIALS
    'aws_access_key_id=<my_access_key>;
    aws_secret_access_key=<my_secret_key>'
  MANIFEST
  GZIP
  ALLOWOVERWRITE
  ESCAPE
  NULL AS '\\N'

 

Although is quite easy to extract data from redshift to s3 buckets in various formates like Delimited or fixed-width formates, but there is no direct way to export the data in JSON formate.

In this blog I have tried to explain a work around to extract the data in json format. UNLOAD command dumps the output data of a select query by using some case statement the select query can be converted to output a json document to which can be dumped by the UNLOAD command.

SELECT '{ ' || case when "colimn1" is null then '' else '"column1": "' || "column1" END ||
               case when "column2" is null then '' else '", "column2": "' || "column2" END ||
       '" }'
FROM target_table;

By putting this query inside UNLOAD we can dump the data in json formate in s3.

UNLOAD ('SELECT \'{ \' || case when "colimn1" is null then \'\' else \'"column1": "\' || "column1" END ||
               case when "column2" is null then \'\' else \'", "column2": "\' || "column2" END ||
       \'" }\' 
FROM target_table')
  TO 's3://bucket_name/path/to/my_filename_prefix'
  WITH CREDENTIALS
    'aws_access_key_id=<my_access_key>;
    aws_secret_access_key=<my_secret_key>'
  MANIFEST
  GZIP
  ALLOWOVERWRITE
  ESCAPE
  NULL AS '\\N'

Leave a comment