- Check the postgresql wiki page out:
- Check out redshift developer's guide
[List databases] (https://gist.github.com/patsancu/50ea9416c0c7ba5746d045e9b047955d)
Load data into redsfhit from s3
Data can be loaded recursively. If there's a folder with this structure
a/
b/
p.json.gz
c/
q.json.gz
r.json.gz
Calling the copy sentence with the folder a as source will copy all files under the folder recursively
copy schema.table_name
from 's3://bucket-name/some_folder/some_file.json.gz'
-- from 's3://bucket-name/a'
with credentials 'aws_access_key_id=some_access_key;ws_secret_access_key=supersupersecretkey'
[gzip]
format as json 'auto'
[region 'us-east-1']
-- If the Amazon S3 buckets that hold the data files do not reside in the same region as your cluster, you must use the REGION parameter to specify the region in which the data is located.
Escape strings with Escape
Remove the format as csv thing
copy analyticsmodel.playback
from :sql:source
with credentials :sql:redshift-credentials
delimiter '~'
ignoreheader as 1
[escape]
[truncatecolumns Truncates data in columns to the appropriate number of characters it has been specified]
Load into redshift with custom field ordering
You can specify a comma-separated list of column names to load source data fields into specific target columns. Docs
copy schema.table (column_a, column_b, column_c, column_d, column_e, colummn_f)
from 's3://some_file.csv'
with credentials 'aws_access_key_id=some_access_key;ws_secret_access_key=supersupersecretkey'
format as csv
ignoreheader as 1
TRUNCATECOLUMNS
Check errors
select * from stl_load_errors
order by starttime desc;
Check info about permanent tables
The STV_TBL_PERM table contains information about the permanent tables in Amazon Redshift. More info here
select * FROM stv_tbl_perm ;
Size info about tables
First method
select "database",
"schema" || '.' || "table", "size" as "size in Mb", "tbl_rows" as "rows"
from svv_table_info
Second method
The query below's info is not exhaustive nor complete
SELECT TRIM(pgdb.datname) AS DATABASE,
TRIM(pgn.nspname) AS SCHEMA,
TRIM(a.name) AS TABLE,
b.mbytes,
a.rows
FROM (SELECT db_id,
id,
name,
SUM(ROWS) AS ROWS
FROM stv_tbl_perm a
GROUP BY db_id,
id,
name) AS a
JOIN pg_class AS pgc ON pgc.oid = a.id
JOIN pg_namespace AS pgn ON pgn.oid = pgc.relnamespace
JOIN pg_database AS pgdb ON pgdb.oid = a.db_id
JOIN (SELECT tbl, COUNT(*) AS mbytes FROM stv_blocklist GROUP BY tbl) b ON a.id = b.tbl
ORDER BY a.db_id,
a.name;
Generate date series
- Generate series using any table as a dummy table.
create table fechas as
select (
date('2017-06-30') + row_number() over (order by true)
)::date as fecha
from ibc.playback limit 40
Warning. The date_series function thing doesn't work that well on redshift when doing joins and stuff like that. For more info, check redshift's developer guide.
WITH date_series_bounds AS (
SELECT date('2012-12-21') as start, date('2013-08-23') as end
), date_series AS (
select date(days.start + days.interval)
from (
select bounds.start, generate_series(0, bounds.end - bounds.start) AS interval from date_series_bounds bounds
) as days
)
select * from date_series
or
select current_date - (n*30 || ' minutes')::interval
from generate_series (0, 100*5-1) n
or
select substr(to_date('2017-06-22', 'YYYY-MM-DD') + (n || ' days')::interval, 1,10)
from generate_series (0, 6) n
List schemas
select *
from information_schema.schemata
Create schema
CREATE SCHEMA IF NOT EXISTS schema_name
Check data load
select * from STL_FILE_SCAN
where name like '%screentime%'
order by curtime desc