Pulling segments from a large table by passing parameters from python
I have a large table which is a union of several small selects as below:
create_sql = str("""
drop table if exists public.segments_table;
create table public.segments_table as
select
household_id,
case
when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201804))
then '1-3 Month Disco'
else 'No Disco'
end as disco_status,'201804' as churn_date,
b.*
from
public.churn_table b
where drform = 'ACTIVE'
and dt = 201803
and tenure > 12
union all
select
household_id,
case
when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201805))
then '1-3 Month Disco'
else 'No Disco'
end as disco_status,'201805' as churn_date,
b.*
from
public.churn_table b
where drform = 'ACTIVE'
and dt = 201803
and tenure > 12
union all
select
household_id,
case
when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201806))
then '1-3 Month Disco'
else 'No Disco'
end as disco_status,'201806' as churn_date,
b.*
from
public.churn_table b
where drform = 'ACTIVE'
and dt = 201803
and tenure > 12;
""")
#Now for each dt ,I pull the relevant records as below in python:
target_column = 'disco_status'
table_name = 'public.segments_table'
# Running for segments
query_params = {'month_date' : '201803',
'churn_date' : '201804',
'disco_target' : '1-3 Month Disco',
'Y0' : 'No Disco'
}
sql_data_sample = str("""select * from {1}
where dt = %(month_date)s
and churn_date = %(churn_date)s
and {0} IN (%(disco_target)s,'No Disco')
;""").format(target_column,table_name)
All these are part of a script called sql_data_prep.py which is imported in the main script:
from sql_data_prep import sql_data_sample,query_params,create_sql
#then a connection to the server is established and the query sql_data_sample is used to fetch various segments based on month_date and churn_date
database = 'Redshift Test'
print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
try:
cnxn = psycopg2.connect(database='xxxx',host='localhost',port='8880',user='dbo',password='xxxx')
curs = cnxn.cursor()
print ('*'*10 + "nConnected to %s!"%(database))
print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
##############################################################################
#Parameterized Query: table_name and target_var to be also made as parameters.
##############################################################################
curs.execute(create_sql)
cnxn.commit()
print('*'*10 + "nDone creating table with data")
print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)
Now as you can see , I need to change the parameters thrice in query_params for three different segments :
1. Where dt = 201803 and churn_date = 201804
2. Where dt = 201803 and churn_date = 201805
3. Where dt = 201803 and churn_date = 201806
But I want to pass these as parameters and pull each segment sequentially in a for loop.
So my new query_params dict will look like:
query_params = {'month_date' : '201803',
'churn_date' : ['201804','201805,'201806'],
'disco_target' : '1-3 Month Disco',
'Y0' : 'No Disco'
}
Once I connect to the database, I can:
for month in query_params['month_date]:
for churn in query_params['churn_date']:
sql_data_sample = str("""select * from {1}
where dt = %(month)s
and churn_date = %(churn)s
and {0} IN (%(disco_target)s,'No Disco')
;""").format(target_column,table_name)
# Some data processing functions:
This is being done to automate the process of creating tables from unions of several selects and then pull the segments sequentially, do some data processing
and move on to the next segment(which is defined by month_date and churn_date here.
I hope my question is clear and can someone please help me with this?
python parameterized-query
add a comment |
I have a large table which is a union of several small selects as below:
create_sql = str("""
drop table if exists public.segments_table;
create table public.segments_table as
select
household_id,
case
when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201804))
then '1-3 Month Disco'
else 'No Disco'
end as disco_status,'201804' as churn_date,
b.*
from
public.churn_table b
where drform = 'ACTIVE'
and dt = 201803
and tenure > 12
union all
select
household_id,
case
when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201805))
then '1-3 Month Disco'
else 'No Disco'
end as disco_status,'201805' as churn_date,
b.*
from
public.churn_table b
where drform = 'ACTIVE'
and dt = 201803
and tenure > 12
union all
select
household_id,
case
when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201806))
then '1-3 Month Disco'
else 'No Disco'
end as disco_status,'201806' as churn_date,
b.*
from
public.churn_table b
where drform = 'ACTIVE'
and dt = 201803
and tenure > 12;
""")
#Now for each dt ,I pull the relevant records as below in python:
target_column = 'disco_status'
table_name = 'public.segments_table'
# Running for segments
query_params = {'month_date' : '201803',
'churn_date' : '201804',
'disco_target' : '1-3 Month Disco',
'Y0' : 'No Disco'
}
sql_data_sample = str("""select * from {1}
where dt = %(month_date)s
and churn_date = %(churn_date)s
and {0} IN (%(disco_target)s,'No Disco')
;""").format(target_column,table_name)
All these are part of a script called sql_data_prep.py which is imported in the main script:
from sql_data_prep import sql_data_sample,query_params,create_sql
#then a connection to the server is established and the query sql_data_sample is used to fetch various segments based on month_date and churn_date
database = 'Redshift Test'
print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
try:
cnxn = psycopg2.connect(database='xxxx',host='localhost',port='8880',user='dbo',password='xxxx')
curs = cnxn.cursor()
print ('*'*10 + "nConnected to %s!"%(database))
print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
##############################################################################
#Parameterized Query: table_name and target_var to be also made as parameters.
##############################################################################
curs.execute(create_sql)
cnxn.commit()
print('*'*10 + "nDone creating table with data")
print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)
Now as you can see , I need to change the parameters thrice in query_params for three different segments :
1. Where dt = 201803 and churn_date = 201804
2. Where dt = 201803 and churn_date = 201805
3. Where dt = 201803 and churn_date = 201806
But I want to pass these as parameters and pull each segment sequentially in a for loop.
So my new query_params dict will look like:
query_params = {'month_date' : '201803',
'churn_date' : ['201804','201805,'201806'],
'disco_target' : '1-3 Month Disco',
'Y0' : 'No Disco'
}
Once I connect to the database, I can:
for month in query_params['month_date]:
for churn in query_params['churn_date']:
sql_data_sample = str("""select * from {1}
where dt = %(month)s
and churn_date = %(churn)s
and {0} IN (%(disco_target)s,'No Disco')
;""").format(target_column,table_name)
# Some data processing functions:
This is being done to automate the process of creating tables from unions of several selects and then pull the segments sequentially, do some data processing
and move on to the next segment(which is defined by month_date and churn_date here.
I hope my question is clear and can someone please help me with this?
python parameterized-query
add a comment |
I have a large table which is a union of several small selects as below:
create_sql = str("""
drop table if exists public.segments_table;
create table public.segments_table as
select
household_id,
case
when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201804))
then '1-3 Month Disco'
else 'No Disco'
end as disco_status,'201804' as churn_date,
b.*
from
public.churn_table b
where drform = 'ACTIVE'
and dt = 201803
and tenure > 12
union all
select
household_id,
case
when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201805))
then '1-3 Month Disco'
else 'No Disco'
end as disco_status,'201805' as churn_date,
b.*
from
public.churn_table b
where drform = 'ACTIVE'
and dt = 201803
and tenure > 12
union all
select
household_id,
case
when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201806))
then '1-3 Month Disco'
else 'No Disco'
end as disco_status,'201806' as churn_date,
b.*
from
public.churn_table b
where drform = 'ACTIVE'
and dt = 201803
and tenure > 12;
""")
#Now for each dt ,I pull the relevant records as below in python:
target_column = 'disco_status'
table_name = 'public.segments_table'
# Running for segments
query_params = {'month_date' : '201803',
'churn_date' : '201804',
'disco_target' : '1-3 Month Disco',
'Y0' : 'No Disco'
}
sql_data_sample = str("""select * from {1}
where dt = %(month_date)s
and churn_date = %(churn_date)s
and {0} IN (%(disco_target)s,'No Disco')
;""").format(target_column,table_name)
All these are part of a script called sql_data_prep.py which is imported in the main script:
from sql_data_prep import sql_data_sample,query_params,create_sql
#then a connection to the server is established and the query sql_data_sample is used to fetch various segments based on month_date and churn_date
database = 'Redshift Test'
print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
try:
cnxn = psycopg2.connect(database='xxxx',host='localhost',port='8880',user='dbo',password='xxxx')
curs = cnxn.cursor()
print ('*'*10 + "nConnected to %s!"%(database))
print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
##############################################################################
#Parameterized Query: table_name and target_var to be also made as parameters.
##############################################################################
curs.execute(create_sql)
cnxn.commit()
print('*'*10 + "nDone creating table with data")
print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)
Now as you can see , I need to change the parameters thrice in query_params for three different segments :
1. Where dt = 201803 and churn_date = 201804
2. Where dt = 201803 and churn_date = 201805
3. Where dt = 201803 and churn_date = 201806
But I want to pass these as parameters and pull each segment sequentially in a for loop.
So my new query_params dict will look like:
query_params = {'month_date' : '201803',
'churn_date' : ['201804','201805,'201806'],
'disco_target' : '1-3 Month Disco',
'Y0' : 'No Disco'
}
Once I connect to the database, I can:
for month in query_params['month_date]:
for churn in query_params['churn_date']:
sql_data_sample = str("""select * from {1}
where dt = %(month)s
and churn_date = %(churn)s
and {0} IN (%(disco_target)s,'No Disco')
;""").format(target_column,table_name)
# Some data processing functions:
This is being done to automate the process of creating tables from unions of several selects and then pull the segments sequentially, do some data processing
and move on to the next segment(which is defined by month_date and churn_date here.
I hope my question is clear and can someone please help me with this?
python parameterized-query
I have a large table which is a union of several small selects as below:
create_sql = str("""
drop table if exists public.segments_table;
create table public.segments_table as
select
household_id,
case
when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201804))
then '1-3 Month Disco'
else 'No Disco'
end as disco_status,'201804' as churn_date,
b.*
from
public.churn_table b
where drform = 'ACTIVE'
and dt = 201803
and tenure > 12
union all
select
household_id,
case
when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201805))
then '1-3 Month Disco'
else 'No Disco'
end as disco_status,'201805' as churn_date,
b.*
from
public.churn_table b
where drform = 'ACTIVE'
and dt = 201803
and tenure > 12
union all
select
household_id,
case
when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201806))
then '1-3 Month Disco'
else 'No Disco'
end as disco_status,'201806' as churn_date,
b.*
from
public.churn_table b
where drform = 'ACTIVE'
and dt = 201803
and tenure > 12;
""")
#Now for each dt ,I pull the relevant records as below in python:
target_column = 'disco_status'
table_name = 'public.segments_table'
# Running for segments
query_params = {'month_date' : '201803',
'churn_date' : '201804',
'disco_target' : '1-3 Month Disco',
'Y0' : 'No Disco'
}
sql_data_sample = str("""select * from {1}
where dt = %(month_date)s
and churn_date = %(churn_date)s
and {0} IN (%(disco_target)s,'No Disco')
;""").format(target_column,table_name)
All these are part of a script called sql_data_prep.py which is imported in the main script:
from sql_data_prep import sql_data_sample,query_params,create_sql
#then a connection to the server is established and the query sql_data_sample is used to fetch various segments based on month_date and churn_date
database = 'Redshift Test'
print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
try:
cnxn = psycopg2.connect(database='xxxx',host='localhost',port='8880',user='dbo',password='xxxx')
curs = cnxn.cursor()
print ('*'*10 + "nConnected to %s!"%(database))
print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
##############################################################################
#Parameterized Query: table_name and target_var to be also made as parameters.
##############################################################################
curs.execute(create_sql)
cnxn.commit()
print('*'*10 + "nDone creating table with data")
print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)
Now as you can see , I need to change the parameters thrice in query_params for three different segments :
1. Where dt = 201803 and churn_date = 201804
2. Where dt = 201803 and churn_date = 201805
3. Where dt = 201803 and churn_date = 201806
But I want to pass these as parameters and pull each segment sequentially in a for loop.
So my new query_params dict will look like:
query_params = {'month_date' : '201803',
'churn_date' : ['201804','201805,'201806'],
'disco_target' : '1-3 Month Disco',
'Y0' : 'No Disco'
}
Once I connect to the database, I can:
for month in query_params['month_date]:
for churn in query_params['churn_date']:
sql_data_sample = str("""select * from {1}
where dt = %(month)s
and churn_date = %(churn)s
and {0} IN (%(disco_target)s,'No Disco')
;""").format(target_column,table_name)
# Some data processing functions:
This is being done to automate the process of creating tables from unions of several selects and then pull the segments sequentially, do some data processing
and move on to the next segment(which is defined by month_date and churn_date here.
I hope my question is clear and can someone please help me with this?
python parameterized-query
python parameterized-query
asked Nov 12 at 4:22
Shuvayan Das
424514
424514
add a comment |
add a comment |
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53255933%2fpulling-segments-from-a-large-table-by-passing-parameters-from-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53255933%2fpulling-segments-from-a-large-table-by-passing-parameters-from-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown