Pulling segments from a large table by passing parameters from python












0














I have a large table which is a union of several small selects as below:



create_sql = str("""
drop table if exists public.segments_table;
create table public.segments_table as

select
household_id,
case
when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201804))
then '1-3 Month Disco'
else 'No Disco'
end as disco_status,'201804' as churn_date,
b.*
from
public.churn_table b
where drform = 'ACTIVE'
and dt = 201803
and tenure > 12

union all

select
household_id,
case
when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201805))
then '1-3 Month Disco'
else 'No Disco'
end as disco_status,'201805' as churn_date,
b.*
from
public.churn_table b
where drform = 'ACTIVE'
and dt = 201803
and tenure > 12

union all

select
household_id,

case
when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201806))
then '1-3 Month Disco'
else 'No Disco'
end as disco_status,'201806' as churn_date,
b.*
from
public.churn_table b
where drform = 'ACTIVE'
and dt = 201803
and tenure > 12;
""")

#Now for each dt ,I pull the relevant records as below in python:

target_column = 'disco_status'
table_name = 'public.segments_table'

# Running for segments
query_params = {'month_date' : '201803',
'churn_date' : '201804',
'disco_target' : '1-3 Month Disco',
'Y0' : 'No Disco'
}

sql_data_sample = str("""select * from {1}
where dt = %(month_date)s
and churn_date = %(churn_date)s
and {0} IN (%(disco_target)s,'No Disco')
;""").format(target_column,table_name)


All these are part of a script called sql_data_prep.py which is imported in the main script:



from sql_data_prep import sql_data_sample,query_params,create_sql

#then a connection to the server is established and the query sql_data_sample is used to fetch various segments based on month_date and churn_date

database = 'Redshift Test'

print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
try:
cnxn = psycopg2.connect(database='xxxx',host='localhost',port='8880',user='dbo',password='xxxx')
curs = cnxn.cursor()
print ('*'*10 + "nConnected to %s!"%(database))
print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
##############################################################################
#Parameterized Query: table_name and target_var to be also made as parameters.
##############################################################################

curs.execute(create_sql)
cnxn.commit()
print('*'*10 + "nDone creating table with data")
print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))

df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)


Now as you can see , I need to change the parameters thrice in query_params for three different segments :



1. Where dt = 201803 and churn_date = 201804
2. Where dt = 201803 and churn_date = 201805
3. Where dt = 201803 and churn_date = 201806


But I want to pass these as parameters and pull each segment sequentially in a for loop.
So my new query_params dict will look like:



query_params = {'month_date' : '201803',
'churn_date' : ['201804','201805,'201806'],
'disco_target' : '1-3 Month Disco',
'Y0' : 'No Disco'
}


Once I connect to the database, I can:



for month in query_params['month_date]:
for churn in query_params['churn_date']:
sql_data_sample = str("""select * from {1}
where dt = %(month)s
and churn_date = %(churn)s
and {0} IN (%(disco_target)s,'No Disco')
;""").format(target_column,table_name)

# Some data processing functions:


This is being done to automate the process of creating tables from unions of several selects and then pull the segments sequentially, do some data processing
and move on to the next segment(which is defined by month_date and churn_date here.



I hope my question is clear and can someone please help me with this?










share|improve this question



























    0














    I have a large table which is a union of several small selects as below:



    create_sql = str("""
    drop table if exists public.segments_table;
    create table public.segments_table as

    select
    household_id,
    case
    when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201804))
    then '1-3 Month Disco'
    else 'No Disco'
    end as disco_status,'201804' as churn_date,
    b.*
    from
    public.churn_table b
    where drform = 'ACTIVE'
    and dt = 201803
    and tenure > 12

    union all

    select
    household_id,
    case
    when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201805))
    then '1-3 Month Disco'
    else 'No Disco'
    end as disco_status,'201805' as churn_date,
    b.*
    from
    public.churn_table b
    where drform = 'ACTIVE'
    and dt = 201803
    and tenure > 12

    union all

    select
    household_id,

    case
    when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201806))
    then '1-3 Month Disco'
    else 'No Disco'
    end as disco_status,'201806' as churn_date,
    b.*
    from
    public.churn_table b
    where drform = 'ACTIVE'
    and dt = 201803
    and tenure > 12;
    """)

    #Now for each dt ,I pull the relevant records as below in python:

    target_column = 'disco_status'
    table_name = 'public.segments_table'

    # Running for segments
    query_params = {'month_date' : '201803',
    'churn_date' : '201804',
    'disco_target' : '1-3 Month Disco',
    'Y0' : 'No Disco'
    }

    sql_data_sample = str("""select * from {1}
    where dt = %(month_date)s
    and churn_date = %(churn_date)s
    and {0} IN (%(disco_target)s,'No Disco')
    ;""").format(target_column,table_name)


    All these are part of a script called sql_data_prep.py which is imported in the main script:



    from sql_data_prep import sql_data_sample,query_params,create_sql

    #then a connection to the server is established and the query sql_data_sample is used to fetch various segments based on month_date and churn_date

    database = 'Redshift Test'

    print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
    try:
    cnxn = psycopg2.connect(database='xxxx',host='localhost',port='8880',user='dbo',password='xxxx')
    curs = cnxn.cursor()
    print ('*'*10 + "nConnected to %s!"%(database))
    print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
    ##############################################################################
    #Parameterized Query: table_name and target_var to be also made as parameters.
    ##############################################################################

    curs.execute(create_sql)
    cnxn.commit()
    print('*'*10 + "nDone creating table with data")
    print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))

    df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)


    Now as you can see , I need to change the parameters thrice in query_params for three different segments :



    1. Where dt = 201803 and churn_date = 201804
    2. Where dt = 201803 and churn_date = 201805
    3. Where dt = 201803 and churn_date = 201806


    But I want to pass these as parameters and pull each segment sequentially in a for loop.
    So my new query_params dict will look like:



    query_params = {'month_date' : '201803',
    'churn_date' : ['201804','201805,'201806'],
    'disco_target' : '1-3 Month Disco',
    'Y0' : 'No Disco'
    }


    Once I connect to the database, I can:



    for month in query_params['month_date]:
    for churn in query_params['churn_date']:
    sql_data_sample = str("""select * from {1}
    where dt = %(month)s
    and churn_date = %(churn)s
    and {0} IN (%(disco_target)s,'No Disco')
    ;""").format(target_column,table_name)

    # Some data processing functions:


    This is being done to automate the process of creating tables from unions of several selects and then pull the segments sequentially, do some data processing
    and move on to the next segment(which is defined by month_date and churn_date here.



    I hope my question is clear and can someone please help me with this?










    share|improve this question

























      0












      0








      0







      I have a large table which is a union of several small selects as below:



      create_sql = str("""
      drop table if exists public.segments_table;
      create table public.segments_table as

      select
      household_id,
      case
      when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201804))
      then '1-3 Month Disco'
      else 'No Disco'
      end as disco_status,'201804' as churn_date,
      b.*
      from
      public.churn_table b
      where drform = 'ACTIVE'
      and dt = 201803
      and tenure > 12

      union all

      select
      household_id,
      case
      when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201805))
      then '1-3 Month Disco'
      else 'No Disco'
      end as disco_status,'201805' as churn_date,
      b.*
      from
      public.churn_table b
      where drform = 'ACTIVE'
      and dt = 201803
      and tenure > 12

      union all

      select
      household_id,

      case
      when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201806))
      then '1-3 Month Disco'
      else 'No Disco'
      end as disco_status,'201806' as churn_date,
      b.*
      from
      public.churn_table b
      where drform = 'ACTIVE'
      and dt = 201803
      and tenure > 12;
      """)

      #Now for each dt ,I pull the relevant records as below in python:

      target_column = 'disco_status'
      table_name = 'public.segments_table'

      # Running for segments
      query_params = {'month_date' : '201803',
      'churn_date' : '201804',
      'disco_target' : '1-3 Month Disco',
      'Y0' : 'No Disco'
      }

      sql_data_sample = str("""select * from {1}
      where dt = %(month_date)s
      and churn_date = %(churn_date)s
      and {0} IN (%(disco_target)s,'No Disco')
      ;""").format(target_column,table_name)


      All these are part of a script called sql_data_prep.py which is imported in the main script:



      from sql_data_prep import sql_data_sample,query_params,create_sql

      #then a connection to the server is established and the query sql_data_sample is used to fetch various segments based on month_date and churn_date

      database = 'Redshift Test'

      print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
      try:
      cnxn = psycopg2.connect(database='xxxx',host='localhost',port='8880',user='dbo',password='xxxx')
      curs = cnxn.cursor()
      print ('*'*10 + "nConnected to %s!"%(database))
      print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
      ##############################################################################
      #Parameterized Query: table_name and target_var to be also made as parameters.
      ##############################################################################

      curs.execute(create_sql)
      cnxn.commit()
      print('*'*10 + "nDone creating table with data")
      print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))

      df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)


      Now as you can see , I need to change the parameters thrice in query_params for three different segments :



      1. Where dt = 201803 and churn_date = 201804
      2. Where dt = 201803 and churn_date = 201805
      3. Where dt = 201803 and churn_date = 201806


      But I want to pass these as parameters and pull each segment sequentially in a for loop.
      So my new query_params dict will look like:



      query_params = {'month_date' : '201803',
      'churn_date' : ['201804','201805,'201806'],
      'disco_target' : '1-3 Month Disco',
      'Y0' : 'No Disco'
      }


      Once I connect to the database, I can:



      for month in query_params['month_date]:
      for churn in query_params['churn_date']:
      sql_data_sample = str("""select * from {1}
      where dt = %(month)s
      and churn_date = %(churn)s
      and {0} IN (%(disco_target)s,'No Disco')
      ;""").format(target_column,table_name)

      # Some data processing functions:


      This is being done to automate the process of creating tables from unions of several selects and then pull the segments sequentially, do some data processing
      and move on to the next segment(which is defined by month_date and churn_date here.



      I hope my question is clear and can someone please help me with this?










      share|improve this question













      I have a large table which is a union of several small selects as below:



      create_sql = str("""
      drop table if exists public.segments_table;
      create table public.segments_table as

      select
      household_id,
      case
      when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201804))
      then '1-3 Month Disco'
      else 'No Disco'
      end as disco_status,'201804' as churn_date,
      b.*
      from
      public.churn_table b
      where drform = 'ACTIVE'
      and dt = 201803
      and tenure > 12

      union all

      select
      household_id,
      case
      when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201805))
      then '1-3 Month Disco'
      else 'No Disco'
      end as disco_status,'201805' as churn_date,
      b.*
      from
      public.churn_table b
      where drform = 'ACTIVE'
      and dt = 201803
      and tenure > 12

      union all

      select
      household_id,

      case
      when household_id in (select distinct household_id) from workmsmgr.ps_churn where drform = 'VOLUNTARY' and dt in (201806))
      then '1-3 Month Disco'
      else 'No Disco'
      end as disco_status,'201806' as churn_date,
      b.*
      from
      public.churn_table b
      where drform = 'ACTIVE'
      and dt = 201803
      and tenure > 12;
      """)

      #Now for each dt ,I pull the relevant records as below in python:

      target_column = 'disco_status'
      table_name = 'public.segments_table'

      # Running for segments
      query_params = {'month_date' : '201803',
      'churn_date' : '201804',
      'disco_target' : '1-3 Month Disco',
      'Y0' : 'No Disco'
      }

      sql_data_sample = str("""select * from {1}
      where dt = %(month_date)s
      and churn_date = %(churn_date)s
      and {0} IN (%(disco_target)s,'No Disco')
      ;""").format(target_column,table_name)


      All these are part of a script called sql_data_prep.py which is imported in the main script:



      from sql_data_prep import sql_data_sample,query_params,create_sql

      #then a connection to the server is established and the query sql_data_sample is used to fetch various segments based on month_date and churn_date

      database = 'Redshift Test'

      print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
      try:
      cnxn = psycopg2.connect(database='xxxx',host='localhost',port='8880',user='dbo',password='xxxx')
      curs = cnxn.cursor()
      print ('*'*10 + "nConnected to %s!"%(database))
      print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))
      ##############################################################################
      #Parameterized Query: table_name and target_var to be also made as parameters.
      ##############################################################################

      curs.execute(create_sql)
      cnxn.commit()
      print('*'*10 + "nDone creating table with data")
      print(datetime.datetime.strftime(datetime.datetime.now(), '%Y-%m-%d %H:%M:%S'))

      df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)


      Now as you can see , I need to change the parameters thrice in query_params for three different segments :



      1. Where dt = 201803 and churn_date = 201804
      2. Where dt = 201803 and churn_date = 201805
      3. Where dt = 201803 and churn_date = 201806


      But I want to pass these as parameters and pull each segment sequentially in a for loop.
      So my new query_params dict will look like:



      query_params = {'month_date' : '201803',
      'churn_date' : ['201804','201805,'201806'],
      'disco_target' : '1-3 Month Disco',
      'Y0' : 'No Disco'
      }


      Once I connect to the database, I can:



      for month in query_params['month_date]:
      for churn in query_params['churn_date']:
      sql_data_sample = str("""select * from {1}
      where dt = %(month)s
      and churn_date = %(churn)s
      and {0} IN (%(disco_target)s,'No Disco')
      ;""").format(target_column,table_name)

      # Some data processing functions:


      This is being done to automate the process of creating tables from unions of several selects and then pull the segments sequentially, do some data processing
      and move on to the next segment(which is defined by month_date and churn_date here.



      I hope my question is clear and can someone please help me with this?







      python parameterized-query






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 12 at 4:22









      Shuvayan Das

      424514




      424514





























          active

          oldest

          votes











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53255933%2fpulling-segments-from-a-large-table-by-passing-parameters-from-python%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown






























          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53255933%2fpulling-segments-from-a-large-table-by-passing-parameters-from-python%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Florida Star v. B. J. F.

          Danny Elfman

          Retrieve a Users Dashboard in Tumblr with R and TumblR. Oauth Issues