Least Squares: Is there a way to improve the performance of this code? packages used: sm.OLS and pd











up vote
0
down vote

favorite












Following function calculates the slope and intercept from a pandas data frame. For big data the calculations takes too long.



import pandas as pd
import statsmodels.api as sm

df = pd.DataFrame({'signal': [270400, 270320, 270278.2609, 270195.8333, 270207.1429, 270081.8182, 269891.3043, 269991.6667, 270153.3333, 270285, 270357.1429, 270485.7143, 270526.3158, 270422.7273, 270310.7143, 270486.3636, 270593.9394, 270655.5556, 270669.2308, 270800, 270911.1111, 271083.3333, 271180, 271084.6154, 270800, 270966.6667, 271100, 271030.7692, 271053.8462, 271123.8095, 271050, 271165.5172, 271384, 271486.6667, 271554.5455, 271600, 271629.1667, 271526.6667, 271612.5, 271663.6364, 271553.5714, 271384, 271400, 271676.9231, 271666.6667, 271624, 271540.7407, 271503.125, 271600, 271620.8333, 271426.087, 271492.8571, 271482.1429, 271426.3158, 271207.6923, 271146.6667, 271057.8947, 271000, 271005.5556, 271044, 270930.4348, 270707.6923, 270833.3333, 270950, 270912.5, 270850, 270711.1111, 270925.9259, 271270, 271220, 271050, 271069.2308, 271037.5, 270954.5455, 270720, 270631.25, 270450, 270387.5, 270196, 270271.4286, 269920, 269760, 269562.5, 269500, 269867.8571, 269792.8571, 269854.1667, 269666.6667, 269417.3913, 269183.3333, 268966.6667, 268859.0909, 268707.6923, 268634.4828, 268516.6667, 268500, 268753.3333, 269175, 269309.5238, 269440.9091, 269394.4444, 269147.3684, 269133.3333, 269083.3333, 269126.087, 269145, 269164.7059, 269175, 269247.3684, 269540.9091, 270039.2857, 270150, 270283.3333, 270480.9524, 270692.8571, 270350, 270414.2857, 270277.7778, 270359.2593, 270148.1481, 269905, 269800, 269731.8182, 270186.6667, 270271.4286, 269983.3333, 269808, 269900, 269652.381, 269434.6154, 269659.2593, 269770.9677, 269831.0345, 269960, 270000, 270021.7391, 269868.4211, 269421.0526, 269106.25, 268813.3333, 268685.7143, 269421.4286, 269712.5, 269546.1538, 269332.2581, 269103.3333, 269289.2857, 269493.5484, 269650, 269710, 269444.4444, 269727.2727, 269833.3333, 269844.4444, 269985.7143, 270445.4545, 270320, 270092.5926, 269978.9474, 269866.6667, 269666.6667, 269468.75, 269582.3529, 269300, 269336.3636, 269309.5238, 269233.3333, 269208, 269100] })

def OLS_slope_intercept(df,select_colname='open'):
model = sm.OLS(df[select_colname], sm.add_constant(df.index)).fit()
intercept = model.params[0]
slope = model.params[1]
return slope, intercept

import time
start = time.time()
slope, intercept = OLS_slope_intercept(df,'signal')
end= time.time()
print('running time of code:' +str(end-start))









share|improve this question
























  • Both answers improve the performance significantly.
    – mqx
    Nov 11 at 2:14















up vote
0
down vote

favorite












Following function calculates the slope and intercept from a pandas data frame. For big data the calculations takes too long.



import pandas as pd
import statsmodels.api as sm

df = pd.DataFrame({'signal': [270400, 270320, 270278.2609, 270195.8333, 270207.1429, 270081.8182, 269891.3043, 269991.6667, 270153.3333, 270285, 270357.1429, 270485.7143, 270526.3158, 270422.7273, 270310.7143, 270486.3636, 270593.9394, 270655.5556, 270669.2308, 270800, 270911.1111, 271083.3333, 271180, 271084.6154, 270800, 270966.6667, 271100, 271030.7692, 271053.8462, 271123.8095, 271050, 271165.5172, 271384, 271486.6667, 271554.5455, 271600, 271629.1667, 271526.6667, 271612.5, 271663.6364, 271553.5714, 271384, 271400, 271676.9231, 271666.6667, 271624, 271540.7407, 271503.125, 271600, 271620.8333, 271426.087, 271492.8571, 271482.1429, 271426.3158, 271207.6923, 271146.6667, 271057.8947, 271000, 271005.5556, 271044, 270930.4348, 270707.6923, 270833.3333, 270950, 270912.5, 270850, 270711.1111, 270925.9259, 271270, 271220, 271050, 271069.2308, 271037.5, 270954.5455, 270720, 270631.25, 270450, 270387.5, 270196, 270271.4286, 269920, 269760, 269562.5, 269500, 269867.8571, 269792.8571, 269854.1667, 269666.6667, 269417.3913, 269183.3333, 268966.6667, 268859.0909, 268707.6923, 268634.4828, 268516.6667, 268500, 268753.3333, 269175, 269309.5238, 269440.9091, 269394.4444, 269147.3684, 269133.3333, 269083.3333, 269126.087, 269145, 269164.7059, 269175, 269247.3684, 269540.9091, 270039.2857, 270150, 270283.3333, 270480.9524, 270692.8571, 270350, 270414.2857, 270277.7778, 270359.2593, 270148.1481, 269905, 269800, 269731.8182, 270186.6667, 270271.4286, 269983.3333, 269808, 269900, 269652.381, 269434.6154, 269659.2593, 269770.9677, 269831.0345, 269960, 270000, 270021.7391, 269868.4211, 269421.0526, 269106.25, 268813.3333, 268685.7143, 269421.4286, 269712.5, 269546.1538, 269332.2581, 269103.3333, 269289.2857, 269493.5484, 269650, 269710, 269444.4444, 269727.2727, 269833.3333, 269844.4444, 269985.7143, 270445.4545, 270320, 270092.5926, 269978.9474, 269866.6667, 269666.6667, 269468.75, 269582.3529, 269300, 269336.3636, 269309.5238, 269233.3333, 269208, 269100] })

def OLS_slope_intercept(df,select_colname='open'):
model = sm.OLS(df[select_colname], sm.add_constant(df.index)).fit()
intercept = model.params[0]
slope = model.params[1]
return slope, intercept

import time
start = time.time()
slope, intercept = OLS_slope_intercept(df,'signal')
end= time.time()
print('running time of code:' +str(end-start))









share|improve this question
























  • Both answers improve the performance significantly.
    – mqx
    Nov 11 at 2:14













up vote
0
down vote

favorite









up vote
0
down vote

favorite











Following function calculates the slope and intercept from a pandas data frame. For big data the calculations takes too long.



import pandas as pd
import statsmodels.api as sm

df = pd.DataFrame({'signal': [270400, 270320, 270278.2609, 270195.8333, 270207.1429, 270081.8182, 269891.3043, 269991.6667, 270153.3333, 270285, 270357.1429, 270485.7143, 270526.3158, 270422.7273, 270310.7143, 270486.3636, 270593.9394, 270655.5556, 270669.2308, 270800, 270911.1111, 271083.3333, 271180, 271084.6154, 270800, 270966.6667, 271100, 271030.7692, 271053.8462, 271123.8095, 271050, 271165.5172, 271384, 271486.6667, 271554.5455, 271600, 271629.1667, 271526.6667, 271612.5, 271663.6364, 271553.5714, 271384, 271400, 271676.9231, 271666.6667, 271624, 271540.7407, 271503.125, 271600, 271620.8333, 271426.087, 271492.8571, 271482.1429, 271426.3158, 271207.6923, 271146.6667, 271057.8947, 271000, 271005.5556, 271044, 270930.4348, 270707.6923, 270833.3333, 270950, 270912.5, 270850, 270711.1111, 270925.9259, 271270, 271220, 271050, 271069.2308, 271037.5, 270954.5455, 270720, 270631.25, 270450, 270387.5, 270196, 270271.4286, 269920, 269760, 269562.5, 269500, 269867.8571, 269792.8571, 269854.1667, 269666.6667, 269417.3913, 269183.3333, 268966.6667, 268859.0909, 268707.6923, 268634.4828, 268516.6667, 268500, 268753.3333, 269175, 269309.5238, 269440.9091, 269394.4444, 269147.3684, 269133.3333, 269083.3333, 269126.087, 269145, 269164.7059, 269175, 269247.3684, 269540.9091, 270039.2857, 270150, 270283.3333, 270480.9524, 270692.8571, 270350, 270414.2857, 270277.7778, 270359.2593, 270148.1481, 269905, 269800, 269731.8182, 270186.6667, 270271.4286, 269983.3333, 269808, 269900, 269652.381, 269434.6154, 269659.2593, 269770.9677, 269831.0345, 269960, 270000, 270021.7391, 269868.4211, 269421.0526, 269106.25, 268813.3333, 268685.7143, 269421.4286, 269712.5, 269546.1538, 269332.2581, 269103.3333, 269289.2857, 269493.5484, 269650, 269710, 269444.4444, 269727.2727, 269833.3333, 269844.4444, 269985.7143, 270445.4545, 270320, 270092.5926, 269978.9474, 269866.6667, 269666.6667, 269468.75, 269582.3529, 269300, 269336.3636, 269309.5238, 269233.3333, 269208, 269100] })

def OLS_slope_intercept(df,select_colname='open'):
model = sm.OLS(df[select_colname], sm.add_constant(df.index)).fit()
intercept = model.params[0]
slope = model.params[1]
return slope, intercept

import time
start = time.time()
slope, intercept = OLS_slope_intercept(df,'signal')
end= time.time()
print('running time of code:' +str(end-start))









share|improve this question















Following function calculates the slope and intercept from a pandas data frame. For big data the calculations takes too long.



import pandas as pd
import statsmodels.api as sm

df = pd.DataFrame({'signal': [270400, 270320, 270278.2609, 270195.8333, 270207.1429, 270081.8182, 269891.3043, 269991.6667, 270153.3333, 270285, 270357.1429, 270485.7143, 270526.3158, 270422.7273, 270310.7143, 270486.3636, 270593.9394, 270655.5556, 270669.2308, 270800, 270911.1111, 271083.3333, 271180, 271084.6154, 270800, 270966.6667, 271100, 271030.7692, 271053.8462, 271123.8095, 271050, 271165.5172, 271384, 271486.6667, 271554.5455, 271600, 271629.1667, 271526.6667, 271612.5, 271663.6364, 271553.5714, 271384, 271400, 271676.9231, 271666.6667, 271624, 271540.7407, 271503.125, 271600, 271620.8333, 271426.087, 271492.8571, 271482.1429, 271426.3158, 271207.6923, 271146.6667, 271057.8947, 271000, 271005.5556, 271044, 270930.4348, 270707.6923, 270833.3333, 270950, 270912.5, 270850, 270711.1111, 270925.9259, 271270, 271220, 271050, 271069.2308, 271037.5, 270954.5455, 270720, 270631.25, 270450, 270387.5, 270196, 270271.4286, 269920, 269760, 269562.5, 269500, 269867.8571, 269792.8571, 269854.1667, 269666.6667, 269417.3913, 269183.3333, 268966.6667, 268859.0909, 268707.6923, 268634.4828, 268516.6667, 268500, 268753.3333, 269175, 269309.5238, 269440.9091, 269394.4444, 269147.3684, 269133.3333, 269083.3333, 269126.087, 269145, 269164.7059, 269175, 269247.3684, 269540.9091, 270039.2857, 270150, 270283.3333, 270480.9524, 270692.8571, 270350, 270414.2857, 270277.7778, 270359.2593, 270148.1481, 269905, 269800, 269731.8182, 270186.6667, 270271.4286, 269983.3333, 269808, 269900, 269652.381, 269434.6154, 269659.2593, 269770.9677, 269831.0345, 269960, 270000, 270021.7391, 269868.4211, 269421.0526, 269106.25, 268813.3333, 268685.7143, 269421.4286, 269712.5, 269546.1538, 269332.2581, 269103.3333, 269289.2857, 269493.5484, 269650, 269710, 269444.4444, 269727.2727, 269833.3333, 269844.4444, 269985.7143, 270445.4545, 270320, 270092.5926, 269978.9474, 269866.6667, 269666.6667, 269468.75, 269582.3529, 269300, 269336.3636, 269309.5238, 269233.3333, 269208, 269100] })

def OLS_slope_intercept(df,select_colname='open'):
model = sm.OLS(df[select_colname], sm.add_constant(df.index)).fit()
intercept = model.params[0]
slope = model.params[1]
return slope, intercept

import time
start = time.time()
slope, intercept = OLS_slope_intercept(df,'signal')
end= time.time()
print('running time of code:' +str(end-start))






python-3.x pandas performance statsmodels least-squares






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 10 at 17:30

























asked Nov 10 at 15:10









mqx

64




64












  • Both answers improve the performance significantly.
    – mqx
    Nov 11 at 2:14


















  • Both answers improve the performance significantly.
    – mqx
    Nov 11 at 2:14
















Both answers improve the performance significantly.
– mqx
Nov 11 at 2:14




Both answers improve the performance significantly.
– mqx
Nov 11 at 2:14












2 Answers
2






active

oldest

votes

















up vote
1
down vote













You can try scipy.stats.linregress which is faster than statsmodels:



import pandas as pd
from scipy.stats import linregress
import statsmodels.api as sm
import numpy as np

def OLS_slope_intercept(df,select_colname='open'):
model = sm.OLS(df[select_colname], sm.add_constant(df.index)).fit()
intercept = model.params[0]
slope = model.params[1]
return slope, intercept

import time
for i in [10,100,1000,10000,100000]:
df1 = pd.concat([df]*i)
start = time.time()
slope, intercept = OLS_slope_intercept(df1,'signal')
end= time.time()
print('running time of for STATSMODEL Library: ' +str(end-start))

start = time.time()
slope, intercept, r_value, p_value, std_err = linregress(df1.index,df1.signal)
end= time.time()
print('running time of for SCIPY Library code: ' +str(end-start))


start = time.time()
coeffs = np.polyfit(df1.index, df1.signal, 1) # 1=linear
end= time.time()
print('running time of for Numpy Library code: ' +str(end-start),'n')




running time of for STATSMODEL Library: 0.018347978591918945
running time of for SCIPY Library code: 0.0010001659393310547
running time of for Numpy Library code: 0.0009999275207519531

running time of for STATSMODEL Library: 0.005003690719604492
running time of for SCIPY Library code: 0.0010006427764892578
running time of for Numpy Library code: 0.0010006427764892578

running time of for STATSMODEL Library: 0.033023834228515625
running time of for SCIPY Library code: 0.0010905265808105469
running time of for Numpy Library code: 0.0

running time of for STATSMODEL Library: 0.2552676200866699
running time of for SCIPY Library code: 0.050981998443603516
running time of for Numpy Library code: 0.11161375045776367

running time of for STATSMODEL Library: 2.6753437519073486
running time of for SCIPY Library code: 0.3832666873931885
running time of for Numpy Library code: 1.2618811130523682





share|improve this answer























  • My conclusion: On average np.polyfit is 5% faster than linregress from scipy.stats.
    – mqx
    Nov 13 at 0:05










  • @mqx check the update linegress is faster than np.ployfit.
    – Sandeep Kadapa
    Nov 13 at 3:54










  • Thank you for your answer. Regarding the topic np.polyfit VS scipy.stats I opened a new question. stackoverflow.com/questions/53285554/…
    – mqx
    2 days ago


















up vote
0
down vote













You can just use numpy.polyfit()



import numpy as np
import time

start = time.time()
coeffs = np.polyfit(df.index, df.signal, 1) # 1=linear
end= time.time()

slope, intercept = coeffs

print(f'Run Time: {end-start}nSlope: {slope}nIntercept: {intercept}')

Run Time: 0.000385284423828125
Slope: -10.83867485480565
Intercept: 271144.8357256735


When I run your example running time of code:0.002992868423461914






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53240267%2fleast-squares-is-there-a-way-to-improve-the-performance-of-this-code-packages%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote













    You can try scipy.stats.linregress which is faster than statsmodels:



    import pandas as pd
    from scipy.stats import linregress
    import statsmodels.api as sm
    import numpy as np

    def OLS_slope_intercept(df,select_colname='open'):
    model = sm.OLS(df[select_colname], sm.add_constant(df.index)).fit()
    intercept = model.params[0]
    slope = model.params[1]
    return slope, intercept

    import time
    for i in [10,100,1000,10000,100000]:
    df1 = pd.concat([df]*i)
    start = time.time()
    slope, intercept = OLS_slope_intercept(df1,'signal')
    end= time.time()
    print('running time of for STATSMODEL Library: ' +str(end-start))

    start = time.time()
    slope, intercept, r_value, p_value, std_err = linregress(df1.index,df1.signal)
    end= time.time()
    print('running time of for SCIPY Library code: ' +str(end-start))


    start = time.time()
    coeffs = np.polyfit(df1.index, df1.signal, 1) # 1=linear
    end= time.time()
    print('running time of for Numpy Library code: ' +str(end-start),'n')




    running time of for STATSMODEL Library: 0.018347978591918945
    running time of for SCIPY Library code: 0.0010001659393310547
    running time of for Numpy Library code: 0.0009999275207519531

    running time of for STATSMODEL Library: 0.005003690719604492
    running time of for SCIPY Library code: 0.0010006427764892578
    running time of for Numpy Library code: 0.0010006427764892578

    running time of for STATSMODEL Library: 0.033023834228515625
    running time of for SCIPY Library code: 0.0010905265808105469
    running time of for Numpy Library code: 0.0

    running time of for STATSMODEL Library: 0.2552676200866699
    running time of for SCIPY Library code: 0.050981998443603516
    running time of for Numpy Library code: 0.11161375045776367

    running time of for STATSMODEL Library: 2.6753437519073486
    running time of for SCIPY Library code: 0.3832666873931885
    running time of for Numpy Library code: 1.2618811130523682





    share|improve this answer























    • My conclusion: On average np.polyfit is 5% faster than linregress from scipy.stats.
      – mqx
      Nov 13 at 0:05










    • @mqx check the update linegress is faster than np.ployfit.
      – Sandeep Kadapa
      Nov 13 at 3:54










    • Thank you for your answer. Regarding the topic np.polyfit VS scipy.stats I opened a new question. stackoverflow.com/questions/53285554/…
      – mqx
      2 days ago















    up vote
    1
    down vote













    You can try scipy.stats.linregress which is faster than statsmodels:



    import pandas as pd
    from scipy.stats import linregress
    import statsmodels.api as sm
    import numpy as np

    def OLS_slope_intercept(df,select_colname='open'):
    model = sm.OLS(df[select_colname], sm.add_constant(df.index)).fit()
    intercept = model.params[0]
    slope = model.params[1]
    return slope, intercept

    import time
    for i in [10,100,1000,10000,100000]:
    df1 = pd.concat([df]*i)
    start = time.time()
    slope, intercept = OLS_slope_intercept(df1,'signal')
    end= time.time()
    print('running time of for STATSMODEL Library: ' +str(end-start))

    start = time.time()
    slope, intercept, r_value, p_value, std_err = linregress(df1.index,df1.signal)
    end= time.time()
    print('running time of for SCIPY Library code: ' +str(end-start))


    start = time.time()
    coeffs = np.polyfit(df1.index, df1.signal, 1) # 1=linear
    end= time.time()
    print('running time of for Numpy Library code: ' +str(end-start),'n')




    running time of for STATSMODEL Library: 0.018347978591918945
    running time of for SCIPY Library code: 0.0010001659393310547
    running time of for Numpy Library code: 0.0009999275207519531

    running time of for STATSMODEL Library: 0.005003690719604492
    running time of for SCIPY Library code: 0.0010006427764892578
    running time of for Numpy Library code: 0.0010006427764892578

    running time of for STATSMODEL Library: 0.033023834228515625
    running time of for SCIPY Library code: 0.0010905265808105469
    running time of for Numpy Library code: 0.0

    running time of for STATSMODEL Library: 0.2552676200866699
    running time of for SCIPY Library code: 0.050981998443603516
    running time of for Numpy Library code: 0.11161375045776367

    running time of for STATSMODEL Library: 2.6753437519073486
    running time of for SCIPY Library code: 0.3832666873931885
    running time of for Numpy Library code: 1.2618811130523682





    share|improve this answer























    • My conclusion: On average np.polyfit is 5% faster than linregress from scipy.stats.
      – mqx
      Nov 13 at 0:05










    • @mqx check the update linegress is faster than np.ployfit.
      – Sandeep Kadapa
      Nov 13 at 3:54










    • Thank you for your answer. Regarding the topic np.polyfit VS scipy.stats I opened a new question. stackoverflow.com/questions/53285554/…
      – mqx
      2 days ago













    up vote
    1
    down vote










    up vote
    1
    down vote









    You can try scipy.stats.linregress which is faster than statsmodels:



    import pandas as pd
    from scipy.stats import linregress
    import statsmodels.api as sm
    import numpy as np

    def OLS_slope_intercept(df,select_colname='open'):
    model = sm.OLS(df[select_colname], sm.add_constant(df.index)).fit()
    intercept = model.params[0]
    slope = model.params[1]
    return slope, intercept

    import time
    for i in [10,100,1000,10000,100000]:
    df1 = pd.concat([df]*i)
    start = time.time()
    slope, intercept = OLS_slope_intercept(df1,'signal')
    end= time.time()
    print('running time of for STATSMODEL Library: ' +str(end-start))

    start = time.time()
    slope, intercept, r_value, p_value, std_err = linregress(df1.index,df1.signal)
    end= time.time()
    print('running time of for SCIPY Library code: ' +str(end-start))


    start = time.time()
    coeffs = np.polyfit(df1.index, df1.signal, 1) # 1=linear
    end= time.time()
    print('running time of for Numpy Library code: ' +str(end-start),'n')




    running time of for STATSMODEL Library: 0.018347978591918945
    running time of for SCIPY Library code: 0.0010001659393310547
    running time of for Numpy Library code: 0.0009999275207519531

    running time of for STATSMODEL Library: 0.005003690719604492
    running time of for SCIPY Library code: 0.0010006427764892578
    running time of for Numpy Library code: 0.0010006427764892578

    running time of for STATSMODEL Library: 0.033023834228515625
    running time of for SCIPY Library code: 0.0010905265808105469
    running time of for Numpy Library code: 0.0

    running time of for STATSMODEL Library: 0.2552676200866699
    running time of for SCIPY Library code: 0.050981998443603516
    running time of for Numpy Library code: 0.11161375045776367

    running time of for STATSMODEL Library: 2.6753437519073486
    running time of for SCIPY Library code: 0.3832666873931885
    running time of for Numpy Library code: 1.2618811130523682





    share|improve this answer














    You can try scipy.stats.linregress which is faster than statsmodels:



    import pandas as pd
    from scipy.stats import linregress
    import statsmodels.api as sm
    import numpy as np

    def OLS_slope_intercept(df,select_colname='open'):
    model = sm.OLS(df[select_colname], sm.add_constant(df.index)).fit()
    intercept = model.params[0]
    slope = model.params[1]
    return slope, intercept

    import time
    for i in [10,100,1000,10000,100000]:
    df1 = pd.concat([df]*i)
    start = time.time()
    slope, intercept = OLS_slope_intercept(df1,'signal')
    end= time.time()
    print('running time of for STATSMODEL Library: ' +str(end-start))

    start = time.time()
    slope, intercept, r_value, p_value, std_err = linregress(df1.index,df1.signal)
    end= time.time()
    print('running time of for SCIPY Library code: ' +str(end-start))


    start = time.time()
    coeffs = np.polyfit(df1.index, df1.signal, 1) # 1=linear
    end= time.time()
    print('running time of for Numpy Library code: ' +str(end-start),'n')




    running time of for STATSMODEL Library: 0.018347978591918945
    running time of for SCIPY Library code: 0.0010001659393310547
    running time of for Numpy Library code: 0.0009999275207519531

    running time of for STATSMODEL Library: 0.005003690719604492
    running time of for SCIPY Library code: 0.0010006427764892578
    running time of for Numpy Library code: 0.0010006427764892578

    running time of for STATSMODEL Library: 0.033023834228515625
    running time of for SCIPY Library code: 0.0010905265808105469
    running time of for Numpy Library code: 0.0

    running time of for STATSMODEL Library: 0.2552676200866699
    running time of for SCIPY Library code: 0.050981998443603516
    running time of for Numpy Library code: 0.11161375045776367

    running time of for STATSMODEL Library: 2.6753437519073486
    running time of for SCIPY Library code: 0.3832666873931885
    running time of for Numpy Library code: 1.2618811130523682






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 2 days ago

























    answered Nov 10 at 16:01









    Sandeep Kadapa

    4,984426




    4,984426












    • My conclusion: On average np.polyfit is 5% faster than linregress from scipy.stats.
      – mqx
      Nov 13 at 0:05










    • @mqx check the update linegress is faster than np.ployfit.
      – Sandeep Kadapa
      Nov 13 at 3:54










    • Thank you for your answer. Regarding the topic np.polyfit VS scipy.stats I opened a new question. stackoverflow.com/questions/53285554/…
      – mqx
      2 days ago


















    • My conclusion: On average np.polyfit is 5% faster than linregress from scipy.stats.
      – mqx
      Nov 13 at 0:05










    • @mqx check the update linegress is faster than np.ployfit.
      – Sandeep Kadapa
      Nov 13 at 3:54










    • Thank you for your answer. Regarding the topic np.polyfit VS scipy.stats I opened a new question. stackoverflow.com/questions/53285554/…
      – mqx
      2 days ago
















    My conclusion: On average np.polyfit is 5% faster than linregress from scipy.stats.
    – mqx
    Nov 13 at 0:05




    My conclusion: On average np.polyfit is 5% faster than linregress from scipy.stats.
    – mqx
    Nov 13 at 0:05












    @mqx check the update linegress is faster than np.ployfit.
    – Sandeep Kadapa
    Nov 13 at 3:54




    @mqx check the update linegress is faster than np.ployfit.
    – Sandeep Kadapa
    Nov 13 at 3:54












    Thank you for your answer. Regarding the topic np.polyfit VS scipy.stats I opened a new question. stackoverflow.com/questions/53285554/…
    – mqx
    2 days ago




    Thank you for your answer. Regarding the topic np.polyfit VS scipy.stats I opened a new question. stackoverflow.com/questions/53285554/…
    – mqx
    2 days ago












    up vote
    0
    down vote













    You can just use numpy.polyfit()



    import numpy as np
    import time

    start = time.time()
    coeffs = np.polyfit(df.index, df.signal, 1) # 1=linear
    end= time.time()

    slope, intercept = coeffs

    print(f'Run Time: {end-start}nSlope: {slope}nIntercept: {intercept}')

    Run Time: 0.000385284423828125
    Slope: -10.83867485480565
    Intercept: 271144.8357256735


    When I run your example running time of code:0.002992868423461914






    share|improve this answer

























      up vote
      0
      down vote













      You can just use numpy.polyfit()



      import numpy as np
      import time

      start = time.time()
      coeffs = np.polyfit(df.index, df.signal, 1) # 1=linear
      end= time.time()

      slope, intercept = coeffs

      print(f'Run Time: {end-start}nSlope: {slope}nIntercept: {intercept}')

      Run Time: 0.000385284423828125
      Slope: -10.83867485480565
      Intercept: 271144.8357256735


      When I run your example running time of code:0.002992868423461914






      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        You can just use numpy.polyfit()



        import numpy as np
        import time

        start = time.time()
        coeffs = np.polyfit(df.index, df.signal, 1) # 1=linear
        end= time.time()

        slope, intercept = coeffs

        print(f'Run Time: {end-start}nSlope: {slope}nIntercept: {intercept}')

        Run Time: 0.000385284423828125
        Slope: -10.83867485480565
        Intercept: 271144.8357256735


        When I run your example running time of code:0.002992868423461914






        share|improve this answer












        You can just use numpy.polyfit()



        import numpy as np
        import time

        start = time.time()
        coeffs = np.polyfit(df.index, df.signal, 1) # 1=linear
        end= time.time()

        slope, intercept = coeffs

        print(f'Run Time: {end-start}nSlope: {slope}nIntercept: {intercept}')

        Run Time: 0.000385284423828125
        Slope: -10.83867485480565
        Intercept: 271144.8357256735


        When I run your example running time of code:0.002992868423461914







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 10 at 15:45









        Chris

        1,2311210




        1,2311210






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53240267%2fleast-squares-is-there-a-way-to-improve-the-performance-of-this-code-packages%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Florida Star v. B. J. F.

            Danny Elfman

            Retrieve a Users Dashboard in Tumblr with R and TumblR. Oauth Issues