Error when trying to pass custom metric in Caret package












1















Related question - 1



I have a dataset like so:



> head(training_data)
year month channelGrouping visitStartTime visitNumber timeSinceLastVisit browser
1 2016 October Social 1477775021 1 0 Chrome
2 2016 September Social 1473037945 1 0 Safari
3 2017 July Organic Search 1500305542 1 0 Chrome
4 2017 July Organic Search 1500322111 2 16569 Chrome
5 2016 August Social 1471890172 1 0 Safari
6 2017 May Direct 1495146428 1 0 Chrome
operatingSystem isMobile continent subContinent country source medium
1 Windows 0 Americas South America Brazil youtube.com referral
2 Macintosh 0 Americas Northern America United States youtube.com referral
3 Windows 0 Americas Northern America Canada google organic
4 Windows 0 Americas Northern America Canada google organic
5 Macintosh 0 Africa Eastern Africa Zambia youtube.com referral
6 Android 1 Americas Northern America United States (direct)
isTrueDirect hits pageviews positiveTransaction
1 0 1 1 No
2 0 1 1 No
3 0 5 5 No
4 1 3 3 No
5 0 1 1 No
6 1 6 6 No

> str(training_data)
'data.frame': 1000 obs. of 18 variables:
$ year : int 2016 2016 2017 2017 2016 2017 2016 2017 2017 2016 ...
$ month : Factor w/ 12 levels "January","February",..: 10 9 7 7 8 5 10 3 3 12 ...
$ channelGrouping : chr "Social" "Social" "Organic Search" "Organic Search" ...
$ visitStartTime : int 1477775021 1473037945 1500305542 1500322111 1471890172 1495146428 1476003570 1488556031 1490323225 1480696262 ...
$ visitNumber : int 1 1 1 2 1 1 1 1 1 1 ...
$ timeSinceLastVisit : int 0 0 0 16569 0 0 0 0 0 0 ...
$ browser : chr "Chrome" "Safari" "Chrome" "Chrome" ...
$ operatingSystem : chr "Windows" "Macintosh" "Windows" "Windows" ...
$ isMobile : int 0 0 0 0 0 1 0 1 0 0 ...
$ continent : Factor w/ 5 levels "Africa","Americas",..: 2 2 2 2 1 2 3 3 2 4 ...
$ subContinent : chr "South America" "Northern America" "Northern America" "Northern America" ...
$ country : chr "Brazil" "United States" "Canada" "Canada" ...
$ source : chr "youtube.com" "youtube.com" "google" "google" ...
$ medium : chr "referral" "referral" "organic" "organic" ...
$ isTrueDirect : int 0 0 0 1 0 1 0 0 0 0 ...
$ hits : int 1 1 5 3 1 6 1 1 2 1 ...
$ pageviews : int 1 1 5 3 1 6 1 1 2 1 ...
$ positiveTransaction: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 …


I then define my custom RMSLE function using Metrics package:



rmsleMetric <- function(data, lev = NULL, model = NULL){
out <- Metrics::rmsle(data$obs, data$pred)
names(out) <- c("rmsle")
return (out)
}


Then, I define the trainControl:



tc <- trainControl(method = "repeatedcv",
number = 5,
repeats = 5,
summaryFunction = rmsleMetric,
classProbs = TRUE)


My grid search:



tg <- expand.grid(alpha = 0, lambda = seq(0, 1, by = 0.1))


Finally, my model:



penalizedLogit_ridge <- train(positiveTransaction ~ .,
data = training_data,
metric="rmsle",
method = "glmnet",
family = "binomial",
trControl = tc,
tuneGrid = tg
)


When I try to run the command above, I get an error:



Something is wrong; all the rmsle metric values are missing:
rmsle
Min. : NA
1st Qu.: NA
Median : NA
Mean :NaN
3rd Qu.: NA
Max. : NA
NA's :11
Error: Stopping
In addition: There were 50 or more warnings (use warnings() to see the first 50)


Looking at warnings, I find:



1: In Ops.factor(1, actual) : ‘+’ not meaningful for factors
2: In Ops.factor(1, predicted) : ‘+’ not meaningful for factors


repeated 25 times



Since the same thing works fine if I change the metric to AUC using prSummary as my summary function, I don't believe that there are any issues with my data.



So, I believe that my function is wrong but I don't know how to figure out why it is wrong.



Any help is highly appreciated.










share|improve this question



























    1















    Related question - 1



    I have a dataset like so:



    > head(training_data)
    year month channelGrouping visitStartTime visitNumber timeSinceLastVisit browser
    1 2016 October Social 1477775021 1 0 Chrome
    2 2016 September Social 1473037945 1 0 Safari
    3 2017 July Organic Search 1500305542 1 0 Chrome
    4 2017 July Organic Search 1500322111 2 16569 Chrome
    5 2016 August Social 1471890172 1 0 Safari
    6 2017 May Direct 1495146428 1 0 Chrome
    operatingSystem isMobile continent subContinent country source medium
    1 Windows 0 Americas South America Brazil youtube.com referral
    2 Macintosh 0 Americas Northern America United States youtube.com referral
    3 Windows 0 Americas Northern America Canada google organic
    4 Windows 0 Americas Northern America Canada google organic
    5 Macintosh 0 Africa Eastern Africa Zambia youtube.com referral
    6 Android 1 Americas Northern America United States (direct)
    isTrueDirect hits pageviews positiveTransaction
    1 0 1 1 No
    2 0 1 1 No
    3 0 5 5 No
    4 1 3 3 No
    5 0 1 1 No
    6 1 6 6 No

    > str(training_data)
    'data.frame': 1000 obs. of 18 variables:
    $ year : int 2016 2016 2017 2017 2016 2017 2016 2017 2017 2016 ...
    $ month : Factor w/ 12 levels "January","February",..: 10 9 7 7 8 5 10 3 3 12 ...
    $ channelGrouping : chr "Social" "Social" "Organic Search" "Organic Search" ...
    $ visitStartTime : int 1477775021 1473037945 1500305542 1500322111 1471890172 1495146428 1476003570 1488556031 1490323225 1480696262 ...
    $ visitNumber : int 1 1 1 2 1 1 1 1 1 1 ...
    $ timeSinceLastVisit : int 0 0 0 16569 0 0 0 0 0 0 ...
    $ browser : chr "Chrome" "Safari" "Chrome" "Chrome" ...
    $ operatingSystem : chr "Windows" "Macintosh" "Windows" "Windows" ...
    $ isMobile : int 0 0 0 0 0 1 0 1 0 0 ...
    $ continent : Factor w/ 5 levels "Africa","Americas",..: 2 2 2 2 1 2 3 3 2 4 ...
    $ subContinent : chr "South America" "Northern America" "Northern America" "Northern America" ...
    $ country : chr "Brazil" "United States" "Canada" "Canada" ...
    $ source : chr "youtube.com" "youtube.com" "google" "google" ...
    $ medium : chr "referral" "referral" "organic" "organic" ...
    $ isTrueDirect : int 0 0 0 1 0 1 0 0 0 0 ...
    $ hits : int 1 1 5 3 1 6 1 1 2 1 ...
    $ pageviews : int 1 1 5 3 1 6 1 1 2 1 ...
    $ positiveTransaction: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 …


    I then define my custom RMSLE function using Metrics package:



    rmsleMetric <- function(data, lev = NULL, model = NULL){
    out <- Metrics::rmsle(data$obs, data$pred)
    names(out) <- c("rmsle")
    return (out)
    }


    Then, I define the trainControl:



    tc <- trainControl(method = "repeatedcv",
    number = 5,
    repeats = 5,
    summaryFunction = rmsleMetric,
    classProbs = TRUE)


    My grid search:



    tg <- expand.grid(alpha = 0, lambda = seq(0, 1, by = 0.1))


    Finally, my model:



    penalizedLogit_ridge <- train(positiveTransaction ~ .,
    data = training_data,
    metric="rmsle",
    method = "glmnet",
    family = "binomial",
    trControl = tc,
    tuneGrid = tg
    )


    When I try to run the command above, I get an error:



    Something is wrong; all the rmsle metric values are missing:
    rmsle
    Min. : NA
    1st Qu.: NA
    Median : NA
    Mean :NaN
    3rd Qu.: NA
    Max. : NA
    NA's :11
    Error: Stopping
    In addition: There were 50 or more warnings (use warnings() to see the first 50)


    Looking at warnings, I find:



    1: In Ops.factor(1, actual) : ‘+’ not meaningful for factors
    2: In Ops.factor(1, predicted) : ‘+’ not meaningful for factors


    repeated 25 times



    Since the same thing works fine if I change the metric to AUC using prSummary as my summary function, I don't believe that there are any issues with my data.



    So, I believe that my function is wrong but I don't know how to figure out why it is wrong.



    Any help is highly appreciated.










    share|improve this question

























      1












      1








      1








      Related question - 1



      I have a dataset like so:



      > head(training_data)
      year month channelGrouping visitStartTime visitNumber timeSinceLastVisit browser
      1 2016 October Social 1477775021 1 0 Chrome
      2 2016 September Social 1473037945 1 0 Safari
      3 2017 July Organic Search 1500305542 1 0 Chrome
      4 2017 July Organic Search 1500322111 2 16569 Chrome
      5 2016 August Social 1471890172 1 0 Safari
      6 2017 May Direct 1495146428 1 0 Chrome
      operatingSystem isMobile continent subContinent country source medium
      1 Windows 0 Americas South America Brazil youtube.com referral
      2 Macintosh 0 Americas Northern America United States youtube.com referral
      3 Windows 0 Americas Northern America Canada google organic
      4 Windows 0 Americas Northern America Canada google organic
      5 Macintosh 0 Africa Eastern Africa Zambia youtube.com referral
      6 Android 1 Americas Northern America United States (direct)
      isTrueDirect hits pageviews positiveTransaction
      1 0 1 1 No
      2 0 1 1 No
      3 0 5 5 No
      4 1 3 3 No
      5 0 1 1 No
      6 1 6 6 No

      > str(training_data)
      'data.frame': 1000 obs. of 18 variables:
      $ year : int 2016 2016 2017 2017 2016 2017 2016 2017 2017 2016 ...
      $ month : Factor w/ 12 levels "January","February",..: 10 9 7 7 8 5 10 3 3 12 ...
      $ channelGrouping : chr "Social" "Social" "Organic Search" "Organic Search" ...
      $ visitStartTime : int 1477775021 1473037945 1500305542 1500322111 1471890172 1495146428 1476003570 1488556031 1490323225 1480696262 ...
      $ visitNumber : int 1 1 1 2 1 1 1 1 1 1 ...
      $ timeSinceLastVisit : int 0 0 0 16569 0 0 0 0 0 0 ...
      $ browser : chr "Chrome" "Safari" "Chrome" "Chrome" ...
      $ operatingSystem : chr "Windows" "Macintosh" "Windows" "Windows" ...
      $ isMobile : int 0 0 0 0 0 1 0 1 0 0 ...
      $ continent : Factor w/ 5 levels "Africa","Americas",..: 2 2 2 2 1 2 3 3 2 4 ...
      $ subContinent : chr "South America" "Northern America" "Northern America" "Northern America" ...
      $ country : chr "Brazil" "United States" "Canada" "Canada" ...
      $ source : chr "youtube.com" "youtube.com" "google" "google" ...
      $ medium : chr "referral" "referral" "organic" "organic" ...
      $ isTrueDirect : int 0 0 0 1 0 1 0 0 0 0 ...
      $ hits : int 1 1 5 3 1 6 1 1 2 1 ...
      $ pageviews : int 1 1 5 3 1 6 1 1 2 1 ...
      $ positiveTransaction: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 …


      I then define my custom RMSLE function using Metrics package:



      rmsleMetric <- function(data, lev = NULL, model = NULL){
      out <- Metrics::rmsle(data$obs, data$pred)
      names(out) <- c("rmsle")
      return (out)
      }


      Then, I define the trainControl:



      tc <- trainControl(method = "repeatedcv",
      number = 5,
      repeats = 5,
      summaryFunction = rmsleMetric,
      classProbs = TRUE)


      My grid search:



      tg <- expand.grid(alpha = 0, lambda = seq(0, 1, by = 0.1))


      Finally, my model:



      penalizedLogit_ridge <- train(positiveTransaction ~ .,
      data = training_data,
      metric="rmsle",
      method = "glmnet",
      family = "binomial",
      trControl = tc,
      tuneGrid = tg
      )


      When I try to run the command above, I get an error:



      Something is wrong; all the rmsle metric values are missing:
      rmsle
      Min. : NA
      1st Qu.: NA
      Median : NA
      Mean :NaN
      3rd Qu.: NA
      Max. : NA
      NA's :11
      Error: Stopping
      In addition: There were 50 or more warnings (use warnings() to see the first 50)


      Looking at warnings, I find:



      1: In Ops.factor(1, actual) : ‘+’ not meaningful for factors
      2: In Ops.factor(1, predicted) : ‘+’ not meaningful for factors


      repeated 25 times



      Since the same thing works fine if I change the metric to AUC using prSummary as my summary function, I don't believe that there are any issues with my data.



      So, I believe that my function is wrong but I don't know how to figure out why it is wrong.



      Any help is highly appreciated.










      share|improve this question














      Related question - 1



      I have a dataset like so:



      > head(training_data)
      year month channelGrouping visitStartTime visitNumber timeSinceLastVisit browser
      1 2016 October Social 1477775021 1 0 Chrome
      2 2016 September Social 1473037945 1 0 Safari
      3 2017 July Organic Search 1500305542 1 0 Chrome
      4 2017 July Organic Search 1500322111 2 16569 Chrome
      5 2016 August Social 1471890172 1 0 Safari
      6 2017 May Direct 1495146428 1 0 Chrome
      operatingSystem isMobile continent subContinent country source medium
      1 Windows 0 Americas South America Brazil youtube.com referral
      2 Macintosh 0 Americas Northern America United States youtube.com referral
      3 Windows 0 Americas Northern America Canada google organic
      4 Windows 0 Americas Northern America Canada google organic
      5 Macintosh 0 Africa Eastern Africa Zambia youtube.com referral
      6 Android 1 Americas Northern America United States (direct)
      isTrueDirect hits pageviews positiveTransaction
      1 0 1 1 No
      2 0 1 1 No
      3 0 5 5 No
      4 1 3 3 No
      5 0 1 1 No
      6 1 6 6 No

      > str(training_data)
      'data.frame': 1000 obs. of 18 variables:
      $ year : int 2016 2016 2017 2017 2016 2017 2016 2017 2017 2016 ...
      $ month : Factor w/ 12 levels "January","February",..: 10 9 7 7 8 5 10 3 3 12 ...
      $ channelGrouping : chr "Social" "Social" "Organic Search" "Organic Search" ...
      $ visitStartTime : int 1477775021 1473037945 1500305542 1500322111 1471890172 1495146428 1476003570 1488556031 1490323225 1480696262 ...
      $ visitNumber : int 1 1 1 2 1 1 1 1 1 1 ...
      $ timeSinceLastVisit : int 0 0 0 16569 0 0 0 0 0 0 ...
      $ browser : chr "Chrome" "Safari" "Chrome" "Chrome" ...
      $ operatingSystem : chr "Windows" "Macintosh" "Windows" "Windows" ...
      $ isMobile : int 0 0 0 0 0 1 0 1 0 0 ...
      $ continent : Factor w/ 5 levels "Africa","Americas",..: 2 2 2 2 1 2 3 3 2 4 ...
      $ subContinent : chr "South America" "Northern America" "Northern America" "Northern America" ...
      $ country : chr "Brazil" "United States" "Canada" "Canada" ...
      $ source : chr "youtube.com" "youtube.com" "google" "google" ...
      $ medium : chr "referral" "referral" "organic" "organic" ...
      $ isTrueDirect : int 0 0 0 1 0 1 0 0 0 0 ...
      $ hits : int 1 1 5 3 1 6 1 1 2 1 ...
      $ pageviews : int 1 1 5 3 1 6 1 1 2 1 ...
      $ positiveTransaction: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 …


      I then define my custom RMSLE function using Metrics package:



      rmsleMetric <- function(data, lev = NULL, model = NULL){
      out <- Metrics::rmsle(data$obs, data$pred)
      names(out) <- c("rmsle")
      return (out)
      }


      Then, I define the trainControl:



      tc <- trainControl(method = "repeatedcv",
      number = 5,
      repeats = 5,
      summaryFunction = rmsleMetric,
      classProbs = TRUE)


      My grid search:



      tg <- expand.grid(alpha = 0, lambda = seq(0, 1, by = 0.1))


      Finally, my model:



      penalizedLogit_ridge <- train(positiveTransaction ~ .,
      data = training_data,
      metric="rmsle",
      method = "glmnet",
      family = "binomial",
      trControl = tc,
      tuneGrid = tg
      )


      When I try to run the command above, I get an error:



      Something is wrong; all the rmsle metric values are missing:
      rmsle
      Min. : NA
      1st Qu.: NA
      Median : NA
      Mean :NaN
      3rd Qu.: NA
      Max. : NA
      NA's :11
      Error: Stopping
      In addition: There were 50 or more warnings (use warnings() to see the first 50)


      Looking at warnings, I find:



      1: In Ops.factor(1, actual) : ‘+’ not meaningful for factors
      2: In Ops.factor(1, predicted) : ‘+’ not meaningful for factors


      repeated 25 times



      Since the same thing works fine if I change the metric to AUC using prSummary as my summary function, I don't believe that there are any issues with my data.



      So, I believe that my function is wrong but I don't know how to figure out why it is wrong.



      Any help is highly appreciated.







      r logistic-regression metrics r-caret evaluation






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 12 '18 at 20:24









      Akshay GaurAkshay Gaur

      627613




      627613
























          1 Answer
          1






          active

          oldest

          votes


















          1














          Your custom metric is not defined properly. If you use classProbs = TRUE and savePredictions = "final" with trainControl you will realize that there are two columns named according to your target classes which hold the predicted probabilities while the data$pred column holds the predicted class which can not be used to calculate the desired metric.



          A proper way to define the function would be to get the possible levels and use them to extract the probabilities for one of the classes:



          rmsleMetric <- function(data, lev = NULL, model = NULL){
          lvls <- levels(data$obs)
          out <- Metrics::rmsle(ifelse(data$obs == lev[2], 0, 1),
          data[, lvls[1]])
          names(out) <- c("rmsle")
          return (out)
          }


          does it work:



          library(caret)
          library(mlbench)
          data(Sonar)
          tc <- trainControl(method = "repeatedcv",
          number = 2,
          repeats = 2,
          summaryFunction = rmsleMetric,
          classProbs = TRUE,
          savePredictions = "final")
          tg <- expand.grid(alpha = 0, lambda = seq(0, 1, by = 0.1))

          penalizedLogit_ridge <- train(Class ~ .,
          data = Sonar,
          metric="rmsle",
          method = "glmnet",
          family = "binomial",
          trControl = tc,
          tuneGrid = tg)

          #output
          glmnet

          208 samples
          60 predictor
          2 classes: 'M', 'R'

          No pre-processing
          Resampling: Cross-Validated (2 fold, repeated 2 times)
          Summary of sample sizes: 105, 103, 104, 104
          Resampling results across tuning parameters:

          lambda rmsle
          0.0 0.2835407
          0.1 0.2753197
          0.2 0.2768288
          0.3 0.2797847
          0.4 0.2827953
          0.5 0.2856088
          0.6 0.2881894
          0.7 0.2905501
          0.8 0.2927171
          0.9 0.2947169
          1.0 0.2965505

          Tuning parameter 'alpha' was held constant at a value of 0
          rmsle was used to select the optimal model using the largest value.
          The final values used for the model were alpha = 0 and lambda = 1.


          You can inspect caret::twoClassSummary - it is defined quite similarly.






          share|improve this answer


























          • Sounds very promising. Let me test this really quick before I mark as answer

            – Akshay Gaur
            Nov 13 '18 at 15:27











          • This works! Do I need to worry about this message though - Warning message: In grepl("(Intercept)", colnames(x)) : input string 73 is invalid in this locale?

            – Akshay Gaur
            Nov 13 '18 at 15:47






          • 1





            Glad to help. In order to help with the warning message I would require a reproducible example with the data set provided since the warning is associated with the data set.

            – missuse
            Nov 13 '18 at 16:41











          • Thank you for offering to help. I may ask a separate question if I feel that resolving that is absolutely necessary.

            – Akshay Gaur
            Nov 13 '18 at 21:47











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53269560%2ferror-when-trying-to-pass-custom-metric-in-caret-package%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          Your custom metric is not defined properly. If you use classProbs = TRUE and savePredictions = "final" with trainControl you will realize that there are two columns named according to your target classes which hold the predicted probabilities while the data$pred column holds the predicted class which can not be used to calculate the desired metric.



          A proper way to define the function would be to get the possible levels and use them to extract the probabilities for one of the classes:



          rmsleMetric <- function(data, lev = NULL, model = NULL){
          lvls <- levels(data$obs)
          out <- Metrics::rmsle(ifelse(data$obs == lev[2], 0, 1),
          data[, lvls[1]])
          names(out) <- c("rmsle")
          return (out)
          }


          does it work:



          library(caret)
          library(mlbench)
          data(Sonar)
          tc <- trainControl(method = "repeatedcv",
          number = 2,
          repeats = 2,
          summaryFunction = rmsleMetric,
          classProbs = TRUE,
          savePredictions = "final")
          tg <- expand.grid(alpha = 0, lambda = seq(0, 1, by = 0.1))

          penalizedLogit_ridge <- train(Class ~ .,
          data = Sonar,
          metric="rmsle",
          method = "glmnet",
          family = "binomial",
          trControl = tc,
          tuneGrid = tg)

          #output
          glmnet

          208 samples
          60 predictor
          2 classes: 'M', 'R'

          No pre-processing
          Resampling: Cross-Validated (2 fold, repeated 2 times)
          Summary of sample sizes: 105, 103, 104, 104
          Resampling results across tuning parameters:

          lambda rmsle
          0.0 0.2835407
          0.1 0.2753197
          0.2 0.2768288
          0.3 0.2797847
          0.4 0.2827953
          0.5 0.2856088
          0.6 0.2881894
          0.7 0.2905501
          0.8 0.2927171
          0.9 0.2947169
          1.0 0.2965505

          Tuning parameter 'alpha' was held constant at a value of 0
          rmsle was used to select the optimal model using the largest value.
          The final values used for the model were alpha = 0 and lambda = 1.


          You can inspect caret::twoClassSummary - it is defined quite similarly.






          share|improve this answer


























          • Sounds very promising. Let me test this really quick before I mark as answer

            – Akshay Gaur
            Nov 13 '18 at 15:27











          • This works! Do I need to worry about this message though - Warning message: In grepl("(Intercept)", colnames(x)) : input string 73 is invalid in this locale?

            – Akshay Gaur
            Nov 13 '18 at 15:47






          • 1





            Glad to help. In order to help with the warning message I would require a reproducible example with the data set provided since the warning is associated with the data set.

            – missuse
            Nov 13 '18 at 16:41











          • Thank you for offering to help. I may ask a separate question if I feel that resolving that is absolutely necessary.

            – Akshay Gaur
            Nov 13 '18 at 21:47
















          1














          Your custom metric is not defined properly. If you use classProbs = TRUE and savePredictions = "final" with trainControl you will realize that there are two columns named according to your target classes which hold the predicted probabilities while the data$pred column holds the predicted class which can not be used to calculate the desired metric.



          A proper way to define the function would be to get the possible levels and use them to extract the probabilities for one of the classes:



          rmsleMetric <- function(data, lev = NULL, model = NULL){
          lvls <- levels(data$obs)
          out <- Metrics::rmsle(ifelse(data$obs == lev[2], 0, 1),
          data[, lvls[1]])
          names(out) <- c("rmsle")
          return (out)
          }


          does it work:



          library(caret)
          library(mlbench)
          data(Sonar)
          tc <- trainControl(method = "repeatedcv",
          number = 2,
          repeats = 2,
          summaryFunction = rmsleMetric,
          classProbs = TRUE,
          savePredictions = "final")
          tg <- expand.grid(alpha = 0, lambda = seq(0, 1, by = 0.1))

          penalizedLogit_ridge <- train(Class ~ .,
          data = Sonar,
          metric="rmsle",
          method = "glmnet",
          family = "binomial",
          trControl = tc,
          tuneGrid = tg)

          #output
          glmnet

          208 samples
          60 predictor
          2 classes: 'M', 'R'

          No pre-processing
          Resampling: Cross-Validated (2 fold, repeated 2 times)
          Summary of sample sizes: 105, 103, 104, 104
          Resampling results across tuning parameters:

          lambda rmsle
          0.0 0.2835407
          0.1 0.2753197
          0.2 0.2768288
          0.3 0.2797847
          0.4 0.2827953
          0.5 0.2856088
          0.6 0.2881894
          0.7 0.2905501
          0.8 0.2927171
          0.9 0.2947169
          1.0 0.2965505

          Tuning parameter 'alpha' was held constant at a value of 0
          rmsle was used to select the optimal model using the largest value.
          The final values used for the model were alpha = 0 and lambda = 1.


          You can inspect caret::twoClassSummary - it is defined quite similarly.






          share|improve this answer


























          • Sounds very promising. Let me test this really quick before I mark as answer

            – Akshay Gaur
            Nov 13 '18 at 15:27











          • This works! Do I need to worry about this message though - Warning message: In grepl("(Intercept)", colnames(x)) : input string 73 is invalid in this locale?

            – Akshay Gaur
            Nov 13 '18 at 15:47






          • 1





            Glad to help. In order to help with the warning message I would require a reproducible example with the data set provided since the warning is associated with the data set.

            – missuse
            Nov 13 '18 at 16:41











          • Thank you for offering to help. I may ask a separate question if I feel that resolving that is absolutely necessary.

            – Akshay Gaur
            Nov 13 '18 at 21:47














          1












          1








          1







          Your custom metric is not defined properly. If you use classProbs = TRUE and savePredictions = "final" with trainControl you will realize that there are two columns named according to your target classes which hold the predicted probabilities while the data$pred column holds the predicted class which can not be used to calculate the desired metric.



          A proper way to define the function would be to get the possible levels and use them to extract the probabilities for one of the classes:



          rmsleMetric <- function(data, lev = NULL, model = NULL){
          lvls <- levels(data$obs)
          out <- Metrics::rmsle(ifelse(data$obs == lev[2], 0, 1),
          data[, lvls[1]])
          names(out) <- c("rmsle")
          return (out)
          }


          does it work:



          library(caret)
          library(mlbench)
          data(Sonar)
          tc <- trainControl(method = "repeatedcv",
          number = 2,
          repeats = 2,
          summaryFunction = rmsleMetric,
          classProbs = TRUE,
          savePredictions = "final")
          tg <- expand.grid(alpha = 0, lambda = seq(0, 1, by = 0.1))

          penalizedLogit_ridge <- train(Class ~ .,
          data = Sonar,
          metric="rmsle",
          method = "glmnet",
          family = "binomial",
          trControl = tc,
          tuneGrid = tg)

          #output
          glmnet

          208 samples
          60 predictor
          2 classes: 'M', 'R'

          No pre-processing
          Resampling: Cross-Validated (2 fold, repeated 2 times)
          Summary of sample sizes: 105, 103, 104, 104
          Resampling results across tuning parameters:

          lambda rmsle
          0.0 0.2835407
          0.1 0.2753197
          0.2 0.2768288
          0.3 0.2797847
          0.4 0.2827953
          0.5 0.2856088
          0.6 0.2881894
          0.7 0.2905501
          0.8 0.2927171
          0.9 0.2947169
          1.0 0.2965505

          Tuning parameter 'alpha' was held constant at a value of 0
          rmsle was used to select the optimal model using the largest value.
          The final values used for the model were alpha = 0 and lambda = 1.


          You can inspect caret::twoClassSummary - it is defined quite similarly.






          share|improve this answer















          Your custom metric is not defined properly. If you use classProbs = TRUE and savePredictions = "final" with trainControl you will realize that there are two columns named according to your target classes which hold the predicted probabilities while the data$pred column holds the predicted class which can not be used to calculate the desired metric.



          A proper way to define the function would be to get the possible levels and use them to extract the probabilities for one of the classes:



          rmsleMetric <- function(data, lev = NULL, model = NULL){
          lvls <- levels(data$obs)
          out <- Metrics::rmsle(ifelse(data$obs == lev[2], 0, 1),
          data[, lvls[1]])
          names(out) <- c("rmsle")
          return (out)
          }


          does it work:



          library(caret)
          library(mlbench)
          data(Sonar)
          tc <- trainControl(method = "repeatedcv",
          number = 2,
          repeats = 2,
          summaryFunction = rmsleMetric,
          classProbs = TRUE,
          savePredictions = "final")
          tg <- expand.grid(alpha = 0, lambda = seq(0, 1, by = 0.1))

          penalizedLogit_ridge <- train(Class ~ .,
          data = Sonar,
          metric="rmsle",
          method = "glmnet",
          family = "binomial",
          trControl = tc,
          tuneGrid = tg)

          #output
          glmnet

          208 samples
          60 predictor
          2 classes: 'M', 'R'

          No pre-processing
          Resampling: Cross-Validated (2 fold, repeated 2 times)
          Summary of sample sizes: 105, 103, 104, 104
          Resampling results across tuning parameters:

          lambda rmsle
          0.0 0.2835407
          0.1 0.2753197
          0.2 0.2768288
          0.3 0.2797847
          0.4 0.2827953
          0.5 0.2856088
          0.6 0.2881894
          0.7 0.2905501
          0.8 0.2927171
          0.9 0.2947169
          1.0 0.2965505

          Tuning parameter 'alpha' was held constant at a value of 0
          rmsle was used to select the optimal model using the largest value.
          The final values used for the model were alpha = 0 and lambda = 1.


          You can inspect caret::twoClassSummary - it is defined quite similarly.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 13 '18 at 12:59

























          answered Nov 13 '18 at 9:52









          missusemissuse

          11.7k2622




          11.7k2622













          • Sounds very promising. Let me test this really quick before I mark as answer

            – Akshay Gaur
            Nov 13 '18 at 15:27











          • This works! Do I need to worry about this message though - Warning message: In grepl("(Intercept)", colnames(x)) : input string 73 is invalid in this locale?

            – Akshay Gaur
            Nov 13 '18 at 15:47






          • 1





            Glad to help. In order to help with the warning message I would require a reproducible example with the data set provided since the warning is associated with the data set.

            – missuse
            Nov 13 '18 at 16:41











          • Thank you for offering to help. I may ask a separate question if I feel that resolving that is absolutely necessary.

            – Akshay Gaur
            Nov 13 '18 at 21:47



















          • Sounds very promising. Let me test this really quick before I mark as answer

            – Akshay Gaur
            Nov 13 '18 at 15:27











          • This works! Do I need to worry about this message though - Warning message: In grepl("(Intercept)", colnames(x)) : input string 73 is invalid in this locale?

            – Akshay Gaur
            Nov 13 '18 at 15:47






          • 1





            Glad to help. In order to help with the warning message I would require a reproducible example with the data set provided since the warning is associated with the data set.

            – missuse
            Nov 13 '18 at 16:41











          • Thank you for offering to help. I may ask a separate question if I feel that resolving that is absolutely necessary.

            – Akshay Gaur
            Nov 13 '18 at 21:47

















          Sounds very promising. Let me test this really quick before I mark as answer

          – Akshay Gaur
          Nov 13 '18 at 15:27





          Sounds very promising. Let me test this really quick before I mark as answer

          – Akshay Gaur
          Nov 13 '18 at 15:27













          This works! Do I need to worry about this message though - Warning message: In grepl("(Intercept)", colnames(x)) : input string 73 is invalid in this locale?

          – Akshay Gaur
          Nov 13 '18 at 15:47





          This works! Do I need to worry about this message though - Warning message: In grepl("(Intercept)", colnames(x)) : input string 73 is invalid in this locale?

          – Akshay Gaur
          Nov 13 '18 at 15:47




          1




          1





          Glad to help. In order to help with the warning message I would require a reproducible example with the data set provided since the warning is associated with the data set.

          – missuse
          Nov 13 '18 at 16:41





          Glad to help. In order to help with the warning message I would require a reproducible example with the data set provided since the warning is associated with the data set.

          – missuse
          Nov 13 '18 at 16:41













          Thank you for offering to help. I may ask a separate question if I feel that resolving that is absolutely necessary.

          – Akshay Gaur
          Nov 13 '18 at 21:47





          Thank you for offering to help. I may ask a separate question if I feel that resolving that is absolutely necessary.

          – Akshay Gaur
          Nov 13 '18 at 21:47


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53269560%2ferror-when-trying-to-pass-custom-metric-in-caret-package%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Florida Star v. B. J. F.

          Danny Elfman

          Retrieve a Users Dashboard in Tumblr with R and TumblR. Oauth Issues