Discussion:
[erlang-patches] Supervisor shutdown reason when reaching max restarts
unknown
2013-07-04 14:44:12 UTC
Permalink
Hi,

this patch changes the behaviour of supervisors to exit with a more specific reason when exiting due to a maximum restart limit hit. This is especially useful (or even necessary) to distinguish between normal and erroneous process terminations when monitoring a supervisor from another process.

In the above case a supervisor would now exit with {shutdown, {reached_max_restart_intensity, Child}} where Child is whatever is available to describe the child, either a child id or in case of a simple_one_for_one supervisor the offending child's process id. The patch should not affect the OTP restart behaviour (also for cascaded supervisors) since a subclass of 'normal' exit reasons is used.

I'm aware that there is some potential backward incompatibility for people that do not expect {shutdown, Reason} when monitoring a supervisor. However, the feature of exiting normally with {shutdown, Reason} has been around for quite a while now and I think this could be a sensible place to use it. Let me know what you think.

The patch does include tests and updated documentation.

git fetch https://github.com/schlagert/otp.git supervisor_shutdown_reason

https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason
https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason.patch

Regards
Tobias
unknown
2013-07-04 15:33:57 UTC
Permalink
On 07/04/2013 04:44 PM, Tobias Schlager wrote:
> Hi,
>
> this patch changes the behaviour of supervisors to exit with a more specific reason when exiting due to a maximum restart limit hit. This is especially useful (or even necessary) to distinguish between normal and erroneous process terminations when monitoring a supervisor from another process.
>
> In the above case a supervisor would now exit with {shutdown, {reached_max_restart_intensity, Child}} where Child is whatever is available to describe the child, either a child id or in case of a simple_one_for_one supervisor the offending child's process id. The patch should not affect the OTP restart behaviour (also for cascaded supervisors) since a subclass of 'normal' exit reasons is used.
>
> I'm aware that there is some potential backward incompatibility for people that do not expect {shutdown, Reason} when monitoring a supervisor. However, the feature of exiting normally with {shutdown, Reason} has been around for quite a while now and I think this could be a sensible place to use it. Let me know what you think.
>
> The patch does include tests and updated documentation.
>
> git fetch https://github.com/schlagert/otp.git supervisor_shutdown_reason
>
> https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason
> https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason.patch
>
> Regards
> Tobias
> _______________________________________________
> erlang-patches mailing list
> erlang-patches
> http://erlang.org/mailman/listinfo/erlang-patches
Hello Tobias,
I've fetched your patch and assigned it to be reviewed by responsible team.
Thanks,

--

BR Fredrik Gustafsson
Erlang OTP Team
unknown
2013-08-23 07:50:08 UTC
Permalink
Hi Tobias!
Thank you for the patch. We have discussed this on OTP Technical Board, and
have come to the conclusion that some more investigation is needed of the
potential backwards incompatibility. I have written a ticket and the job
will be prioritized into our backlog. Unfortunately we won't make it before
the next release (R16B02).

In order to help us a bit on the way, could you please provide some more
information about your use case? You say that you are monitoring the
supervisor from another process, do you mean other process than the
supervisor's supervisor? If so, could you explain this architecture a bit
more?

Who else will see this exit reason? - application_master? - the parent
supervisors? - other?

Thanks again!
Regards
/siri



2013/7/4 Tobias Schlager <Tobias.Schlager>

> Hi,
>
> this patch changes the behaviour of supervisors to exit with a more
> specific reason when exiting due to a maximum restart limit hit. This is
> especially useful (or even necessary) to distinguish between normal and
> erroneous process terminations when monitoring a supervisor from another
> process.
>
> In the above case a supervisor would now exit with {shutdown,
> {reached_max_restart_intensity, Child}} where Child is whatever is
> available to describe the child, either a child id or in case of a
> simple_one_for_one supervisor the offending child's process id. The patch
> should not affect the OTP restart behaviour (also for cascaded supervisors)
> since a subclass of 'normal' exit reasons is used.
>
> I'm aware that there is some potential backward incompatibility for people
> that do not expect {shutdown, Reason} when monitoring a supervisor.
> However, the feature of exiting normally with {shutdown, Reason} has been
> around for quite a while now and I think this could be a sensible place to
> use it. Let me know what you think.
>
> The patch does include tests and updated documentation.
>
> git fetch https://github.com/schlagert/otp.gitsupervisor_shutdown_reason
>
>
> https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason
>
> https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason.patch
>
> Regards
> Tobias
> _______________________________________________
> erlang-patches mailing list
> erlang-patches
> http://erlang.org/mailman/listinfo/erlang-patches
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20130823/49b4beb5/attachment.html>
unknown
2013-08-23 10:26:28 UTC
Permalink
Hi Siri,

glad to hear from you, I'll try to do my best to explain the use case I have in mind.

Consider you have a one_for_one (or simple_one_for_one) supervisor A with a worker child B that dynamically adds children to A (using supervisor:start_child/2). Now consider these children also are supervisors of type C with various statically configured workers. I now would like to monitor supervisors of type C from worker B to be able to take some action when *something goes wrong* at one of the C supervisors (e.g. C crashed because of one of its subworkers). However, I can't differentiate between 'something went wrong' or a supervisor C just exited gracefully (e.g. the application was stopped) because supervisors only exit with reason normal or shutdown. It is arguable whether to use another restart type for C supervisors in order to propagate the exit. However, I don't want to crash the whole supervision just to be able to tell that something failed to restart somewhere down the supervision path.

In general, the new exit reasons are visible to all processes linked with supervisors or monitoring them (so parent supervisors as well as the application master will see these reasons). This is why I chose the '{shutdown, Reason}' format, which must be supported (according to the documentation this is considered a normal exit reason). Thus, changing the exit reasons will not affect the behaviour of supervision hierarchies (verified by the test suite) or the application master (as far as I can tell). The backward incompatibilty is located in processes depending on the undocumented behaviour of supervisors always exiting with normal or shutdown and not with '{shutdown, Reason}'.

I hope, that my explanation makes things a bit clearer (and not worse).

Regards
Tobias

________________________________
Von: Siri Hansen [erlangsiri]
Gesendet: Freitag, 23. August 2013 09:50
An: Tobias Schlager
Cc: erlang-patches
Betreff: Re: [erlang-patches] Supervisor shutdown reason when reaching max restarts

Hi Tobias!
Thank you for the patch. We have discussed this on OTP Technical Board, and have come to the conclusion that some more investigation is needed of the potential backwards incompatibility. I have written a ticket and the job will be prioritized into our backlog. Unfortunately we won't make it before the next release (R16B02).

In order to help us a bit on the way, could you please provide some more information about your use case? You say that you are monitoring the supervisor from another process, do you mean other process than the supervisor's supervisor? If so, could you explain this architecture a bit more?

Who else will see this exit reason? - application_master? - the parent supervisors? - other?

Thanks again!
Regards
/siri



2013/7/4 Tobias Schlager <Tobias.Schlager<mailto:Tobias.Schlager>>
Hi,

this patch changes the behaviour of supervisors to exit with a more specific reason when exiting due to a maximum restart limit hit. This is especially useful (or even necessary) to distinguish between normal and erroneous process terminations when monitoring a supervisor from another process.

In the above case a supervisor would now exit with {shutdown, {reached_max_restart_intensity, Child}} where Child is whatever is available to describe the child, either a child id or in case of a simple_one_for_one supervisor the offending child's process id. The patch should not affect the OTP restart behaviour (also for cascaded supervisors) since a subclass of 'normal' exit reasons is used.

I'm aware that there is some potential backward incompatibility for people that do not expect {shutdown, Reason} when monitoring a supervisor. However, the feature of exiting normally with {shutdown, Reason} has been around for quite a while now and I think this could be a sensible place to use it. Let me know what you think.

The patch does include tests and updated documentation.

git fetch https://github.com/schlagert/otp.git supervisor_shutdown_reason

https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason
https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason.patch

Regards
Tobias
_______________________________________________
erlang-patches mailing list
erlang-patches<mailto:erlang-patches>
http://erlang.org/mailman/listinfo/erlang-patches

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20130823/c9a11671/attachment.html>
unknown
2013-08-23 12:17:21 UTC
Permalink
Tobias, thanks for a good explanation. I will post this into the ticket and
we'll take it into account when finishing the review.
Thanks again for your contribution!
/siri


2013/8/23 Tobias Schlager <Tobias.Schlager>

> Hi Siri,
>
> glad to hear from you, I'll try to do my best to explain the use case I
> have in mind.
>
> Consider you have a one_for_one (or simple_one_for_one) supervisor A with
> a worker child B that dynamically adds children to A (using
> supervisor:start_child/2). Now consider these children also are supervisors
> of type C with various statically configured workers. I now would like to
> monitor supervisors of type C from worker B to be able to take some action
> when *something goes wrong* at one of the C supervisors (e.g. C crashed
> because of one of its subworkers). However, I can't differentiate between
> 'something went wrong' or a supervisor C just exited gracefully (e.g. the
> application was stopped) because supervisors only exit with reason normal
> or shutdown. It is arguable whether to use another restart type for C
> supervisors in order to propagate the exit. However, I don't want to crash
> the whole supervision just to be able to tell that something failed to
> restart somewhere down the supervision path.
>
> In general, the new exit reasons are visible to all processes linked with
> supervisors or monitoring them (so parent supervisors as well as the
> application master will see these reasons). This is why I chose the
> '{shutdown, Reason}' format, which must be supported (according to the
> documentation this is considered a normal exit reason). Thus, changing the
> exit reasons will not affect the behaviour of supervision hierarchies
> (verified by the test suite) or the application master (as far as I can
> tell). The backward incompatibilty is located in processes depending on the
> undocumented behaviour of supervisors always exiting with normal or
> shutdown and not with '{shutdown, Reason}'.
>
> I hope, that my explanation makes things a bit clearer (and not worse).
>
> Regards
> Tobias
>
> ------------------------------
> *Von:* Siri Hansen [erlangsiri]
> *Gesendet:* Freitag, 23. August 2013 09:50
> *An:* Tobias Schlager
> *Cc:* erlang-patches
> *Betreff:* Re: [erlang-patches] Supervisor shutdown reason when reaching
> max restarts
>
> Hi Tobias!
> Thank you for the patch. We have discussed this on OTP Technical Board,
> and have come to the conclusion that some more investigation is needed of
> the potential backwards incompatibility. I have written a ticket and the
> job will be prioritized into our backlog. Unfortunately we won't make it
> before the next release (R16B02).
>
> In order to help us a bit on the way, could you please provide some more
> information about your use case? You say that you are monitoring the
> supervisor from another process, do you mean other process than the
> supervisor's supervisor? If so, could you explain this architecture a bit
> more?
>
> Who else will see this exit reason? - application_master? - the parent
> supervisors? - other?
>
> Thanks again!
> Regards
> /siri
>
>
>
> 2013/7/4 Tobias Schlager <Tobias.Schlager>
>
>> Hi,
>>
>> this patch changes the behaviour of supervisors to exit with a more
>> specific reason when exiting due to a maximum restart limit hit. This is
>> especially useful (or even necessary) to distinguish between normal and
>> erroneous process terminations when monitoring a supervisor from another
>> process.
>>
>> In the above case a supervisor would now exit with {shutdown,
>> {reached_max_restart_intensity, Child}} where Child is whatever is
>> available to describe the child, either a child id or in case of a
>> simple_one_for_one supervisor the offending child's process id. The patch
>> should not affect the OTP restart behaviour (also for cascaded supervisors)
>> since a subclass of 'normal' exit reasons is used.
>>
>> I'm aware that there is some potential backward incompatibility for
>> people that do not expect {shutdown, Reason} when monitoring a supervisor.
>> However, the feature of exiting normally with {shutdown, Reason} has been
>> around for quite a while now and I think this could be a sensible place to
>> use it. Let me know what you think.
>>
>> The patch does include tests and updated documentation.
>>
>> git fetch https://github.com/schlagert/otp.gitsupervisor_shutdown_reason
>>
>>
>> https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason
>>
>> https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason.patch
>>
>> Regards
>> Tobias
>> _______________________________________________
>> erlang-patches mailing list
>> erlang-patches
>> http://erlang.org/mailman/listinfo/erlang-patches
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20130823/c514772e/attachment.html>
unknown
2014-04-16 12:54:22 UTC
Permalink
Hello Tobias and the list!

This review has finally made its way up the backlog. Thanks for your
patience!

Is this use case still relevant?

I've started looking at the compatibility issue. As far as I can see there
shouldn't be a big problem upwards in a normal supervision tree, since this
is internal to OTP. Downwards, however, there might be problems since exit
trapping children will get {shutdown,Reason} instead of shutdown when a
supervisor terminates due to a failed restart attempt. In the case of a
gen_server or gen_fsm, this exit reason will be propagated to the callback
module via Mod:terminate/2. The value 'shutdown', used when the supervisor
orders termination, is documented for this callback function. For other
proc_lib processes the receive statement is implemented individually, so
even here the change will be exposed to the user (if matching the exit
reason).

Links and monitors across the supervision tree might also be a problem, of
course. And the case when something other than a supervisor or the
application_master is the parent of a supervisor - I don't know if this is
very common, though, and I find the incompatibility downwards in the
supervision tree most alarming.

As you know, we need to be very careful with backwards incompatible changes
in core functionality, so for the time being we would like to investigate
the general need for this functionality and if there are other possible
ways to go.

Some questions for the list:

1. Is the use case described in the Tobias' previous mail a common one? If
yes, how have you solved it?

2. Do you normally care about the exit reason in your Mod:terminate/2
function or in your proc_lib receive statement when getting an EXIT from
the parent?

3. Any other comments on this?

Thanks
/siri



2013-08-23 14:17 GMT+02:00 Siri Hansen <erlangsiri>:

> Tobias, thanks for a good explanation. I will post this into the ticket
> and we'll take it into account when finishing the review.
> Thanks again for your contribution!
> /siri
>
>
> 2013/8/23 Tobias Schlager <Tobias.Schlager>
>
>> Hi Siri,
>>
>> glad to hear from you, I'll try to do my best to explain the use case I
>> have in mind.
>>
>> Consider you have a one_for_one (or simple_one_for_one) supervisor A with
>> a worker child B that dynamically adds children to A (using
>> supervisor:start_child/2). Now consider these children also are supervisors
>> of type C with various statically configured workers. I now would like to
>> monitor supervisors of type C from worker B to be able to take some action
>> when *something goes wrong* at one of the C supervisors (e.g. C crashed
>> because of one of its subworkers). However, I can't differentiate between
>> 'something went wrong' or a supervisor C just exited gracefully (e.g. the
>> application was stopped) because supervisors only exit with reason normal
>> or shutdown. It is arguable whether to use another restart type for C
>> supervisors in order to propagate the exit. However, I don't want to crash
>> the whole supervision just to be able to tell that something failed to
>> restart somewhere down the supervision path.
>>
>> In general, the new exit reasons are visible to all processes linked with
>> supervisors or monitoring them (so parent supervisors as well as the
>> application master will see these reasons). This is why I chose the
>> '{shutdown, Reason}' format, which must be supported (according to the
>> documentation this is considered a normal exit reason). Thus, changing the
>> exit reasons will not affect the behaviour of supervision hierarchies
>> (verified by the test suite) or the application master (as far as I can
>> tell). The backward incompatibilty is located in processes depending on the
>> undocumented behaviour of supervisors always exiting with normal or
>> shutdown and not with '{shutdown, Reason}'.
>>
>> I hope, that my explanation makes things a bit clearer (and not worse).
>>
>> Regards
>> Tobias
>>
>> ------------------------------
>> *Von:* Siri Hansen [erlangsiri]
>> *Gesendet:* Freitag, 23. August 2013 09:50
>> *An:* Tobias Schlager
>> *Cc:* erlang-patches
>> *Betreff:* Re: [erlang-patches] Supervisor shutdown reason when reaching
>> max restarts
>>
>> Hi Tobias!
>> Thank you for the patch. We have discussed this on OTP Technical Board,
>> and have come to the conclusion that some more investigation is needed of
>> the potential backwards incompatibility. I have written a ticket and the
>> job will be prioritized into our backlog. Unfortunately we won't make it
>> before the next release (R16B02).
>>
>> In order to help us a bit on the way, could you please provide some
>> more information about your use case? You say that you are monitoring the
>> supervisor from another process, do you mean other process than the
>> supervisor's supervisor? If so, could you explain this architecture a bit
>> more?
>>
>> Who else will see this exit reason? - application_master? - the parent
>> supervisors? - other?
>>
>> Thanks again!
>> Regards
>> /siri
>>
>>
>>
>> 2013/7/4 Tobias Schlager <Tobias.Schlager>
>>
>>> Hi,
>>>
>>> this patch changes the behaviour of supervisors to exit with a more
>>> specific reason when exiting due to a maximum restart limit hit. This is
>>> especially useful (or even necessary) to distinguish between normal and
>>> erroneous process terminations when monitoring a supervisor from another
>>> process.
>>>
>>> In the above case a supervisor would now exit with {shutdown,
>>> {reached_max_restart_intensity, Child}} where Child is whatever is
>>> available to describe the child, either a child id or in case of a
>>> simple_one_for_one supervisor the offending child's process id. The patch
>>> should not affect the OTP restart behaviour (also for cascaded supervisors)
>>> since a subclass of 'normal' exit reasons is used.
>>>
>>> I'm aware that there is some potential backward incompatibility for
>>> people that do not expect {shutdown, Reason} when monitoring a supervisor.
>>> However, the feature of exiting normally with {shutdown, Reason} has been
>>> around for quite a while now and I think this could be a sensible place to
>>> use it. Let me know what you think.
>>>
>>> The patch does include tests and updated documentation.
>>>
>>> git fetch https://github.com/schlagert/otp.gitsupervisor_shutdown_reason
>>>
>>>
>>> https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason
>>>
>>> https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason.patch
>>>
>>> Regards
>>> Tobias
>>> _______________________________________________
>>> erlang-patches mailing list
>>> erlang-patches
>>> http://erlang.org/mailman/listinfo/erlang-patches
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20140416/da5248ca/attachment.html>
unknown
2014-04-16 13:49:20 UTC
Permalink
Hi Siri,

I already solved the problem for my use case. In my case the supervisors of type C never exit gracefully except for the case the application gets stopped, so I ended up 'disarming' the monitors of process B in an 'application:prep_stop/1' callback.

Of course, this is a debatable patch because it introduces backwards incompatible behavior and I will not argue if it gets declined. I just thought that having differentiated exit reason of supervisors would be a good thing to have in general (and that according to the documentation {shutdown, Reason} exits must be supported by callback implementations anyways).

Regards
Tobias

--

Lindenbaum GmbH
Dipl.-Inform. Tobias Schlager, Software Engineer
Erbprinzenstr. 4-12, 76133 Karlsruhe, Deutschland
Tel: +49 721 48 08 48 - 000, Fax: +49 721 48 08 48 - 801,
E-Mail: tobias.schlager

Firmensitz: Erbprinzenstr. 4-12, Eingang A, 76133 Karlsruhe
Registergericht: Amtsgericht Mannheim, HRB 706184
Gesch?ftsf?hrer: Dr. Ralf Nikolai
Steuernummer: 35007/02060, USt. ID: DE 263797265

http://www.lindenbaum.eu


________________________________
Von: Siri Hansen [erlangsiri]
Gesendet: Mittwoch, 16. April 2014 14:54
An: Tobias Schlager
Cc: erlang-patches
Betreff: Re: [erlang-patches] Supervisor shutdown reason when reaching max restarts

Hello Tobias and the list!

This review has finally made its way up the backlog. Thanks for your patience!

Is this use case still relevant?

I've started looking at the compatibility issue. As far as I can see there shouldn't be a big problem upwards in a normal supervision tree, since this is internal to OTP. Downwards, however, there might be problems since exit trapping children will get {shutdown,Reason} instead of shutdown when a supervisor terminates due to a failed restart attempt. In the case of a gen_server or gen_fsm, this exit reason will be propagated to the callback module via Mod:terminate/2. The value 'shutdown', used when the supervisor orders termination, is documented for this callback function. For other proc_lib processes the receive statement is implemented individually, so even here the change will be exposed to the user (if matching the exit reason).

Links and monitors across the supervision tree might also be a problem, of course. And the case when something other than a supervisor or the application_master is the parent of a supervisor - I don't know if this is very common, though, and I find the incompatibility downwards in the supervision tree most alarming.

As you know, we need to be very careful with backwards incompatible changes in core functionality, so for the time being we would like to investigate the general need for this functionality and if there are other possible ways to go.

Some questions for the list:

1. Is the use case described in the Tobias' previous mail a common one? If yes, how have you solved it?

2. Do you normally care about the exit reason in your Mod:terminate/2 function or in your proc_lib receive statement when getting an EXIT from the parent?

3. Any other comments on this?

Thanks
/siri



2013-08-23 14:17 GMT+02:00 Siri Hansen <erlangsiri<mailto:erlangsiri>>:
Tobias, thanks for a good explanation. I will post this into the ticket and we'll take it into account when finishing the review.
Thanks again for your contribution!
/siri


2013/8/23 Tobias Schlager <Tobias.Schlager<mailto:Tobias.Schlager>>
Hi Siri,

glad to hear from you, I'll try to do my best to explain the use case I have in mind.

Consider you have a one_for_one (or simple_one_for_one) supervisor A with a worker child B that dynamically adds children to A (using supervisor:start_child/2). Now consider these children also are supervisors of type C with various statically configured workers. I now would like to monitor supervisors of type C from worker B to be able to take some action when *something goes wrong* at one of the C supervisors (e.g. C crashed because of one of its subworkers). However, I can't differentiate between 'something went wrong' or a supervisor C just exited gracefully (e.g. the application was stopped) because supervisors only exit with reason normal or shutdown. It is arguable whether to use another restart type for C supervisors in order to propagate the exit. However, I don't want to crash the whole supervision just to be able to tell that something failed to restart somewhere down the supervision path.

In general, the new exit reasons are visible to all processes linked with supervisors or monitoring them (so parent supervisors as well as the application master will see these reasons). This is why I chose the '{shutdown, Reason}' format, which must be supported (according to the documentation this is considered a normal exit reason). Thus, changing the exit reasons will not affect the behaviour of supervision hierarchies (verified by the test suite) or the application master (as far as I can tell). The backward incompatibilty is located in processes depending on the undocumented behaviour of supervisors always exiting with normal or shutdown and not with '{shutdown, Reason}'.

I hope, that my explanation makes things a bit clearer (and not worse).

Regards
Tobias

________________________________
Von: Siri Hansen [erlangsiri<mailto:erlangsiri>]
Gesendet: Freitag, 23. August 2013 09:50
An: Tobias Schlager
Cc: erlang-patches<mailto:erlang-patches>
Betreff: Re: [erlang-patches] Supervisor shutdown reason when reaching max restarts

Hi Tobias!
Thank you for the patch. We have discussed this on OTP Technical Board, and have come to the conclusion that some more investigation is needed of the potential backwards incompatibility. I have written a ticket and the job will be prioritized into our backlog. Unfortunately we won't make it before the next release (R16B02).

In order to help us a bit on the way, could you please provide some more information about your use case? You say that you are monitoring the supervisor from another process, do you mean other process than the supervisor's supervisor? If so, could you explain this architecture a bit more?

Who else will see this exit reason? - application_master? - the parent supervisors? - other?

Thanks again!
Regards
/siri



2013/7/4 Tobias Schlager <Tobias.Schlager<mailto:Tobias.Schlager>>
Hi,

this patch changes the behaviour of supervisors to exit with a more specific reason when exiting due to a maximum restart limit hit. This is especially useful (or even necessary) to distinguish between normal and erroneous process terminations when monitoring a supervisor from another process.

In the above case a supervisor would now exit with {shutdown, {reached_max_restart_intensity, Child}} where Child is whatever is available to describe the child, either a child id or in case of a simple_one_for_one supervisor the offending child's process id. The patch should not affect the OTP restart behaviour (also for cascaded supervisors) since a subclass of 'normal' exit reasons is used.

I'm aware that there is some potential backward incompatibility for people that do not expect {shutdown, Reason} when monitoring a supervisor. However, the feature of exiting normally with {shutdown, Reason} has been around for quite a while now and I think this could be a sensible place to use it. Let me know what you think.

The patch does include tests and updated documentation.

git fetch https://github.com/schlagert/otp.git supervisor_shutdown_reason

https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason
https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason.patch

Regards
Tobias
_______________________________________________
erlang-patches mailing list
erlang-patches<mailto:erlang-patches>
http://erlang.org/mailman/listinfo/erlang-patches



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20140416/6f53e018/attachment-0001.html>
unknown
2014-05-12 12:17:13 UTC
Permalink
Hi Tobias!

It seems this feature would be "good to have" but is not totally necessary.
Since there has been very little feeback on this thread, and given the
backwards incompatibility, we have thus decided to reject it.

Thanks for you effort!
Best Regards
/siri


2014-04-16 15:49 GMT+02:00 Tobias Schlager <Tobias.Schlager>:

> Hi Siri,
>
> I already solved the problem for my use case. In my case the supervisors
> of type C never exit gracefully except for the case the application gets
> stopped, so I ended up 'disarming' the monitors of process B in an
> 'application:prep_stop/1' callback.
>
> Of course, this is a debatable patch because it introduces backwards
> incompatible behavior and I will not argue if it gets declined. I just
> thought that having differentiated exit reason of supervisors would be a
> good thing to have in general (and that according to the documentation
> {shutdown, Reason} exits must be supported by callback implementations
> anyways).
>
> Regards
> Tobias
>
> --
>
> Lindenbaum GmbH
> Dipl.-Inform. Tobias Schlager, Software Engineer
> Erbprinzenstr. 4-12, 76133 Karlsruhe, Deutschland
> Tel: +49 721 48 08 48 - 000, Fax: +49 721 48 08 48 - 801,
> E-Mail: tobias.schlager
>
> Firmensitz: Erbprinzenstr. 4-12, Eingang A, 76133 Karlsruhe
> Registergericht: Amtsgericht Mannheim, HRB 706184
> Gesch?ftsf?hrer: Dr. Ralf Nikolai
> Steuernummer: 35007/02060, USt. ID: DE 263797265
>
> http://www.lindenbaum.eu
>
>
> ------------------------------
> *Von:* Siri Hansen [erlangsiri]
> *Gesendet:* Mittwoch, 16. April 2014 14:54
>
> *An:* Tobias Schlager
> *Cc:* erlang-patches
> *Betreff:* Re: [erlang-patches] Supervisor shutdown reason when reaching
> max restarts
>
> Hello Tobias and the list!
>
> This review has finally made its way up the backlog. Thanks for your
> patience!
>
> Is this use case still relevant?
>
> I've started looking at the compatibility issue. As far as I can see
> there shouldn't be a big problem upwards in a normal supervision tree,
> since this is internal to OTP. Downwards, however, there might be problems
> since exit trapping children will get {shutdown,Reason} instead of shutdown
> when a supervisor terminates due to a failed restart attempt. In the case
> of a gen_server or gen_fsm, this exit reason will be propagated to the
> callback module via Mod:terminate/2. The value 'shutdown', used when the
> supervisor orders termination, is documented for this callback function.
> For other proc_lib processes the receive statement is implemented
> individually, so even here the change will be exposed to the user (if
> matching the exit reason).
>
> Links and monitors across the supervision tree might also be a problem,
> of course. And the case when something other than a supervisor or the
> application_master is the parent of a supervisor - I don't know if this is
> very common, though, and I find the incompatibility downwards in the
> supervision tree most alarming.
>
> As you know, we need to be very careful with backwards incompatible
> changes in core functionality, so for the time being we would like to
> investigate the general need for this functionality and if there are other
> possible ways to go.
>
> Some questions for the list:
>
> 1. Is the use case described in the Tobias' previous mail a common one?
> If yes, how have you solved it?
>
> 2. Do you normally care about the exit reason in your Mod:terminate/2
> function or in your proc_lib receive statement when getting an EXIT from
> the parent?
>
> 3. Any other comments on this?
>
> Thanks
> /siri
>
>
>
> 2013-08-23 14:17 GMT+02:00 Siri Hansen <erlangsiri>:
>
>> Tobias, thanks for a good explanation. I will post this into the ticket
>> and we'll take it into account when finishing the review.
>> Thanks again for your contribution!
>> /siri
>>
>>
>> 2013/8/23 Tobias Schlager <Tobias.Schlager>
>>
>>> Hi Siri,
>>>
>>> glad to hear from you, I'll try to do my best to explain the use case I
>>> have in mind.
>>>
>>> Consider you have a one_for_one (or simple_one_for_one) supervisor A
>>> with a worker child B that dynamically adds children to A (using
>>> supervisor:start_child/2). Now consider these children also are supervisors
>>> of type C with various statically configured workers. I now would like to
>>> monitor supervisors of type C from worker B to be able to take some action
>>> when *something goes wrong* at one of the C supervisors (e.g. C crashed
>>> because of one of its subworkers). However, I can't differentiate between
>>> 'something went wrong' or a supervisor C just exited gracefully (e.g. the
>>> application was stopped) because supervisors only exit with reason normal
>>> or shutdown. It is arguable whether to use another restart type for C
>>> supervisors in order to propagate the exit. However, I don't want to crash
>>> the whole supervision just to be able to tell that something failed to
>>> restart somewhere down the supervision path.
>>>
>>> In general, the new exit reasons are visible to all processes linked
>>> with supervisors or monitoring them (so parent supervisors as well as the
>>> application master will see these reasons). This is why I chose the
>>> '{shutdown, Reason}' format, which must be supported (according to the
>>> documentation this is considered a normal exit reason). Thus, changing the
>>> exit reasons will not affect the behaviour of supervision hierarchies
>>> (verified by the test suite) or the application master (as far as I can
>>> tell). The backward incompatibilty is located in processes depending on the
>>> undocumented behaviour of supervisors always exiting with normal or
>>> shutdown and not with '{shutdown, Reason}'.
>>>
>>> I hope, that my explanation makes things a bit clearer (and not worse).
>>>
>>> Regards
>>> Tobias
>>>
>>> ------------------------------
>>> *Von:* Siri Hansen [erlangsiri]
>>> *Gesendet:* Freitag, 23. August 2013 09:50
>>> *An:* Tobias Schlager
>>> *Cc:* erlang-patches
>>> *Betreff:* Re: [erlang-patches] Supervisor shutdown reason when
>>> reaching max restarts
>>>
>>> Hi Tobias!
>>> Thank you for the patch. We have discussed this on OTP Technical Board,
>>> and have come to the conclusion that some more investigation is needed of
>>> the potential backwards incompatibility. I have written a ticket and the
>>> job will be prioritized into our backlog. Unfortunately we won't make it
>>> before the next release (R16B02).
>>>
>>> In order to help us a bit on the way, could you please provide some
>>> more information about your use case? You say that you are monitoring the
>>> supervisor from another process, do you mean other process than the
>>> supervisor's supervisor? If so, could you explain this architecture a bit
>>> more?
>>>
>>> Who else will see this exit reason? - application_master? - the parent
>>> supervisors? - other?
>>>
>>> Thanks again!
>>> Regards
>>> /siri
>>>
>>>
>>>
>>> 2013/7/4 Tobias Schlager <Tobias.Schlager>
>>>
>>>> Hi,
>>>>
>>>> this patch changes the behaviour of supervisors to exit with a more
>>>> specific reason when exiting due to a maximum restart limit hit. This is
>>>> especially useful (or even necessary) to distinguish between normal and
>>>> erroneous process terminations when monitoring a supervisor from another
>>>> process.
>>>>
>>>> In the above case a supervisor would now exit with {shutdown,
>>>> {reached_max_restart_intensity, Child}} where Child is whatever is
>>>> available to describe the child, either a child id or in case of a
>>>> simple_one_for_one supervisor the offending child's process id. The patch
>>>> should not affect the OTP restart behaviour (also for cascaded supervisors)
>>>> since a subclass of 'normal' exit reasons is used.
>>>>
>>>> I'm aware that there is some potential backward incompatibility for
>>>> people that do not expect {shutdown, Reason} when monitoring a supervisor.
>>>> However, the feature of exiting normally with {shutdown, Reason} has been
>>>> around for quite a while now and I think this could be a sensible place to
>>>> use it. Let me know what you think.
>>>>
>>>> The patch does include tests and updated documentation.
>>>>
>>>> git fetch https://github.com/schlagert/otp.gitsupervisor_shutdown_reason
>>>>
>>>>
>>>> https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason
>>>>
>>>> https://github.com/schlagert/otp/compare/erlang:master...supervisor_shutdown_reason.patch
>>>>
>>>> Regards
>>>> Tobias
>>>> _______________________________________________
>>>> erlang-patches mailing list
>>>> erlang-patches
>>>> http://erlang.org/mailman/listinfo/erlang-patches
>>>>
>>>
>>>
>>
>
> _______________________________________________
> erlang-patches mailing list
> erlang-patches
> http://erlang.org/mailman/listinfo/erlang-patches
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20140512/81a995f7/attachment.html>
Loading...