Rocksolid Light

Welcome to Rocksolid Light

mail  files  register  newsreader  groups  login

Message-ID:  

"The hands that help are better far than the lips that pray." -- Robert G. Ingersoll


tech / sci.electronics.design / Predictive failures

SubjectAuthor
* Predictive failuresDon Y
+* Re:Predictive failuresMartin Rid
|`* Re: Re:Predictive failuresDon Y
| `* Re: Re:Predictive failuresEdward Rawde
|  `* Re: Re:Predictive failuresDon Y
|   +* Re: Re:Predictive failuresEdward Rawde
|   |+* Re: Predictive failuresJoe Gwinn
|   ||`- Re: Predictive failuresEdward Rawde
|   |`- Re: Predictive failureslegg
|   `* Re: Re:Predictive failuresEdward Rawde
|    `* Re: Re:Predictive failuresDon Y
|     +* Re: Re:Predictive failuresEdward Rawde
|     |`* Re: Re:Predictive failuresDon Y
|     | +* Re: Re:Predictive failuresEdward Rawde
|     | |`* Re: Re:Predictive failuresDon Y
|     | | `* Re: Re:Predictive failuresEdward Rawde
|     | |  `* Re: Re:Predictive failuresDon Y
|     | |   `* Re: Re:Predictive failuresEdward Rawde
|     | |    `* Re: Re:Predictive failuresDon Y
|     | |     `* Re: Re:Predictive failuresEdward Rawde
|     | |      `* Re: Re:Predictive failuresDon Y
|     | |       `* Re: Re:Predictive failuresEdward Rawde
|     | |        `* Re: Re:Predictive failuresDon Y
|     | |         `* Re: Re:Predictive failuresEdward Rawde
|     | |          `* Re: Re:Predictive failuresDon Y
|     | |           `- Re: Re:Predictive failuresEdward Rawde
|     | `* Re: Predictive failuresJasen Betts
|     |  `- Re: Predictive failuresDon Y
|     `* Re: Predictive failuresLiz Tuddenham
|      `- Re: Predictive failuresDon Y
+- Re: Predictive failuresjohn larkin
+* Re: Predictive failuresJoe Gwinn
|`* Re: Predictive failuresjohn larkin
| `* Re: Predictive failuresJoe Gwinn
|  +* Re: Predictive failuresjohn larkin
|  |`* Re: Predictive failuresJoe Gwinn
|  | `* Re: Predictive failuresJohn Larkin
|  |  +* Re: Predictive failuresJoe Gwinn
|  |  |`* Re: Predictive failuresJohn Larkin
|  |  | +* Re: Predictive failuresEdward Rawde
|  |  | |`* Re: Predictive failuresJohn Larkin
|  |  | | `- Re: Predictive failuresEdward Rawde
|  |  | `- Re: Predictive failuresJoe Gwinn
|  |  `- Re: Predictive failuresGlen Walpert
|  `* Re: Predictive failuresPhil Hobbs
|   +- Re: Predictive failuresJohn Larkin
|   `- Re: Predictive failuresJoe Gwinn
+* Re: Predictive failuresEdward Rawde
|`* Re: Predictive failuresDon Y
| +* Re: Predictive failuresEdward Rawde
| |+* Re: Predictive failuresDon Y
| ||`- Re: Predictive failuresEdward Rawde
| |`- Re: Predictive failuresMartin Brown
| `* Re: Predictive failuresChris Jones
|  `* Re: Predictive failuresDon Y
|   `- Re: Predictive failuresDon Y
+* Re: Predictive failuresMartin Brown
|+- Re: Predictive failuresDon Y
|`* Re: Predictive failuresJohn Larkin
| `* Re: Predictive failuresBill Sloman
|  `* Re: Predictive failuresEdward Rawde
|   `* Re: Predictive failuresJohn Larkin
|    `* Re: Predictive failuresEdward Rawde
|     `* Re: Predictive failuresJohn Larkin
|      `* Re: Predictive failuresJohn Larkin
|       `- Re: Predictive failuresEdward Rawde
+* Re: Predictive failuresDon
|+* Re: Predictive failuresEdward Rawde
||+- Re: Predictive failuresDon
||`- Re: Predictive failuresDon Y
|+* Re: Predictive failuresjohn larkin
||`* Re: Predictive failuresDon
|| `* Re: Predictive failuresJohn Larkin
||  `- Re: Predictive failuresDon
|`- Re: Predictive failuresDon Y
`* Re: Predictive failuresBuzz McCool
 `* Re: Predictive failuresDon Y
  +* Re: Predictive failuresGlen Walpert
  |`- Re: Predictive failuresDon Y
  `* Re: Predictive failuresboB
   `* Re: Predictive failuresDon Y
    `* Re: Predictive failuresboB
     `- Re: Predictive failuresDon Y

Pages:1234
Predictive failures

<uvjn74$d54b$1@dont-email.me>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136395&group=sci.electronics.design#136395

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: blockedofcourse@foo.invalid (Don Y)
Newsgroups: sci.electronics.design
Subject: Predictive failures
Date: Mon, 15 Apr 2024 10:13:02 -0700
Organization: A noiseless patient Spider
Lines: 11
Message-ID: <uvjn74$d54b$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 15 Apr 2024 19:13:08 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="da2971c44a50358724a18132e5903f1e";
logging-data="431243"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18/L+JXYzlLq4IdEl1bzP/L"
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.2.2
Cancel-Lock: sha1:X99tkglJ6uea0eykZUF4F6Kj1SU=
Content-Language: en-US
 by: Don Y - Mon, 15 Apr 2024 17:13 UTC

Is there a general rule of thumb for signalling the likelihood of
an "imminent" (for some value of "imminent") hardware failure?

I suspect most would involve *relative* changes that would be
suggestive of changing conditions in the components (and not
directly related to environmental influences).

So, perhaps, a good strategy is to just "watch" everything and
notice the sorts of changes you "typically" encounter in the hope
that something of greater magnitude would be a harbinger...

Re:Predictive failures

<uvjobr$dfi2$1@dont-email.me>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136396&group=sci.electronics.design#136396

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: martin_riddle@verison.net (Martin Rid)
Newsgroups: sci.electronics.design
Subject: Re:Predictive failures
Date: Mon, 15 Apr 2024 13:32:41 -0400 (EDT)
Organization: news.eternal-september.org
Lines: 11
Message-ID: <uvjobr$dfi2$1@dont-email.me>
References: <uvjn74$d54b$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 15 Apr 2024 19:32:44 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="e56ebcc5079992aa5316bd0409afecd3";
logging-data="441922"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19igk7GNms2Abj1uIL/oYBj"
Cancel-Lock: sha1:UENGkPx7P+2dMizC/j5ntazP+Tk=
X-Newsreader: PiaoHong.Usenet.Client.Free:2.02
 by: Martin Rid - Mon, 15 Apr 2024 17:32 UTC

Don Y <blockedofcourse@foo.invalid> Wrote in message:r
> Is there a general rule of thumb for signalling the likelihood ofan "imminent" (for some value of "imminent") hardware failure?I suspect most would involve *relative* changes that would besuggestive of changing conditions in the components (and notdirectly related to environmental influences).So, perhaps, a good strategy is to just "watch" everything andnotice the sorts of changes you "typically" encounter in the hopethat something of greater magnitude would be a harbinger...

Current and voltages outside of normal operation?

Cheers
--

----Android NewsGroup Reader----
https://piaohong.s3-us-west-2.amazonaws.com/usenet/index.html

Re: Predictive failures

<u1sq1j1ikum0b2vrp0vji6mu115bp2s12l@4ax.com>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136397&group=sci.electronics.design#136397

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!69.80.99.23.MISMATCH!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!nntp.supernews.com!news.supernews.com.POSTED!not-for-mail
NNTP-Posting-Date: Mon, 15 Apr 2024 18:28:13 +0000
From: jl@650pot.com (john larkin)
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Mon, 15 Apr 2024 11:28:13 -0700
Message-ID: <u1sq1j1ikum0b2vrp0vji6mu115bp2s12l@4ax.com>
References: <uvjn74$d54b$1@dont-email.me>
User-Agent: ForteAgent/8.00.32.1272
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 27
X-Trace: sv3-i03kBMW7OSBynXMJ8EPNL5ZJXqDYDh4qGMPgOYJdJknQ/XM1xYBhdb8Siw8Y8DrUZYSTI313LoVdA2D!N7NJ2HHsTulw7D4kQeeq/e38kXHw+nA9tNC9q1F8QxqQe7dSVb3hWt0M5SbGIcFVMceOGBoIE4iW!HZ5IwQ==
X-Complaints-To: www.supernews.com/docs/abuse.html
X-DMCA-Complaints-To: www.supernews.com/docs/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
 by: john larkin - Mon, 15 Apr 2024 18:28 UTC

On Mon, 15 Apr 2024 10:13:02 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>Is there a general rule of thumb for signalling the likelihood of
>an "imminent" (for some value of "imminent") hardware failure?
>
>I suspect most would involve *relative* changes that would be
>suggestive of changing conditions in the components (and not
>directly related to environmental influences).
>
>So, perhaps, a good strategy is to just "watch" everything and
>notice the sorts of changes you "typically" encounter in the hope
>that something of greater magnitude would be a harbinger...

Checking temperatures is good. An overload or a fan failure can be bad
news.

We put temp sensors on most products. Some parts, like ADCs and FPGAs,
have free built-in temp sensors.

I have tried various ideas to put an air flow sensor on boards, but so
far none have worked very well. We do check fan tachs to be sure they
are still spinning.

Blocking air flow generally makes fan speed *increase*.

Re: Predictive failures

<jg0r1j1r2cdlnhev0v1gaogd3fj0kmdiim@4ax.com>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136398&group=sci.electronics.design#136398

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!69.80.99.23.MISMATCH!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Mon, 15 Apr 2024 19:41:58 +0000
From: joegwinn@comcast.net (Joe Gwinn)
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Mon, 15 Apr 2024 15:41:57 -0400
Message-ID: <jg0r1j1r2cdlnhev0v1gaogd3fj0kmdiim@4ax.com>
References: <uvjn74$d54b$1@dont-email.me>
User-Agent: ForteAgent/8.00.32.1272
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 20
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-iGidheF8Nd7K1Xtk2VpZwTjrtOnpq3VRLC451p4L6+v3OehwMogCAFW1wsBfkITB8jD9kg/fYZpwyoK!hKFKn/YX4zWxDrd8sImpdyGn35oylrQGis6jQVXrCk0M0KcI4/pzSh9p1t/BOp/y5nen7TU=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
 by: Joe Gwinn - Mon, 15 Apr 2024 19:41 UTC

On Mon, 15 Apr 2024 10:13:02 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

>Is there a general rule of thumb for signalling the likelihood of
>an "imminent" (for some value of "imminent") hardware failure?
>
>I suspect most would involve *relative* changes that would be
>suggestive of changing conditions in the components (and not
>directly related to environmental influences).
>
>So, perhaps, a good strategy is to just "watch" everything and
>notice the sorts of changes you "typically" encounter in the hope
>that something of greater magnitude would be a harbinger...

There is a standard approach that may work: Measure the level and
trend of very low frequency (around a tenth of a Hertz) flicker noise.
When connections (perhaps within a package) start to fail, the flicker
level rises. The actual frequency monitored isn't all that critical.

Joe Gwinn

Re: Predictive failures

<0s1r1jhb5vfe7lvopuvfk4ndkbt54ud3d9@4ax.com>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136399&group=sci.electronics.design#136399

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!Xl.tags.giganews.com!local-1.nntp.ord.giganews.com!nntp.supernews.com!news.supernews.com.POSTED!not-for-mail
NNTP-Posting-Date: Mon, 15 Apr 2024 20:05:41 +0000
From: jl@650pot.com (john larkin)
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Mon, 15 Apr 2024 13:05:40 -0700
Message-ID: <0s1r1jhb5vfe7lvopuvfk4ndkbt54ud3d9@4ax.com>
References: <uvjn74$d54b$1@dont-email.me> <jg0r1j1r2cdlnhev0v1gaogd3fj0kmdiim@4ax.com>
User-Agent: ForteAgent/8.00.32.1272
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 33
X-Trace: sv3-ivTQYwo+Y8HDquj7TrsXh81rUZmbIy3LuZ0ytpkSg44UkKWgEE9R3tGUMuEgCGYE/vQn2gcI45XdtBH!p+2VzcN5o6mKbl19CCZGui1gBJU1dVF45h9JREIMlqsVsBMF8orEjCWGahrCjthS91ewU5j0+ilu!SFigAQ==
X-Complaints-To: www.supernews.com/docs/abuse.html
X-DMCA-Complaints-To: www.supernews.com/docs/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Received-Bytes: 2312
 by: john larkin - Mon, 15 Apr 2024 20:05 UTC

On Mon, 15 Apr 2024 15:41:57 -0400, Joe Gwinn <joegwinn@comcast.net>
wrote:

>On Mon, 15 Apr 2024 10:13:02 -0700, Don Y
><blockedofcourse@foo.invalid> wrote:
>
>>Is there a general rule of thumb for signalling the likelihood of
>>an "imminent" (for some value of "imminent") hardware failure?
>>
>>I suspect most would involve *relative* changes that would be
>>suggestive of changing conditions in the components (and not
>>directly related to environmental influences).
>>
>>So, perhaps, a good strategy is to just "watch" everything and
>>notice the sorts of changes you "typically" encounter in the hope
>>that something of greater magnitude would be a harbinger...
>
>There is a standard approach that may work: Measure the level and
>trend of very low frequency (around a tenth of a Hertz) flicker noise.
>When connections (perhaps within a package) start to fail, the flicker
>level rises. The actual frequency monitored isn't all that critical.
>
>Joe Gwinn

Do connections "start to fail" ?

I don't think I've ever owned a piece of electronic equipment that
warned me of an impending failure.

Cars do, for some failure modes, like low oil level.

Don, what does the thing do?

Re: Predictive failures

<uvk2sk$1p01$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136400&group=sci.electronics.design#136400

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: invalid@invalid.invalid (Edward Rawde)
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Mon, 15 Apr 2024 16:32:17 -0400
Organization: BWH Usenet Archive (https://usenet.blueworldhosting.com)
Lines: 60
Message-ID: <uvk2sk$1p01$1@nnrp.usenet.blueworldhosting.com>
References: <uvjn74$d54b$1@dont-email.me>
Injection-Date: Mon, 15 Apr 2024 20:32:20 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="58369"; mail-complaints-to="usenet@blueworldhosting.com"
Cancel-Lock: sha1:NjIJENoDOqnP1uNXD1LQyY6WUy8= sha256:0iDJpOaGNtmGMtfzbOQ0waHGNHeP3SRNdvim17/Snr4=
sha1:rfscBOPqMpXqDv1NfY5RfX4Z6jI= sha256:qQ0fjvDsYzqXHrLwSBi9dSk4Mivu/jEDBVfXa9Fhw0E=
X-RFC2646: Format=Flowed; Response
X-Priority: 3
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
X-Newsreader: Microsoft Outlook Express 6.00.2900.5931
X-MSMail-Priority: Normal
 by: Edward Rawde - Mon, 15 Apr 2024 20:32 UTC

"Don Y" <blockedofcourse@foo.invalid> wrote in message
news:uvjn74$d54b$1@dont-email.me...
> Is there a general rule of thumb for signalling the likelihood of
> an "imminent" (for some value of "imminent") hardware failure?

My conclusion would be no.
Some of my reasons are given below.

It always puzzled me how HAL could know that the AE-35 would fail in the
near future, but maybe HAL had a motive for lying.

Back in that era I was doing a lot of repair work when I should have been
doing my homework.
So I knew that there were many unrelated kinds of hardware failure.

A component could fail suddenly, such as a short circuit diode, and
everything would work fine after replacing it.
The cause could perhaps have been a manufacturing defect, such as
insufficient cooling due to poor quality assembly, but the exact real cause
would never be known.

A component could fail suddenly as a side effect of another failure.
One short circuit output transistor and several other components could also
burn up.

A component could fail slowly and only become apparent when it got to the
stage of causing an audible or visible effect.
It would often be easy to locate the dried up electrolytic due to it having
already let go of some of its contents.

So I concluded that if I wanted to be sure that I could always watch my
favourite TV show, we would have to have at least two TVs in the house.

If it's not possible to have the equivalent of two TVs then you will want to
be in a position to get the existing TV repaired or replaced as quicky as
possible.

My home wireless Internet system doesn't care if one access point fails, and
I would not expect to be able to do anything to predict a time of failure.
Experience says a dead unit has power supply issues. Usually external but
could be internal.

I don't think it would be possible to "watch" everything because it's rare
that you can properly test a component while it's part of a working system.

These days I would expect to have fun with management asking for software to
be able to diagnose and report any hardware failure.
Not very easy if the power supply has died.

>
> I suspect most would involve *relative* changes that would be
> suggestive of changing conditions in the components (and not
> directly related to environmental influences).
>
> So, perhaps, a good strategy is to just "watch" everything and
> notice the sorts of changes you "typically" encounter in the hope
> that something of greater magnitude would be a harbinger...
>

Re: Predictive failures

<rh7r1jhtvqivb43vmt3u9d0snah8fu4pjn@4ax.com>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136401&group=sci.electronics.design#136401

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!69.80.99.22.MISMATCH!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Mon, 15 Apr 2024 22:08:12 +0000
From: joegwinn@comcast.net (Joe Gwinn)
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Mon, 15 Apr 2024 18:03:23 -0400
Message-ID: <rh7r1jhtvqivb43vmt3u9d0snah8fu4pjn@4ax.com>
References: <uvjn74$d54b$1@dont-email.me> <jg0r1j1r2cdlnhev0v1gaogd3fj0kmdiim@4ax.com> <0s1r1jhb5vfe7lvopuvfk4ndkbt54ud3d9@4ax.com>
User-Agent: ForteAgent/8.00.32.1272
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 64
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-RR5VKbpbjMnshXszVhbh9pmP8pasD506mSeib+AC6M+KA2rAhsGMjX3FwG8RX2DRwU1gK0KMbSSMl6j!7eR8yN/2c77XDBVeCX5VGAN/swyawsRaO1f+bj8gXpu37XLfCncXUYIes6JLXq6EOUAu5/k=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
 by: Joe Gwinn - Mon, 15 Apr 2024 22:03 UTC

On Mon, 15 Apr 2024 13:05:40 -0700, john larkin <jl@650pot.com> wrote:

>On Mon, 15 Apr 2024 15:41:57 -0400, Joe Gwinn <joegwinn@comcast.net>
>wrote:
>
>>On Mon, 15 Apr 2024 10:13:02 -0700, Don Y
>><blockedofcourse@foo.invalid> wrote:
>>
>>>Is there a general rule of thumb for signalling the likelihood of
>>>an "imminent" (for some value of "imminent") hardware failure?
>>>
>>>I suspect most would involve *relative* changes that would be
>>>suggestive of changing conditions in the components (and not
>>>directly related to environmental influences).
>>>
>>>So, perhaps, a good strategy is to just "watch" everything and
>>>notice the sorts of changes you "typically" encounter in the hope
>>>that something of greater magnitude would be a harbinger...
>>
>>There is a standard approach that may work: Measure the level and
>>trend of very low frequency (around a tenth of a Hertz) flicker noise.
>>When connections (perhaps within a package) start to fail, the flicker
>>level rises. The actual frequency monitored isn't all that critical.
>>
>>Joe Gwinn
>
>Do connections "start to fail" ?

Yes, they do, in things like vias. I went through a big drama where a
critical bit of radar logic circuitry would slowly go nuts.

It turned out that the copper plating on the walls of the vias was
suffering from low-cycle fatigue during temperature cycling and slowly
breaking, one little crack at a time, until it went open. If you
measured the resistance to parts per million (6.5 digit DMM), sampling
at 1 Hz, you could see the 1/f noise at 0.1 Hz rising. It's useful to
also measure a copper line, and divide the via-chain resistance by the
no-via resistance, to correct for temperature changes.

The solution was to redesign the vias, mainly to increase the critical
volume of copper. And modern SMD designs have less and less copper
volume.

I bet precision resistors can also be measured this way.

>I don't think I've ever owned a piece of electronic equipment that
>warned me of an impending failure.

Onset of smoke emission is a common sign.

>Cars do, for some failure modes, like low oil level.

The industrial method for big stuff is accelerometers attached near
the bearings, and listen for excessive rotation-correlated (not
necessarily harmonic) noise.

>Don, what does the thing do?

Good question.

Joe Gwinn

Re: Predictive failures

<pbdr1j11kj8sdfrtu4erc8c67s1g8dos9m@4ax.com>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136402&group=sci.electronics.design#136402

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!69.80.99.23.MISMATCH!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!nntp.supernews.com!news.supernews.com.POSTED!not-for-mail
NNTP-Posting-Date: Mon, 15 Apr 2024 23:26:36 +0000
From: jl@650pot.com (john larkin)
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Mon, 15 Apr 2024 16:26:35 -0700
Message-ID: <pbdr1j11kj8sdfrtu4erc8c67s1g8dos9m@4ax.com>
References: <uvjn74$d54b$1@dont-email.me> <jg0r1j1r2cdlnhev0v1gaogd3fj0kmdiim@4ax.com> <0s1r1jhb5vfe7lvopuvfk4ndkbt54ud3d9@4ax.com> <rh7r1jhtvqivb43vmt3u9d0snah8fu4pjn@4ax.com>
User-Agent: ForteAgent/8.00.32.1272
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 81
X-Trace: sv3-qw79UbpP5kauHZ+cPvuDMijERrZhHWfU3QBA1SmJgMi2alsRJNpReV3dTXBl8ZYwRr5MMVST5RKImZi!pU6fiNZAsBosDCaFbeGOrs6tNHyudA+tRaNxo3zda5oDAkNXYd6UZfh0tCiEp9tPfGyvGEDgQJy4!Fiv+Xg==
X-Complaints-To: www.supernews.com/docs/abuse.html
X-DMCA-Complaints-To: www.supernews.com/docs/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
 by: john larkin - Mon, 15 Apr 2024 23:26 UTC

On Mon, 15 Apr 2024 18:03:23 -0400, Joe Gwinn <joegwinn@comcast.net>
wrote:

>On Mon, 15 Apr 2024 13:05:40 -0700, john larkin <jl@650pot.com> wrote:
>
>>On Mon, 15 Apr 2024 15:41:57 -0400, Joe Gwinn <joegwinn@comcast.net>
>>wrote:
>>
>>>On Mon, 15 Apr 2024 10:13:02 -0700, Don Y
>>><blockedofcourse@foo.invalid> wrote:
>>>
>>>>Is there a general rule of thumb for signalling the likelihood of
>>>>an "imminent" (for some value of "imminent") hardware failure?
>>>>
>>>>I suspect most would involve *relative* changes that would be
>>>>suggestive of changing conditions in the components (and not
>>>>directly related to environmental influences).
>>>>
>>>>So, perhaps, a good strategy is to just "watch" everything and
>>>>notice the sorts of changes you "typically" encounter in the hope
>>>>that something of greater magnitude would be a harbinger...
>>>
>>>There is a standard approach that may work: Measure the level and
>>>trend of very low frequency (around a tenth of a Hertz) flicker noise.
>>>When connections (perhaps within a package) start to fail, the flicker
>>>level rises. The actual frequency monitored isn't all that critical.
>>>
>>>Joe Gwinn
>>
>>Do connections "start to fail" ?
>
>Yes, they do, in things like vias. I went through a big drama where a
>critical bit of radar logic circuitry would slowly go nuts.
>
>It turned out that the copper plating on the walls of the vias was
>suffering from low-cycle fatigue during temperature cycling and slowly
>breaking, one little crack at a time, until it went open. If you
>measured the resistance to parts per million (6.5 digit DMM), sampling
>at 1 Hz, you could see the 1/f noise at 0.1 Hz rising. It's useful to
>also measure a copper line, and divide the via-chain resistance by the
>no-via resistance, to correct for temperature changes.

But nobody is going to monitor every via on a PCB, even if it were
possible.

One could instrument a PCB fab test board, I guess. But DC tests would
be fine.

We have one board with over 4000 vias, but they are mostly in
parallel.

>
>The solution was to redesign the vias, mainly to increase the critical
>volume of copper. And modern SMD designs have less and less copper
>volume.
>
>I bet precision resistors can also be measured this way.
>
>
>>I don't think I've ever owned a piece of electronic equipment that
>>warned me of an impending failure.
>
>Onset of smoke emission is a common sign.
>
>
>>Cars do, for some failure modes, like low oil level.
>
>The industrial method for big stuff is accelerometers attached near
>the bearings, and listen for excessive rotation-correlated (not
>necessarily harmonic) noise.

Big ships that I've worked on have a long propeller shaft in the shaft
alley, a long tunnel where nobody often goes. They have magnetic shaft
runout sensors and shaft bearing temperature monitors.

They measure shaft torque and SHP too, from the shaft twist.

I liked hiding out in the shaft alley. It was private and cool, that
giant shaft slowly rotating.

Re: Predictive failures

<uvki1n$iqfd$1@dont-email.me>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136403&group=sci.electronics.design#136403

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: pcdhSpamMeSenseless@electrooptical.net (Phil Hobbs)
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Tue, 16 Apr 2024 00:51:03 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 72
Message-ID: <uvki1n$iqfd$1@dont-email.me>
References: <uvjn74$d54b$1@dont-email.me>
<jg0r1j1r2cdlnhev0v1gaogd3fj0kmdiim@4ax.com>
<0s1r1jhb5vfe7lvopuvfk4ndkbt54ud3d9@4ax.com>
<rh7r1jhtvqivb43vmt3u9d0snah8fu4pjn@4ax.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 16 Apr 2024 02:51:03 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="bab2b74387cfd4690227be17d5df4ff6";
logging-data="616941"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+NXJcXU6o4li//pHxuDkaI"
User-Agent: NewsTap/5.5 (iPhone/iPod Touch)
Cancel-Lock: sha1:jWPP/inH3V7VLl/C8GOVkSHPgGA=
sha1:p9DdnAlo4Ms0o6/V2ibCXg5ym2U=
 by: Phil Hobbs - Tue, 16 Apr 2024 00:51 UTC

Joe Gwinn <joegwinn@comcast.net> wrote:
> On Mon, 15 Apr 2024 13:05:40 -0700, john larkin <jl@650pot.com> wrote:
>
>> On Mon, 15 Apr 2024 15:41:57 -0400, Joe Gwinn <joegwinn@comcast.net>
>> wrote:
>>
>>> On Mon, 15 Apr 2024 10:13:02 -0700, Don Y
>>> <blockedofcourse@foo.invalid> wrote:
>>>
>>>> Is there a general rule of thumb for signalling the likelihood of
>>>> an "imminent" (for some value of "imminent") hardware failure?
>>>>
>>>> I suspect most would involve *relative* changes that would be
>>>> suggestive of changing conditions in the components (and not
>>>> directly related to environmental influences).
>>>>
>>>> So, perhaps, a good strategy is to just "watch" everything and
>>>> notice the sorts of changes you "typically" encounter in the hope
>>>> that something of greater magnitude would be a harbinger...
>>>
>>> There is a standard approach that may work: Measure the level and
>>> trend of very low frequency (around a tenth of a Hertz) flicker noise.
>>> When connections (perhaps within a package) start to fail, the flicker
>>> level rises. The actual frequency monitored isn't all that critical.
>>>
>>> Joe Gwinn
>>
>> Do connections "start to fail" ?
>
> Yes, they do, in things like vias. I went through a big drama where a
> critical bit of radar logic circuitry would slowly go nuts.
>
> It turned out that the copper plating on the walls of the vias was
> suffering from low-cycle fatigue during temperature cycling and slowly
> breaking, one little crack at a time, until it went open. If you
> measured the resistance to parts per million (6.5 digit DMM), sampling
> at 1 Hz, you could see the 1/f noise at 0.1 Hz rising. It's useful to
> also measure a copper line, and divide the via-chain resistance by the
> no-via resistance, to correct for temperature changes.
>
> The solution was to redesign the vias, mainly to increase the critical
> volume of copper. And modern SMD designs have less and less copper
> volume.
>
> I bet precision resistors can also be measured this way.
>
>
>> I don't think I've ever owned a piece of electronic equipment that
>> warned me of an impending failure.
>
> Onset of smoke emission is a common sign.
>
>
>> Cars do, for some failure modes, like low oil level.
>
> The industrial method for big stuff is accelerometers attached near
> the bearings, and listen for excessive rotation-correlated (not
> necessarily harmonic) noise.

There are a number of instruments available that look for metal particles
in the lubricating oil.

Cheers

Phil Hobbs
>
>

--
Dr Philip C D Hobbs Principal Consultant ElectroOptical Innovations LLC /
Hobbs ElectroOptics Optics, Electro-optics, Photonics, Analog Electronics

Re: Re:Predictive failures

<uvkn71$ngqi$2@dont-email.me>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136404&group=sci.electronics.design#136404

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: blockedofcourse@foo.invalid (Don Y)
Newsgroups: sci.electronics.design
Subject: Re: Re:Predictive failures
Date: Mon, 15 Apr 2024 19:19:06 -0700
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <uvkn71$ngqi$2@dont-email.me>
References: <uvjn74$d54b$1@dont-email.me> <uvjobr$dfi2$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 16 Apr 2024 04:19:14 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="8f873457a009428ae193cacdeebfb978";
logging-data="770898"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18W3IJ/2ORji0YIiPybslms"
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.2.2
Cancel-Lock: sha1:0wiye+BNTthmxUy/cmntMs3YdGs=
In-Reply-To: <uvjobr$dfi2$1@dont-email.me>
Content-Language: en-US
 by: Don Y - Tue, 16 Apr 2024 02:19 UTC

On 4/15/2024 10:32 AM, Martin Rid wrote:
> Don Y <blockedofcourse@foo.invalid> Wrote in message:r
>> Is there a general rule of thumb for signalling the likelihood ofan "imminent" (for some value of "imminent") hardware failure?I suspect most would involve *relative* changes that would besuggestive of changing conditions in the components (and notdirectly related to environmental influences).So, perhaps, a good strategy is to just "watch" everything andnotice the sorts of changes you "typically" encounter in the hopethat something of greater magnitude would be a harbinger...
>
> Current and voltages outside of normal operation?

I think "outside" is (often) likely indicative of
"something is (already) broken".

But, perhaps TRENDS in either/both can be predictive.

E.g., if a (sub)circuit has always been consuming X (which
is nominal for the design) and, over time, starts to consume
1.1X, is that suggestive that something is in the process of
failing?

Note that the goal is not to troubleshoot the particular design
or its components but, rather, act as an early warning that
maintenance may be required (or, that performance may not be
what you are expecting/have become accustomed to).

You can include mechanisms to verify outputs are what you
*intended* them to be (in case the output drivers have shit
the bed).

You can, also, do sanity checks that ensure values are never
what they SHOULDN'T be (this is commonly done within software
products -- if something "can't happen" then noticing that
it IS happening is a sure-fire indication that something
is broken!)

[Limit switches on mechanisms are there to ensure the impossible
is not possible -- like driving a mechanism beyond its extents]

And, where possible, notice second-hand effects of your actions
(e.g., if you switched on a load, you should see an increase
in supplied current).

But, again, these are more helpful in detecting FAILED items.

Re: Predictive failures

<2jnr1jt588m6ph7hps8qm5gbc6f457grb4@4ax.com>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136405&group=sci.electronics.design#136405

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!Xl.tags.giganews.com!local-1.nntp.ord.giganews.com!nntp.supernews.com!news.supernews.com.POSTED!not-for-mail
NNTP-Posting-Date: Tue, 16 Apr 2024 02:19:20 +0000
From: jjSNIPlarkin@highNONOlandtechnology.com (John Larkin)
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Mon, 15 Apr 2024 19:17:39 -0700
Organization: Highland Tech
Reply-To: xx@yy.com
Message-ID: <2jnr1jt588m6ph7hps8qm5gbc6f457grb4@4ax.com>
References: <uvjn74$d54b$1@dont-email.me> <jg0r1j1r2cdlnhev0v1gaogd3fj0kmdiim@4ax.com> <0s1r1jhb5vfe7lvopuvfk4ndkbt54ud3d9@4ax.com> <rh7r1jhtvqivb43vmt3u9d0snah8fu4pjn@4ax.com> <uvki1n$iqfd$1@dont-email.me>
X-Newsreader: Forte Agent 3.1/32.783
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 83
X-Trace: sv3-sJOlzM0n1n4GjfJX4ednQTuQg5FInFHkgmtlFgvt+K7c6IPXZ99ZXMpCAwDwkCNV/rpBZX52wbPAVNQ!FMVCVxe2GgZ4a7o/kK9EHyEmt3O/erKxBA2i0H4mlflXAy3FXnHk3ZtaatmJuD64LSWIqLr2TK95!Faz97A==
X-Complaints-To: www.supernews.com/docs/abuse.html
X-DMCA-Complaints-To: www.supernews.com/docs/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Received-Bytes: 4472
 by: John Larkin - Tue, 16 Apr 2024 02:17 UTC

On Tue, 16 Apr 2024 00:51:03 -0000 (UTC), Phil Hobbs
<pcdhSpamMeSenseless@electrooptical.net> wrote:

>Joe Gwinn <joegwinn@comcast.net> wrote:
>> On Mon, 15 Apr 2024 13:05:40 -0700, john larkin <jl@650pot.com> wrote:
>>
>>> On Mon, 15 Apr 2024 15:41:57 -0400, Joe Gwinn <joegwinn@comcast.net>
>>> wrote:
>>>
>>>> On Mon, 15 Apr 2024 10:13:02 -0700, Don Y
>>>> <blockedofcourse@foo.invalid> wrote:
>>>>
>>>>> Is there a general rule of thumb for signalling the likelihood of
>>>>> an "imminent" (for some value of "imminent") hardware failure?
>>>>>
>>>>> I suspect most would involve *relative* changes that would be
>>>>> suggestive of changing conditions in the components (and not
>>>>> directly related to environmental influences).
>>>>>
>>>>> So, perhaps, a good strategy is to just "watch" everything and
>>>>> notice the sorts of changes you "typically" encounter in the hope
>>>>> that something of greater magnitude would be a harbinger...
>>>>
>>>> There is a standard approach that may work: Measure the level and
>>>> trend of very low frequency (around a tenth of a Hertz) flicker noise.
>>>> When connections (perhaps within a package) start to fail, the flicker
>>>> level rises. The actual frequency monitored isn't all that critical.
>>>>
>>>> Joe Gwinn
>>>
>>> Do connections "start to fail" ?
>>
>> Yes, they do, in things like vias. I went through a big drama where a
>> critical bit of radar logic circuitry would slowly go nuts.
>>
>> It turned out that the copper plating on the walls of the vias was
>> suffering from low-cycle fatigue during temperature cycling and slowly
>> breaking, one little crack at a time, until it went open. If you
>> measured the resistance to parts per million (6.5 digit DMM), sampling
>> at 1 Hz, you could see the 1/f noise at 0.1 Hz rising. It's useful to
>> also measure a copper line, and divide the via-chain resistance by the
>> no-via resistance, to correct for temperature changes.
>>
>> The solution was to redesign the vias, mainly to increase the critical
>> volume of copper. And modern SMD designs have less and less copper
>> volume.
>>
>> I bet precision resistors can also be measured this way.
>>
>>
>>> I don't think I've ever owned a piece of electronic equipment that
>>> warned me of an impending failure.
>>
>> Onset of smoke emission is a common sign.
>>
>>
>>> Cars do, for some failure modes, like low oil level.
>>
>> The industrial method for big stuff is accelerometers attached near
>> the bearings, and listen for excessive rotation-correlated (not
>> necessarily harmonic) noise.
>
>There are a number of instruments available that look for metal particles
>in the lubricating oil.
>
>Cheers
>
>Phil Hobbs
>>
>>

And water. Some of our capacitor simulators include a parallel
resistance component.

One customer used to glue bits of metal onto a string and pull it
through the magnetic sensor. We did a simulator for that too.

Jet engines have magnetic eddy-current blade-tip sensors. For
efficiency, they want a tiny clearance between fan blades and the
casing, but not too tiny.

Re: Predictive failures

<uvkqqu$o5co$1@dont-email.me>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136406&group=sci.electronics.design#136406

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: blockedofcourse@foo.invalid (Don Y)
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Mon, 15 Apr 2024 20:20:55 -0700
Organization: A noiseless patient Spider
Lines: 177
Message-ID: <uvkqqu$o5co$1@dont-email.me>
References: <uvjn74$d54b$1@dont-email.me>
<uvk2sk$1p01$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 16 Apr 2024 05:21:03 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="8f873457a009428ae193cacdeebfb978";
logging-data="791960"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/8bOLFwF2hv9SY5Bf//rhu"
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.2.2
Cancel-Lock: sha1:o3Ublxzjbw8cxLWWj2PtneomU4M=
In-Reply-To: <uvk2sk$1p01$1@nnrp.usenet.blueworldhosting.com>
Content-Language: en-US
 by: Don Y - Tue, 16 Apr 2024 03:20 UTC

On 4/15/2024 1:32 PM, Edward Rawde wrote:
> "Don Y" <blockedofcourse@foo.invalid> wrote in message
> news:uvjn74$d54b$1@dont-email.me...
>> Is there a general rule of thumb for signalling the likelihood of
>> an "imminent" (for some value of "imminent") hardware failure?
>
> My conclusion would be no.
> Some of my reasons are given below.
>
> It always puzzled me how HAL could know that the AE-35 would fail in the
> near future, but maybe HAL had a motive for lying.

Why does your PC retry failed disk operations? If I ask the drive to give
me LBA 1234, shouldn't it ALWAYS give me LBA1234? Without any data corruption
(CRC error) AND within the normal access time limits defined by the location
of those magnetic domains on the rotating medium?

Why should it attempt to retry this MORE than once?

Now, if you knew your disk drive was repeatedly retrying operations,
would your confidence in it be unchanged from times when it did not
exhibit such behavior?

Assuming you have properly configured a EIA232 interface, why would you
ever get a parity error? (OVERRUN errors can be the result of an i/f
that is running too fast for the system on the receiving end) How would
you even KNOW this was happening?

I suspect everyone who has owned a DVD/CD drive has encountered a
"slow tray" as the mechanism aged. Or, a tray that wouldn't
open (of its own accord) as soon/quickly as it used to.

The controller COULD be watching this (cuz it knows when it
initiated the operation and there is an "end-of-stroke"
sensor available) and KNOW that the drive belt was stretching
to the point where it was impacting operation.

[And, that a stretched belt wasn't going to suddenly decide to
unstretch to fix the problem!]

> Back in that era I was doing a lot of repair work when I should have been
> doing my homework.
> So I knew that there were many unrelated kinds of hardware failure.

The goal isn't to predict ALL failures but, rather, to anticipate
LIKELY failures and treat them before they become an inconvenience
(or worse).

One morning, the (gas) furnace repeatedly tried to light as the
thermostat called for heat. Then, a few moments later, the
safeties would kick in and shut down the gas flow. This attracted my
attention as the LIT furnace should STAY LIT!

The furnace was too stupid to notice its behavior so would repeat
this cycle, endlessly.

I stepped in and overrode the thermostat to eliminate the call
for heat as this behavior couldn't be productive (if something
truly IS wrong, then why let it continue? and, if there is nothing
wrong with the controls/mechanism, then clearly it is unable to meet
my needs so why let it persist in trying?)

[Turns out, there was a city-wide gas shortage so there was enough
gas available to light the furnace but not enough to bring it up to
temperature as quickly as the designers had expected]

> A component could fail suddenly, such as a short circuit diode, and
> everything would work fine after replacing it.
> The cause could perhaps have been a manufacturing defect, such as
> insufficient cooling due to poor quality assembly, but the exact real cause
> would never be known.

You don't care about the real cause. Or, even the failure mode.
You (as user) just don't want to be inconvenienced by the sudden
loss of the functionality/convenience that the the device provided.

> A component could fail suddenly as a side effect of another failure.
> One short circuit output transistor and several other components could also
> burn up.

So, if you could predict the OTHER failure...
Or, that such a failure might occur and lead to the followup failure...

> A component could fail slowly and only become apparent when it got to the
> stage of causing an audible or visible effect.

But, likely, there was something observable *in* the circuit that
just hadn't made it to the level of human perception.

> It would often be easy to locate the dried up electrolytic due to it having
> already let go of some of its contents.
>
> So I concluded that if I wanted to be sure that I could always watch my
> favourite TV show, we would have to have at least two TVs in the house.
>
> If it's not possible to have the equivalent of two TVs then you will want to
> be in a position to get the existing TV repaired or replaced as quicky as
> possible.

Two TVs are affordable. Consider two controllers for a wire-EDM machine.

Or, the cost of having that wire-EDM machine *idle* (because you didn't
have a spare controller!)

> My home wireless Internet system doesn't care if one access point fails, and
> I would not expect to be able to do anything to predict a time of failure.
> Experience says a dead unit has power supply issues. Usually external but
> could be internal.

Again, the goal isn't to predict "time of failure". But, rather, to be
able to know that "this isn't going to end well" -- with some advance notice
that allows for preemptive action to be taken (and not TOO much advance
notice that the user ends up replacing items prematurely).

> I don't think it would be possible to "watch" everything because it's rare
> that you can properly test a component while it's part of a working system.

You don't have to -- as long as you can observe its effects on other
parts of the system. E.g., there's no easy/inexpensive way to
check to see how much the belt on that CD/DVD player has stretched.
But, you can notice that it HAS stretched (or, some less likely
change has occurred that similarly interferes with the tray's actions)
by noting how the activity that it is used for has changed.

> These days I would expect to have fun with management asking for software to
> be able to diagnose and report any hardware failure.
> Not very easy if the power supply has died.

What if the power supply HASN'T died? What if you are diagnosing the
likely upcoming failure *of* the power supply?

You have ECC memory in most (larger) machines. Do you silently
expect it to just fix all the errors? Does it have a way of telling you
how many such errors it HAS corrected? Can you infer the number of
errors that it *hasn't*?

[Why have ECC at all?]

There are (and have been) many efforts to *predict* lifetimes of
components (and, systems). And, some work to examine the state
of systems /in situ/ with an eye towards anticipating their
likelihood of future failure.

[The former has met with poor results -- predicting the future
without a position in its past is difficult. And, knowing how
a device is "stored" when not powered on also plays a role
in its future survival! (is there some reason YOUR devices
can't power themselves on, periodically; notice the environmental
conditions; log them and then power back off)]

The question is one of a practical nature; how much does it cost
you to add this capability to a device and how accurately can it
make those predictions (thus avoiding some future cost/inconvenience).

For small manufacturers, the research required is likely not cost-effective;
just take your best stab at it and let the customer "buy a replacement"
when the time comes (hopefully, outside of your warranty window).

But, anything you can do to minimize this TCO issue gives your product
an edge over competitors. Given that most devices are smart, nowadays,
it seems obvious that they should undertake as much of this task as
they can (conveniently) afford.

<https://www.sciencedirect.com/science/article/abs/pii/S0026271409003667>

<https://www.researchgate.net/publication/3430090_In_Situ_Temperature_Measurement_of_a_Notebook_Computer-A_Case_Study_in_Health_and_Usage_Monitoring_of_Electronics>

<https://www.tandfonline.com/doi/abs/10.1080/16843703.2007.11673148>

<https://www.prognostics.umd.edu/calcepapers/02_V.Shetty_remaingLifeAssesShuttleRemotemanipulatorSystem_22ndSpaceSimulationConf.pdf>

<https://ieeexplore.ieee.org/document/1656125>

<https://journals.sagepub.com/doi/10.1177/0142331208092031>

[Sorry, I can't publish links to the full articles]

Re: Re:Predictive failures

<uvkrig$30nb$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136407&group=sci.electronics.design#136407

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: invalid@invalid.invalid (Edward Rawde)
Newsgroups: sci.electronics.design
Subject: Re: Re:Predictive failures
Date: Mon, 15 Apr 2024 23:33:34 -0400
Organization: BWH Usenet Archive (https://usenet.blueworldhosting.com)
Lines: 75
Message-ID: <uvkrig$30nb$1@nnrp.usenet.blueworldhosting.com>
References: <uvjn74$d54b$1@dont-email.me> <uvjobr$dfi2$1@dont-email.me> <uvkn71$ngqi$2@dont-email.me>
Injection-Date: Tue, 16 Apr 2024 03:33:36 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="99051"; mail-complaints-to="usenet@blueworldhosting.com"
Cancel-Lock: sha1:Ss5b+WM0NCt859rELhS4c3bd0jo= sha256:RzJDaBmJ8YopT52wLREFEbUoGXyJaf/YMlgrc9igOYs=
sha1:wDPazqV49/VM6gCu9X6UJllv1Bc= sha256:O/QqDIV2uPO45pec7hGJBQmWvJJWy8vgedUGNz4dtbk=
X-RFC2646: Format=Flowed; Response
X-Newsreader: Microsoft Outlook Express 6.00.2900.5931
X-MSMail-Priority: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
X-Priority: 3
 by: Edward Rawde - Tue, 16 Apr 2024 03:33 UTC

"Don Y" <blockedofcourse@foo.invalid> wrote in message
news:uvkn71$ngqi$2@dont-email.me...
> On 4/15/2024 10:32 AM, Martin Rid wrote:
>> Don Y <blockedofcourse@foo.invalid> Wrote in message:r
>>> Is there a general rule of thumb for signalling the likelihood ofan
>>> "imminent" (for some value of "imminent") hardware failure?I suspect
>>> most would involve *relative* changes that would besuggestive of
>>> changing conditions in the components (and notdirectly related to
>>> environmental influences).So, perhaps, a good strategy is to just
>>> "watch" everything andnotice the sorts of changes you "typically"
>>> encounter in the hopethat something of greater magnitude would be a
>>> harbinger...
>>
>> Current and voltages outside of normal operation?
>
> I think "outside" is (often) likely indicative of
> "something is (already) broken".
>
> But, perhaps TRENDS in either/both can be predictive.
>
> E.g., if a (sub)circuit has always been consuming X (which
> is nominal for the design) and, over time, starts to consume
> 1.1X, is that suggestive that something is in the process of
> failing?

That depends on many other unknown factors.
Temperature sensors are common in electronics.
So is current sensing. Voltage sensing too.

>
> Note that the goal is not to troubleshoot the particular design
> or its components but, rather, act as an early warning that
> maintenance may be required (or, that performance may not be
> what you are expecting/have become accustomed to).

If the system is electronic then you can detect whether currents and/or
votages are within expected ranges.
If they are a just a little out of expected range then you might turn on a
warning LED.
If they are way out of range then you might tell the power supply to turn
off quick.
By all means tell the software what has happened, but don't put software
between the current sensor and the emergency turn off.
Be aware that components in monitoring circuits can fail too.

>
> You can include mechanisms to verify outputs are what you
> *intended* them to be (in case the output drivers have shit
> the bed).
>
> You can, also, do sanity checks that ensure values are never
> what they SHOULDN'T be (this is commonly done within software
> products -- if something "can't happen" then noticing that
> it IS happening is a sure-fire indication that something
> is broken!)
>
> [Limit switches on mechanisms are there to ensure the impossible
> is not possible -- like driving a mechanism beyond its extents]
>
> And, where possible, notice second-hand effects of your actions
> (e.g., if you switched on a load, you should see an increase
> in supplied current).
>
> But, again, these are more helpful in detecting FAILED items.

What system would you like to have early warnings for?
Are the warnings needed to indicate operation out of expected limits or to
indicate that maintenance is required, or both?
Without detailed knowledge of the specific sytem, only speculative answers
can be given.

>
>

Re: Predictive failures

<uvktun$2kjj$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136408&group=sci.electronics.design#136408

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: invalid@invalid.invalid (Edward Rawde)
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Tue, 16 Apr 2024 00:14:13 -0400
Organization: BWH Usenet Archive (https://usenet.blueworldhosting.com)
Lines: 247
Message-ID: <uvktun$2kjj$1@nnrp.usenet.blueworldhosting.com>
References: <uvjn74$d54b$1@dont-email.me> <uvk2sk$1p01$1@nnrp.usenet.blueworldhosting.com> <uvkqqu$o5co$1@dont-email.me>
Injection-Date: Tue, 16 Apr 2024 04:14:15 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="86643"; mail-complaints-to="usenet@blueworldhosting.com"
Cancel-Lock: sha1:DI85A2uKP7OK2Ks//uDXOgXGqJ8= sha256:8IOnhv+PUTACyWcZgrk0/8VTn7hPfizXaroOnBYb80w=
sha1:4yw2yxZVv7xNAqRYzPJpvEko3Yo= sha256:9aw9cdGCz+cBdfhqJgDfGOW+XsVk140nmisHWvWCK+Q=
X-Newsreader: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
X-MSMail-Priority: Normal
X-Priority: 3
X-RFC2646: Format=Flowed; Response
 by: Edward Rawde - Tue, 16 Apr 2024 04:14 UTC

"Don Y" <blockedofcourse@foo.invalid> wrote in message
news:uvkqqu$o5co$1@dont-email.me...
> On 4/15/2024 1:32 PM, Edward Rawde wrote:
>> "Don Y" <blockedofcourse@foo.invalid> wrote in message
>> news:uvjn74$d54b$1@dont-email.me...
>>> Is there a general rule of thumb for signalling the likelihood of
>>> an "imminent" (for some value of "imminent") hardware failure?
>>
>> My conclusion would be no.
>> Some of my reasons are given below.
>>
>> It always puzzled me how HAL could know that the AE-35 would fail in the
>> near future, but maybe HAL had a motive for lying.
>
> Why does your PC retry failed disk operations?

Because the software designer didn't understand hardware.
The correct approach is to mark that part of the disk as unusable and, if
possible, move any data from it elsewhere quick.

> If I ask the drive to give
> me LBA 1234, shouldn't it ALWAYS give me LBA1234? Without any data
> corruption
> (CRC error) AND within the normal access time limits defined by the
> location
> of those magnetic domains on the rotating medium?
>
> Why should it attempt to retry this MORE than once?
>
> Now, if you knew your disk drive was repeatedly retrying operations,
> would your confidence in it be unchanged from times when it did not
> exhibit such behavior?

I'd have put an SSD in by now, along with an off site backup of the same
data :)

>
> Assuming you have properly configured a EIA232 interface, why would you
> ever get a parity error? (OVERRUN errors can be the result of an i/f
> that is running too fast for the system on the receiving end) How would
> you even KNOW this was happening?
>
> I suspect everyone who has owned a DVD/CD drive has encountered a
> "slow tray" as the mechanism aged. Or, a tray that wouldn't
> open (of its own accord) as soon/quickly as it used to.

If it hasn't been used for some time then I'm ready with a tiny screwdriver
blade to help it open.
But I forget when I last used an optical drive.

>
> The controller COULD be watching this (cuz it knows when it
> initiated the operation and there is an "end-of-stroke"
> sensor available) and KNOW that the drive belt was stretching
> to the point where it was impacting operation.
>
> [And, that a stretched belt wasn't going to suddenly decide to
> unstretch to fix the problem!]
>
>> Back in that era I was doing a lot of repair work when I should have been
>> doing my homework.
>> So I knew that there were many unrelated kinds of hardware failure.
>
> The goal isn't to predict ALL failures but, rather, to anticipate
> LIKELY failures and treat them before they become an inconvenience
> (or worse).
>
> One morning, the (gas) furnace repeatedly tried to light as the
> thermostat called for heat. Then, a few moments later, the
> safeties would kick in and shut down the gas flow. This attracted my
> attention as the LIT furnace should STAY LIT!
>
> The furnace was too stupid to notice its behavior so would repeat
> this cycle, endlessly.
>
> I stepped in and overrode the thermostat to eliminate the call
> for heat as this behavior couldn't be productive (if something
> truly IS wrong, then why let it continue? and, if there is nothing
> wrong with the controls/mechanism, then clearly it is unable to meet
> my needs so why let it persist in trying?)
>
> [Turns out, there was a city-wide gas shortage so there was enough
> gas available to light the furnace but not enough to bring it up to
> temperature as quickly as the designers had expected]

That's why the furnace designers couldn't have anticipated it.
They did not know that such a contition might occur so never tested for it.

>
>> A component could fail suddenly, such as a short circuit diode, and
>> everything would work fine after replacing it.
>> The cause could perhaps have been a manufacturing defect, such as
>> insufficient cooling due to poor quality assembly, but the exact real
>> cause
>> would never be known.
>
> You don't care about the real cause. Or, even the failure mode.
> You (as user) just don't want to be inconvenienced by the sudden
> loss of the functionality/convenience that the the device provided.

There will always be sudden unexpected loss of functionality for reasons
which could not easily be predicted.
People who service lawn mowers in the area where I live are very busy right
now.

>
>> A component could fail suddenly as a side effect of another failure.
>> One short circuit output transistor and several other components could
>> also
>> burn up.
>
> So, if you could predict the OTHER failure...
> Or, that such a failure might occur and lead to the followup failure...
>
>> A component could fail slowly and only become apparent when it got to the
>> stage of causing an audible or visible effect.
>
> But, likely, there was something observable *in* the circuit that
> just hadn't made it to the level of human perception.

Yes a power supply ripple detection circuit could have turned on a warning
LED but that never happened for at least two reasons.
1. The detection circuit would have increased the cost of the equipment and
thus diminished the profit of the manufacturer.
2. The user would not have understood and would have ignored the warning
anyway.

>
>> It would often be easy to locate the dried up electrolytic due to it
>> having
>> already let go of some of its contents.
>>
>> So I concluded that if I wanted to be sure that I could always watch my
>> favourite TV show, we would have to have at least two TVs in the house.
>>
>> If it's not possible to have the equivalent of two TVs then you will want
>> to
>> be in a position to get the existing TV repaired or replaced as quicky as
>> possible.
>
> Two TVs are affordable. Consider two controllers for a wire-EDM machine.
>
> Or, the cost of having that wire-EDM machine *idle* (because you didn't
> have a spare controller!)
>
>> My home wireless Internet system doesn't care if one access point fails,
>> and
>> I would not expect to be able to do anything to predict a time of
>> failure.
>> Experience says a dead unit has power supply issues. Usually external but
>> could be internal.
>
> Again, the goal isn't to predict "time of failure". But, rather, to be
> able to know that "this isn't going to end well" -- with some advance
> notice
> that allows for preemptive action to be taken (and not TOO much advance
> notice that the user ends up replacing items prematurely).

Get feedback from the people who use your equpment.

>
>> I don't think it would be possible to "watch" everything because it's
>> rare
>> that you can properly test a component while it's part of a working
>> system.
>
> You don't have to -- as long as you can observe its effects on other
> parts of the system. E.g., there's no easy/inexpensive way to
> check to see how much the belt on that CD/DVD player has stretched.
> But, you can notice that it HAS stretched (or, some less likely
> change has occurred that similarly interferes with the tray's actions)
> by noting how the activity that it is used for has changed.

Sure but you have to be the operator for that.
So you can be ready to help the tray open when needed.

>
>> These days I would expect to have fun with management asking for software
>> to
>> be able to diagnose and report any hardware failure.
>> Not very easy if the power supply has died.
>
> What if the power supply HASN'T died? What if you are diagnosing the
> likely upcoming failure *of* the power supply?

Then I probably can't, because the power supply may be just a bought in
power supply which was never designed with upcoming failure detection in
mind.

>
> You have ECC memory in most (larger) machines. Do you silently
> expect it to just fix all the errors? Does it have a way of telling you
> how many such errors it HAS corrected? Can you infer the number of
> errors that it *hasn't*?
>
> [Why have ECC at all?]

Things are sometimes done the way they've always been done.
I used to notice a missing chip in the 9th position but now you mention it
the RAM I just looked at has 9 chips each side.

>
> There are (and have been) many efforts to *predict* lifetimes of
> components (and, systems). And, some work to examine the state
> of systems /in situ/ with an eye towards anticipating their
> likelihood of future failure.


Click here to read the complete article
Re: Re:Predictive failures

<uvl2gr$phap$2@dont-email.me>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136409&group=sci.electronics.design#136409

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: blockedofcourse@foo.invalid (Don Y)
Newsgroups: sci.electronics.design
Subject: Re: Re:Predictive failures
Date: Mon, 15 Apr 2024 22:32:04 -0700
Organization: A noiseless patient Spider
Lines: 108
Message-ID: <uvl2gr$phap$2@dont-email.me>
References: <uvjn74$d54b$1@dont-email.me> <uvjobr$dfi2$1@dont-email.me>
<uvkn71$ngqi$2@dont-email.me>
<uvkrig$30nb$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 16 Apr 2024 07:32:13 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="8f873457a009428ae193cacdeebfb978";
logging-data="836953"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19dCPpNJf4+9Zet4plQgPUI"
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.2.2
Cancel-Lock: sha1:F0mkyhDN1u/Bu5FYxIl9+pP6Yas=
In-Reply-To: <uvkrig$30nb$1@nnrp.usenet.blueworldhosting.com>
Content-Language: en-US
 by: Don Y - Tue, 16 Apr 2024 05:32 UTC

On 4/15/2024 8:33 PM, Edward Rawde wrote:

[Shouldn't that be Edwar D rawdE?]

>>> Current and voltages outside of normal operation?
>>
>> I think "outside" is (often) likely indicative of
>> "something is (already) broken".
>>
>> But, perhaps TRENDS in either/both can be predictive.
>>
>> E.g., if a (sub)circuit has always been consuming X (which
>> is nominal for the design) and, over time, starts to consume
>> 1.1X, is that suggestive that something is in the process of
>> failing?
>
> That depends on many other unknown factors.
> Temperature sensors are common in electronics.
> So is current sensing. Voltage sensing too.

Sensors cost money. And, HAVING data but not knowing how to
USE it is a wasted activity (and cost).

Why not monitor every node in the schematic and compare
them (with dedicated hardware -- that is ALSO monitored??)
with expected operational limits?

Then, design some network to weight the individual
observations to make the prediction?

>> Note that the goal is not to troubleshoot the particular design
>> or its components but, rather, act as an early warning that
>> maintenance may be required (or, that performance may not be
>> what you are expecting/have become accustomed to).
>
> If the system is electronic then you can detect whether currents and/or
> votages are within expected ranges.
> If they are a just a little out of expected range then you might turn on a
> warning LED.
> If they are way out of range then you might tell the power supply to turn
> off quick.
> By all means tell the software what has happened, but don't put software
> between the current sensor and the emergency turn off.

Again, the goal is to be an EARLY warning, not an "Oh, Shit! Kill the power!!"

As such, software is invaluable as designing PREDICTIVE hardware is
harder than designing predictive software (algorithms).

You don't want to tell the user "The battery in your smoke detector
is NOW dead (leaving you vulnerable)" but, rather, "The battery in
your smoke detector WILL cease to be able to provide the power necessary
for the smoke detector to provide the level of protection that you
desire."

And, the WAY that you inform the user has to be "productive/useful".
A smoke detector beeping every minute is likely to find itself unplugged,
leading to exactly the situation that the alert was trying to avoid!

A smoke detector that beeps once a day risks not being heard
(what if the occupant "works nights"?). A smoke detector
that beeps a month in advance of the anticipated failure (and
requires acknowledgement) risks being forgotten -- until
it is forced to beep more persistently (see above).

> Be aware that components in monitoring circuits can fail too.

Which is why hardware interlocks are physical switches -- yet
can only be used to protect against certain types of faults
(those that are most costly -- injury or loss of life)

>> But, again, these are more helpful in detecting FAILED items.
>
> What system would you like to have early warnings for?
> Are the warnings needed to indicate operation out of expected limits or to
> indicate that maintenance is required, or both?
> Without detailed knowledge of the specific sytem, only speculative answers
> can be given.

I'm not looking for speculation. I'm looking for folks who have DONE
such things (designing to speculation is more expensive than just letting
the devices fail when they need to fail!)

E.g., when making tablets, it is possible that a bit of air will
get trapped in the granulation during compression. This is dependant
on a lot of factors -- tablet dimensions, location in the die
where the compression event is happening, characteristics of the
granulation, geometry/condition of the tooling, etc.

But, if this happens, some tens of milliseconds later, the top will "pop"
off the tablet. It now is cosmetically damaged as well as likely out
of specification (amount of "active" present in the dose). You want
to either be able to detect this (100% of the time on 100% of the tablets)
and dynamically discard those units (and only those units!). *OR*,
identify the characteristics of the process that most affect this condition
and *monitor* for them to AVOID the problem.

If that means replacing your tooling more frequently (expensive!),
it can save money in the long run (imagine having to "sort" through
a million tablets each hour to determine if any have popped like this?)
Or, throttling down the press so the compression events are "slower"
(more gradual). Or, moving the event up in the die to provide
a better egress for the trapped air. Or...

TELLING the user that this is happening (or likely to happen, soon)
has real $$$ value. Even better if your device can LEARN which
tablets and conditions will likely lead to this -- and when!

Re: Predictive failures

<uvl30j$phap$3@dont-email.me>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136410&group=sci.electronics.design#136410

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: blockedofcourse@foo.invalid (Don Y)
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Mon, 15 Apr 2024 22:40:29 -0700
Organization: A noiseless patient Spider
Lines: 253
Message-ID: <uvl30j$phap$3@dont-email.me>
References: <uvjn74$d54b$1@dont-email.me>
<uvk2sk$1p01$1@nnrp.usenet.blueworldhosting.com>
<uvkqqu$o5co$1@dont-email.me>
<uvktun$2kjj$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 16 Apr 2024 07:40:38 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="8f873457a009428ae193cacdeebfb978";
logging-data="836953"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX180lebDKiCccgAVAVlm/yQ1"
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.2.2
Cancel-Lock: sha1:StYHbMsZPCFz9UApf34JIU2gVZ4=
Content-Language: en-US
In-Reply-To: <uvktun$2kjj$1@nnrp.usenet.blueworldhosting.com>
 by: Don Y - Tue, 16 Apr 2024 05:40 UTC

On 4/15/2024 9:14 PM, Edward Rawde wrote:
>>> It always puzzled me how HAL could know that the AE-35 would fail in the
>>> near future, but maybe HAL had a motive for lying.
>>
>> Why does your PC retry failed disk operations?
>
> Because the software designer didn't understand hardware.

Actually, he DID understand the hardware which is why he retried
it instead of ASSUMING every operation would proceed correctly.

[Why bother testing the result code if you never expect a failure?]

> The correct approach is to mark that part of the disk as unusable and, if
> possible, move any data from it elsewhere quick.

That only makes sense if the error is *persistent*. "Shit
happens" and you can get an occasional failed operation when
nothing is truly "broken".

(how do you know the HBA isn't the culprit?)

>> If I ask the drive to give
>> me LBA 1234, shouldn't it ALWAYS give me LBA1234? Without any data
>> corruption
>> (CRC error) AND within the normal access time limits defined by the
>> location
>> of those magnetic domains on the rotating medium?
>>
>> Why should it attempt to retry this MORE than once?
>>
>> Now, if you knew your disk drive was repeatedly retrying operations,
>> would your confidence in it be unchanged from times when it did not
>> exhibit such behavior?
>
> I'd have put an SSD in by now, along with an off site backup of the same
> data :)

So, any problems you have with your SSD, today, should be solved by using the
technology that will be invented 10 years hence! Ah, that's a sound strategy!

>> Assuming you have properly configured a EIA232 interface, why would you
>> ever get a parity error? (OVERRUN errors can be the result of an i/f
>> that is running too fast for the system on the receiving end) How would
>> you even KNOW this was happening?
>>
>> I suspect everyone who has owned a DVD/CD drive has encountered a
>> "slow tray" as the mechanism aged. Or, a tray that wouldn't
>> open (of its own accord) as soon/quickly as it used to.
>
> If it hasn't been used for some time then I'm ready with a tiny screwdriver
> blade to help it open.

Why don't they ship such drives with tiny screwdrivers to make it
easier for EVERY customer to address this problem?

> But I forget when I last used an optical drive.

When the firmware in your SSD corrupts your data, what remedy will
you use?

You're missing the forest for the trees.

>> [Turns out, there was a city-wide gas shortage so there was enough
>> gas available to light the furnace but not enough to bring it up to
>> temperature as quickly as the designers had expected]
>
> That's why the furnace designers couldn't have anticipated it.

Really? You can't anticipate the "gas shutoff" not being in the ON
position? (which would yield the same endless retry cycle)

> They did not know that such a contition might occur so never tested for it.

If they planned on ENDLESSLY retrying, then they must have imagined
some condition COULD occur that would lead to such an outcome.
Else, why not just retry *once* and then give up? Or, not
retry at all?

>>> A component could fail suddenly, such as a short circuit diode, and
>>> everything would work fine after replacing it.
>>> The cause could perhaps have been a manufacturing defect, such as
>>> insufficient cooling due to poor quality assembly, but the exact real
>>> cause
>>> would never be known.
>>
>> You don't care about the real cause. Or, even the failure mode.
>> You (as user) just don't want to be inconvenienced by the sudden
>> loss of the functionality/convenience that the the device provided.
>
> There will always be sudden unexpected loss of functionality for reasons
> which could not easily be predicted.

And if they CAN'T be predicted, then they aren't germane to this
discussion, eh?

My concern is for the set of failure modes that can realistically
be anticipated.

I *know* the inverters in my monitors are going to fail. It
would be nice if I knew before I was actively using one when
it went dark!

[But, most users would only use this indication to tell them
to purchase another monitor; "You have been warned!"]

> People who service lawn mowers in the area where I live are very busy right
> now.
>
>>> A component could fail suddenly as a side effect of another failure.
>>> One short circuit output transistor and several other components could
>>> also
>>> burn up.
>>
>> So, if you could predict the OTHER failure...
>> Or, that such a failure might occur and lead to the followup failure...
>>
>>> A component could fail slowly and only become apparent when it got to the
>>> stage of causing an audible or visible effect.
>>
>> But, likely, there was something observable *in* the circuit that
>> just hadn't made it to the level of human perception.
>
> Yes a power supply ripple detection circuit could have turned on a warning
> LED but that never happened for at least two reasons.
> 1. The detection circuit would have increased the cost of the equipment and
> thus diminished the profit of the manufacturer.

That would depend on the market, right? Most of my computers have redundant
"smart" (i.e., internal monitoring and reporting) power supplies. Because
they were marketed to folks who wanted that sort of reliability. Because
a manufacturer who didn't provide that level of AVAILABILITY would quickly
lose market share. The cost of the added components and "handling" is
small compared to the cost of lost opportunity (sales).

> 2. The user would not have understood and would have ignored the warning
> anyway.

That makes assumptions about the market AND the user.

If one of my machines signals a fault, I look to see what it is complaining
about: is it a power supply failure (in which case, I'm now reliant on
a single power supply)? is it a memory failure (in which case, a bank
of memory may have been disabled which means the machine will thrash
more and throughput will drop)? is it a link aggregation error (and
network traffic will suffer)?

If I can't understand these errors, then I either don't buy a product
with that level of reliability *or* have someone on hand who CAN
understand the errors and provide remedies/advice.

Consumers will replace a PC because of malware, trashed registry,
creeping cruft, etc. That's a problem with the consumer buying the
"wrong" sort of computing equipment for his likely method of use.
(buy a Mac?)

>>> My home wireless Internet system doesn't care if one access point fails,
>>> and
>>> I would not expect to be able to do anything to predict a time of
>>> failure.
>>> Experience says a dead unit has power supply issues. Usually external but
>>> could be internal.
>>
>> Again, the goal isn't to predict "time of failure". But, rather, to be
>> able to know that "this isn't going to end well" -- with some advance
>> notice
>> that allows for preemptive action to be taken (and not TOO much advance
>> notice that the user ends up replacing items prematurely).
>
> Get feedback from the people who use your equpment.

Users often don't understand when a device is malfunctioning.
Or, how to report the conditions and symptoms in a meaningful way.

I recall a woman I worked with ~45 years ago sitting, patiently,
waiting for her computer to boot. As I walked past, she asked me how
long it takes for that to happen (floppy based systems). Alarmed
(I had designed the workstations), I asked "How long have you been
waiting?"

Turns out, she had inserted the (8") floppy rotated 90 degrees from
it's proper orientation.

How much longer would she have waited had I not walked past?

>>> I don't think it would be possible to "watch" everything because it's
>>> rare
>>> that you can properly test a component while it's part of a working
>>> system.
>>
>> You don't have to -- as long as you can observe its effects on other
>> parts of the system. E.g., there's no easy/inexpensive way to
>> check to see how much the belt on that CD/DVD player has stretched.
>> But, you can notice that it HAS stretched (or, some less likely
>> change has occurred that similarly interferes with the tray's actions)
>> by noting how the activity that it is used for has changed.
>
> Sure but you have to be the operator for that.
> So you can be ready to help the tray open when needed.

One wouldn't bother with a CD/DVD player -- they are too disposable
and reporting errors won't help the user (even though you have a
big ATTACHED display at your disposal!)


Click here to read the complete article
Re: Predictive failures

<uvldrf$rpnh$1@dont-email.me>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136411&group=sci.electronics.design#136411

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: '''newspam'''@nonad.co.uk (Martin Brown)
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Tue, 16 Apr 2024 09:45:34 +0100
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <uvldrf$rpnh$1@dont-email.me>
References: <uvjn74$d54b$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 16 Apr 2024 10:45:35 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="24ee2e361efeca0239e60e430996748a";
logging-data="911089"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18M6tKI3Ca/fzk93VBqR2fNaj3ATm6m7rB6so4oplsX/g=="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:ORtI80Heeja8UjFyIcGMLDBIMQw=
Content-Language: en-GB
In-Reply-To: <uvjn74$d54b$1@dont-email.me>
 by: Martin Brown - Tue, 16 Apr 2024 08:45 UTC

On 15/04/2024 18:13, Don Y wrote:
> Is there a general rule of thumb for signalling the likelihood of
> an "imminent" (for some value of "imminent") hardware failure?

You have to be very careful that the additional complexity doesn't
itself introduce new annoying failure modes. My previous car had
filament bulb failure sensors (new one is LED) of which the one for the
parking light had itself failed - the parking light still worked.
However, the car would great me with "parking light failure" every time
I started the engine and the main dealer refused to cancel it.

Repair of parking light sensor failure required swapping out the
*entire* front light assembly since it was built in one time hot glue.
That would be a very expensive "repair" for a trivial fault.

The parking light is not even a required feature.

> I suspect most would involve *relative* changes that would be
> suggestive of changing conditions in the components (and not
> directly related to environmental influences).
>
> So, perhaps, a good strategy is to just "watch" everything and
> notice the sorts of changes you "typically" encounter in the hope
> that something of greater magnitude would be a harbinger...

Monitoring temperature, voltage supply and current consumption isn't a
bad idea. If they get unexpectedly out of line something is wrong.
Likewise with power on self tests you can catch some latent failures
before they actually affect normal operation.

--
Martin Brown

Re: Predictive failures

<uvlktt$tad0$1@dont-email.me>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136412&group=sci.electronics.design#136412

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: '''newspam'''@nonad.co.uk (Martin Brown)
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Tue, 16 Apr 2024 11:46:20 +0100
Organization: A noiseless patient Spider
Lines: 252
Message-ID: <uvlktt$tad0$1@dont-email.me>
References: <uvjn74$d54b$1@dont-email.me>
<uvk2sk$1p01$1@nnrp.usenet.blueworldhosting.com>
<uvkqqu$o5co$1@dont-email.me>
<uvktun$2kjj$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 16 Apr 2024 12:46:21 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="24ee2e361efeca0239e60e430996748a";
logging-data="960928"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/m8R3spCjCJP7SMARAWu8b6N2xqU/qu2Qq9RS5drCWwQ=="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:xn8qWErodZJ79TsbvPKW7lgyuAc=
In-Reply-To: <uvktun$2kjj$1@nnrp.usenet.blueworldhosting.com>
Content-Language: en-GB
 by: Martin Brown - Tue, 16 Apr 2024 10:46 UTC

On 16/04/2024 05:14, Edward Rawde wrote:
> "Don Y" <blockedofcourse@foo.invalid> wrote in message
> news:uvkqqu$o5co$1@dont-email.me...
>> On 4/15/2024 1:32 PM, Edward Rawde wrote:
>>> "Don Y" <blockedofcourse@foo.invalid> wrote in message
>>> news:uvjn74$d54b$1@dont-email.me...
>>>> Is there a general rule of thumb for signalling the likelihood of
>>>> an "imminent" (for some value of "imminent") hardware failure?
>>>
>>> My conclusion would be no.
>>> Some of my reasons are given below.
>>>
>>> It always puzzled me how HAL could know that the AE-35 would fail in the
>>> near future, but maybe HAL had a motive for lying.
>>
>> Why does your PC retry failed disk operations?
>
> Because the software designer didn't understand hardware.
> The correct approach is to mark that part of the disk as unusable and, if
> possible, move any data from it elsewhere quick.
>
>> If I ask the drive to give
>> me LBA 1234, shouldn't it ALWAYS give me LBA1234? Without any data
>> corruption
>> (CRC error) AND within the normal access time limits defined by the
>> location
>> of those magnetic domains on the rotating medium?
>>
>> Why should it attempt to retry this MORE than once?
>>
>> Now, if you knew your disk drive was repeatedly retrying operations,
>> would your confidence in it be unchanged from times when it did not
>> exhibit such behavior?
>
> I'd have put an SSD in by now, along with an off site backup of the same
> data :)
>
>>
>> Assuming you have properly configured a EIA232 interface, why would you
>> ever get a parity error? (OVERRUN errors can be the result of an i/f
>> that is running too fast for the system on the receiving end) How would
>> you even KNOW this was happening?
>>
>> I suspect everyone who has owned a DVD/CD drive has encountered a
>> "slow tray" as the mechanism aged. Or, a tray that wouldn't
>> open (of its own accord) as soon/quickly as it used to.
>
> If it hasn't been used for some time then I'm ready with a tiny screwdriver
> blade to help it open.
> But I forget when I last used an optical drive.
>
>>
>> The controller COULD be watching this (cuz it knows when it
>> initiated the operation and there is an "end-of-stroke"
>> sensor available) and KNOW that the drive belt was stretching
>> to the point where it was impacting operation.
>>
>> [And, that a stretched belt wasn't going to suddenly decide to
>> unstretch to fix the problem!]
>>
>>> Back in that era I was doing a lot of repair work when I should have been
>>> doing my homework.
>>> So I knew that there were many unrelated kinds of hardware failure.
>>
>> The goal isn't to predict ALL failures but, rather, to anticipate
>> LIKELY failures and treat them before they become an inconvenience
>> (or worse).
>>
>> One morning, the (gas) furnace repeatedly tried to light as the
>> thermostat called for heat. Then, a few moments later, the
>> safeties would kick in and shut down the gas flow. This attracted my
>> attention as the LIT furnace should STAY LIT!
>>
>> The furnace was too stupid to notice its behavior so would repeat
>> this cycle, endlessly.
>>
>> I stepped in and overrode the thermostat to eliminate the call
>> for heat as this behavior couldn't be productive (if something
>> truly IS wrong, then why let it continue? and, if there is nothing
>> wrong with the controls/mechanism, then clearly it is unable to meet
>> my needs so why let it persist in trying?)
>>
>> [Turns out, there was a city-wide gas shortage so there was enough
>> gas available to light the furnace but not enough to bring it up to
>> temperature as quickly as the designers had expected]
>
> That's why the furnace designers couldn't have anticipated it.
> They did not know that such a contition might occur so never tested for it.
>
>>
>>> A component could fail suddenly, such as a short circuit diode, and
>>> everything would work fine after replacing it.
>>> The cause could perhaps have been a manufacturing defect, such as
>>> insufficient cooling due to poor quality assembly, but the exact real
>>> cause
>>> would never be known.
>>
>> You don't care about the real cause. Or, even the failure mode.
>> You (as user) just don't want to be inconvenienced by the sudden
>> loss of the functionality/convenience that the the device provided.
>
> There will always be sudden unexpected loss of functionality for reasons
> which could not easily be predicted.
> People who service lawn mowers in the area where I live are very busy right
> now.
>
>>
>>> A component could fail suddenly as a side effect of another failure.
>>> One short circuit output transistor and several other components could
>>> also
>>> burn up.
>>
>> So, if you could predict the OTHER failure...
>> Or, that such a failure might occur and lead to the followup failure...
>>
>>> A component could fail slowly and only become apparent when it got to the
>>> stage of causing an audible or visible effect.
>>
>> But, likely, there was something observable *in* the circuit that
>> just hadn't made it to the level of human perception.
>
> Yes a power supply ripple detection circuit could have turned on a warning
> LED but that never happened for at least two reasons.
> 1. The detection circuit would have increased the cost of the equipment and
> thus diminished the profit of the manufacturer.
> 2. The user would not have understood and would have ignored the warning
> anyway.
>
>>
>>> It would often be easy to locate the dried up electrolytic due to it
>>> having
>>> already let go of some of its contents.
>>>
>>> So I concluded that if I wanted to be sure that I could always watch my
>>> favourite TV show, we would have to have at least two TVs in the house.
>>>
>>> If it's not possible to have the equivalent of two TVs then you will want
>>> to
>>> be in a position to get the existing TV repaired or replaced as quicky as
>>> possible.
>>
>> Two TVs are affordable. Consider two controllers for a wire-EDM machine.
>>
>> Or, the cost of having that wire-EDM machine *idle* (because you didn't
>> have a spare controller!)
>>
>>> My home wireless Internet system doesn't care if one access point fails,
>>> and
>>> I would not expect to be able to do anything to predict a time of
>>> failure.
>>> Experience says a dead unit has power supply issues. Usually external but
>>> could be internal.
>>
>> Again, the goal isn't to predict "time of failure". But, rather, to be
>> able to know that "this isn't going to end well" -- with some advance
>> notice
>> that allows for preemptive action to be taken (and not TOO much advance
>> notice that the user ends up replacing items prematurely).
>
> Get feedback from the people who use your equpment.
>
>>
>>> I don't think it would be possible to "watch" everything because it's
>>> rare
>>> that you can properly test a component while it's part of a working
>>> system.
>>
>> You don't have to -- as long as you can observe its effects on other
>> parts of the system. E.g., there's no easy/inexpensive way to
>> check to see how much the belt on that CD/DVD player has stretched.
>> But, you can notice that it HAS stretched (or, some less likely
>> change has occurred that similarly interferes with the tray's actions)
>> by noting how the activity that it is used for has changed.
>
> Sure but you have to be the operator for that.
> So you can be ready to help the tray open when needed.
>
>>
>>> These days I would expect to have fun with management asking for software
>>> to
>>> be able to diagnose and report any hardware failure.
>>> Not very easy if the power supply has died.
>>
>> What if the power supply HASN'T died? What if you are diagnosing the
>> likely upcoming failure *of* the power supply?
>
> Then I probably can't, because the power supply may be just a bought in
> power supply which was never designed with upcoming failure detection in
> mind.
>
>>
>> You have ECC memory in most (larger) machines. Do you silently
>> expect it to just fix all the errors? Does it have a way of telling you
>> how many such errors it HAS corrected? Can you infer the number of
>> errors that it *hasn't*?
>>
>> [Why have ECC at all?]
>
> Things are sometimes done the way they've always been done.
> I used to notice a missing chip in the 9th position but now you mention it
> the RAM I just looked at has 9 chips each side.
>
>>
>> There are (and have been) many efforts to *predict* lifetimes of
>> components (and, systems). And, some work to examine the state
>> of systems /in situ/ with an eye towards anticipating their
>> likelihood of future failure.
>
> I'm sure that's true.
>
>>
>> [The former has met with poor results -- predicting the future
>> without a position in its past is difficult. And, knowing how
>> a device is "stored" when not powered on also plays a role
>> in its future survival! (is there some reason YOUR devices
>> can't power themselves on, periodically; notice the environmental
>> conditions; log them and then power back off)]
>>
>> The question is one of a practical nature; how much does it cost
>> you to add this capability to a device and how accurately can it
>> make those predictions (thus avoiding some future cost/inconvenience).
>>
>> For small manufacturers, the research required is likely not
>> cost-effective;
>> just take your best stab at it and let the customer "buy a replacement"
>> when the time comes (hopefully, outside of your warranty window).
>>
>> But, anything you can do to minimize this TCO issue gives your product
>> an edge over competitors. Given that most devices are smart, nowadays,
>> it seems obvious that they should undertake as much of this task as
>> they can (conveniently) afford.
>>
>> <https://www.sciencedirect.com/science/article/abs/pii/S0026271409003667>
>>
>> <https://www.researchgate.net/publication/3430090_In_Situ_Temperature_Measurement_of_a_Notebook_Computer-A_Case_Study_in_Health_and_Usage_Monitoring_of_Electronics>
>>
>> <https://www.tandfonline.com/doi/abs/10.1080/16843703.2007.11673148>
>>
>> <https://www.prognostics.umd.edu/calcepapers/02_V.Shetty_remaingLifeAssesShuttleRemotemanipulatorSystem_22ndSpaceSimulationConf.pdf>
>>
>> <https://ieeexplore.ieee.org/document/1656125>
>>
>> <https://journals.sagepub.com/doi/10.1177/0142331208092031>
>>
>> [Sorry, I can't publish links to the full articles]
>>
>
>


Click here to read the complete article
Re: Predictive failures

<uvln9b$trln$2@dont-email.me>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136413&group=sci.electronics.design#136413

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: blockedofcourse@foo.invalid (Don Y)
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Tue, 16 Apr 2024 04:26:28 -0700
Organization: A noiseless patient Spider
Lines: 90
Message-ID: <uvln9b$trln$2@dont-email.me>
References: <uvjn74$d54b$1@dont-email.me> <uvldrf$rpnh$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 16 Apr 2024 13:26:36 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="8f873457a009428ae193cacdeebfb978";
logging-data="978615"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+kTaxaa2CLK83/FnADKd46"
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101
Thunderbird/102.2.2
Cancel-Lock: sha1:QVbd4tUGHH4rVGENZvfhArkhcxQ=
Content-Language: en-US
In-Reply-To: <uvldrf$rpnh$1@dont-email.me>
 by: Don Y - Tue, 16 Apr 2024 11:26 UTC

On 4/16/2024 1:45 AM, Martin Brown wrote:
> On 15/04/2024 18:13, Don Y wrote:
>> Is there a general rule of thumb for signalling the likelihood of
>> an "imminent" (for some value of "imminent") hardware failure?
>
> You have to be very careful that the additional complexity doesn't itself
> introduce new annoying failure modes.

*Or*, decrease the reliability of the device, in general.

> My previous car had filament bulb failure
> sensors (new one is LED) of which the one for the parking light had itself
> failed - the parking light still worked. However, the car would great me with
> "parking light failure" every time I started the engine and the main dealer
> refused to cancel it.

My goal is to provide *advisories*. You don't want to constrain the
user.

Smoke detectors that nag you with "replace battery" alerts are nags.
A car that refuses to start unless the seat belts are fastened is a nag.

You shouldn't require a third party to enable you to ignore an
advisory. But, it's OK to require the user to acknowledge that
advisory!

> Repair of parking light sensor failure required swapping out the *entire* front
> light assembly since it was built in one time hot glue. That would be a very
> expensive "repair" for a trivial fault.
>
> The parking light is not even a required feature.
>
>> I suspect most would involve *relative* changes that would be
>> suggestive of changing conditions in the components (and not
>> directly related to environmental influences).
>>
>> So, perhaps, a good strategy is to just "watch" everything and
>> notice the sorts of changes you "typically" encounter in the hope
>> that something of greater magnitude would be a harbinger...
>
> Monitoring temperature, voltage supply and current consumption isn't a bad
> idea. If they get unexpectedly out of line something is wrong.

Extremes are easy to detect -- but often indicate failures.
E.g., a short, an open.

The problem is sorting out what magnitude changes are significant
and which are normal variation.

I think being able to track history gives you a leg up in that
it gives you a better idea of what MIGHT be normal instead of
just looking at an instant in time.

> Likewise with
> power on self tests you can catch some latent failures before they actually
> affect normal operation.

POST is seldom executed as devices tend to run 24/7/365.
So, I have to design runtime BIST support that can, hopefully,
coax this information from a *running* system without interfering
with that operation.

This puts constraints on how you operate the hardware
(unless you want to add lots of EXTRA hardware to
extract these observations.

E.g., if you can control N loads, then individually (sequentially)
activating them and noticing the delta power consumption reveals
more than just enabling ALL that need to be enabled and only seeing
the aggregate of those loads.

This can also simplify gross failure detection if part of the
normal control strategy.

E.g., I designed a medical instrument many years ago that had an
external "sensor array". As that could be unplugged at any time,
I had to continually monitor for it's disconnection. At the same
time, individual sensors in the array could be "spoiled" by
spilled reagents. Yet, the other sensors shouldn't be compromised
or voided just because of the failure of certain ones.

Recognizing that this sort of thing COULD happen in normal use
was the biggest part of the design; the hardware and software
to actually handle these exceptions was then straightforward.

Note that some failures may not be possible to recover from
without adding significant cost (and other failure modes).
So, it's a value decision as to what you support and what
you "tolerate".

Re: Predictive failures

<20240416a@crcomp.net>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136414&group=sci.electronics.design#136414

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: g@crcomp.net (Don)
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Tue, 16 Apr 2024 13:25:07 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 31
Message-ID: <20240416a@crcomp.net>
References: <uvjn74$d54b$1@dont-email.me>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 16 Apr 2024 15:25:08 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="ea24c22831db45ec26d076c5131ed99c";
logging-data="1027523"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1920qvG8NGNfzXT7+KzTcBf"
Cancel-Lock: sha1:cFUl45rMuyWpLKG0xbsyz2jOmss=
 by: Don - Tue, 16 Apr 2024 13:25 UTC

Don Y wrote:
> Is there a general rule of thumb for signalling the likelihood of
> an "imminent" (for some value of "imminent") hardware failure?
>
> I suspect most would involve *relative* changes that would be
> suggestive of changing conditions in the components (and not
> directly related to environmental influences).
>
> So, perhaps, a good strategy is to just "watch" everything and
> notice the sorts of changes you "typically" encounter in the hope
> that something of greater magnitude would be a harbinger...

A singular speculative spitball - the capacitive marker:

In-situ Prognostic Method of Power MOSFET Based on Miller Effect

... This paper presents a new in-situ prognosis method for
MOSFET based on miller effect. According to the theory
analysis, simulation and experiment results, the miller
platform voltage is identified as a new degradation
precursor ...

(10.1109/PHM.2017.8079139)

Danke,

--
Don, KB7RPU, https://www.qsl.net/kb7rpu
There was a young lady named Bright Whose speed was far faster than light;
She set out one day In a relative way And returned on the previous night.

Re: Predictive failures

<070t1j1ct6dhfvk48pvuk12l2g1ip9v15a@4ax.com>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136415&group=sci.electronics.design#136415

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!69.80.99.23.MISMATCH!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Tue, 16 Apr 2024 13:54:41 +0000
From: joegwinn@comcast.net (Joe Gwinn)
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Tue, 16 Apr 2024 09:54:40 -0400
Message-ID: <070t1j1ct6dhfvk48pvuk12l2g1ip9v15a@4ax.com>
References: <uvjn74$d54b$1@dont-email.me> <jg0r1j1r2cdlnhev0v1gaogd3fj0kmdiim@4ax.com> <0s1r1jhb5vfe7lvopuvfk4ndkbt54ud3d9@4ax.com> <rh7r1jhtvqivb43vmt3u9d0snah8fu4pjn@4ax.com> <uvki1n$iqfd$1@dont-email.me>
User-Agent: ForteAgent/8.00.32.1272
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 82
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-dgGKDdxyB2mvXZ0GyFkzeaBBs/aecQ9omoqf9WPgO/2mbkI2K9mwTNVMUl5XKE7+JipWBeQKHZvk/LP!3jDyp8goJdIU0aOGzaI0qSyFvfTzo94lhvD9MKLJn1SYBjSIVNptc4kmaY4AvV6s65ON1R4=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
 by: Joe Gwinn - Tue, 16 Apr 2024 13:54 UTC

On Tue, 16 Apr 2024 00:51:03 -0000 (UTC), Phil Hobbs
<pcdhSpamMeSenseless@electrooptical.net> wrote:

>Joe Gwinn <joegwinn@comcast.net> wrote:
>> On Mon, 15 Apr 2024 13:05:40 -0700, john larkin <jl@650pot.com> wrote:
>>
>>> On Mon, 15 Apr 2024 15:41:57 -0400, Joe Gwinn <joegwinn@comcast.net>
>>> wrote:
>>>
>>>> On Mon, 15 Apr 2024 10:13:02 -0700, Don Y
>>>> <blockedofcourse@foo.invalid> wrote:
>>>>
>>>>> Is there a general rule of thumb for signalling the likelihood of
>>>>> an "imminent" (for some value of "imminent") hardware failure?
>>>>>
>>>>> I suspect most would involve *relative* changes that would be
>>>>> suggestive of changing conditions in the components (and not
>>>>> directly related to environmental influences).
>>>>>
>>>>> So, perhaps, a good strategy is to just "watch" everything and
>>>>> notice the sorts of changes you "typically" encounter in the hope
>>>>> that something of greater magnitude would be a harbinger...
>>>>
>>>> There is a standard approach that may work: Measure the level and
>>>> trend of very low frequency (around a tenth of a Hertz) flicker noise.
>>>> When connections (perhaps within a package) start to fail, the flicker
>>>> level rises. The actual frequency monitored isn't all that critical.
>>>>
>>>> Joe Gwinn
>>>
>>> Do connections "start to fail" ?
>>
>> Yes, they do, in things like vias. I went through a big drama where a
>> critical bit of radar logic circuitry would slowly go nuts.
>>
>> It turned out that the copper plating on the walls of the vias was
>> suffering from low-cycle fatigue during temperature cycling and slowly
>> breaking, one little crack at a time, until it went open. If you
>> measured the resistance to parts per million (6.5 digit DMM), sampling
>> at 1 Hz, you could see the 1/f noise at 0.1 Hz rising. It's useful to
>> also measure a copper line, and divide the via-chain resistance by the
>> no-via resistance, to correct for temperature changes.
>>
>> The solution was to redesign the vias, mainly to increase the critical
>> volume of copper. And modern SMD designs have less and less copper
>> volume.
>>
>> I bet precision resistors can also be measured this way.
>>
>>
>>> I don't think I've ever owned a piece of electronic equipment that
>>> warned me of an impending failure.
>>
>> Onset of smoke emission is a common sign.
>>
>>
>>> Cars do, for some failure modes, like low oil level.
>>
>> The industrial method for big stuff is accelerometers attached near
>> the bearings, and listen for excessive rotation-correlated (not
>> necessarily harmonic) noise.
>
>There are a number of instruments available that look for metal particles
>in the lubricating oil.

Yes.

The old-school version was a magnetic drain plug, which one inspected
for clinging iron chips or dust, also serving to trap those chips. The
newer-school version was to send a sample of the dirty oil to the lab
for microscope and chemical analysis. There are companies that will
take your old lubrication oil and reprocess it, yielding new oil.

If there was an oil filter, inspect the filter surface.

And when one was replacing the oil in the gear case, wipe the bottom
with a white rag, and look at the rag.

Nobody did electronic testing until very recently, because even
expensive electronics were far too unreliable and fragile.

Joe Gwinn

Re: Predictive failures

<v91t1j914boel60v78kspq9u9rd23jbsai@4ax.com>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136416&group=sci.electronics.design#136416

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!feeder.usenetexpress.com!tr3.iad1.usenetexpress.com!69.80.99.27.MISMATCH!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Tue, 16 Apr 2024 14:19:01 +0000
From: joegwinn@comcast.net (Joe Gwinn)
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Tue, 16 Apr 2024 10:19:00 -0400
Message-ID: <v91t1j914boel60v78kspq9u9rd23jbsai@4ax.com>
References: <uvjn74$d54b$1@dont-email.me> <jg0r1j1r2cdlnhev0v1gaogd3fj0kmdiim@4ax.com> <0s1r1jhb5vfe7lvopuvfk4ndkbt54ud3d9@4ax.com> <rh7r1jhtvqivb43vmt3u9d0snah8fu4pjn@4ax.com> <pbdr1j11kj8sdfrtu4erc8c67s1g8dos9m@4ax.com>
User-Agent: ForteAgent/8.00.32.1272
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 104
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-IOgd370cq/9xdWEClbrDE54cwMK7jn6EtwFGeqQ1Df4OG9Ze6/G2jjfInSdIE5s6HYb4kdNpKe5lVtV!+hgseVEEA2HyDBpGh0s7kEjRt8vkftkjpXon7bkqDfGobYt6ESaUREBrmBTBG5qzDEI3zOQ=
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
 by: Joe Gwinn - Tue, 16 Apr 2024 14:19 UTC

On Mon, 15 Apr 2024 16:26:35 -0700, john larkin <jl@650pot.com> wrote:

>On Mon, 15 Apr 2024 18:03:23 -0400, Joe Gwinn <joegwinn@comcast.net>
>wrote:
>
>>On Mon, 15 Apr 2024 13:05:40 -0700, john larkin <jl@650pot.com> wrote:
>>
>>>On Mon, 15 Apr 2024 15:41:57 -0400, Joe Gwinn <joegwinn@comcast.net>
>>>wrote:
>>>
>>>>On Mon, 15 Apr 2024 10:13:02 -0700, Don Y
>>>><blockedofcourse@foo.invalid> wrote:
>>>>
>>>>>Is there a general rule of thumb for signalling the likelihood of
>>>>>an "imminent" (for some value of "imminent") hardware failure?
>>>>>
>>>>>I suspect most would involve *relative* changes that would be
>>>>>suggestive of changing conditions in the components (and not
>>>>>directly related to environmental influences).
>>>>>
>>>>>So, perhaps, a good strategy is to just "watch" everything and
>>>>>notice the sorts of changes you "typically" encounter in the hope
>>>>>that something of greater magnitude would be a harbinger...
>>>>
>>>>There is a standard approach that may work: Measure the level and
>>>>trend of very low frequency (around a tenth of a Hertz) flicker noise.
>>>>When connections (perhaps within a package) start to fail, the flicker
>>>>level rises. The actual frequency monitored isn't all that critical.
>>>>
>>>>Joe Gwinn
>>>
>>>Do connections "start to fail" ?
>>
>>Yes, they do, in things like vias. I went through a big drama where a
>>critical bit of radar logic circuitry would slowly go nuts.
>>
>>It turned out that the copper plating on the walls of the vias was
>>suffering from low-cycle fatigue during temperature cycling and slowly
>>breaking, one little crack at a time, until it went open. If you
>>measured the resistance to parts per million (6.5 digit DMM), sampling
>>at 1 Hz, you could see the 1/f noise at 0.1 Hz rising. It's useful to
>>also measure a copper line, and divide the via-chain resistance by the
>>no-via resistance, to correct for temperature changes.
>
>But nobody is going to monitor every via on a PCB, even if it were
>possible.

It was not possible to test the vias on the failing logic board, but
we knew from metallurgical cut, polish, and inspect studies of failed
boards that it was the vias that were failing.

>One could instrument a PCB fab test board, I guess. But DC tests would
>be fine.

What was being tested was a fab test board that had both the series
via chain path and the no-via path of roughly the same DC resistance,
set up so we could do 4-wire Kelvin resistance measurements of each
path independent of the other path.

>We have one board with over 4000 vias, but they are mostly in
>parallel.

This can also be tested , but using a 6.5-digit DMM intended for
measuring very low resistance values. A change of one part in 4,000
is huge to a 6.5-digit instrument. The conductivity will decline
linearly as vias fail one by one.

>>The solution was to redesign the vias, mainly to increase the critical
>>volume of copper. And modern SMD designs have less and less copper
>>volume.
>>
>>I bet precision resistors can also be measured this way.
>>
>>
>>>I don't think I've ever owned a piece of electronic equipment that
>>>warned me of an impending failure.
>>
>>Onset of smoke emission is a common sign.
>>
>>
>>>Cars do, for some failure modes, like low oil level.
>>
>>The industrial method for big stuff is accelerometers attached near
>>the bearings, and listen for excessive rotation-correlated (not
>>necessarily harmonic) noise.
>
>Big ships that I've worked on have a long propeller shaft in the shaft
>alley, a long tunnel where nobody often goes. They have magnetic shaft
>runout sensors and shaft bearing temperature monitors.
>
>They measure shaft torque and SHP too, from the shaft twist.

Yep. And these kinds of things fail slowly. At first.

>I liked hiding out in the shaft alley. It was private and cool, that
>giant shaft slowly rotating.

Probably had a calming flowing water sound as well.

Joe Gwinn

Re: Predictive failures

<uvm47h$1s5b$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136417&group=sci.electronics.design#136417

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: invalid@invalid.invalid (Edward Rawde)
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Tue, 16 Apr 2024 11:07:27 -0400
Organization: BWH Usenet Archive (https://usenet.blueworldhosting.com)
Lines: 25
Message-ID: <uvm47h$1s5b$1@nnrp.usenet.blueworldhosting.com>
References: <uvjn74$d54b$1@dont-email.me> <uvk2sk$1p01$1@nnrp.usenet.blueworldhosting.com> <uvkqqu$o5co$1@dont-email.me> <uvktun$2kjj$1@nnrp.usenet.blueworldhosting.com> <uvl30j$phap$3@dont-email.me>
Injection-Date: Tue, 16 Apr 2024 15:07:29 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="61611"; mail-complaints-to="usenet@blueworldhosting.com"
Cancel-Lock: sha1:e0amVj16XotL93gbuwyNcte5F5k= sha256:sOz0pwWz23r5lfTdjGttVwQQHmAMADpIjfk0Z9U0QH4=
sha1:29rLsPDoSh1nz4APaGMt5nJQS4Q= sha256:7RbmM4QvmYLnVXvT7IdGYYyZWPmuH70jo4+nKtv7P+Q=
X-RFC2646: Format=Flowed; Response
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
 by: Edward Rawde - Tue, 16 Apr 2024 15:07 UTC

"Don Y" <blockedofcourse@foo.invalid> wrote in message
news:uvl30j$phap$3@dont-email.me...
> On 4/15/2024 9:14 PM, Edward Rawde wrote:
>>>> It always puzzled me how HAL could know that the AE-35 would fail in
>>>> the
>>>> near future, but maybe HAL had a motive for lying.
>>>
>>> Why does your PC retry failed disk operations?
>>
>> Because the software designer didn't understand hardware.
>
> Actually, he DID understand the hardware which is why he retried
> it instead of ASSUMING every operation would proceed correctly.
>
> ....
>
> When the firmware in your SSD corrupts your data, what remedy will
> you use?

Replace drive and restore backup.
It's happened a few times, and a friend had one of those 16 GB but looks
like 1 TB to the OS SSDs from Amazon.

Re: Re:Predictive failures

<uvm4di$s98$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136418&group=sci.electronics.design#136418

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!panix!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: invalid@invalid.invalid (Edward Rawde)
Newsgroups: sci.electronics.design
Subject: Re: Re:Predictive failures
Date: Tue, 16 Apr 2024 11:10:40 -0400
Organization: BWH Usenet Archive (https://usenet.blueworldhosting.com)
Lines: 21
Message-ID: <uvm4di$s98$1@nnrp.usenet.blueworldhosting.com>
References: <uvjn74$d54b$1@dont-email.me> <uvjobr$dfi2$1@dont-email.me> <uvkn71$ngqi$2@dont-email.me> <uvkrig$30nb$1@nnrp.usenet.blueworldhosting.com> <uvl2gr$phap$2@dont-email.me>
Injection-Date: Tue, 16 Apr 2024 15:10:42 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="28968"; mail-complaints-to="usenet@blueworldhosting.com"
Cancel-Lock: sha1:C/nbpb1grHdPKXL0OqV67vH2EX0= sha256:qLVwdz2yEtsI5rY+1qt6HSWLrUFT7tsdgSWRb9U+ut4=
sha1:3+DE3U2KvofI5PrqPIVlM6Ognbc= sha256:d88oajHeacnYFsXdIjzIue1F6BixPUI2m18oQpMzN6g=
X-RFC2646: Format=Flowed; Response
X-MSMail-Priority: Normal
X-Priority: 3
X-Newsreader: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
 by: Edward Rawde - Tue, 16 Apr 2024 15:10 UTC

"Don Y" <blockedofcourse@foo.invalid> wrote in message
news:uvl2gr$phap$2@dont-email.me...
> On 4/15/2024 8:33 PM, Edward Rawde wrote:
>
> [Shouldn't that be Edwar D rawdE?]
>

I don't mind how you pronounce it.

>
....
>
> A smoke detector that beeps once a day risks not being heard

Reminds me of a tenant who just removed the battery to stop the annoying
beeping.

>
....

Re: Predictive failures

<5c4t1jt3er7a51macq5mnl8gfsuaipuv2b@4ax.com>

  copy mid

https://news.novabbs.org/tech/article-flat.php?id=136419&group=sci.electronics.design#136419

  copy link   Newsgroups: sci.electronics.design
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!feeder.usenetexpress.com!tr1.iad1.usenetexpress.com!69.80.99.26.MISMATCH!Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!nntp.supernews.com!news.supernews.com.POSTED!not-for-mail
NNTP-Posting-Date: Tue, 16 Apr 2024 15:17:45 +0000
From: jjSNIPlarkin@highNONOlandtechnology.com (John Larkin)
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Tue, 16 Apr 2024 08:16:04 -0700
Organization: Highland Tech
Reply-To: xx@yy.com
Message-ID: <5c4t1jt3er7a51macq5mnl8gfsuaipuv2b@4ax.com>
References: <uvjn74$d54b$1@dont-email.me> <jg0r1j1r2cdlnhev0v1gaogd3fj0kmdiim@4ax.com> <0s1r1jhb5vfe7lvopuvfk4ndkbt54ud3d9@4ax.com> <rh7r1jhtvqivb43vmt3u9d0snah8fu4pjn@4ax.com> <pbdr1j11kj8sdfrtu4erc8c67s1g8dos9m@4ax.com> <v91t1j914boel60v78kspq9u9rd23jbsai@4ax.com>
X-Newsreader: Forte Agent 3.1/32.783
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Lines: 146
X-Trace: sv3-7Pe7FXM0LIjZo8Id0Pb6jVikoTV67wIpak6ctex3rWvC1CzE2lgnCADu0YJxn8ortez2p2LaEDXpETW!WXASGuLv9sINzCmBSvEpfoMeXD55S9cePA2vSY5U13cWQo4ALNaLhywE6+Vqfr5/Gca9jSLhUyux!aD7oTQ==
X-Complaints-To: www.supernews.com/docs/abuse.html
X-DMCA-Complaints-To: www.supernews.com/docs/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
 by: John Larkin - Tue, 16 Apr 2024 15:16 UTC

On Tue, 16 Apr 2024 10:19:00 -0400, Joe Gwinn <joegwinn@comcast.net>
wrote:

>On Mon, 15 Apr 2024 16:26:35 -0700, john larkin <jl@650pot.com> wrote:
>
>>On Mon, 15 Apr 2024 18:03:23 -0400, Joe Gwinn <joegwinn@comcast.net>
>>wrote:
>>
>>>On Mon, 15 Apr 2024 13:05:40 -0700, john larkin <jl@650pot.com> wrote:
>>>
>>>>On Mon, 15 Apr 2024 15:41:57 -0400, Joe Gwinn <joegwinn@comcast.net>
>>>>wrote:
>>>>
>>>>>On Mon, 15 Apr 2024 10:13:02 -0700, Don Y
>>>>><blockedofcourse@foo.invalid> wrote:
>>>>>
>>>>>>Is there a general rule of thumb for signalling the likelihood of
>>>>>>an "imminent" (for some value of "imminent") hardware failure?
>>>>>>
>>>>>>I suspect most would involve *relative* changes that would be
>>>>>>suggestive of changing conditions in the components (and not
>>>>>>directly related to environmental influences).
>>>>>>
>>>>>>So, perhaps, a good strategy is to just "watch" everything and
>>>>>>notice the sorts of changes you "typically" encounter in the hope
>>>>>>that something of greater magnitude would be a harbinger...
>>>>>
>>>>>There is a standard approach that may work: Measure the level and
>>>>>trend of very low frequency (around a tenth of a Hertz) flicker noise.
>>>>>When connections (perhaps within a package) start to fail, the flicker
>>>>>level rises. The actual frequency monitored isn't all that critical.
>>>>>
>>>>>Joe Gwinn
>>>>
>>>>Do connections "start to fail" ?
>>>
>>>Yes, they do, in things like vias. I went through a big drama where a
>>>critical bit of radar logic circuitry would slowly go nuts.
>>>
>>>It turned out that the copper plating on the walls of the vias was
>>>suffering from low-cycle fatigue during temperature cycling and slowly
>>>breaking, one little crack at a time, until it went open. If you
>>>measured the resistance to parts per million (6.5 digit DMM), sampling
>>>at 1 Hz, you could see the 1/f noise at 0.1 Hz rising. It's useful to
>>>also measure a copper line, and divide the via-chain resistance by the
>>>no-via resistance, to correct for temperature changes.
>>
>>But nobody is going to monitor every via on a PCB, even if it were
>>possible.
>
>It was not possible to test the vias on the failing logic board, but
>we knew from metallurgical cut, polish, and inspect studies of failed
>boards that it was the vias that were failing.
>
>
>>One could instrument a PCB fab test board, I guess. But DC tests would
>>be fine.
>
>What was being tested was a fab test board that had both the series
>via chain path and the no-via path of roughly the same DC resistance,
>set up so we could do 4-wire Kelvin resistance measurements of each
>path independent of the other path.

Yes, but the question was whether one could predict the failure of an
operating electronic gadget. The answer is mostly NO.

We had a visit from the quality team from a giant company that you
have heard of. They wanted us to trend analyze all the power supplies
on our boards and apply a complex algotithm to predict failures. It
was total nonsense, basically predicting the future by zooming in on
random noise with a big 1/f component, just like climate prediction.

>
>
>>We have one board with over 4000 vias, but they are mostly in
>>parallel.
>
>This can also be tested , but using a 6.5-digit DMM intended for
>measuring very low resistance values. A change of one part in 4,000
>is huge to a 6.5-digit instrument. The conductivity will decline
>linearly as vias fail one by one.
>
>

Millikelvin temperature changes would make more signal than a failing
via.

>>>The solution was to redesign the vias, mainly to increase the critical
>>>volume of copper. And modern SMD designs have less and less copper
>>>volume.
>>>
>>>I bet precision resistors can also be measured this way.
>>>
>>>
>>>>I don't think I've ever owned a piece of electronic equipment that
>>>>warned me of an impending failure.
>>>
>>>Onset of smoke emission is a common sign.
>>>
>>>
>>>>Cars do, for some failure modes, like low oil level.
>>>
>>>The industrial method for big stuff is accelerometers attached near
>>>the bearings, and listen for excessive rotation-correlated (not
>>>necessarily harmonic) noise.
>>
>>Big ships that I've worked on have a long propeller shaft in the shaft
>>alley, a long tunnel where nobody often goes. They have magnetic shaft
>>runout sensors and shaft bearing temperature monitors.
>>
>>They measure shaft torque and SHP too, from the shaft twist.
>
>Yep. And these kinds of things fail slowly. At first.

They could repair a bearing at sea, given a heads-up about violent
failure. A serious bearing failure on a single-screw machine means
getting a seagoing tug.

The main engine gearbox had padlocks on the covers.

There was also a chem lab to analyze oil and water and such, looking
for contaminamts that might suggest something going on.

>
>
>>I liked hiding out in the shaft alley. It was private and cool, that
>>giant shaft slowly rotating.
>
>Probably had a calming flowing water sound as well.

Yes, cool and beautiful and serene after the heat and noise and
vibration of the engine room. A quiet 32,000 horsepower.

It was fun being an electronic guru on sea trials of a ship full of
big hairy Popeye types. I, skinny gawky kid, got my own stateroom when
other tech reps slept in cots in the hold.

Have you noticed how many lumberjack types are afraid of electricity?
That can be funny.

>
>Joe Gwinn

Pages:1234
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor