Some long diatribe about ML
This commit is contained in:
parent
5ee000a51a
commit
020301d399
|
@ -0,0 +1,98 @@
|
|||
---
|
||||
layout: post
|
||||
title: "The problem with ML"
|
||||
tags:
|
||||
- software
|
||||
- ml
|
||||
- aws
|
||||
- databricks
|
||||
---
|
||||
|
||||
|
||||
The holidays are the time of year when I typically field a lot of questions
|
||||
from relatives about technology or the tech industry, and this year my favorite
|
||||
questions were around **AI**. (*insert your own scary music*) Machine-learning
|
||||
(ML) or Artificial Intelligence (AI) are being widely deployed and I have some
|
||||
**Problems™** with that. Machine learning is not necessarily a new
|
||||
domain, the practices commonly accepted as "ML" have been used for quite a
|
||||
while to support search and recommendations use-cases. In fact, my day job
|
||||
includes supporting data scientists and those who are actively creating models
|
||||
and deploying them to production. _However_, many of my relatives outside of the tech industry believe that "AI" is going to replace people, their jobs, and/or run the future. I genuinely hope AI/ML comes nowhere close to this future imagined by members of my family.
|
||||
|
||||
|
||||
Like many pieces of technology, it is not inherently good or bad, but the
|
||||
problem with ML as it is applied today is that **its application is far
|
||||
outpacing our understanding of its consequences**.
|
||||
|
||||
Brian Kernighan, co-creator of the C programming language and UNIX, said:
|
||||
|
||||
> Everyone knows that debugging is twice as hard as writing a program in the
|
||||
> first place. So if you're as clever as you can be when you write it, how will
|
||||
> you ever debug it?
|
||||
|
||||
Setting aside the _mountain_ of ethical concerns around the application of ML
|
||||
which have and should continue to be discussed in the technology industry,
|
||||
there's a fundamental challenge with ML-based systems: I don't think their
|
||||
creators understand how they work, how their conclusions are determined, or how
|
||||
to consistently improve them over time. Imagine you are a data scientist or ML
|
||||
developer, how confident are you in what your models will predict between
|
||||
experiments or evolutions of the model? Would you be willing to testify in a
|
||||
court of law about the veracity of your model's output?
|
||||
|
||||
Imagine you are a developer working on the models that Tesla's "full
|
||||
self-driving" (FSD) mode relies upon. Your model has been implicated in a Tesla
|
||||
killing the driver and/or pedestrians (which [has
|
||||
happened](https://www.reuters.com/business/autos-transportation/us-probing-fatal-tesla-crash-that-killed-pedestrian-2021-09-03/)).
|
||||
Do you think it would be possible to convince a judge and jury that your model
|
||||
is _not_ programmed to mow down pedestrians outside of a crosswalk? How do you
|
||||
prove what a model is or is not supposed to do given never before seen inputs?
|
||||
|
||||
Traditional software _does_ have a variation of this problem but source code
|
||||
lends itself to scrutiny far better than the ML models. Many of which have come
|
||||
from successive evolutions of public training data, proprietary model changes,
|
||||
and integrations with new data sources.
|
||||
|
||||
These problems may be solvable in the ML ecosystem, but problem is that the
|
||||
application of ML is outpacing our ability to understand, monitor, and diagnose
|
||||
models when they do harm.
|
||||
|
||||
That model your startup is working on to help accelerate home loan approvals
|
||||
based on historical mortgages, how do you assert that your models are not
|
||||
re-introducing racist policies like
|
||||
[redlining](https://en.wikipedia.org/wiki/Redlining). (forms of this [have happened](https://fortune.com/2020/02/11/a-i-fairness-eye-on-a-i/)).
|
||||
|
||||
How about that fun image generation (AI art!) project you have been tinkering
|
||||
with uses a publicly available model that was trained on millions of images
|
||||
from the internet, and as a result in some cases unintentionally outputs
|
||||
explicit images, or even what some jurisdictions might consider bordering on
|
||||
child pornography. (forms of this [have
|
||||
happened](https://www.wired.com/story/lensa-artificial-intelligence-csem/)).
|
||||
|
||||
Really anything you teach based on the data "from the internet" is asking for
|
||||
racist, pornographic, or otherwise offensive results, as the [Microsoft
|
||||
Tay](https://www.cbsnews.com/news/microsoft-shuts-down-ai-chatbot-after-it-turned-into-racist-nazi/)
|
||||
example should have taught us.
|
||||
|
||||
|
||||
Can you imagine the human-rights nightmare that could ensue from shoddy ML
|
||||
models being brought into a healthcare setting? Law-enforcement? Or even
|
||||
military settings?
|
||||
|
||||
---
|
||||
|
||||
Machine-learning encompasses a very powerful set of tools and patterns, but our
|
||||
ability to predict how those models will be used, what they will output, or how
|
||||
to prevent negative outcomes are _dangerously_ insufficient for the use outside
|
||||
of search and recommendation systems.
|
||||
|
||||
I understand how models are developed, how they are utilized, and what I
|
||||
_think_ they're supposed to do.
|
||||
|
||||
Fundamentally the challenge with AI/ML is that we understand how to "make it
|
||||
work", but we don't understand _why_ it works.
|
||||
|
||||
Nonetheless we keep deploying "AI" anywhere there's funding, consequences be
|
||||
damned.
|
||||
|
||||
And that's a problem.
|
||||
|
Loading…
Reference in New Issue