Some long diatribe about ML

2023-01-04 17:04:14 -08:00 · 2023-01-04 17:04:14 -08:00 · 020301d399
parent 5ee000a51a
commit 020301d399
1 changed files with 98 additions and 0 deletions
--- a/_posts/2023-01-04-the-problem-with-ml.md
+++ b/_posts/2023-01-04-the-problem-with-ml.md
@ -0,0 +1,98 @@
+---
+layout: post
+title: "The problem with ML"
+tags:
+- software
+- ml
+- aws
+- databricks
+---
+
+
+The holidays are the time of year when I typically field a lot of questions
+from relatives about technology or the tech industry, and this year my favorite
+questions were around **AI**. (*insert your own scary music*) Machine-learning
+(ML) or Artificial Intelligence (AI) are being widely deployed and I have some
+**Problems&trade;** with that. Machine learning is not necessarily a new
+domain, the practices commonly accepted as "ML" have been used for quite a
+while to support search and recommendations use-cases. In fact, my day job
+includes supporting data scientists and those who are actively creating models
+and deploying them to production. _However_, many of my relatives outside of the tech industry believe that "AI" is going to replace people, their jobs, and/or run the future. I genuinely hope AI/ML comes nowhere close to this future imagined by members of my family.
+
+
+Like many pieces of technology, it is not inherently good or bad, but the
+problem with ML as it is applied today is that **its application is far
+outpacing our understanding of its consequences**.
+
+Brian Kernighan, co-creator of the C programming language and UNIX, said:
+
+> Everyone knows that debugging is twice as hard as writing a program in the
+> first place. So if you're as clever as you can be when you write it, how will
+> you ever debug it?
+
+Setting aside the _mountain_ of ethical concerns around the application of ML
+which have and should continue to be discussed in the technology industry,
+there's a fundamental challenge with ML-based systems: I don't think their
+creators understand how they work, how their conclusions are determined, or how
+to consistently improve them over time. Imagine you are a data scientist or ML
+developer, how confident are you in what your models will predict between
+experiments or evolutions of the model? Would you be willing to testify in a
+court of law about the veracity of your model's output?
+
+Imagine you are a developer working on the models that Tesla's "full
+self-driving" (FSD) mode relies upon. Your model has been implicated in a Tesla
+killing the driver and/or pedestrians (which [has
+happened](https://www.reuters.com/business/autos-transportation/us-probing-fatal-tesla-crash-that-killed-pedestrian-2021-09-03/)).
+Do you think it would be possible to convince a judge and jury that your model
+is _not_ programmed to mow down pedestrians outside of a crosswalk? How do you
+prove what a model is or is not supposed to do given never before seen inputs?
+
+Traditional software _does_ have a variation of this problem but source code
+lends itself to scrutiny far better than the ML models. Many of which have come
+from successive evolutions of public training data, proprietary model changes,
+and integrations with new data sources.
+
+These problems may be solvable in the ML ecosystem, but problem is that the
+application of ML is outpacing our ability to understand, monitor, and diagnose
+models when they do harm.
+
+That model your startup is working on to help accelerate home loan approvals
+based on historical mortgages, how do you assert that your models are not
+re-introducing racist policies like
+    [redlining](https://en.wikipedia.org/wiki/Redlining). (forms of this [have happened](https://fortune.com/2020/02/11/a-i-fairness-eye-on-a-i/)).
+
+How about that fun image generation (AI art!) project you have been tinkering
+with uses a publicly available model that was trained on millions of images
+from the internet, and as a result in some cases unintentionally outputs
+explicit images, or even what some jurisdictions might consider bordering on
+child pornography. (forms of this [have
+happened](https://www.wired.com/story/lensa-artificial-intelligence-csem/)).
+
+Really anything you teach based on the data "from the internet" is asking for
+racist, pornographic, or otherwise offensive results, as the [Microsoft
+Tay](https://www.cbsnews.com/news/microsoft-shuts-down-ai-chatbot-after-it-turned-into-racist-nazi/)
+example should have taught us.
+
+
+Can you imagine the human-rights nightmare that could ensue from shoddy ML
+models being brought into a healthcare setting? Law-enforcement? Or even
+military settings?
+
+---
+
+Machine-learning encompasses a very powerful set of tools and patterns, but our
+ability to predict how those models will be used, what they will output, or how
+to prevent negative outcomes are _dangerously_ insufficient for the use outside
+of search and recommendation systems.
+
+I understand how models are developed, how they are utilized, and what I
+_think_ they're supposed to do.
+
+Fundamentally the challenge with AI/ML is that we understand how to "make it
+work", but we don't understand _why_ it works.
+
+Nonetheless we keep deploying "AI" anywhere there's funding, consequences be
+damned.
+
+And that's a problem.
+