Add a brief missive on ChatGPT

This was inspired by some discussion of using ChatGPT for Serious Coding
Business(tm)
This commit is contained in:
R Tyler Croy 2023-01-09 15:05:28 -08:00
parent 020301d399
commit 4cd2cc707a
No known key found for this signature in database
GPG Key ID: E5C92681BEF6CEA2
1 changed files with 57 additions and 0 deletions

View File

@ -0,0 +1,57 @@
---
layout: post
title: "ChatGPT and your intellectual property"
tags:
- software
- ml
- opinion
---
There is an excessive number [ChatGPT](https://en.wikipedia.org/wiki/ChatGPT)
screenshots littering social media right now, and not nearly enough critical
thinking about feeding data into this novel new chatbot. An anecdotal survey of
my timeline includes people asking ChatGPT to solve math equations, write
emails for them, create short story prompts, identify bugs in code, or even
generate code for them. Behold, the power of AI!
ChatGPT is created by [OpenAI](https://openai.com/blog/chatgpt/), which despite
the name is *not* any form of "open" organization, but rather a startup which
has been [considering funding at a pretty monstrous
valuation](https://siliconangle.com/2023/01/05/openai-startup-behind-chatgpt-discusses-tender-offer-value-29b).
In essence, ChatGPT is an AI tool trained on a large corpus of public and
proprietary information, packaged up as a kooky chatbot.
Fine. Setting aside my own annoyance with ML developers co-opting data from
"the commons", fine.
The zeal with which most people are dumping information into ChatGPT really
concerns me however. I have seen a number of people feeding their own source
code into ChatGPT to ask it to find bugs or security holes. It would be
foolish to assume that the inputs into ChatGPT are not _also used to train
ChatGPT_, or at least the next generations of the model.
I am certainly no lawyer, but the two primary problems here are:
* Most developers are not authorized to disclose proprietary information of
their employers. Pasting source code into _any_ browser window creates a
liability, but a browser window with ChatGPT increases the likelihood that
the source code disclosed will be _reproduced_ in the future, for some other
user of the system. Uh oh!
* Can the code _generated_ by ChatGPT could be considered _yours_? Who actually
owns the copyright to machine generated code, or machine generated anything
for that matter? Do the architects of the system own it, or the users
supplying the inputs? This particular wrinkle isn't unique to ChatGPT, but
any ML tool generating data which occupies a space adjacent to human created,
and copyrighted works.
My concerns with what OpenAI is doing with this data is not tin-foil paranoia.
[Adobe is catching
grief](https://news.yahoo.com/adobe-using-photos-train-ai-001413408.html) for
opting Lightroom users _in_ to train their AI with those users copyrighted or
proprietary works.
I am sure the legal system will catch up to the rapid evolution of these ML
robber barons, but until then I think we should all be _very_ weary of feeding
intellectual property to these systems.