Add a brief missive on ChatGPT

This was inspired by some discussion of using ChatGPT for Serious Coding Business(tm)
2023-01-09 15:05:28 -08:00 · 2023-01-09 15:05:28 -08:00 · 4cd2cc707a
parent 020301d399
commit 4cd2cc707a
1 changed files with 57 additions and 0 deletions
--- a/_posts/2023-01-09-chatgpt-and-your-ip.md
+++ b/_posts/2023-01-09-chatgpt-and-your-ip.md
@ -0,0 +1,57 @@
+---
+layout: post
+title: "ChatGPT and your intellectual property"
+tags:
+- software
+- ml
+- opinion
+---
+
+There is an excessive number [ChatGPT](https://en.wikipedia.org/wiki/ChatGPT)
+screenshots littering social media right now, and not nearly enough critical
+thinking about feeding data into this novel new chatbot. An anecdotal survey of
+my timeline includes people asking ChatGPT to solve math equations, write
+emails for them, create short story prompts, identify bugs in code, or even
+generate code for them. Behold, the power of AI!
+
+ChatGPT is created by [OpenAI](https://openai.com/blog/chatgpt/), which despite
+the name is *not* any form of "open" organization, but rather a startup which
+has been [considering funding at a pretty monstrous
+valuation](https://siliconangle.com/2023/01/05/openai-startup-behind-chatgpt-discusses-tender-offer-value-29b).
+In essence, ChatGPT is an AI tool trained on a large corpus of public and
+proprietary information, packaged up as a kooky chatbot.
+
+Fine. Setting aside my own annoyance with ML developers co-opting data from
+"the commons", fine.
+
+The zeal with which most people are dumping information into ChatGPT really
+concerns me however. I have seen a number of people feeding their own source
+code into ChatGPT to ask it to find bugs or security holes.  It would be
+foolish to assume that the inputs into ChatGPT are not _also used to train
+ChatGPT_, or at least the next generations of the model.
+
+I am certainly no lawyer, but the two primary problems here are:
+
+* Most developers are not authorized to disclose proprietary information of
+  their employers. Pasting source code into _any_ browser window creates a
+  liability, but a browser window with ChatGPT increases the likelihood that
+  the source code disclosed will be _reproduced_ in the future, for some other
+  user of the system. Uh oh!
+* Can the code _generated_ by ChatGPT could be considered _yours_? Who actually
+  owns the copyright to machine generated code, or machine generated anything
+  for that matter? Do the architects of the system own it, or the users
+  supplying the inputs? This particular wrinkle isn't unique to ChatGPT, but
+  any ML tool generating data which occupies a space adjacent to human created,
+  and copyrighted works.
+ 
+ 
+My concerns with what OpenAI is doing with this data is not tin-foil paranoia.
+[Adobe is catching
+grief](https://news.yahoo.com/adobe-using-photos-train-ai-001413408.html) for
+opting Lightroom users _in_ to train their AI with those users copyrighted or
+proprietary works.
+
+I am sure the legal system will catch up to the rapid evolution of these ML
+robber barons, but until then I think we should all be _very_ weary of feeding
+intellectual property to these systems.
+