Author: William Thompson (2017-02-20)
This is a brief summary of a concept involving the automated process of artificial intelligence programming and automated blind data population tasking via third party distribution of captcha-based security measures.
[UPDATE] – 2017-02-20: While linking out to the reference data provided in this article I came across information provided by Google which confirmed that, at least in part, the ideas expressed herein have in fact been in use by Google. The following information is stated on the reCAPTCHA landing page:
“Millions of CAPTCHAs are solved by people every day. reCAPTCHA makes positive use of this human effort by channeling the time spent solving CAPTCHAs into digitizing text, annotating images, and building machine learning datasets. This in turn helps preserve books, improve maps, and solve hard AI problems.”
With very few exceptions, several basic categories of objects are deliberately obscured in the resultant live Google Street View images.
- Human faces and identities
- License plate numbers
- House and building numbers
- Brand names, billboards, etc.
Captcha challenges (reCAPTCHA as presented by Google) are often presented to website users in an effort to filter genuine website users from malicious automated scripts and bots which have been directed to abuse the services provided on such websites or to carry out denial of service attacks (DDoS). These captcha challenges include the following required inputs for access to be granted to the sites and services of which they are used to protect:
- Clicking specific images of natural or urban based objects or items out of a grid of photographed objects and items.
- Entering alphanumeric characters exactly as they are seen in various low resolution pictures.
- Tracing the outlines of various objects such as aircraft, road signs, etc
- Upon completion of captcha challenges, users are occasionally asked to complete additional such challenges before gaining access to their intended online destination, even when no discernible errors have been made.
CONCLUSIONS AND USE CASES
New captcha challenges are constructed from snipping content from photos taken for the Google Street View
The removed content is then divided into multiple images so as to obscure the origins of such images, however data related to the true origins of all images (such as date, time, and geolocation data) are encrypted and stored securely.
Image segments are then presented as free API driven captcha challenge plugins and scripts to webmasters under the pretense of higher DDoS security etc.
Users visiting sites containing these captcha challenges then input responses which are checked (and recorded) through the API connection.
Automated user-driven validation is implemented in the following manner:
- All newly released captcha challenges are presented as part of a dual captcha challenge with the secondary challenge consisting of older data which has been correlated to more substantial numbers of answers.
- A measure of input validity is derived from the historical set of input data. A directly proportional relationship is drawn between the (1) quantities of historical data and (2) the certainty of valid inputs resulting in a progression of machine learning.
As historical data is compiled on each individual fragment of the images presented in captcha challenges, predetermined benchmarks of certainty are achieved. When these benchmarks are reached, the securely stored data which relates to the origin of these same image fragments is then correlated which initiates the utility of those images in further applications such as the following:
- Precise time-stamped geographical data is connected to every single license plate inadvertently captured by the mobile Google Street View car during map-making activities.(The raw images of license plates, which were not easily recorded into a database, were fed through the captcha system, allowing the data to be entered and validated by massive multi-user confirmations without such users being made aware of what final purpose their inputs where to be used for). The resulting correlated data is used to provide government agencies (or other entities willing to pay the price) with a list of known places and times a specific license plate was spotted.
- Machine learning is employed to learn the various shapes of signs and vehicles and other objects which were specified by users who had successfully completed captcha challenges on that same data. Artificial intelligence systems are then designed with this new learning which are able to complete tasks such as drive a car or vehicle enabled with driverless automated driving functionality. Additionally, the same data is employed in manufacturing various smart defense systems able to recognize, respond to, and even engage and neutralize vehicles matching a given description or profile.
- Street names, house numbers, and business locations can be precisely pinned onto the Google Maps product to provide more accurate GPS navigation to end users.
All of the ideas expressed in this article just popped into my head while I was brushing my teeth yesterday morning. I have no evidence that would suggest that the thoughts expressed in this document are true, although one might agree that the concepts expressed above would make for quite an extensive source of revenue if implemented properly.
Thanks for reading. Let me know your thoughts.
NOTE: Please review the UPDATE section at the top of this article prior to corresponding with me regarding this article. Thank you.