form-extractor v3.0 [17 Oct 2024] by Dominic
Description
Linux/Cygwin program to extract form tags from html. Handy for hacking website form responses; for instance, can be called from another script with -a option to get 'magic' hidden form responses for resubmission using curl or wget.
Usage
form-extractor.sh [options] file_or_web_address
Options
-a - like -d / -F but determine automatically which is appropriate (-d or -F) (only for curl not wget i.e. not with -g)
-c "pathfilename" - use and save any cookies in "pathfilename"
-d - return only hidden and submit tags ready for curl including -d for content type application/x-www-form-urlencoded
-f - return only the action (i.e. page to be replied to)
-F - return only hidden and submit tags ready for curl including -F for content type multipart/form-data
-g - return only hidden and submit tags ready for wget, including --post-data=
-h - show this help and exit
-i "text" - provide unique identifying text within the form tag to identify the correct form on a page with multiple forms
-j - also show javascript tags
-l - show changelog and exit
-n - no header
-q - return only non-hidden tags in -a format
-u - debug mode (implementation may vary)
-x - skip remote identity verification (--no-check-certificate)
Example
# download a remote page containing a single login form, keep/re-use any cookies
curl -b /tmp/cookies -c /tmp/cookies -o /tmp/page.htm https://myloginpage.com
# use form-extractor.sh to extract all hidden tags on the form
HIDDENTAGS="$(./form-extractor.sh -a /tmp/page.htm)"
# add your own specific data
LOGINTAGS="-F username=houdini -F password=LetMeIn"
# login using any cookies, hidden tags and your specific data
curl -b /tmp/cookies -c /tmp/cookies $HIDDENTAGS $LOGINTAGS -o /tmp/page.htm https:://myloginpage.com
# logged in page saved at /tmp/page.htm, enjoy...
Dependencies
awk sed wget (wget is only required if retrieving from internet)
License
Copyright © 2024 Dominic Raferd. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Changelog
3.0 [17 Oct 2024]: change -q option to return non-hidden tags in -a format
2.9 [10 Oct 2024]: add -n option, make shellcheck-compatible
2.8 [26 May 2020]: add -f (action) option
2.7 [27 Mar 2018]: bugfix when there are spaces in values
2.6 [05 Jun 2017]: bugfix when there are spaces and quotes in values
2.5 [18 May 2017]: add -u (debug) option
2.4 [18 May 2016]: bugfix -i option
2.3 [15 Apr 2016]: use existing cookie file as well as saving there
2.2 [06 Nov 2015]: bugfix hidden/submit tag extraction
2.1 [28 Oct 2015]: more sophisticated hidden/submit tag extraction, better tidying up
2.0 [24 Aug 2015]: bugfixes
1.9 [08 Oct 2014]: bugfix for -a option, add example to help
1.8 [06 Oct 2014]: add submit to hidden tag responses
1.7 [02 Apr 2014]: add -a and -g options
1.6 [13 Mar 2014]: bugfix
1.5 [03 Feb 2014]: bugfix
1.4 [23 Jan 2014]: work with forms which don't use quotes
1.3 [02 Nov 2013]: first public release
1.2 [03 May 2012]: operate on remote web page as well as local files
1.1 [7 Feb 2011]
Download form-extractor.sh
Donation
I have provided this software free gratis and for nothing. If you would like to thank me with a contribution, please let me know and I will send you a link. Thank you!
My Other Sites
- TimeDicer - Onsite/offsite data backup for Windows (uses rdiff-backup)
- Finding a 4D Backup Solution
- Web Scraping How To - extracting data from web sites
My Programs
Here is a selection of some (other) programs I have written, most of which run under GNU/Linux from the command line (CLI), are freely available and can be obtained by clicking on the links. Dependencies are shown and while in most cases written and tested on an x86-based Linux server, they should run on a Raspberry Pi, and many can run under Windows using Windows Subsystem for Linux (WSL) or Cygwin. Email me if you have problems or questions, or if you think I could help with a programming requirement.
Backup Utilities
- TimeDicer - Onsite/offsite data backup for Windows (uses rdiff-backup) [ GNU/Linux & MS Windows©: 2008-20 ]
- rdiff-backup-regress - GNU/Linux script to regress an rdiff-backup archive. [ GNU/Linux: 2012-24 ]
Debian/Ubuntu kernel and LVM Utilities
- kernel-remove - GNU/Linux script to list the installed GNU/Linux kernels in a Debian-based distro (e.g. Ubuntu), and can be used to remove an unwanted kernel and related packages, updating grub appropriately. [ GNU/Linux-Debian/Ubuntu: 2010-24 ]
- lvm-usage - GNU/Linux script to show available disk space and how it is used; run as cron job to warn if usage is above a set percentage. Provides additional information if LVM is in use. [ GNU/Linux: 2012-24 ]
- lvm-delete-snapshot - GNU/Linux script to remove LVM snapshot that has been left over by another process. [ GNU/Linux: 2012-21 ]
- netnames - GNU/Linux script shows current name, biosdevname and 'predictable name' of network device - helps with network device name scheme migration. [ GNU/Linux-Debian/Ubuntu: 2020-20 ]
- lv-convert2cache - GNU/Linux script to convert an existing LV into a cache LV using a smaller faster device as a cache. [ GNU/Linux: 2022-24 ]
Miscellaneous Programs
- sleepwalker - Windows© program which can be run from a remote machine to 'wake up' a Windows© machine behind a router, wait for it to start and then initiate Remote Desktop session. [MS Windows©: 2008-22]
- numliststat - GNU/Linux program giving statistical value(s) for a piped-in list of numbers. [ GNU/Linux: 2022-24 ]
- relay-enforcer - GNU/Linux program enabling a postfix-based mail server relaying to Gmail to act on reports from Gmail about blocked emails. [ GNU/Linux: 2016-24 ]
- pdf-compress - GNU/Linux program to create smaller b/w pdf file from an original large pdf file, especially when original resulted from scanning. [ GNU/Linux: 2016-23 ]
- tiny-device-monitor - GNU/Linux program to test webpages (including password-protected) or machines to check they are live; use as a cron job for your own websites, for hardware presenting a webpage, or for any machines with a presence on your local LAN or on the internet. [ GNU/Linux: 2009-24 ]
- mythic-dns-sync - GNU/Linux program to update DNS record at mythic-beasts.com to match local external ip. [ GNU/Linux: 2016-23 ]
- saynoto0870 - For UK, a GNU/Linux script which performs automated lookup of the www.saynoto0870.com database, finding cheap or free geographic number replacements for expensive non-geographic (087* or 084*) numbers. [ GNU/Linux: 2012-12 ]
- bind9-resolved-switch - GNU/Linux program for switching permanently between using bind9 or systemd-resolved as the system DNS resolver. [ GNU/Linux: 2016-24 ]
- unlock - GNU/Linux remote program for easy entering of decrypt passphrase on a remote machine which has root dm-crypt+LUKS. [ GNU/Linux: 2017-18 ]
- wifi-updown - GNU/Linux program to take down wifi interface if there is a working wired interface (or restore wifi if not). [ GNU/Linux: 2018-23 ]
- routefix - GNU/Linux program to restore a default ip traffic route if there is none such (e.g. after running wifi-updown). [ GNU/Linux: 2018-23 ]
- dutree - GNU/Linux program to show a tree-style list of files and directories at the specified location which are greater than the specified size (default 1GB). [ GNU/Linux: 2012-24 ]
- Accounts - Multi-business multi-currency accounting software, uses Access [MS Windows©: 1996-2024]
- Rents Program - Residential lettings/landlord front office program, with many special features for UK market [MS Windows©: 1991-2024]
This section is closed. If you have a question, please submit it by email, thank you.