зеркало из https://github.com/mozilla/gecko-dev.git
28 строки
977 B
Plaintext
28 строки
977 B
Plaintext
|
check HTTP error codes on 1st line
|
||
|
deal with content type "text/html "
|
||
|
take stats on domain names e.g. foo.co.kr, www.bar.com
|
||
|
URL char stats e.g. 8-bit, escaped 8-bit, etc
|
||
|
hierachical tag and attribute stats, not flat attr space
|
||
|
more checking in ISO 2022 code
|
||
|
detect UCS-2, UCS-4
|
||
|
deal with multiple charset parameters in one content-type
|
||
|
FRAME SRC URLs
|
||
|
IMG SRC URLs
|
||
|
other URLs?
|
||
|
NNTP robot
|
||
|
FTP robot
|
||
|
DNS robot
|
||
|
IP robot
|
||
|
parse URLs properly a la RFC
|
||
|
improve hashing (grow tables, prime numbers)
|
||
|
parse <!doctype ...> where "..." appears as attribute-name-like thing
|
||
|
run purify to find memory leaks
|
||
|
use less memory in URL hash table (value not needed, only key needed)
|
||
|
use less memory in URL list (use array, remove processed URLs, randomize?)
|
||
|
get http://www.olelo.hawaii.edu/UTF8/index.html to work
|
||
|
(problem in io.c's read whole stream routine)
|
||
|
---
|
||
|
2/17/99
|
||
|
use nm to find all system calls, and do proper error checking on all of them
|
||
|
e.g. write() to catch SIGPIPE-like stuff(?)
|