Erle Robotics Python Networking Gitbook Free
Introduction
1. Introduction to Client/Server Networking
- 1.1. Virtualenv
- 1.2. Installing virtualenv in Erle
- 1.3. Create a virtual environment to test packages
2. Introduction to socket
- 2.1. What is socket?
- 2.2. Creating a Socket
- 2.3. Using sockets
- 2.4. Disconnecting
- 2.5. Non - blocking sockets
3. UDP and TCP
- 3.1. Addresses and port numbers
- 3.2. UDP
- 3.3. TCP
4. Socket names and DNS
- 4.1. Socket names
- 4.2. Five socket cordinates
- 4.3. IPv6
- 4.4. The getaddrinfo() function
- 4.5. A Sketch of How DNS Works
- 4.6. Using DNS
5. Network Data and Network Errors
- 5.1. Text and Encodings
- 5.2. Network Byte Order
- 5.3. Framing and Quoting
- 5.4. Pickles and Self-Delimiting Formats
- 5.5. XML, JSON, Etc.
- 5.6. Compression
- 5.7. Network Exceptions
- 5.8. Handling Exceptions
6. TLS and SSL
- 6.1. Cleartext on the Network
- 6.2. TLS Encrypts Your Conversations
- 6.3. Supporting TLS in Python
- 6.4. The Standard SSL Module
7. Server Architecture
- 7.1. Daemons and Logging
- 7.2. Introductory example
- 7.3. Elementary client
- 7.4. Event-Driven Servers
- 7.5. The Semantics of Non-blocking
- 7.6. Twisted Python
- 7.7. Threading and Multi-processing
- 7.8. Threading and Multi-processing Frameworks
8. Caches, Message Queues, and Map-Reduce
- 8.1. Using Memcached
- 8.2. Memcached and Sharding
- 8.3. Message Queues
- 8.4. Using Message Queues from Python
- 8.5. Map-Reduce
9. HTTP
- 9.1. URL Anatomy
- 9.2. Relative URLs
- 9.3. Instrumenting urllib2
- 9.4. The GET Method and The Host Header
- 9.5. Payloads and Persistent Connections
- 9.6. POST And Forms
- 9.7. REST And More HTTP Methods
- 9.8. Identifying User Agents and Web Servers
- 9.9. Content Type Negotiation
- 9.10. Compression
- 9.11. HTTP Caching
- 9.12. The HEAD Method
- 9.13. HTTPS Encryption
- 9.14. HTTP Authentication
- 9.15. Cookies
- 9.16. HTTP Session Hijacking
- 9.17. Cross-Site Scripting Attacks
10. Screen Scraping
- 10.1. Fetching Web Pages
- 10.2. Downloading Pages Through Form Submission
- 10.3. The Structure of Web Pages
- 10.4. Three Axes
- 10.5. Diving into an HTML Document
- 10.6. Selectors
11. Web Applications
- 11.1. Web Servers and Python
- 11.2. Choosing a Web Server
- 11.3. WSGI
- 11.4. WSGI Middleware
- 11.5. Python Web Frameworks
- 11.6. URL Dispatch Techniques
- 11.7. Templates
- 11.8. Pure-Python Web Servers
- 11.9. Common Gateway Interface (CGI)
- 11.10. mod_python
12. E-mail Composition and Decoding
- 12.1. E-mail Messages
- 12.2. Composing Traditional Messages
- 12.3. Parsing Traditional Messages
- 12.4. Parsing Dates
- 12.5. Understanding MIME
- 12.6. Composing MIME Attachments
- 12.7. MIME Alternative Parts
- 12.8. Composing Non-English Headers
- 12.9. Composing Nested Multiparts
- 12.10. Parsing MIME Messages
- 12.11. Decoding Headers
13. Simple Mail Transport Protocol (SMTP)
- 13.1. E-mail Clients, Webmail Services
- 13.2. How SMTP Is Used
- 13.3. Sending E-Mail
- 13.4. Introducing the SMTP Library
- 13.5. Error Handling and Conversation Debugging
- 13.6. Getting Information from EHLO
- 13.7. Using Secure Sockets Layer and Transport Layer Security
- 13.8. Authenticated SMTP
14. Post Office Protocol (POP)
- 14.1. Connecting and Authenticating
- 14.2. Obtaining Mailbox Information
- 14.3. Downloading and Deleting Messages
15. Internet Message Access Protocol (IMAP)
- 15.1. Understanding IMAP in Python
- 15.2. IMAPClient
- 15.3. Message Numbers vs. UIDs
- 15.4. Summary Information
- 15.5. Downloading an Entire Mailbox
- 15.6. Downloading Messages Individually
- 15.7. Flagging and Deleting Messages
- 15.8. Searching and Manipulating Messages
16. Telnet and SSH
- 16.1. Command-Line Automation
- 16.2. Command-Line Expansion and Quoting
- 16.3. Unix Has No Special Characters
- 16.4. Quoting Characters for Protection
- 16.5. Things Are Different in a Terminal
- 16.6. Terminals Do Buffering
- 16.7. Telnet
- 16.8. SSH: The Secure Shell
- 16.9. SSH Host Keys
- 16.10. SSH Authentication
- 16.11. Shell Sessions and Individual Commands
- 16.12. SFTP: File Transfer Over SSH
17. File Transfer Protocol (FTP)
- 17.1. What to Use Instead of FTP
- 17.2. Communication Channels
- 17.3. Using FTP in Python
- 17.4. ASCII and Binary Files
- 17.5. Advanced Binary Downloading
- 17.6. Uploading Data
- 17.7. Advanced Binary Uploading
- 17.8. Handling Errors
- 17.9. Detecting Directories and Recursive Download
- 17.10. Creating Directories, Deleting Things
18. Remote Procedure Call (RPC)
- 18.1. Features of RPC
- 18.2. XML-RPC
- 18.3. JSON-RPC
- 18.4. Self-documenting Data
- 18.5. Talking About Objects: Pyro and RPyC
- 18.6. An RPyC Example
- 18.7. RPC, Web Frameworks, Message Queues

Erle Robotics Python Networking Gitbook Free

Three Axes

Parsing HTML with Python requires three choices:

The parser you will use to digest the HTML, and try to make sense of its tangle of opening and closing tags.

The API(Application Programming Interface) by which your Python program will access the tree of concentric elements that the parser built from its analysis of the HTML page.

What kinds of selectors you will be able to write to jump directly to the part of the page that interests you, instead of having to step into the hierarchy one element at a time.

The issue of selectors is a very important one, because a well-written selector can unambiguously identify an HTML element that interests you without your having to touch any of the elements above it in the document tree.

Now, I should pause for a second to explain terms like “deeper,” and I think the concept will be clearest if we reconsider the unordered list that was quoted in the previous section. An experienced web developer looking at that list rearranges it in her head, so that this is what it looks like:

First

Second

Third

Fourth

<ul>
<li>First</li>
<li>Second</li>
<li>Third</li>
<li>Fourth</li>
</ul>

Here the <ul> element is said to be a “parent” element of the individual list items, which “wraps” them and which is one level “above” them in the whole document. The<li> elements are “siblings” of one another; each is a “child” of the <ul> element that “contains” them, and they sit “below” their parent in the larger document tree. This kind of spatial thinking winds up being very important for working your way into a document through an API.

In brief, here are your choices along each of the three axes that were just listed:

The most powerful, flexible, and fastest parser at the moment appears to be the HTMLParser that comes with lxml; the next most powerful is the longtime favorite BeautifulSoup ; and coming in dead last are the parsing classes included with the Python Standard Library, which no one seems to use for serious screen scraping.

The best API for manipulating a tree of HTML elements is ElementTree, which has been brought into the Standard Library for use with the Standard Library parsers, and is also the API supported by lxml; BeautifulSoup supports an API peculiar to itself; and a pair of ancient, ugly, event-based interfaces to HTML still exist in the Python Standard Library.

The lxml library supports two of the major industry-standard selectors: CSS selectors and XPath query language; BeautifulSoup has a selector system all its own, but one that is very powerful and has powered countless web-scraping programs over the years.