streaming-form-data
streaming_form_data
provides a Python parser for parsing multipart/form-data
input
chunks (the most commonly used encoding when submitting data through HTML forms).
Chunk size is determined by the API user, but currently there are no restrictions on what the chunk size should be, since the parser works byte-by-byte (which means that passing the entire input as a single chunk should also work).
Installation
$ pip install streaming_form_data
The core parser is written in Cython
, which is a superset of Python that compiles the
input down to a C extension which can then be imported in normal Python code.
The compiled C parser code is included in the PyPI package, hence the installation requires a working C compiler.
Usage
>>> from streaming_form_data import StreamingFormDataParser
>>> from streaming_form_data.targets import ValueTarget, FileTarget, NullTarget
>>>
>>> headers = {'Content-Type': 'multipart/form-data; boundary=boundary'}
>>>
>>> parser = StreamingFormDataParser(headers=headers)
>>>
>>> parser.register('name', ValueTarget())
>>> parser.register('file', FileTarget('/tmp/file.txt'))
>>> parser.register('discard-me', NullTarget())
>>>
>>> for chunk in request.body:
... parser.data_received(chunk)
...
>>>
Usage can broadly be split into three stages.
1. Initialization
The StreamingFormDataParser
class expects a dictionary of HTTP request headers when
being instantiated. These headers are used to determine the input Content-Type
and a
few other metadata.
Optionally, you can enable strict mode in the parser by setting the strict
keyword
argument to True
. In strict mode, the parser throws UnexpectedPartException
if it
starts to parse a field whose name has not been registered. When not in strict mode,
unexpected parts are silently ignored.
2. Input Registration
HTML forms typically have multiple fields. For instance, a form could have a text input
field called name
and a file input field called file
.
This needs to be communicated to the parser using the parser.register
function. This
function expects two arguments - the name of the input field, and the associated
Target
class (which determines how the input should be handled).
For instance, if you want to store the contents of the name
field in an in-memory
variable, and the file
field in a file on disk, you can tell this to the parser as
follows.
>>> name_target = ValueTarget()
>>> file_target = FileTarget('/tmp/file.dat')
>>>
>>> parser.register('name', name_target)
>>> parser.register('file', file_target)
Registering multiple targets is also supported.
>>> name_target = ValueTarget()
>>> sha256_target = SHA256Target()
>>>
>>> parser.register('file', name_target)
>>> parser.register('file', sha256_target)
In this case, the contents of the file
field would be streamed to both the
ValueTarget
as well as the SHA256Target
.
3. Streaming data
At this stage the parser has everything it needs to be able to work. Depending on what web framework you're using, just pass the actual HTTP request body to the parser, either one chunk at a time or the complete thing at once.
>> chunk = read_next_chunk() # depends on your web framework of choice
>>
>> parser.data_received(chunk)
API
StreamingFormDataParser
This class is the main entry point. It expects a dictionary of HTTP request headers
and has a keyword argument strict
. The headers are used to determine the input
Content-Type
and a few other metadata. The strict flag is used to enable or disable
the strict mode.
Target
classes
When registering inputs with the parser, instances of subclasses of the Target
class
should be used. These target classes ultimately determine what to do with the data.
ValueTarget
ValueTarget
objects hold the input in memory.
>>> target = ValueTarget()
FileTarget
FileTarget
objects stream the contents to a file on-disk.
>>> target = FileTarget('/tmp/file.txt')
DirectoryTarget
DirectoryTarget
objects stream the contents to a directory on-disk.
>>> target = DirectoryTarget('/tmp/uploads/')
SHA256Target
SHA256Target
objects calculate a SHA256
hash of the given input, and hold the result
in memory.
>>> target = SHA256Target()
NullTarget
NullTarget
objects discard the input completely.
>>> target = NullTarget()
S3Target
S3Target
objects stream the contents of a file to an S3 bucket.
>>> target = S3Target("s3://<bucket>/path/to/key", "wb")
CSVTarget
CSVTarget
objects process and release CSV lines in chunks.
>>> target = CSVTarget()
Custom Target
classes
It's possible to define custom targets for your specific use case by inheriting the
streaming_form_data.targets.BaseTarget
class and overriding the on_data_received
function.
>>> from streaming_form_data.targets import BaseTarget
>>>
>>> class CustomTarget(BaseTarget):
... def on_data_received(self, chunk):
... do_something(chunk)
If the Content-Disposition
header included the filename
directive, this value will
be available as the self.multipart_filename
attribute in Target
classes.
Similarly, if the Content-Type
header is available for the uploaded files, this value
will be available as the self.multipart_content_type
attribute in Target
classes.
Validator
classes
Target
classes accept a validator
callable when being instantiated. Every time
data_received
is called with a given chunk
, the target runs this chunk
through the
given callable.
This is useful for performing certain validation tasks like making sure the input size is not exceeding a certain value. This is shown in the following code snippet.
>>> from streaming_form_data.targets import ValueTarget
>>>
>>> target = ValueTarget(validator=MaxSizeValidator(100))
Exceptions
ParseFailedException
This exception is the base class of the streaming_form_data
exceptions. It can be
raised during initialization, registering parts or reading chunks.
UnexpectedPartException
This exception is raised when the parser is in strict mode and starts to parse an
unexpected part. It contains part_name
attribute to check the name of the unexpected
part. In can only be raised from data_received
.
>>> try:
>>> parser.data_received(chunk)
>>> except streaming_form_data.parser.UnexpectedPartException as e:
>>> print(e.part_name)
>>> raise
Examples
Bottle
- https://git.io/vhCUyFlask
- https://git.io/fjPoATornado
- https://git.io/vhCUM
If you'd like to document usage with another web framework (which ideally allows chunked HTTP reads), please open an issue or a pull request.