Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
|
#include <disk-io-mgr.h>
Public Member Functions | |
ScanRange (int initial_capacity=-1) | |
The initial queue capacity for this. Specify -1 to use IoMgr default. More... | |
virtual | ~ScanRange () |
void | Reset (hdfsFS fs, const char *file, int64_t len, int64_t offset, int disk_id, bool try_cache, bool expected_local, int64_t mtime, void *metadata=NULL) |
void * | meta_data () const |
bool | try_cache () const |
bool | expected_local () const |
int | ready_buffers_capacity () const |
Status | GetNext (BufferDescriptor **buffer) |
void | Cancel (const Status &status) |
std::string | DebugString () const |
return a descriptive string for debug. More... | |
int64_t | mtime () const |
hdfsFS | fs () const |
const char * | file () const |
int64_t | offset () const |
int64_t | len () const |
int | disk_id () const |
RequestType::type | request_type () const |
T * | Next () const |
Returns the Next/Prev node or NULL if this is the end/front. More... | |
T * | Prev () const |
Static Public Attributes | |
static const int64_t | NEVER_CACHE = -1 |
If the mtime is set to NEVER_CACHE, the file handle should never be cached. More... | |
Protected Attributes | |
hdfsFS | fs_ |
Hadoop filesystem that contains file_, or set to NULL for local filesystem. More... | |
std::string | file_ |
Path to file being read or written. More... | |
int64_t | offset_ |
Offset within file_ being read or written. More... | |
int64_t | len_ |
Length of data read or written. More... | |
int | disk_id_ |
Id of disk containing file_;. More... | |
RequestType::type | request_type_ |
The type of IO request, READ or WRITE. More... | |
Private Member Functions | |
void | InitInternal (DiskIoMgr *io_mgr, RequestContext *reader) |
Initialize internal fields. More... | |
bool | EnqueueBuffer (BufferDescriptor *buffer) |
void | CleanupQueuedBuffers () |
bool | Validate () |
int64_t | MaxReadChunkSize () const |
Maximum length in bytes for hdfsRead() calls. More... | |
Status | Open () |
Opens the file for this range. This function only modifies state in this range. More... | |
void | Close () |
Closes the file for this range. This function only modifies state in this range. More... | |
Status | Read (char *buffer, int64_t *bytes_read, bool *eosr) |
Status | ReadFromCache (bool *read_succeeded) |
Private Attributes | |
void * | meta_data_ |
bool | try_cache_ |
bool | expected_local_ |
DiskIoMgr * | io_mgr_ |
RequestContext * | reader_ |
Reader/owner of the scan range. More... | |
union { | |
FILE * local_file_ | |
hdfsFile hdfs_file_ | |
}; | |
File handle either to hdfs or local fs (FILE*) More... | |
struct hadoopRzBuffer * | cached_buffer_ |
boost::mutex | lock_ |
int | bytes_read_ |
Number of bytes read so far for this scan range. More... | |
Status | status_ |
bool | eosr_queued_ |
If true, the last buffer for this scan range has been queued. More... | |
bool | eosr_returned_ |
If true, the last buffer for this scan range has been returned. More... | |
bool | blocked_on_queue_ |
boost::condition_variable | buffer_ready_cv_ |
std::list< BufferDescriptor * > | ready_buffers_ |
int | ready_buffers_capacity_ |
boost::mutex | hdfs_lock_ |
bool | is_cancelled_ |
If true, this scan range has been cancelled. More... | |
int64_t | mtime_ |
Last modified time of the file associated with the scan range. More... | |
Friends | |
class | DiskIoMgr |
ScanRange description. The caller must call Reset() to initialize the fields before calling AddScanRanges(). The private fields are used internally by the IoMgr.
Definition at line 295 of file disk-io-mgr.h.
DiskIoMgr::ScanRange::ScanRange | ( | int | initial_capacity = -1 | ) |
The initial queue capacity for this. Specify -1 to use IoMgr default.
Definition at line 204 of file disk-io-mgr-scan-range.cc.
References NEVER_CACHE, impala::DiskIoMgr::RequestType::READ, impala::DiskIoMgr::RequestRange::request_type_, and Reset().
|
virtual |
Definition at line 210 of file disk-io-mgr-scan-range.cc.
void DiskIoMgr::ScanRange::Cancel | ( | const Status & | status | ) |
Cancel this scan range. This cleans up all queued buffers and wakes up any threads blocked on GetNext(). Status is the reason the range was cancelled. Must not be ok(). Status is returned to the user in GetNext().
Definition at line 143 of file disk-io-mgr-scan-range.cc.
References impala::DiskIoMgr::DebugString(), impala::lock_, impala::Status::ok(), and impala::DiskIoMgr::Validate().
Referenced by impala::DiskIoMgr::RequestContext::Cancel(), impala::DiskIoMgr::HandleReadFinished(), and impala::DiskIoMgr::ReadRange().
|
private |
Cleanup any queued buffers (i.e. due to cancellation). This cannot be called with any locks taken.
Definition at line 165 of file disk-io-mgr-scan-range.cc.
References impala::DiskIoMgr::BufferDescriptor::Return().
|
private |
Closes the file for this range. This function only modifies state in this range.
Definition at line 301 of file disk-io-mgr-scan-range.cc.
References impala::ImpaladMetrics::IO_MGR_NUM_OPEN_FILES, impala::IsDfsPath(), impala::PrettyPrinter::Print(), and VLOG_FILE.
Referenced by impala::DiskIoMgr::HandleReadFinished(), and impala::DiskIoMgr::ReturnBuffer().
string DiskIoMgr::ScanRange::DebugString | ( | ) | const |
return a descriptive string for debug.
Definition at line 179 of file disk-io-mgr-scan-range.cc.
Referenced by EnqueueBuffer().
|
inlineinherited |
Definition at line 269 of file disk-io-mgr.h.
References impala::DiskIoMgr::RequestRange::disk_id_.
Referenced by impala::DiskIoMgr::RequestContext::AddRequestRange(), impala::HdfsParquetScanner::InitColumns(), impala::HdfsTextScanner::IssueInitialRanges(), impala::HdfsParquetScanner::IssueInitialRanges(), impala::BufferedBlockMgr::PinBlock(), impala::HdfsParquetScanner::ProcessFooter(), and impala::DiskIoMgr::RequestContext::ScheduleScanRange().
|
private |
Enqueues a buffer for this range. This does not block. Returns true if this scan range has hit the queue capacity, false otherwise.
Definition at line 36 of file disk-io-mgr-scan-range.cc.
References blocked_on_queue_, impala::DiskIoMgr::BufferDescriptor::buffer_, buffer_ready_cv_, DebugString(), impala::DiskIoMgr::BufferDescriptor::eosr(), eosr_queued_, eosr_returned_, io_mgr_, is_cancelled_, lock_, MIN_QUEUE_CAPACITY, impala::DiskIoMgr::RequestContext::num_buffers_in_reader_, impala::DiskIoMgr::num_buffers_in_readers_, impala::DiskIoMgr::RequestContext::num_ready_buffers_, impala::DiskIoMgr::RequestContext::num_used_buffers_, reader_, ready_buffers_, ready_buffers_capacity_, impala::DiskIoMgr::BufferDescriptor::Return(), and Validate().
Referenced by impala::DiskIoMgr::HandleReadFinished().
|
inline |
Definition at line 314 of file disk-io-mgr.h.
References expected_local_.
Referenced by impala::HdfsParquetScanner::InitColumns(), impala::HdfsTextScanner::IssueInitialRanges(), impala::HdfsParquetScanner::IssueInitialRanges(), and impala::HdfsParquetScanner::ProcessFooter().
|
inlineinherited |
Definition at line 266 of file disk-io-mgr.h.
References impala::DiskIoMgr::RequestRange::file_.
Referenced by impala::ScannerContext::Stream::filename(), impala::HdfsParquetScanner::InitColumns(), impala::BufferedBlockMgr::PinBlock(), impala::HdfsParquetScanner::ProcessFooter(), impala::HdfsScanNode::ScannerThread(), impala::HdfsParquetScanner::ValidateColumn(), and impala::DiskIoMgr::Write().
|
inlineinherited |
Definition at line 265 of file disk-io-mgr.h.
References impala::DiskIoMgr::RequestRange::fs_.
Referenced by impala::HdfsParquetScanner::InitColumns(), and impala::HdfsParquetScanner::ProcessFooter().
Status DiskIoMgr::ScanRange::GetNext | ( | BufferDescriptor ** | buffer | ) |
Returns the next buffer for this scan range. buffer is an output parameter. This function blocks until a buffer is ready or an error occurred. If this is called when all buffers have been returned, *buffer is set to NULL and Status::OK is returned. Only one thread can be in GetNext() at any time.
Definition at line 70 of file disk-io-mgr-scan-range.cc.
References impala::Cancel(), impala::DiskIoMgr::RequestContext::Cancelled, impala::DiskIoMgr::DebugString(), impala::DiskIoMgr::BufferDescriptor::eosr(), impala::lock_, MAX_QUEUE_CAPACITY, impala::Status::OK, impala::Status::ok(), and impala::DiskIoMgr::Validate().
Referenced by impala::DiskIoMgrStress::ClientThread(), impala::BufferedBlockMgr::PinBlock(), impala::DiskIoMgr::Read(), impala::TEST_F(), and impala::DiskIoMgrTest::ValidateScanRange().
|
private |
Initialize internal fields.
Definition at line 233 of file disk-io-mgr-scan-range.cc.
References impala::DiskIoMgr::DebugString(), impala::DiskIoMgr::RequestContext::initial_scan_range_queue_capacity(), MIN_QUEUE_CAPACITY, and impala::DiskIoMgr::Validate().
|
inlineinherited |
Definition at line 268 of file disk-io-mgr.h.
References impala::DiskIoMgr::RequestRange::len_.
Referenced by impala::DiskIoMgr::AddWriteRange(), impala::ScannerContext::Stream::bytes_left(), impala::ScannerContext::Stream::eosr(), impala::BufferedBlockMgr::PinBlock(), impala::DiskIoMgr::Read(), impala::HdfsScanNode::ScannerThread(), and impala::DiskIoMgrTest::ValidateSyncRead().
|
private |
Maximum length in bytes for hdfsRead() calls.
Definition at line 343 of file disk-io-mgr-scan-range.cc.
References impala::IsS3APath().
|
inline |
Definition at line 312 of file disk-io-mgr.h.
References meta_data_.
Referenced by impala::HdfsTextScanner::IssueInitialRanges(), impala::HdfsParquetScanner::IssueInitialRanges(), and impala::HdfsScanNode::ScannerThread().
|
inline |
Definition at line 333 of file disk-io-mgr.h.
References mtime_.
Referenced by impala::DiskIoMgrTest::InitRange().
|
inlineinherited |
Returns the Next/Prev node or NULL if this is the end/front.
Definition at line 48 of file internal-queue.h.
References impala::InternalQueue< T >::lock_, impala::InternalQueue< T >::Node::next, and impala::InternalQueue< T >::Node::parent_queue.
Referenced by impala::TEST(), and impala::BufferedBlockMgr::Validate().
|
inlineinherited |
Definition at line 267 of file disk-io-mgr.h.
References impala::DiskIoMgr::RequestRange::offset_.
Referenced by impala::DiskIoMgrStress::ClientThread(), impala::ScannerContext::Stream::file_offset(), impala::HdfsTextScanner::FindFirstTuple(), impala::HdfsTextScanner::IssueInitialRanges(), impala::HdfsParquetScanner::IssueInitialRanges(), impala::BufferedBlockMgr::PinBlock(), impala::HdfsScanNode::ScannerThread(), impala::TEST_F(), impala::DiskIoMgrTest::ValidateScanRange(), and impala::DiskIoMgr::WriteRangeHelper().
|
private |
Opens the file for this range. This function only modifies state in this range.
Definition at line 251 of file disk-io-mgr-scan-range.cc.
References impala::Status::CANCELLED, impala::GetHdfsErrorMsg(), impala::GetStrErrMsg(), impala::ImpaladMetrics::IO_MGR_NUM_OPEN_FILES, impala::Status::OK, and VLOG_FILE.
Referenced by impala::DiskIoMgr::ReadRange().
|
inlineinherited |
Definition at line 52 of file internal-queue.h.
References impala::InternalQueue< T >::lock_, impala::InternalQueue< T >::Node::parent_queue, and impala::InternalQueue< T >::Node::prev.
Referenced by impala::TEST().
Reads from this range into 'buffer'. Buffer is preallocated. Returns the number of bytes read. Updates range to keep track of where in the file we are.
Definition at line 360 of file disk-io-mgr-scan-range.cc.
References impala::Status::CANCELLED, impala::GetHdfsErrorMsg(), impala::GetStrErrMsg(), and impala::Status::OK.
Referenced by impala::DiskIoMgr::ReadRange().
Reads from the DN cache. On success, sets cached_buffer_ to the DN buffer and *read_succeeded to true. If the data is not cached, returns ok() and *read_succeeded is set to false. Returns a non-ok status if it ran into a non-continuable error.
Definition at line 404 of file disk-io-mgr-scan-range.cc.
References impala::Status::CANCELLED, COUNTER_ADD, impala::DiskIoMgr::BufferDescriptor::eosr_, impala::DiskIoMgr::BufferDescriptor::len_, impala::Status::OK, impala::Status::ok(), and impala::DiskIoMgr::BufferDescriptor::scan_range_offset_.
Referenced by impala::DiskIoMgr::AddScanRanges().
|
inline |
Definition at line 315 of file disk-io-mgr.h.
References ready_buffers_capacity_.
|
inlineinherited |
Definition at line 270 of file disk-io-mgr.h.
References impala::DiskIoMgr::RequestRange::request_type_.
Referenced by impala::DiskIoMgr::RequestContext::AddRequestRange(), impala::DiskIoMgr::RequestContext::Cancel(), and impala::DiskIoMgr::WorkLoop().
void DiskIoMgr::ScanRange::Reset | ( | hdfsFS | fs, |
const char * | file, | ||
int64_t | len, | ||
int64_t | offset, | ||
int | disk_id, | ||
bool | try_cache, | ||
bool | expected_local, | ||
int64_t | mtime, | ||
void * | metadata = NULL |
||
) |
Resets this scan range object with the scan range description. The scan range must fall within the file bounds (offset >= 0 and offset + len <= file_length). Resets this scan range object with the scan range description.
Definition at line 215 of file disk-io-mgr-scan-range.cc.
References offset.
Referenced by impala::HdfsScanNode::AllocateScanRange(), impala::DiskIoMgrTest::InitRange(), impala::DiskIoMgrStress::NewClient(), impala::BufferedBlockMgr::PinBlock(), ScanRange(), and impala::DiskIoMgrTest::WriteValidateCallback().
|
inline |
Definition at line 313 of file disk-io-mgr.h.
References try_cache_.
Referenced by impala::HdfsParquetScanner::InitColumns(), impala::HdfsTextScanner::IssueInitialRanges(), impala::HdfsParquetScanner::IssueInitialRanges(), and impala::HdfsParquetScanner::ProcessFooter().
|
private |
Validates the internal state of this range. lock_ must be taken before calling this.
Definition at line 189 of file disk-io-mgr-scan-range.cc.
Referenced by EnqueueBuffer().
|
friend |
Definition at line 336 of file disk-io-mgr.h.
union { ... } |
File handle either to hdfs or local fs (FILE*)
|
private |
If true, this scan range has been removed from the reader's in_flight_ranges queue because the ready_buffers_ queue is full.
Definition at line 423 of file disk-io-mgr.h.
Referenced by EnqueueBuffer(), and impala::DiskIoMgr::ReadRange().
|
private |
IO buffers that are queued for this scan range. Condition variable for GetNext
Definition at line 427 of file disk-io-mgr.h.
Referenced by EnqueueBuffer().
|
private |
Number of bytes read so far for this scan range.
Definition at line 408 of file disk-io-mgr.h.
Referenced by impala::DiskIoMgr::ReadRange().
|
private |
If non-null, this is DN cached buffer. This means the cached read succeeded and all the bytes for the range are in this buffer.
Definition at line 401 of file disk-io-mgr.h.
Referenced by impala::DiskIoMgr::HandleReadFinished(), and impala::DiskIoMgr::ReturnBuffer().
|
protectedinherited |
Id of disk containing file_;.
Definition at line 286 of file disk-io-mgr.h.
Referenced by impala::DiskIoMgr::RequestRange::disk_id(), impala::DiskIoMgr::HandleWriteFinished(), and impala::DiskIoMgr::ValidateScanRange().
|
private |
If true, the last buffer for this scan range has been queued.
Definition at line 416 of file disk-io-mgr.h.
Referenced by EnqueueBuffer().
|
private |
If true, the last buffer for this scan range has been returned.
Definition at line 419 of file disk-io-mgr.h.
Referenced by EnqueueBuffer().
|
private |
If true, we expect this scan range to be a local read. Note that if this is false, it does not necessarily mean we expect the read to be remote, and that we never create scan ranges where some of the range is expected to be remote and some of it local. TODO: we can do more with this
Definition at line 386 of file disk-io-mgr.h.
Referenced by expected_local().
|
protectedinherited |
Path to file being read or written.
Definition at line 277 of file disk-io-mgr.h.
Referenced by impala::DiskIoMgr::RequestRange::file(), impala::DiskIoMgr::Write(), and impala::DiskIoMgr::WriteRangeHelper().
|
protectedinherited |
Hadoop filesystem that contains file_, or set to NULL for local filesystem.
Definition at line 274 of file disk-io-mgr.h.
Referenced by impala::DiskIoMgr::RequestRange::fs().
hdfsFile impala::DiskIoMgr::ScanRange::hdfs_file_ |
Definition at line 396 of file disk-io-mgr.h.
|
private |
Lock that should be taken during hdfs calls. Only one thread (the disk reading thread) calls into hdfs at a time so this lock does not have performance impact. This lock only serves to coordinate cleanup. Specifically it serves to ensure that the disk threads are finished with HDFS calls before is_cancelled_ is set to true and cleanup starts. If this lock and lock_ need to be taken, lock_ must be taken first.
Definition at line 442 of file disk-io-mgr.h.
|
private |
Definition at line 388 of file disk-io-mgr.h.
Referenced by EnqueueBuffer().
|
private |
If true, this scan range has been cancelled.
Definition at line 445 of file disk-io-mgr.h.
Referenced by EnqueueBuffer(), and impala::DiskIoMgr::ReturnBuffer().
|
protectedinherited |
Length of data read or written.
Definition at line 283 of file disk-io-mgr.h.
Referenced by impala::DiskIoMgr::RequestRange::len(), impala::DiskIoMgr::ReadRange(), and impala::DiskIoMgr::WriteRangeHelper().
FILE* impala::DiskIoMgr::ScanRange::local_file_ |
Definition at line 395 of file disk-io-mgr.h.
|
private |
Lock protecting fields below. This lock should not be taken during Open/Read/Close.
Definition at line 405 of file disk-io-mgr.h.
Referenced by EnqueueBuffer().
|
private |
Pointer to caller specified metadata. This is untouched by the io manager and the caller can put whatever auxiliary data in here.
Definition at line 374 of file disk-io-mgr.h.
Referenced by meta_data().
|
private |
Last modified time of the file associated with the scan range.
Definition at line 448 of file disk-io-mgr.h.
Referenced by mtime().
|
static |
If the mtime is set to NEVER_CACHE, the file handle should never be cached.
Definition at line 299 of file disk-io-mgr.h.
Referenced by impala::DiskIoMgrStress::NewClient(), impala::BufferedBlockMgr::PinBlock(), ScanRange(), and impala::DiskIoMgrTest::WriteValidateCallback().
|
protectedinherited |
Offset within file_ being read or written.
Definition at line 280 of file disk-io-mgr.h.
Referenced by impala::DiskIoMgr::RequestRange::offset().
|
private |
Reader/owner of the scan range.
Definition at line 391 of file disk-io-mgr.h.
Referenced by EnqueueBuffer().
|
private |
Definition at line 428 of file disk-io-mgr.h.
Referenced by EnqueueBuffer(), and impala::DiskIoMgr::ReadRange().
|
private |
The soft capacity limit for ready_buffers_. ready_buffers_ can exceed the limit temporarily as the capacity is adjusted dynamically. In that case, the capcity is only realized when the caller removes buffers from ready_buffers_.
Definition at line 434 of file disk-io-mgr.h.
Referenced by EnqueueBuffer(), and ready_buffers_capacity().
|
protectedinherited |
The type of IO request, READ or WRITE.
Definition at line 289 of file disk-io-mgr.h.
Referenced by impala::DiskIoMgr::RequestRange::request_type(), and ScanRange().
|
private |
Status for this range. This is non-ok if is_cancelled_ is true. Note: an individual range can fail without the RequestContext being cancelled. This allows us to skip individual ranges.
Definition at line 413 of file disk-io-mgr.h.
|
private |
If true, this scan range is expected to be cached. Note that this might be wrong since the block could have been uncached. In that case, the cached path will fail and we'll just put the scan range on the normal read path.
Definition at line 379 of file disk-io-mgr.h.
Referenced by impala::DiskIoMgr::AddScanRanges(), and try_cache().