Impala
Impalaistheopensource,nativeanalyticdatabaseforApacheHadoop.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
impala::DiskIoMgr::ScanRange Class Reference

#include <disk-io-mgr.h>

Inheritance diagram for impala::DiskIoMgr::ScanRange:
Collaboration diagram for impala::DiskIoMgr::ScanRange:

Public Member Functions

 ScanRange (int initial_capacity=-1)
 The initial queue capacity for this. Specify -1 to use IoMgr default. More...
 
virtual ~ScanRange ()
 
void Reset (hdfsFS fs, const char *file, int64_t len, int64_t offset, int disk_id, bool try_cache, bool expected_local, int64_t mtime, void *metadata=NULL)
 
void * meta_data () const
 
bool try_cache () const
 
bool expected_local () const
 
int ready_buffers_capacity () const
 
Status GetNext (BufferDescriptor **buffer)
 
void Cancel (const Status &status)
 
std::string DebugString () const
 return a descriptive string for debug. More...
 
int64_t mtime () const
 
hdfsFS fs () const
 
const char * file () const
 
int64_t offset () const
 
int64_t len () const
 
int disk_id () const
 
RequestType::type request_type () const
 
T * Next () const
 Returns the Next/Prev node or NULL if this is the end/front. More...
 
T * Prev () const
 

Static Public Attributes

static const int64_t NEVER_CACHE = -1
 If the mtime is set to NEVER_CACHE, the file handle should never be cached. More...
 

Protected Attributes

hdfsFS fs_
 Hadoop filesystem that contains file_, or set to NULL for local filesystem. More...
 
std::string file_
 Path to file being read or written. More...
 
int64_t offset_
 Offset within file_ being read or written. More...
 
int64_t len_
 Length of data read or written. More...
 
int disk_id_
 Id of disk containing file_;. More...
 
RequestType::type request_type_
 The type of IO request, READ or WRITE. More...
 

Private Member Functions

void InitInternal (DiskIoMgr *io_mgr, RequestContext *reader)
 Initialize internal fields. More...
 
bool EnqueueBuffer (BufferDescriptor *buffer)
 
void CleanupQueuedBuffers ()
 
bool Validate ()
 
int64_t MaxReadChunkSize () const
 Maximum length in bytes for hdfsRead() calls. More...
 
Status Open ()
 Opens the file for this range. This function only modifies state in this range. More...
 
void Close ()
 Closes the file for this range. This function only modifies state in this range. More...
 
Status Read (char *buffer, int64_t *bytes_read, bool *eosr)
 
Status ReadFromCache (bool *read_succeeded)
 

Private Attributes

void * meta_data_
 
bool try_cache_
 
bool expected_local_
 
DiskIoMgrio_mgr_
 
RequestContextreader_
 Reader/owner of the scan range. More...
 
union {
   FILE *   local_file_
 
   hdfsFile   hdfs_file_
 
}; 
 File handle either to hdfs or local fs (FILE*) More...
 
struct hadoopRzBuffer * cached_buffer_
 
boost::mutex lock_
 
int bytes_read_
 Number of bytes read so far for this scan range. More...
 
Status status_
 
bool eosr_queued_
 If true, the last buffer for this scan range has been queued. More...
 
bool eosr_returned_
 If true, the last buffer for this scan range has been returned. More...
 
bool blocked_on_queue_
 
boost::condition_variable buffer_ready_cv_
 
std::list< BufferDescriptor * > ready_buffers_
 
int ready_buffers_capacity_
 
boost::mutex hdfs_lock_
 
bool is_cancelled_
 If true, this scan range has been cancelled. More...
 
int64_t mtime_
 Last modified time of the file associated with the scan range. More...
 

Friends

class DiskIoMgr
 

Detailed Description

ScanRange description. The caller must call Reset() to initialize the fields before calling AddScanRanges(). The private fields are used internally by the IoMgr.

Definition at line 295 of file disk-io-mgr.h.

Constructor & Destructor Documentation

DiskIoMgr::ScanRange::ScanRange ( int  initial_capacity = -1)

The initial queue capacity for this. Specify -1 to use IoMgr default.

Definition at line 204 of file disk-io-mgr-scan-range.cc.

References NEVER_CACHE, impala::DiskIoMgr::RequestType::READ, impala::DiskIoMgr::RequestRange::request_type_, and Reset().

DiskIoMgr::ScanRange::~ScanRange ( )
virtual

Definition at line 210 of file disk-io-mgr-scan-range.cc.

Member Function Documentation

void DiskIoMgr::ScanRange::Cancel ( const Status status)

Cancel this scan range. This cleans up all queued buffers and wakes up any threads blocked on GetNext(). Status is the reason the range was cancelled. Must not be ok(). Status is returned to the user in GetNext().

Definition at line 143 of file disk-io-mgr-scan-range.cc.

References impala::DiskIoMgr::DebugString(), impala::lock_, impala::Status::ok(), and impala::DiskIoMgr::Validate().

Referenced by impala::DiskIoMgr::RequestContext::Cancel(), impala::DiskIoMgr::HandleReadFinished(), and impala::DiskIoMgr::ReadRange().

void DiskIoMgr::ScanRange::CleanupQueuedBuffers ( )
private

Cleanup any queued buffers (i.e. due to cancellation). This cannot be called with any locks taken.

Definition at line 165 of file disk-io-mgr-scan-range.cc.

References impala::DiskIoMgr::BufferDescriptor::Return().

void DiskIoMgr::ScanRange::Close ( )
private

Closes the file for this range. This function only modifies state in this range.

Definition at line 301 of file disk-io-mgr-scan-range.cc.

References impala::ImpaladMetrics::IO_MGR_NUM_OPEN_FILES, impala::IsDfsPath(), impala::PrettyPrinter::Print(), and VLOG_FILE.

Referenced by impala::DiskIoMgr::HandleReadFinished(), and impala::DiskIoMgr::ReturnBuffer().

string DiskIoMgr::ScanRange::DebugString ( ) const

return a descriptive string for debug.

Definition at line 179 of file disk-io-mgr-scan-range.cc.

Referenced by EnqueueBuffer().

hdfsFS impala::DiskIoMgr::RequestRange::fs ( ) const
inlineinherited
Status DiskIoMgr::ScanRange::GetNext ( BufferDescriptor **  buffer)

Returns the next buffer for this scan range. buffer is an output parameter. This function blocks until a buffer is ready or an error occurred. If this is called when all buffers have been returned, *buffer is set to NULL and Status::OK is returned. Only one thread can be in GetNext() at any time.

Definition at line 70 of file disk-io-mgr-scan-range.cc.

References impala::Cancel(), impala::DiskIoMgr::RequestContext::Cancelled, impala::DiskIoMgr::DebugString(), impala::DiskIoMgr::BufferDescriptor::eosr(), impala::lock_, MAX_QUEUE_CAPACITY, impala::Status::OK, impala::Status::ok(), and impala::DiskIoMgr::Validate().

Referenced by impala::DiskIoMgrStress::ClientThread(), impala::BufferedBlockMgr::PinBlock(), impala::DiskIoMgr::Read(), impala::TEST_F(), and impala::DiskIoMgrTest::ValidateScanRange().

void DiskIoMgr::ScanRange::InitInternal ( DiskIoMgr io_mgr,
RequestContext reader 
)
private
int64_t DiskIoMgr::ScanRange::MaxReadChunkSize ( ) const
private

Maximum length in bytes for hdfsRead() calls.

Definition at line 343 of file disk-io-mgr-scan-range.cc.

References impala::IsS3APath().

void* impala::DiskIoMgr::ScanRange::meta_data ( ) const
inline
int64_t impala::DiskIoMgr::ScanRange::mtime ( ) const
inline

Definition at line 333 of file disk-io-mgr.h.

References mtime_.

Referenced by impala::DiskIoMgrTest::InitRange().

template<typename T>
T* impala::InternalQueue< T >::Node::Next ( ) const
inlineinherited

Returns the Next/Prev node or NULL if this is the end/front.

Definition at line 48 of file internal-queue.h.

References impala::InternalQueue< T >::lock_, impala::InternalQueue< T >::Node::next, and impala::InternalQueue< T >::Node::parent_queue.

Referenced by impala::TEST(), and impala::BufferedBlockMgr::Validate().

Status DiskIoMgr::ScanRange::Open ( )
private

Opens the file for this range. This function only modifies state in this range.

Definition at line 251 of file disk-io-mgr-scan-range.cc.

References impala::Status::CANCELLED, impala::GetHdfsErrorMsg(), impala::GetStrErrMsg(), impala::ImpaladMetrics::IO_MGR_NUM_OPEN_FILES, impala::Status::OK, and VLOG_FILE.

Referenced by impala::DiskIoMgr::ReadRange().

template<typename T>
T* impala::InternalQueue< T >::Node::Prev ( ) const
inlineinherited
Status DiskIoMgr::ScanRange::Read ( char *  buffer,
int64_t *  bytes_read,
bool eosr 
)
private

Reads from this range into 'buffer'. Buffer is preallocated. Returns the number of bytes read. Updates range to keep track of where in the file we are.

Definition at line 360 of file disk-io-mgr-scan-range.cc.

References impala::Status::CANCELLED, impala::GetHdfsErrorMsg(), impala::GetStrErrMsg(), and impala::Status::OK.

Referenced by impala::DiskIoMgr::ReadRange().

Status DiskIoMgr::ScanRange::ReadFromCache ( bool read_succeeded)
private

Reads from the DN cache. On success, sets cached_buffer_ to the DN buffer and *read_succeeded to true. If the data is not cached, returns ok() and *read_succeeded is set to false. Returns a non-ok status if it ran into a non-continuable error.

Definition at line 404 of file disk-io-mgr-scan-range.cc.

References impala::Status::CANCELLED, COUNTER_ADD, impala::DiskIoMgr::BufferDescriptor::eosr_, impala::DiskIoMgr::BufferDescriptor::len_, impala::Status::OK, impala::Status::ok(), and impala::DiskIoMgr::BufferDescriptor::scan_range_offset_.

Referenced by impala::DiskIoMgr::AddScanRanges().

int impala::DiskIoMgr::ScanRange::ready_buffers_capacity ( ) const
inline

Definition at line 315 of file disk-io-mgr.h.

References ready_buffers_capacity_.

RequestType::type impala::DiskIoMgr::RequestRange::request_type ( ) const
inlineinherited
void DiskIoMgr::ScanRange::Reset ( hdfsFS  fs,
const char *  file,
int64_t  len,
int64_t  offset,
int  disk_id,
bool  try_cache,
bool  expected_local,
int64_t  mtime,
void *  metadata = NULL 
)

Resets this scan range object with the scan range description. The scan range must fall within the file bounds (offset >= 0 and offset + len <= file_length). Resets this scan range object with the scan range description.

Definition at line 215 of file disk-io-mgr-scan-range.cc.

References offset.

Referenced by impala::HdfsScanNode::AllocateScanRange(), impala::DiskIoMgrTest::InitRange(), impala::DiskIoMgrStress::NewClient(), impala::BufferedBlockMgr::PinBlock(), ScanRange(), and impala::DiskIoMgrTest::WriteValidateCallback().

bool DiskIoMgr::ScanRange::Validate ( )
private

Validates the internal state of this range. lock_ must be taken before calling this.

Definition at line 189 of file disk-io-mgr-scan-range.cc.

Referenced by EnqueueBuffer().

Friends And Related Function Documentation

friend class DiskIoMgr
friend

Definition at line 336 of file disk-io-mgr.h.

Member Data Documentation

union { ... }

File handle either to hdfs or local fs (FILE*)

bool impala::DiskIoMgr::ScanRange::blocked_on_queue_
private

If true, this scan range has been removed from the reader's in_flight_ranges queue because the ready_buffers_ queue is full.

Definition at line 423 of file disk-io-mgr.h.

Referenced by EnqueueBuffer(), and impala::DiskIoMgr::ReadRange().

boost::condition_variable impala::DiskIoMgr::ScanRange::buffer_ready_cv_
private

IO buffers that are queued for this scan range. Condition variable for GetNext

Definition at line 427 of file disk-io-mgr.h.

Referenced by EnqueueBuffer().

int impala::DiskIoMgr::ScanRange::bytes_read_
private

Number of bytes read so far for this scan range.

Definition at line 408 of file disk-io-mgr.h.

Referenced by impala::DiskIoMgr::ReadRange().

struct hadoopRzBuffer* impala::DiskIoMgr::ScanRange::cached_buffer_
private

If non-null, this is DN cached buffer. This means the cached read succeeded and all the bytes for the range are in this buffer.

Definition at line 401 of file disk-io-mgr.h.

Referenced by impala::DiskIoMgr::HandleReadFinished(), and impala::DiskIoMgr::ReturnBuffer().

int impala::DiskIoMgr::RequestRange::disk_id_
protectedinherited
bool impala::DiskIoMgr::ScanRange::eosr_queued_
private

If true, the last buffer for this scan range has been queued.

Definition at line 416 of file disk-io-mgr.h.

Referenced by EnqueueBuffer().

bool impala::DiskIoMgr::ScanRange::eosr_returned_
private

If true, the last buffer for this scan range has been returned.

Definition at line 419 of file disk-io-mgr.h.

Referenced by EnqueueBuffer().

bool impala::DiskIoMgr::ScanRange::expected_local_
private

If true, we expect this scan range to be a local read. Note that if this is false, it does not necessarily mean we expect the read to be remote, and that we never create scan ranges where some of the range is expected to be remote and some of it local. TODO: we can do more with this

Definition at line 386 of file disk-io-mgr.h.

Referenced by expected_local().

std::string impala::DiskIoMgr::RequestRange::file_
protectedinherited

Path to file being read or written.

Definition at line 277 of file disk-io-mgr.h.

Referenced by impala::DiskIoMgr::RequestRange::file(), impala::DiskIoMgr::Write(), and impala::DiskIoMgr::WriteRangeHelper().

hdfsFS impala::DiskIoMgr::RequestRange::fs_
protectedinherited

Hadoop filesystem that contains file_, or set to NULL for local filesystem.

Definition at line 274 of file disk-io-mgr.h.

Referenced by impala::DiskIoMgr::RequestRange::fs().

hdfsFile impala::DiskIoMgr::ScanRange::hdfs_file_

Definition at line 396 of file disk-io-mgr.h.

boost::mutex impala::DiskIoMgr::ScanRange::hdfs_lock_
private

Lock that should be taken during hdfs calls. Only one thread (the disk reading thread) calls into hdfs at a time so this lock does not have performance impact. This lock only serves to coordinate cleanup. Specifically it serves to ensure that the disk threads are finished with HDFS calls before is_cancelled_ is set to true and cleanup starts. If this lock and lock_ need to be taken, lock_ must be taken first.

Definition at line 442 of file disk-io-mgr.h.

DiskIoMgr* impala::DiskIoMgr::ScanRange::io_mgr_
private

Definition at line 388 of file disk-io-mgr.h.

Referenced by EnqueueBuffer().

bool impala::DiskIoMgr::ScanRange::is_cancelled_
private

If true, this scan range has been cancelled.

Definition at line 445 of file disk-io-mgr.h.

Referenced by EnqueueBuffer(), and impala::DiskIoMgr::ReturnBuffer().

int64_t impala::DiskIoMgr::RequestRange::len_
protectedinherited

Length of data read or written.

Definition at line 283 of file disk-io-mgr.h.

Referenced by impala::DiskIoMgr::RequestRange::len(), impala::DiskIoMgr::ReadRange(), and impala::DiskIoMgr::WriteRangeHelper().

FILE* impala::DiskIoMgr::ScanRange::local_file_

Definition at line 395 of file disk-io-mgr.h.

boost::mutex impala::DiskIoMgr::ScanRange::lock_
private

Lock protecting fields below. This lock should not be taken during Open/Read/Close.

Definition at line 405 of file disk-io-mgr.h.

Referenced by EnqueueBuffer().

void* impala::DiskIoMgr::ScanRange::meta_data_
private

Pointer to caller specified metadata. This is untouched by the io manager and the caller can put whatever auxiliary data in here.

Definition at line 374 of file disk-io-mgr.h.

Referenced by meta_data().

int64_t impala::DiskIoMgr::ScanRange::mtime_
private

Last modified time of the file associated with the scan range.

Definition at line 448 of file disk-io-mgr.h.

Referenced by mtime().

const int64_t impala::DiskIoMgr::ScanRange::NEVER_CACHE = -1
static

If the mtime is set to NEVER_CACHE, the file handle should never be cached.

Definition at line 299 of file disk-io-mgr.h.

Referenced by impala::DiskIoMgrStress::NewClient(), impala::BufferedBlockMgr::PinBlock(), ScanRange(), and impala::DiskIoMgrTest::WriteValidateCallback().

int64_t impala::DiskIoMgr::RequestRange::offset_
protectedinherited

Offset within file_ being read or written.

Definition at line 280 of file disk-io-mgr.h.

Referenced by impala::DiskIoMgr::RequestRange::offset().

RequestContext* impala::DiskIoMgr::ScanRange::reader_
private

Reader/owner of the scan range.

Definition at line 391 of file disk-io-mgr.h.

Referenced by EnqueueBuffer().

std::list<BufferDescriptor*> impala::DiskIoMgr::ScanRange::ready_buffers_
private

Definition at line 428 of file disk-io-mgr.h.

Referenced by EnqueueBuffer(), and impala::DiskIoMgr::ReadRange().

int impala::DiskIoMgr::ScanRange::ready_buffers_capacity_
private

The soft capacity limit for ready_buffers_. ready_buffers_ can exceed the limit temporarily as the capacity is adjusted dynamically. In that case, the capcity is only realized when the caller removes buffers from ready_buffers_.

Definition at line 434 of file disk-io-mgr.h.

Referenced by EnqueueBuffer(), and ready_buffers_capacity().

RequestType::type impala::DiskIoMgr::RequestRange::request_type_
protectedinherited

The type of IO request, READ or WRITE.

Definition at line 289 of file disk-io-mgr.h.

Referenced by impala::DiskIoMgr::RequestRange::request_type(), and ScanRange().

Status impala::DiskIoMgr::ScanRange::status_
private

Status for this range. This is non-ok if is_cancelled_ is true. Note: an individual range can fail without the RequestContext being cancelled. This allows us to skip individual ranges.

Definition at line 413 of file disk-io-mgr.h.

bool impala::DiskIoMgr::ScanRange::try_cache_
private

If true, this scan range is expected to be cached. Note that this might be wrong since the block could have been uncached. In that case, the cached path will fail and we'll just put the scan range on the normal read path.

Definition at line 379 of file disk-io-mgr.h.

Referenced by impala::DiskIoMgr::AddScanRanges(), and try_cache().


The documentation for this class was generated from the following files: