Analysis

Forecasting Dengue Epidemics

Predicting the next Pandemic – Dengue

Predicting Dengue – DataDriven Competition Goal:  Predict Dengue outbreaks by the total number of cases by year and week for for two cities (Juan and Iquitos). Data:  U.S. Centers for Disease Control and prevention, the Department of Defense’s Naval Medical Research Unit 6 and the Armed Forces Health Surveillance Center, in collaboration with the Peruvian
+ Read More

R Ultimate Histogram

Histogram in R

Histogram in R After a lot of finessing, code in R for a really great Histogram   #load libraries library(ggplot2) library(formattable) library(scales) # font windowsFonts(Tahoma=windowsFont(“Tahoma”)) lengthselect <- flist_widget[flist_widget$length==10,] lengthselect summary(lengthselect) barfill <- “cyan3” barlines <- “#1F3552” meanprice <- mean(lengthselect$price) medianprice <- currency(median(lengthselect$price), digits=0L) sdprice <- currency(sd(lengthselect$price), digits=0L) rangeprice <- currency(range(lengthselect$price), digits=0L) minprice <- currency(min(lengthselect$price), digits=0L)
+ Read More

Machine Learning

Machine Learning: Charity Donor Analysis

Machine Learning:  Charity Donor Analysis Introduction A charitable organization wishes to develop a machine learning model to improve the cost effectiveness of their direct marketing campaigns to previous donors. The recent mailing records reflect an overall 10% response rate with an average donation of $14.50. The cost to produce and send each mail is $2.
+ Read More

Text Analytics Sentiment Analysis

Text Analytics in R – Internet of Things (IoT)

Internet of Things (IoT) Text Analytics in R A small corpus of ten articles related to the Internet of Things (IoT) were collected for the purpose of text analytics.  Using R, each article was cleaned for unusual characters, changed to lower case,  removed numbers, punctuation, stop words, white space along with any additional terms that
+ Read More

Factor Analysis Path Diagram

Factor Analysis to Identify Sectors

Factor Analysis Introduction Utilizing a stock portfolio data set and a factor analysis to identify sectors in the stock market, we will transform the variables into log values to explain the variation in the log-returns of the stocks and market index.  We will begin the factor analysis by performing a Principal Factor Analysis without a
+ Read More

Principal Components Analysis - Grouped by Sector

Principal Components Analysis

Principal Components Analysis Utilizing a stock portfolio data set and the Principal Components Analysis as a method in reducing dimension and as a remedial measure for multicollinearity in Ordinary Least Squares regression.  Beginning with the data, we will transform the variables into log values to explain the variation in the log-returns of the stocks and
+ Read More

Automated Variable Selection

Automated Variable Selection The Amex, Iowa housing data set build has been utilized to develop various iterative regression models to determine the mean sales price of a house based on numerous variables. The variables range correlated, continuous variables to categorical variables. In this installment, we continue building the model using raw categories and later, the
+ Read More

Regression Models Using Numerous Variables

Assessing Regression Models Using Numerous Variables Regression model on the Amex, Iowa housing data set builds regression models for the house sale price with numerous variables.  Some of which are highly correlated, continuous variables along to the other side of the continuum by evaluating categorical, low correlated variables.  An assessment of each model will be
+ Read More

Variable Transformations

Variable Transformations: Continuous & Categorical

Variable Transformations The Amex, Iowa housing data set build has been utilized to develop various regression models to determine the sales price of a house based on numerous variables. The variables range from highly correlated, continuous variables to categorical variables with low correlations. In this assessment, variable transformations and comparisons of Y versus Log(Y) will
+ Read More

Cluster Analysis Average Distance between Cluster

Cluster Analysis on Transformed Variables

Cluster Analysis on Transformed Predictor Variables Cluster analysis is grouping a set of objects in a way that objects in the same group are more similar in some sense to each other than those in other groups.  Clusters are identified by assessing the relative distances between points, the relative homogeneity of each cluster and the degree
+ Read More