Different Types of Features
In this article, we discuss different types of features that can be used as a starting point for modeling. It is important to remember that the same data set may have multiple features which are good for prediction (e.g., price and size) or bad for prediction (e.g., noise). There is no universal way of selecting features
There are many different types of features. The first thing you should do is determine if your feature should be classified as one or more of the following:
1. Point Feature – Point features are associated with a centroid (the center point) and consider only X-Y coordinates, disregarding elevation values. Point data typically represents things such as rivers, streams, trails, and other naturally occurring phenomena. Points are a good choice for features where the position of the feature is more important than its physical dimensions.
Point Feature – Pushpin Symbol
2. Line Features – Lines typically represent linear features such as roads, railroads, power lines, pipelines, and coastlines to name a few. The attribute table for lines often contain separate fields to store different types of data, such as speed limits along roads or power line voltage. Lines are not concerned with elevation, rather their position in space is defined by adjoining pairs of X-Y coordinates.
Line Feature – Line Symbol
3. Polygon Features – The third type of feature is polygons, which are composed of collections of points. The simplest example is the outline of a state or country. Polygons are used to represent boundaries, hence they are concerned with area and elevation information. For example, one could draw a polygon around all the rivers within a particular watershed.
Polygon Feature – Polygon Symbol
4. Raster Features – Rasters are the fourth type of feature that can be considered for a model. The important point to keep in mind about rasters is that they are concerned with both X-Y coordinates and elevation values. The most common example of rasters are satellite images, which have individual cells identified by pixel location (X-Y coordinates). Each cell contains elevation value information, which is often represented as the color of the pixel.
Raster Feature – Satellite Image with Color Values
5. Categorical Features – The fifth type of features are categoricals. These features typically have a small number of possible options that represent some meaningful difference in the feature being described. For example, different models of cars or different colors in a satellite image are examples of categorical data, since they can be easily identified and differentiated from one another. Categoricals usually have no associated units.
Categorical Feature – Car Models
6. Continuous Features – The final type of feature that may need to be considered are continuous features. Continuous features are numerical values that vary continuously throughout the range of possible measurements for that feature. For example, once you know the number of bedrooms in a home, you also know the total area of that home by multiplying the floor area of each room together. Another example is travel time on road segments – if the speed limit on the
what’s your best feature set
Feature Selection Algorithms:
Linear Regression – Which feature set/ type is best for predicting house prices with linear regression?
Logistic Regression – What’s the best set of features to help predict housing value with logistic regression and also show which attribute has most importance in model prediction?
Random Forests – What are the best features for use with Random Forests (regression or classification)?
Naïve Bayes – What would be the best feature set to predict house prices using Naïve Bayes?
K-Nearest Neighbors – Which features might be most useful for predicting house prices using kNN regression?
Conclusion
A problem with many of the feature selection algorithms (with the exception of random forests and some types of decision trees) is that they generally do not consider interactions between features.